date:20230907

Re: Suspicious redundant assignment in COPY FROM

2023-09-07 Thread Jingtang Zhang

Michael Paquier  于2023年9月8日周五 13:42写道：

Thanks, Michael~


> The assignment of num_phys_attrs could be kept at the same place as
> on HEAD, a bit closer to the palloc0() where it is used.
>

Agreed with this principle. Patch is modified and attached.

--
Jingtang


v2-0001-Remove-redundant-assignment-in-copyfrom.patch
Description: Binary data

Re: Suspicious redundant assignment in COPY FROM

2023-09-07 Thread Michael Paquier

On Fri, Sep 08, 2023 at 12:23:17PM +0800, Jingtang Zhang wrote:
> Hi all, I was reading code of COPY FROM and I found some suspicious
> redundant assignment for tuple descriptor and number of attributes. Is
> it a behavior on purpose, or an accidently involved by the refactor in
> c532d15? Patch is attached.

This looks like a copy-pasto to me, as the tuple descriptor coming
from the relation is just used for sanity checks on the attributes
depending on the options by the caller for the COPY.

The assignment of num_phys_attrs could be kept at the same place as
on HEAD, a bit closer to the palloc0() where it is used.
--
Michael


signature.asc
Description: PGP signature

Re: Eager page freeze criteria clarification

2023-09-07 Thread Andres Freund

Hi,

On 2023-09-07 21:45:22 -0700, Andres Freund wrote:
> In contrast to that, freezing will almost always trigger an FPI (except for
> empty pages, but we imo ought to stop setting empty pages all frozen [1]).
> 
> 
> Yep, a quick experiment confirms that:
> 
> DROP TABLE IF EXISTS foo;
> CREATE TABLE foo AS SELECT generate_series(1, 1000);
> CHECKPOINT;
> VACUUM (VERBOSE) foo;
> 
> checksums off: WAL usage: 44249 records, 3 full page images, 2632091 bytes
> checksums on: WAL usage: 132748 records, 44253 full page images, 388758161 
> bytes
> 
> 
> I initially was confused by the 3x wal records - I was expecting 2x. The
> reason is that with checksums on, we emit an FPI during the visibility check,
> which then triggers the current heuristic for opportunistic freezing. The
> saving grace is that WAL volume is completely dominated by the FPIs:
> 
> Type   N  (%)  Record 
> size  (%) FPI size  (%)Combined size  (%)
>    -  ---  
> ---  ---   ----  
> ---
> XLOG/FPI_FOR_HINT  44253 ( 33.34)  
> 2168397 (  7.84)361094232 (100.00)363262629 ( 93.44)
> Transaction/INVALIDATION   1 (  0.00)   
> 78 (  0.00)0 (  0.00)   78 (  0.00)
> Standby/INVALIDATIONS  1 (  0.00)   
> 90 (  0.00)0 (  0.00)   90 (  0.00)
> Heap2/FREEZE_PAGE  44248 ( 33.33) 
> 22876120 ( 82.72)0 (  0.00) 22876120 (  5.88)
> Heap2/VISIBLE  44248 ( 33.33)  
> 2610642 (  9.44)16384 (  0.00)  2627026 (  0.68)
> Heap/INPLACE   1 (  0.00)  
> 188 (  0.00)0 (  0.00)  188 (  0.00)
>   
>     
> Total 132752  
> 27655515 [7.11%] 361110616 [92.89%]388766131 [100%]
> 
> In realistic tables, where rows are wider than a single int, FPI_FOR_HINT
> dominates even further, as the FREEZE_PAGE would be smaller if there weren't
> 226 tuples on each page...

The above is not a great demonstration of the overhead of setting all-visible,
as the FPIs are triggered via FPI_FOR_HINTs, independent of setting
all-visible. Adding "SELECT count(*) FROM foo" before the checkpoint sets them
earlier and results in:

checksum off:

WAL usage: 44249 records, 3 full page images, 2627915 bytes

Type   N  (%)  Record size  
(%) FPI size  (%)Combined size  (%)
   -  ---  ---  
---   ----  ---
Transaction/INVALIDATION   1 (  0.00)   78 
(  0.00)0 (  0.00)   78 (  0.00)
Standby/INVALIDATIONS  1 (  0.00)   90 
(  0.00)0 (  0.00)   90 (  0.00)
Heap2/VISIBLE  44248 ( 99.99)  2610642 
( 99.99)16384 ( 95.15)  2627026 ( 99.96)
Heap/INPLACE   1 (  0.00)   53 
(  0.00)  836 (  4.85)  889 (  0.03)
    
  
Total  44251   2610863 
[99.34%]17220 [0.66%]   2628083 [100%]

checksums on:

WAL usage: 44252 records, 44254 full page images, 363935830 bytes

Type   N  (%)  Record size  
(%) FPI size  (%)Combined size  (%)
   -  ---  ---  
---   ----  ---
XLOG/FPI_FOR_HINT  3 (  0.01)  147 
(  0.01)24576 (  0.01)24723 (  0.01)
Transaction/INVALIDATION   1 (  0.00)   78 
(  0.00)0 (  0.00)   78 (  0.00)
Standby/INVALIDATIONS  1 (  0.00)   90 
(  0.00)0 (  0.00)   90 (  0.00)
Heap2/VISIBLE  44248 ( 99.99)  2831882 
( 99.99)

Re: Eager page freeze criteria clarification

2023-09-07 Thread Peter Geoghegan

On Thu, Sep 7, 2023 at 9:45 PM Andres Freund  wrote:
> I.e. setting an, otherwise unmodified, page all-visible won't trigger an FPI
> if checksums are disabled, but will FPI with checksums enabled. I think that's
> a substantial difference in WAL volume for insert-only workloads...

Note that all RDS Postgres users get page-level checksums. Overall,
the FPI trigger mechanism is going to noticeably improve performance
characteristics for many users. Simple and crude though it is.

> Type   N  (%)  Record 
> size  (%) FPI size  (%)Combined size  (%)
>    -  ---  
> ---  ---   ----  
> ---
> XLOG/FPI_FOR_HINT  44253 ( 33.34)  
> 2168397 (  7.84)361094232 (100.00)363262629 ( 93.44)
> Transaction/INVALIDATION   1 (  0.00)   
> 78 (  0.00)0 (  0.00)   78 (  0.00)
> Standby/INVALIDATIONS  1 (  0.00)   
> 90 (  0.00)0 (  0.00)   90 (  0.00)
> Heap2/FREEZE_PAGE  44248 ( 33.33) 
> 22876120 ( 82.72)0 (  0.00) 22876120 (  5.88)
> Heap2/VISIBLE  44248 ( 33.33)  
> 2610642 (  9.44)16384 (  0.00)  2627026 (  0.68)
> Heap/INPLACE   1 (  0.00)  
> 188 (  0.00)0 (  0.00)  188 (  0.00)
>   
>     
> Total 132752  
> 27655515 [7.11%] 361110616 [92.89%]388766131 [100%]
>
> In realistic tables, where rows are wider than a single int, FPI_FOR_HINT
> dominates even further, as the FREEZE_PAGE would be smaller if there weren't
> 226 tuples on each page...

If FREEZE_PAGE WAL volume is really what holds back further high level
improvements in this area, then it can be worked on directly -- it's
not a fixed cost. It wouldn't be particularly difficult, in fact.
These are records that still mostly consist of long runs of contiguous
page offset numbers. They're ideally suited for compression using some
kind of simple variant of run length encoding. The freeze plan
deduplication stuff in 16 made a big difference, but it's still not
very hard to improve matters here.

-- 
Peter Geoghegan

Re: Eager page freeze criteria clarification

2023-09-07 Thread Andres Freund

Hi,

On 2023-09-06 16:21:31 -0400, Robert Haas wrote:
> On Wed, Sep 6, 2023 at 12:20 PM Peter Geoghegan  wrote:
> > If VACUUM freezes too aggressively, then (pretty much by definition)
> > we can be sure that the next VACUUM will scan the same pages -- there
> > may be some scope for VACUUM to "learn from its mistake" when we err
> > in the direction of over-freezing. But when VACUUM makes the opposite
> > mistake (doesn't freeze when it should have), it won't scan those same
> > pages again for a long time, by design. It therefore has no plausible
> > way of "learning from its mistakes" before it becomes an extremely
> > expensive and painful lesson (which happens whenever the next
> > aggressive VACUUM takes place). This is in large part a consequence of
> > the way that VACUUM dutifully sets pages all-visible whenever
> > possible. That behavior interacts badly with many workloads, over
> > time.
>
> I think this is an insightful commentary with which I partially agree.
> As I see it, the difference is that when you make the mistake of
> marking something all-visible or freezing it too aggressively, you
> incur a price that you pay almost immediately. When you make the
> mistake of not marking something all-visible when it would have been
> best to do so, you incur a price that you pay later, when the next
> VACUUM happens. When you make the mistake of not marking something
> all-frozen when it would have been best to do so, you incur a price
> that you pay even later, not at the next VACUUM but at some VACUUM
> further off. So there are different trade-offs. When you pay the price
> for a mistake immediately or nearly immediately, it can potentially
> harm the performance of the foreground workload, if you're making a
> lot of mistakes.

We have to make a *lot* of mistakes to badly harm the foreground workload. As
long as we don't constantly trigger FPIs, the worst case effects of freezing
unnecessarily aren't that large, particularly if we keep the number of
XLogInsert()s constant (via the combining of records that Melanie is working
on).  Once/if we want to opportunistically freeze when it *does* trigger an
FPI, we *do* need to be more certain that it's not pointless work.

We could further bound the worst case overhead by having a range
representation for freeze plans. That should be quite doable, we can add
another flag for xl_heap_freeze_plan.frzflags, which indicates the tuples in
the offset array are start/end tids, rather than individual tids.

> That sucks. On the other hand, when you defer paying
> the price until some later bulk operation, the costs of all of your
> mistakes get added up and then you pay the whole price all at once,
> which means you can be suddenly slapped with an enormous bill that you
> weren't expecting. That sucks, too, just in a different way.

It's particularly bad in the case of freezing because there's practically no
backpressure against deferring more work than the system can handle. If we had
made foreground processes freeze one page for every unfrozen page they create,
once the table reaches a certain percentage of old unfrozen pages, it'd be a
different story...

I think it's important that we prevent exploding the WAL volume due to
opportunistic freezing the same page over and over, but as long as we take
care to not do that, I think the impact on the foreground is going to be
small.

I'm sure that we can come up with cases where it's noticeable, e.g. because
the system is already completely bottlecked by WAL IO throughput and a small
increase in WAL volume is going to push things over the edge. But such systems
are going to be in substantial trouble at the next anti-wraparound vacuum and
will be out of commission for days once they hit the anti-wraparound shutdown
limits.

Some backpressure in the form of a small performance decrease for foreground
work might even be good there.

> > VACUUM simply ignores such second-order effects. Perhaps it would be
> > practical to address some of the issues in this area by avoiding
> > setting pages all-visible without freezing them, in some general
> > sense. That at least creates a kind of symmetry between mistakes in
> > the direction of under-freezing and mistakes in the direction of
> > over-freezing. That might enable VACUUM to course-correct in either
> > direction.
> >
> > Melanie is already planning on combining the WAL records (PRUNE,
> > FREEZE_PAGE, and VISIBLE). Perhaps that'll weaken the argument for
> > setting unfrozen pages all-visible even further.
>
> Yeah, so I think the question here is whether it's ever a good idea to
> mark a page all-visible without also freezing it. If it's not, then we
> should either mark fewer pages all-visible, or freeze more of them.
> Maybe I'm all wet here, but I think it depends on the situation. If a
> page is already dirty and has had an FPI since the last checkpoint,
> then it's pretty appealing to freeze whenever we mark all-visible. We
> still have to consider

Re: Correct the documentation for work_mem

2023-09-07 Thread David Rowley

On Fri, 8 Sept 2023 at 15:24, Bruce Momjian  wrote:
> Adjusted patch attached.

This looks mostly fine to me modulo "sort or hash".  I do see many
instances of "and/or" in the docs. Maybe that would work better.

David

Re: old_snapshot_threshold bottleneck on replica

2023-09-07 Thread Thomas Munro

On Fri, Sep 8, 2023 at 4:22 PM Peter Geoghegan  wrote:
> This looks right to me.

Thanks, pushed.

Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

2023-09-07 Thread Kyotaro Horiguchi

At Fri, 08 Sep 2023 14:17:16 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Ditching cmd.exe seems like a big hassle. So, on the flip side, I
> tried to identify the postmaster PID using the shell's PID, and it
> seem to work. The APIs used are avaiable from XP/2003 onwards.

Cleaned it up a bit.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 807d7023a9..25d3354dc6 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -32,6 +32,9 @@
 #ifdef WIN32	/* on Unix, we don't need libpq */
 #include "pqexpbuffer.h"
 #endif
+#ifdef WIN32
+#include 
+#endif
 
 
 typedef enum
@@ -425,6 +428,90 @@ free_readfile(char **optlines)
  * start/test/stop routines
  */
 
+#ifdef WIN32
+/*
+ * Find the PID of the launched postmaster.
+ *
+ * On Windows, the cmd.exe doesn't support the exec command. As a result, we
+ * don't directly get the postmaster's PID. This function identifies the PID of
+ * the postmaster started by the child cmd.exe.
+ *
+ * Returns the postmaster's PID. If the shell is alive but the postmaster is
+ * missing, returns 0. Otherwise terminates this command with an error.
+ */
+DWORD
+win32_find_postmaster_pid(int shell_pid)
+{
+	HANDLE hSnapshot;
+	DWORD flags;
+	DWORD p_pid = shell_pid;
+	PROCESSENTRY32 ppe;
+	DWORD pm_pid = 0;
+	bool shell_exists = false;
+	bool multiple_children = false;
+	DWORD last_error;
+
+	/* Create a process snapshot */
+	hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, shell_pid);
+	if (hSnapshot == INVALID_HANDLE_VALUE)
+	{
+		write_stderr(_("%s: CreateToolhelp32Snapshot failed\n"),
+	 progname);
+		return;
+	}
+
+	/* start iterating on the snapshot */
+	ppe.dwSize = sizeof(PROCESSENTRY32);
+	if (!Process32First(hSnapshot, ))
+	{
+		write_stderr(_("%s: Process32First failed: errcode=%08x\n"),
+	 progname, GetLastError());
+		exit(1);
+	}
+
+	/* iterate over the snapshot */
+	do
+	{
+		if (ppe.th32ProcessID == shell_pid)
+			shell_exists = true;
+		if (ppe.th32ParentProcessID == shell_pid)
+		{
+			if (pm_pid > 0)
+multiple_children = true;
+			pm_pid = ppe.th32ProcessID;
+		}
+	}
+	while (Process32Next(hSnapshot, ));
+
+	/* avoid multiple calls primary for clarity, not out of necessity */
+	last_error = GetLastError();
+	if (last_error != ERROR_NO_MORE_FILES)
+	{
+		write_stderr(_("%s: Process32Next failed: errcode=%08x\n"),
+	 progname, last_error);
+		exit(1);
+	}
+	CloseHandle(hSnapshot);
+
+	/* assuming the launching shell executes a single process */
+	if (multiple_children)
+	{
+		write_stderr(_("%s: launcher shell executed multiple processes\n"),
+	 progname);
+		exit(1);
+	}
+
+	/* check if the process is still alive */
+	if (!shell_exists)
+	{
+		write_stderr(_("%s: launcher shell died\n"), progname);
+		exit(1);
+	}
+
+	return pm_pid;
+}
+#endif
+
 /*
  * Start the postmaster and return its PID.
  *
@@ -609,7 +696,11 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			/* File is complete enough for us, parse it */
 			pid_t		pmpid;
 			time_t		pmstart;
-
+#ifndef WIN32
+			pid_t		wait_pid = pm_pid;
+#else
+			pid_t		wait_pid = win32_find_postmaster_pid(pm_pid);
+#endif
 			/*
 			 * Make sanity checks.  If it's for the wrong PID, or the recorded
 			 * start time is before pg_ctl started, then either we are looking
@@ -619,14 +710,8 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			 */
 			pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]);
 			pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]);
-			if (pmstart >= start_time - 2 &&
-#ifndef WIN32
-pmpid == pm_pid
-#else
-			/* Windows can only reject standalone-backend PIDs */
-pmpid > 0
-#endif
-)
+
+			if (pmstart >= start_time - 2 && pmpid == wait_pid)
 			{
 /*
  * OK, seems to be a valid pidfile from our child.  Check the

Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

2023-09-07 Thread Kyotaro Horiguchi

At Thu, 7 Sep 2023 10:53:41 +, "Hayato Kuroda (Fujitsu)" 
 wrote in 
> My first idea is that to move the checking part to above, but this may not 
> handle
> the case the postmaster is still alive (now sure this is real issue). Do we 
> have to
> add a new indicator which ensures the identity of processes for windows?
> Please tell me how you feel.

It doesn't seem to work as expected. We still lose the relationship
between the PID file and the launched postmaster.

> > Now, I
> > also recall that the processes spawned by pg_ctl on Windows make the
> > status handling rather tricky to reason about..
> 
> Did you say about the below comment? Currently I have no idea to make
> codes more proper, sorry.
> 
> ```
>* On Windows, we may be checking the postmaster's parent 
> shell, but
>* that's fine for this purpose.
> ```

Ditching cmd.exe seems like a big hassle. So, on the flip side, I
tried to identify the postmaster PID using the shell's PID, and it
seem to work. The APIs used are avaiable from XP/2003 onwards.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 807d7023a9..d54cc6ab54 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -32,6 +32,9 @@
 #ifdef WIN32	/* on Unix, we don't need libpq */
 #include "pqexpbuffer.h"
 #endif
+#ifdef WIN32
+#include 
+#endif
 
 
 typedef enum
@@ -425,6 +428,91 @@ free_readfile(char **optlines)
  * start/test/stop routines
  */
 
+#ifdef WIN32
+/*
+ * Find the PID of the launched postmaster.
+ *
+ * On Windows, the cmd.exe doesn't support the exec command. As a result, we
+ * don't directly get the postmaster's PID. This function identifies the PID of
+ * the postmaster started by the child cmd.exe.
+ *
+ * Returns the postmaster's PID. If the shell is alive but the postmaster is
+ * missing, returns 0. Otherwise terminates this command with an error.
+ */
+DWORD
+win32_find_postmaster_pid(int shell_pid)
+{
+	HANDLE hSnapshot;
+	DWORD flags;
+	DWORD p_pid = shell_pid;
+	PROCESSENTRY32 ppe;
+	DWORD pm_pid = 0;
+	bool shell_exists = false;
+	bool multiple_children = false;
+	DWORD last_error;
+
+	/* Create a process snapshot */
+	hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, shell_pid);
+	if (hSnapshot == INVALID_HANDLE_VALUE)
+	{
+		write_stderr(_("%s: CreateToolhelp32Snapshot failed\n"),
+	 progname);
+		return;
+	}
+
+	/* start iterating on the snapshot */
+	ppe.dwSize = sizeof(PROCESSENTRY32);
+	if (!Process32First(hSnapshot, ))
+	{
+		write_stderr(_("%s: Process32First failed: errcode=%08x\n"),
+	 progname, GetLastError());
+		exit(1);
+	}
+
+	/* iterate over the snapshot */
+	do
+	{
+		if (ppe.th32ProcessID == shell_pid)
+			shell_exists = true;
+		if (ppe.th32ParentProcessID == shell_pid)
+		{
+			if (pm_pid > 0)
+multiple_children = true;
+			pm_pid = ppe.th32ProcessID;
+		}
+	}
+	while (Process32Next(hSnapshot, ));
+
+	/* avoid multiple calls primary for clarity, not out of necessity */
+	last_error = GetLastError();
+	if (last_error != ERROR_NO_MORE_FILES)
+	{
+		write_stderr(_("%s: Process32Next failed: errcode=%08x\n"),
+	 progname, last_error);
+		exit(1);
+	}
+	CloseHandle(hSnapshot);
+
+	/* assuming the launching shell executes a single process */
+	if (multiple_children)
+	{
+		write_stderr(_("%s: lancher shell executed multiple processes\n"),
+	 progname);
+		exit(1);
+	}
+
+	/* Check if the process is gone for any reason */
+	if (!shell_exists)
+	{
+		/* Report the condition as server start failure */
+		write_stderr(_("%s: launcher shell died\n"), progname);
+		exit(1);
+	}
+
+	return pm_pid;
+}
+#endif
+
 /*
  * Start the postmaster and return its PID.
  *
@@ -609,7 +697,11 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			/* File is complete enough for us, parse it */
 			pid_t		pmpid;
 			time_t		pmstart;
-
+#ifndef WIN32
+			pid_t		wait_pid = pm_pid;
+#else
+			pid_t		wait_pid = win32_find_postmaster_pid(pm_pid);
+#endif
 			/*
 			 * Make sanity checks.  If it's for the wrong PID, or the recorded
 			 * start time is before pg_ctl started, then either we are looking
@@ -619,21 +711,15 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			 */
 			pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]);
 			pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]);
-			if (pmstart >= start_time - 2 &&
-#ifndef WIN32
-pmpid == pm_pid
-#else
-			/* Windows can only reject standalone-backend PIDs */
-pmpid > 0
-#endif
-)
+
+			if (pmstart >= start_time - 2 && pmpid == wait_pid)
 			{
 /*
  * OK, seems to be a valid pidfile from our child.  Check the
  * status line (this assumes a v10 or later server).
  */
 char	   *pmstatus = optlines[LOCK_FILE_LINE_PM_STATUS - 1];
-
+
 if (strcmp(pmstatus, PM_STATUS_READY) == 0 ||
 	strcmp(pmstatus, PM_STATUS_STANDBY) == 0)
 {

Re: Optimize planner memory consumption for huge arrays

2023-09-07 Thread Lepikhov Andrei

On Wed, Sep 6, 2023, at 8:09 PM, Ashutosh Bapat wrote:
> Hi Lepikhov,
>
> Thanks for using my patch and I am glad that you found it useful.
>
> On Mon, Sep 4, 2023 at 10:56 AM Lepikhov Andrei
>  wrote:
>>
>> Hi, hackers,
>>
>> Looking at the planner behaviour with the memory consumption patch [1], I 
>> figured out that arrays increase memory consumption by the optimizer 
>> significantly. See init.sql in attachment.
>> The point here is that the planner does small memory allocations for each 
>> element during estimation. As a result, it looks like the planner consumes 
>> about 250 bytes for each integer element.
>
> I guess the numbers you mentioned in init.sql are total memory used by
> the planner (as reported by the patch in the thread) when planning
> that query and not memory consumed by Const nodes themselves. Am I
> right? I think the measurements need to be explained better and also
> the realistic scenario you are trying to oprimize.

Yes, it is the total memory consumed by the planner - I used the numbers 
generated by your patch [1]. I had been increasing the number of elements in 
the array to exclude the memory consumed by the planner for other purposes. As 
you can see, the array with 1 element consumes 12kB of memory, 1E4 elements - 
2.6 MB. All of that memory increment is related to the only enlargement of this 
array. (2600-12)/10 = 260 bytes. So, I make a conclusion: each 4-byte element 
produces a consumption of 260 bytes of memory.
This scenario I obtained from the user complaint - they had strict restrictions 
on memory usage and were stuck in this unusual memory usage case.

> I guess, the reason you think that partitioning will increase the
> memory consumed is because each partition will have the clause
> translated for it. Selectivity estimation for each partition will
> create those many Const nodes and hence consume memory. Am I right?

Yes.

> Can you please measure the memory consumed with and without your
> patch.

Done. See test case and results in 'init_parts.sql' in attachment. Short 
summary below. I varied a number of elements from 1 to 1 and partitions 
from 1 to 100. As you can see, partitioning adds a lot of memory consumption by 
itself. But we see an effect from patch also.

master:
elems   1   1E1 1E2 1E3 1E4 
parts
1   28kB50kB0.3MB   2.5MB   25MB
10  45kB143kB   0.6MB   4.8MB   47MB
100 208kB   125kB   3.3MB   27MB274MB

patched:
elems   1   1E1 1E2 1E3 1E4
parts
1   28kB48kB0.25MB  2.2MB   22.8MB
10  44kB100kB   313kB   2.4MB   23.7MB
100 208kB   101kB   0.9MB   3.7MB   32.4MB

Just for comparison, without partitioning:
elems   1   1E1 1E2 1E3 1E4 
master: 12kB14kB37kB266kB   2.5MB
patched:12kB11.5kB  13kB24kB141kB

>> It is maybe not a problem most of the time. However, in the case of 
>> partitions, memory consumption multiplies by each partition. Such a corner 
>> case looks weird, but the fix is simple. So, why not?
>
> With vectorized operations becoming a norm these days, it's possible
> to have thousands of element in array of an ANY or IN clause. Also
> will be common to have thousands of partitions. But I think what we
> need to do here is to write a selectivity estimation function which
> takes an const array and return selectivity without requiring to
> create a Const node for each element.

Maybe you're right. Could you show any examples of vectorized usage of postgres 
to understand your idea more clearly?
Here I propose only quick simple solution. I don't think it would change the 
way of development.

>> The diff in the attachment is proof of concept showing how to reduce wasting 
>> of memory. Having benchmarked a bit, I didn't find any overhead.
>>
>
> You might want to include your benchmarking results as well.

Here is nothing interesting. pgbench TPS and planning time for the cases above 
doesn't change planning time.

[1] Report planning memory in EXPLAIN ANALYZE

-- 
Regards,
Andrei Lepikhov

init_parts.sql
Description: application/sql

Re: Eager page freeze criteria clarification

2023-09-07 Thread Andres Freund

Hi,

On 2023-09-06 10:46:03 -0400, Robert Haas wrote:
> On Fri, Sep 1, 2023 at 9:07 PM Peter Geoghegan  wrote:
> > Why not also avoid setting pages all-visible? The WAL records aren't
> > too much smaller than most freeze records these days -- 64 bytes on
> > most systems. I realize that the rules for FPIs are a bit different
> > when page-level checksums aren't enabled, but fundamentally it's the
> > same situation. No?
>
> It's an interesting point. AFAIK, whether or not page-level checksums
> are enabled doesn't really matter here.

I think it does matter:

void
visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
  uint8 flags)
...
recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, 
flags);

/*
 * If data checksums are enabled (or wal_log_hints=on), we
 * need to protect the heap page from being torn.
 *
 * If not, then we must *not* update the heap page's LSN. In
 * this case, the FPI for the heap page was omitted from the
 * WAL record inserted above, so it would be incorrect to
 * update the heap page's LSN.
 */
if (XLogHintBitIsNeeded())
{
PageheapPage = BufferGetPage(heapBuf);

PageSetLSN(heapPage, recptr);
}

and

/*
 * Perform XLogInsert for a heap-visible operation.  'block' is the block
 * being marked all-visible, and vm_buffer is the buffer containing the
 * corresponding visibility map block.  Both should have already been modified
 * and dirtied.
 *
 * snapshotConflictHorizon comes from the largest xmin on the page being
 * marked all-visible.  REDO routine uses it to generate recovery conflicts.
 *
 * If checksums or wal_log_hints are enabled, we may also generate a full-page
 * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
 * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
 * update the heap page's LSN.
 */
XLogRecPtr
log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
 TransactionId snapshotConflictHorizon, uint8 
vmflags)
{

I.e. setting an, otherwise unmodified, page all-visible won't trigger an FPI
if checksums are disabled, but will FPI with checksums enabled. I think that's
a substantial difference in WAL volume for insert-only workloads...

In contrast to that, freezing will almost always trigger an FPI (except for
empty pages, but we imo ought to stop setting empty pages all frozen [1]).

Yep, a quick experiment confirms that:

DROP TABLE IF EXISTS foo;
CREATE TABLE foo AS SELECT generate_series(1, 1000);
CHECKPOINT;
VACUUM (VERBOSE) foo;

checksums off: WAL usage: 44249 records, 3 full page images, 2632091 bytes
checksums on: WAL usage: 132748 records, 44253 full page images, 388758161 bytes

I initially was confused by the 3x wal records - I was expecting 2x. The
reason is that with checksums on, we emit an FPI during the visibility check,
which then triggers the current heuristic for opportunistic freezing. The
saving grace is that WAL volume is completely dominated by the FPIs:

Type   N  (%)  Record size  
(%) FPI size  (%)Combined size  (%)
   -  ---  ---  
---   ----  ---
XLOG/FPI_FOR_HINT  44253 ( 33.34)  2168397 
(  7.84)361094232 (100.00)363262629 ( 93.44)
Transaction/INVALIDATION   1 (  0.00)   78 
(  0.00)0 (  0.00)   78 (  0.00)
Standby/INVALIDATIONS  1 (  0.00)   90 
(  0.00)0 (  0.00)   90 (  0.00)
Heap2/FREEZE_PAGE  44248 ( 33.33) 22876120 
( 82.72)0 (  0.00) 22876120 (  5.88)
Heap2/VISIBLE  44248 ( 33.33)  2610642 
(  9.44)16384 (  0.00)  2627026 (  0.68)
Heap/INPLACE   1 (  0.00)  188 
(  0.00)0 (  0.00)  188 (  0.00)

Total 132752  27655515 
[7.11%] 361110616 [92.89%]388766131 [100%]

In realistic tables, where rows are wider than a single int, FPI_FOR_HINT
dominates even further, as the FREEZE_PAGE would be smaller if there weren't
226 tuples on each

Re: Impact of checkpointer during pg_upgrade

2023-09-07 Thread Michael Paquier

On Fri, Sep 08, 2023 at 09:14:59AM +0530, Amit Kapila wrote:
> On Fri, Sep 8, 2023 at 9:00 AM Zhijie Hou (Fujitsu)
>  wrote:
>>>  I
>>> mean that doing the latter is benefitial for the sake of any patch 
>>> committed and
>>> as a long-term method to rely on.
> 
> What is your worry here? Are you worried that unknowingly in the
> future we could add some other way to invalidate slots during upgrades
> that we won't be able to detect?

Exactly.  A safety belt would not hurt, especially if the belt added
is simple.  The idea of a backend side elog(ERROR) with
isBinaryUpgrade is tempting in the invalidation slot path.
--
Michael


signature.asc
Description: PGP signature

Re: persist logical slots to disk during shutdown checkpoint

2023-09-07 Thread Michael Paquier

On Fri, Sep 08, 2023 at 09:04:43AM +0530, Amit Kapila wrote:
> I have added something on these lines and also changed the other
> comment pointed out by you. In the passing, I made minor cosmetic
> changes as well.

+ * We can flush dirty replication slots at regular intervals by any
+ * background process like bgwriter but checkpoint is a convenient location.

I don't see a need to refer to the bgwriter here.  On the contrary, it
can be confusing as one could think that this flush happens in the
bgwriter, but that's not the case currently as only the checkpointer
does that.

+* We won't ensure that the slot is persisted after the

s/persisted/flushed/?  Or just refer to the "slot's data being
flushed", or refer to "the slot's data is made durable" instead?  The
use of "persist" here is confusing, because a slot's persistence
refers to it as being a *candidate* for flush (compared to an
ephemeral slot), and it does not refer to the *fact* of flushing its
data to make sure that it survives a crash.  In the context of this
patch, the LSN value tracked in the slot's in-memory data refers to
the last point where the slot's data has been flushed.

+/*
+ * This is used to track the last persisted confirmed_flush LSN value to
+ * detect any divergence in the in-memory and on-disk values for the same.
+ */

"This value tracks is the last LSN where this slot's data has been
flushed to disk.  This is used during a checkpoint shutdown to decide
if a logical slot's data should be forcibly flushed or not."

Hmm.  WAL senders are shut down *after* the checkpointer and *after*
the shutdown checkpoint is finished (see PM_SHUTDOWN and
PM_SHUTDOWN_2) because we want the WAL senders to acknowledge the
checkpoint record before shutting down the primary.  In order to limit
the number of records to work on after a restart, what this patch is
proposing is an improvement.  Perhaps it would be better to document
that we don't care about the potential concurrent activity of logical
WAL senders in this case and that the LSN we are saving at is a best
effort, meaning that last_saved_confirmed_flush is just here to reduce
the damage on a follow-up restart?  The comment added in
CheckPointReplicationSlots() goes in this direction, but perhaps this
potential concurrent activity should be mentioned?
--
Michael

signature.asc
Description: PGP signature

Suspicious redundant assignment in COPY FROM

2023-09-07 Thread Jingtang Zhang

Hi all, I was reading code of COPY FROM and I found some suspicious
redundant assignment for tuple descriptor and number of attributes. Is
it a behavior on purpose, or an accidently involved by the refactor in
c532d15? Patch is attached.


0001-Remove-redundant-assignment-in-copyfrom.patch
Description: Binary data

Re: old_snapshot_threshold bottleneck on replica

2023-09-07 Thread Peter Geoghegan

On Thu, Sep 7, 2023 at 8:49 PM Thomas Munro  wrote:
> The code moved around quite a few times over several commits and quite
> a lot since then, which is why I didn't go for straight revert, but
> clearly the manual approach risked missing things.

It's not a big deal, obviously.

> I think the
> attached removes all unused 'snapshot' arguments from AM-internal
> functions.  Checked by compiling with clang's -Wunused-parameters, and
> then searching for 'snapshot', and excluding the expected cases.

This looks right to me.

Thanks
-- 
Peter Geoghegan

Re: Document that server will start even if it's unable to open some TCP/IP ports

2023-09-07 Thread Nathan Bossart

On Thu, Sep 07, 2023 at 07:13:44PM -0400, Bruce Momjian wrote:
> On Thu, Sep  7, 2023 at 02:54:13PM -0700, Nathan Bossart wrote:
>> IMO the phrase "open a port" is kind of nonstandard.  I think we should say
>> something along the lines of
>> 
>>  If listen_addresses is not empty, the server will start only if it can
>>  listen on at least one of the specified addresses.  A warning will be
>>  emitted for any addresses that the server cannot listen on.
> 
> Good idea, updated patch attached.

I still think we should say "listen on an address" instead of "open a
port," but otherwise it LGTM.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Eager page freeze criteria clarification

2023-09-07 Thread Andres Freund

Hi,

On 2023-09-06 10:35:17 -0400, Robert Haas wrote:
> On Wed, Sep 6, 2023 at 1:09 AM Andres Freund  wrote:
> > Yea, it'd be useful to have a reasonably approximate wall clock time for the
> > last modification of a page. We just don't have infrastructure for 
> > determining
> > that. We'd need an LSN->time mapping (xid->time wouldn't be particularly
> > useful, I think).
> >
> > A very rough approximate modification time can be computed by assuming an 
> > even
> > rate of WAL generation, and using the LSN at the time of the last vacuum and
> > the time of the last vacuum, to compute the approximate age.
> >
> > For a while I thought that'd not give us anything that just using LSNs gives
> > us, but I think it might allow coming up with a better cutoff logic: Instead
> > of using a cutoff like "page LSN is older than 10% of the LSNs since the 
> > last
> > vacuum of the table", it would allow us to approximate "page has not been
> > modified in the last 15 seconds" or such.  I think that might help avoid
> > unnecessary freezing on tables with very frequent vacuuming.
> 
> Yes. I'm uncomfortable with the last-vacuum-LSN approach mostly
> because of the impact on very frequently vacuumed tables, and
> secondarily because of the impact on very infrequently vacuumed
> tables.
> 
> Downthread, I proposed using the RedoRecPtr of the latest checkpoint
> rather than the LSN of the previou vacuum. I still like that idea.

Assuming that "downthread" references
https://postgr.es/m/CA%2BTgmoYb670VcDFbekjn2YQOKF9a7e-kBFoj2WJF1HtH7YPaWQ%40mail.gmail.com
could you sketch out the logic you're imagining a bit more?

> It's a value that we already have, with no additional bookkeeping. It
> varies over a much narrower range than the interval between vacuums on
> a table. The vacuum interval could be as short as tens of seconds as
> long as years, while the checkpoint interval is almost always going to
> be between a few minutes at the low end and some tens of minutes at
> the high end, hours at the very most. That's quite appealing.

The reason I was thinking of using the "lsn at the end of the last vacuum", is
that it seems to be more adapative to the frequency of vacuuming.

One the one hand, if a table is rarely autovacuumed because it is huge,
(InsertLSN-RedoRecPtr) might or might not be representative of the workload
over a longer time. On the other hand, if a page in a frequently vacuumed
table has an LSN from around the last vacuum (or even before), it should be
frozen, but will appear to be recent in RedoRecPtr based heuristics?

Perhaps we can mix both approaches. We can use the LSN and time of the last
vacuum to establish an LSN->time mapping that's reasonably accurate for a
relation. For infrequently vacuumed tables we can use the time between
checkpoints to establish a *more aggressive* cutoff for freezing then what a
percent-of-time-since-last-vacuum appach would provide. If e.g. a table gets
vacuumed every 100 hours and checkpoint timeout is 1 hour, no realistic
percent-of-time-since-last-vacuum setting will allow freezing, as all dirty
pages will be too new. To allow freezing a decent proportion of those, we
could allow freezing pages that lived longer than ~20%
time-between-recent-checkpoints.

Hm, possibly stupid idea: What about using shared_buffers residency as a
factor? If vacuum had to read in a page to vacuum it, a) we would need read IO
to freeze it later, as we'll soon evict the page via the ringbuffer b)
non-residency indicates the page isn't constantly being modified?

> Also, I think the time between checkpoints actually matters here, because in
> some sense we're looking to get dirty, already-FPI'd pages frozen before
> they get written out, or before a new FPI becomes necessary, and checkpoints
> are one way for the first of those things to happen and the only way for the
> second one to happen.

Intuitively it seems easier to take care of that by checking if an FPI is
needed or not?

I guess we, eventually, might want to also freeze if an FPI would be
generated, if we are reasonably certain that the page isn't going to be
modified again soon (e.g. when table stats indicate effectively aren't any
updates / deletes). Although perhaps it's better to take care of that via
freeze-on-writeout style logic.

I suspect than even for freeze-on-writeout we might end up needing some
heuristics

Greetings,

Andres Freund

Re: Row pattern recognition

2023-09-07 Thread Tatsuo Ishii

Hi,

> Hello!
> 
>> (1) I completely changed the pattern matching engine so that it
>> performs backtracking. Now the engine evaluates all pattern elements
>> defined in PATTER against each row, saving matched pattern variables
>> in a string per row. For example if the pattern element A and B
>> evaluated to true, a string "AB" is created for current row.
> 
> If I understand correctly, this strategy assumes that one row's
> membership in a pattern variable is independent of the other rows'
> membership. But I don't think that's necessarily true:
> 
> DEFINE
>   A AS PREV(CLASSIFIER()) IS DISTINCT FROM 'A',
>   ...

But:

UP AS price > PREV(price)

also depends on previous row, no? Can you please elaborate how your
example could break current implementation? I cannot test it because
CLASSIFIER is not implemented yet.

>> See row_is_in_reduced_frame, search_str_set and search_str_set_recurse
>> in nodeWindowAggs.c for more details. For now I use a naive depth
>> first search and probably there is a lot of rooms for optimization
>> (for example rewriting it without using
>> recursion).
> 
> The depth-first match is doing a lot of subtle work here. For example, with
> 
> PATTERN ( A+ B+ )
> DEFINE A AS TRUE,
>B AS TRUE
> 
> (i.e. all rows match both variables), and three rows in the partition,
> our candidates will be tried in the order
> 
> aaa
> aab
> aba
> abb
> ...
> bbb
> 
> The only possible matches against our regex `^a+b+` are "aab" and "abb",
> and that order matches the preferment order, so it's fine. But it's easy
> to come up with a pattern where that's the wrong order, like
> 
> PATTERN ( A+ (B|A)+ )
> 
> Now "aaa" will be considered before "aab", which isn't correct.

Can you explain a little bit more? I think 'aaa' matches a regular
expression 'a+(b|a)+' and should be no problem before "aab" is
considered.

> Similarly, the assumption that we want to match the longest string only
> works because we don't allow alternation yet.

Can you please clarify more on this?

> Cool, I will give this piece some more thought. Do you mind if I try to
> add some more complicated pattern quantifiers to stress the
> architecture, or would you prefer to tackle that later? Just alternation
> by itself will open up a world of corner cases.

Do you mean you want to provide a better patch for the pattern
matching part? That will be helpfull. Because I am currently working
on the aggregation part and have no time to do it. However, the
aggregation work affects the v5 patch: it needs a refactoring. So can
you wait until I release v6 patch? I hope it will be released in two
weeks or so.

> On 8/9/23 01:41, Tatsuo Ishii wrote:
>> - I found a bug with pattern matching code. It creates a string for
>>   subsequent regular expression matching. It uses the initial letter
>>   of each define variable name. For example, if the varname is "foo",
>>   then "f" is used. Obviously this makes trouble if we have two or
>>   more variables starting with same "f" (e.g. "food"). To fix this, I
>>   assign [a-z] to each variable instead of its initial letter. However
>>   this way limits us not to have more than 26 variables. I hope 26 is
>>   enough for most use cases.
> 
> There are still plenty of alphanumerics left that could be assigned...
> 
> But I'm wondering if we might want to just implement the NFA directly?
> The current implementation's Cartesian explosion can probably be pruned
> aggressively, but replaying the entire regex match once for every
> backtracked step will still duplicate a lot of work.

Not sure if you mean implementing new regular expression engine
besides src/backend/regexp. I am afraid it's not a trivial work. The
current regexp code consists of over 10k lines. What do you think?

> I've attached another test case; it looks like last_value() is depending
> on some sort of side effect from either first_value() or nth_value(). I
> know the window frame itself is still under construction, so apologies
> if this is an expected failure.

Thanks. Fortunately current code which I am working passes the new
test. I will include it in the next (v6) patch.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: old_snapshot_threshold bottleneck on replica

2023-09-07 Thread Thomas Munro

On Fri, Sep 8, 2023 at 2:00 PM Thomas Munro  wrote:
> On Fri, Sep 8, 2023 at 1:53 PM Peter Geoghegan  wrote:
> > Thanks for working on this. Though I wonder why you didn't do
> > something closer to a straight revert of the feature. Why is nbtree
> > still passing around snapshots needlessly?

The code moved around quite a few times over several commits and quite
a lot since then, which is why I didn't go for straight revert, but
clearly the manual approach risked missing things.  I think the
attached removes all unused 'snapshot' arguments from AM-internal
functions.  Checked by compiling with clang's -Wunused-parameters, and
then searching for 'snapshot', and excluding the expected cases.

> > Also, why are there still many comments referencing the feature?
> > There's the one above should_attempt_truncation(), for example.
> > Another appears above init_toast_snapshot(). Are these just
> > oversights, or was it deliberate? You said something about retaining
> > vestiges.

Stray comments removed.
From 7d57de24c5a4e9ab3dacca9231a9893b909439f6 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 8 Sep 2023 14:29:13 +1200
Subject: [PATCH] Remove some more "snapshot too old" vestiges.

Commit f691f5b8 removed the logic, but left behind some now-useless
Snapshot arguments to various AM-internal functions, and missed a couple
of comments.

Reported-by: Peter Geoghegan 
Discussion: https://postgr.es/m/CAH2-Wznj9qSNXZ1P1uWTUD_FeaTezbUazb416EPwi4Qr_jR_6A%40mail.gmail.com

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 94a9759322..dbb83d80f8 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -2694,7 +2694,7 @@ bt_rootdescend(BtreeCheckState *state, IndexTuple itup)
 	 */
 	Assert(state->readonly && state->rootdescend);
 	exists = false;
-	stack = _bt_search(state->rel, NULL, key, , BT_READ, NULL);
+	stack = _bt_search(state->rel, NULL, key, , BT_READ);
 
 	if (BufferIsValid(lbuf))
 	{
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index d4fec654bb..a7538f32c2 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -169,7 +169,7 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
 	MemoryContext oldcxt = CurrentMemoryContext;
 	bool		autosummarize = BrinGetAutoSummarize(idxRel);
 
-	revmap = brinRevmapInitialize(idxRel, , NULL);
+	revmap = brinRevmapInitialize(idxRel, );
 
 	/*
 	 * origHeapBlk is the block number where the insertion occurred.  heapBlk
@@ -202,7 +202,7 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
 
 			lastPageTuple =
 brinGetTupleForHeapBlock(revmap, lastPageRange, , ,
-		 NULL, BUFFER_LOCK_SHARE, NULL);
+		 NULL, BUFFER_LOCK_SHARE);
 			if (!lastPageTuple)
 			{
 bool		recorded;
@@ -222,7 +222,7 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
 		}
 
 		brtup = brinGetTupleForHeapBlock(revmap, heapBlk, , ,
-		 NULL, BUFFER_LOCK_SHARE, NULL);
+		 NULL, BUFFER_LOCK_SHARE);
 
 		/* if range is unsummarized, there's nothing to do */
 		if (!brtup)
@@ -332,8 +332,7 @@ brinbeginscan(Relation r, int nkeys, int norderbys)
 	scan = RelationGetIndexScan(r, nkeys, norderbys);
 
 	opaque = palloc_object(BrinOpaque);
-	opaque->bo_rmAccess = brinRevmapInitialize(r, >bo_pagesPerRange,
-			   scan->xs_snapshot);
+	opaque->bo_rmAccess = brinRevmapInitialize(r, >bo_pagesPerRange);
 	opaque->bo_bdesc = brin_build_desc(r);
 	scan->opaque = opaque;
 
@@ -537,8 +536,7 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
 		MemoryContextResetAndDeleteChildren(perRangeCxt);
 
 		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, ,
-	   , , BUFFER_LOCK_SHARE,
-	   scan->xs_snapshot);
+	   , , BUFFER_LOCK_SHARE);
 		if (tup)
 		{
 			gottuple = true;
@@ -880,7 +878,7 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	/*
 	 * Initialize our state, including the deformed tuple state.
 	 */
-	revmap = brinRevmapInitialize(index, , NULL);
+	revmap = brinRevmapInitialize(index, );
 	state = initialize_brin_buildstate(index, revmap, pagesPerRange);
 
 	/*
@@ -1458,8 +1456,7 @@ summarize_range(IndexInfo *indexInfo, BrinBuildState *state, Relation heapRel,
 		 * the same.)
 		 */
 		phtup = brinGetTupleForHeapBlock(state->bs_rmAccess, heapBlk, ,
-		 , , BUFFER_LOCK_SHARE,
-		 NULL);
+		 , , BUFFER_LOCK_SHARE);
 		/* the placeholder tuple must exist */
 		if (phtup == NULL)
 			elog(ERROR, "missing placeholder tuple");
@@ -1496,7 +1493,7 @@ brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange,
 	Buffer		buf;
 	BlockNumber startBlk;
 
-	revmap = brinRevmapInitialize(index, , NULL);
+	revmap = brinRevmapInitialize(index, );
 
 	/* determine range of pages to process */
 	heapNumBlocks = RelationGetNumberOfBlocks(heapRel);
@@ -1537,7 +1534,7 @@ brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange,

Re: Impact of checkpointer during pg_upgrade

2023-09-07 Thread Amit Kapila

On Fri, Sep 8, 2023 at 9:00 AM Zhijie Hou (Fujitsu)
 wrote:
>
> On Friday, September 8, 2023 10:58 AM Michael Paquier  
> wrote:
> >
> > On Fri, Sep 08, 2023 at 08:18:14AM +0530, Amit Kapila wrote:
> > > This validation tries to ensure that we don't have any bugs/issues in
> > > our patch. It may be a candidate for assert but I feel even if we
> > > encounter any bug it is better to fix the bug.
> >
> > My guess is that an elog-like error is more adapted so as we are able to 
> > detect
> > problems in more cases, but perhaps an assert may be enough for the
> > buildfarm.  If there is anything in the backend that causes slots to become
> > invalidated, I agree that any issue causing that should be fixed, but isn't 
> > the
> > point different here?  Having a check at the end of an upgrade is a mean to
> > improve the detection rate of bugs where slots get invalidated, so it is 
> > actually
> > helpful to have one anyway?  I am not sure what is your strategy here, do 
> > you
> > mean to keep a check at the end of pg_upgrade only in the patch to validate 
> > it?
> > Or do you mean to add something in pg_upgrade as part of the feature?
> >

We can do whatever the consensus is but I feel such an end-check to
some extent is only helpful for the testing of a patch before the
commit but not otherwise.

> >  I
> > mean that doing the latter is benefitial for the sake of any patch 
> > committed and
> > as a long-term method to rely on.
>

What is your worry here? Are you worried that unknowingly in the
future we could add some other way to invalidate slots during upgrades
that we won't be able to detect?

> I feel adding a check at pg_upgrade may not 100% detect the slot invalidation
> if we check by querying the old cluster to get the slot info, because the
> invalidation can happen before the first time we fetch the info or after the
> last time we fetch the info(e.g. shutdown checkpoint could also invalidate
> slots)
>
> Personally, I think if we really want to add a check, it might be better to 
> put
> it at server side, Like: reporting an ERROR at server side when invalidating
> the slot(InvalidatePossiblyObsoleteSlot) if in upgrade mode.
>

I don't know whether we really need to be extra careful in this case,
the same could be said about other consistency checks on the old
cluster.

-- 
With Regards,
Amit Kapila.

Re: Impact of checkpointer during pg_upgrade

2023-09-07 Thread Michael Paquier

On Fri, Sep 08, 2023 at 03:30:23AM +, Zhijie Hou (Fujitsu) wrote:
> I feel adding a check at pg_upgrade may not 100% detect the slot invalidation
> if we check by querying the old cluster to get the slot info, because the
> invalidation can happen before the first time we fetch the info or after the
> last time we fetch the info(e.g. shutdown checkpoint could also invalidate
> slots)
> 
> Personally, I think if we really want to add a check, it might be better to 
> put
> it at server side, Like: reporting an ERROR at server side when invalidating
> the slot(InvalidatePossiblyObsoleteSlot) if in upgrade mode.

Yeah, that may be enough to paint one isBinaryUpgrade in the
invalidation path of the backend, with en elog(ERROR) as that would be
an unexpected state.

> Having said that I feel it's fine if we don't add this check as setting
> max_slot_wal_keep_size to -1 looks sufficient.

I would do both, FWIW, to stay on the safe side.  And both are
non-invasive.
--
Michael


signature.asc
Description: PGP signature

Re: persist logical slots to disk during shutdown checkpoint

2023-09-07 Thread Amit Kapila

On Thu, Sep 7, 2023 at 3:38 PM Ashutosh Bapat
 wrote:
>
> On Thu, Sep 7, 2023 at 1:43 PM Amit Kapila  wrote:
> >
> > On Thu, Sep 7, 2023 at 1:18 PM Michael Paquier  wrote:
> > >
> > > On Thu, Sep 07, 2023 at 11:56:28AM +0530, Amit Kapila wrote:
> > > > Thanks, the patch looks good to me as well.
> > >
> > > +   /* This is used to track the last saved confirmed_flush LSN value */
> > > +   XLogRecPtr  last_saved_confirmed_flush;
> > >
> > > This does not feel sufficient, as the comment explaining what this
> > > variable does uses the same terms as the variable name (aka it is the
> > > last save of the confirmed_lsn).  Why it it here and why it is useful?
> > > In which context and/or code paths is it used?  Okay, there are some
> > > explanations when saving a slot, restoring a slot or when a checkpoint
> > > processes slots, but it seems important to me to document more things
> > > in ReplicationSlot where this is defined.
> > >
> >
> > Hmm, this is quite debatable as different people feel differently
> > about this. The patch author kept it where it is now but in one of my
> > revisions, I rewrote and added it in the ReplicationSlot. Then
> > Ashutosh argued that it is better to keep it near where we are saving
> > the slot (aka where the patch has) [1]. Anyway, as I also preferred
> > the core part of the theory about this variable to be in
> > ReplicationSlot, so I'll move it there before commit unless someone
> > argues against it or has any other comments.
>
> If we want it to be in ReplicationSlot, I suggest we just say, - saves
> last confirmed flush LSN to detect any divergence in the in-memory and
> on-disk confirmed flush LSN cheaply.
>

I have added something on these lines and also changed the other
comment pointed out by you. In the passing, I made minor cosmetic
changes as well.

-- 
With Regards,
Amit Kapila.


v10-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patch
Description: Binary data

RE: Impact of checkpointer during pg_upgrade

2023-09-07 Thread Zhijie Hou (Fujitsu)

On Friday, September 8, 2023 10:58 AM Michael Paquier  
wrote:
> 
> On Fri, Sep 08, 2023 at 08:18:14AM +0530, Amit Kapila wrote:
> > This validation tries to ensure that we don't have any bugs/issues in
> > our patch. It may be a candidate for assert but I feel even if we
> > encounter any bug it is better to fix the bug.
> 
> My guess is that an elog-like error is more adapted so as we are able to 
> detect
> problems in more cases, but perhaps an assert may be enough for the
> buildfarm.  If there is anything in the backend that causes slots to become
> invalidated, I agree that any issue causing that should be fixed, but isn't 
> the
> point different here?  Having a check at the end of an upgrade is a mean to
> improve the detection rate of bugs where slots get invalidated, so it is 
> actually
> helpful to have one anyway?  I am not sure what is your strategy here, do you
> mean to keep a check at the end of pg_upgrade only in the patch to validate 
> it?
> Or do you mean to add something in pg_upgrade as part of the feature?  I
> mean that doing the latter is benefitial for the sake of any patch committed 
> and
> as a long-term method to rely on.

I feel adding a check at pg_upgrade may not 100% detect the slot invalidation
if we check by querying the old cluster to get the slot info, because the
invalidation can happen before the first time we fetch the info or after the
last time we fetch the info(e.g. shutdown checkpoint could also invalidate
slots)

Personally, I think if we really want to add a check, it might be better to put
it at server side, Like: reporting an ERROR at server side when invalidating
the slot(InvalidatePossiblyObsoleteSlot) if in upgrade mode.

Having said that I feel it's fine if we don't add this check as setting
max_slot_wal_keep_size to -1 looks sufficient.

Best Regards,
Hou zj

Re: Impact of checkpointer during pg_upgrade

2023-09-07 Thread Michael Paquier

On Fri, Sep 08, 2023 at 08:18:14AM +0530, Amit Kapila wrote:
> This validation tries to ensure that we don't have any bugs/issues in
> our patch. It may be a candidate for assert but I feel even if we
> encounter any bug it is better to fix the bug.

My guess is that an elog-like error is more adapted so as we are able
to detect problems in more cases, but perhaps an assert may be enough
for the buildfarm.  If there is anything in the backend that causes
slots to become invalidated, I agree that any issue causing that
should be fixed, but isn't the point different here?  Having a check
at the end of an upgrade is a mean to improve the detection rate of
bugs where slots get invalidated, so it is actually helpful to have
one anyway?  I am not sure what is your strategy here, do you mean to
keep a check at the end of pg_upgrade only in the patch to validate
it?  Or do you mean to add something in pg_upgrade as part of the
feature?  I mean that doing the latter is benefitial for the sake of
any patch committed and as a long-term method to rely on.
--
Michael


signature.asc
Description: PGP signature

Re: Impact of checkpointer during pg_upgrade

2023-09-07 Thread Amit Kapila

On Fri, Sep 8, 2023 at 5:37 AM Michael Paquier  wrote:
>
> On Thu, Sep 07, 2023 at 03:33:52PM +0530, Amit Kapila wrote:
> > I think if we just make max_slot_wal_keep_size to -1 that should be
> > sufficient to not let any slots get invalidated during upgrade. Do you
> > have anything else in mind?
>
> Forcing wal_keep_size while on it may be a good thing.
>

I had thought about it but couldn't come up with a reason to force
wal_keep_size for this purpose.

> > If we do (b) either via GUCs or IsBinaryUpgrade check we don't need to
> > do any of (a), (b), or (d). I feel that would be a minimal and
> > sufficient fix to prevent any side impact of checkpointer on slots
> > during an upgrade.
>
> I could get into the addition of a post-upgrade check to make sure
> that nothing got invalidated while the upgrade was running, FWIW.
>

This validation tries to ensure that we don't have any bugs/issues in
our patch. It may be a candidate for assert but I feel even if we
encounter any bug it is better to fix the bug.

-- 
With Regards,
Amit Kapila.

Re: old_snapshot_threshold bottleneck on replica

2023-09-07 Thread Thomas Munro

On Fri, Sep 8, 2023 at 1:53 PM Peter Geoghegan  wrote:
> On Tue, Sep 5, 2023 at 12:58 AM Thomas Munro  wrote:
> > I hope we get "snapshot too old" back one day.
>
> Thanks for working on this. Though I wonder why you didn't do
> something closer to a straight revert of the feature. Why is nbtree
> still passing around snapshots needlessly?
>
> Also, why are there still many comments referencing the feature?
> There's the one above should_attempt_truncation(), for example.
> Another appears above init_toast_snapshot(). Are these just
> oversights, or was it deliberate? You said something about retaining
> vestiges.

Oh.  Not intentional.  Looking now...

Re: old_snapshot_threshold bottleneck on replica

2023-09-07 Thread Peter Geoghegan

On Tue, Sep 5, 2023 at 12:58 AM Thomas Munro  wrote:
> I hope we get "snapshot too old" back one day.

Thanks for working on this. Though I wonder why you didn't do
something closer to a straight revert of the feature. Why is nbtree
still passing around snapshots needlessly?

Also, why are there still many comments referencing the feature?
There's the one above should_attempt_truncation(), for example.
Another appears above init_toast_snapshot(). Are these just
oversights, or was it deliberate? You said something about retaining
vestiges.

-- 
Peter Geoghegan

Re: SQL:2011 application time

2023-09-07 Thread jian he

On Sat, Sep 2, 2023 at 5:58 AM Paul Jungwirth
 wrote:
>
>
> I don't quite understand this part:
>
>  >> Also, your implementation
>  >> requires at least one non-overlaps column, which also seems like a
>  >> confusing restriction.
>
> That's just a regular non-temporal constraint. Right? If I'm missing
> something let me know.
>

for a range primary key, is it fine to expect it to be unique, not
null and also not overlap? (i am not sure how hard to implement it).

-
quote from 7IWD2-02-Foundation-2011-12.pdf. 4.18.3.2 Unique
constraints, page 97 of 1483.

4.18.3.2 Unique constraints In addition to the components of every
table constraint descriptor, a unique constraint descriptor includes:
— An indication of whether it was defined with PRIMARY KEY or UNIQUE.
— The names and positions of the unique columns specified in the

 — If  is specified, then the name of
the period specified.

If the table descriptor for base table T includes a unique constraint
descriptor indicating that the unique constraint was defined with
PRIMARY KEY, then the columns of that unique constraint constitute the
primary key of T. A table that has a primary key cannot have a proper
supertable.
A unique constraint that does not include a  on a table T is satisfied if and only if there do not
exist two rows R1 and R2 of T such that R1 and R2 have the same
non-null values in the unique columns. If a unique constraint UC on a
table T includes a  WOS, then let
 ATPN be the contained in WOS. UC is
satisfied if and only if there do not exist two rows R1 and R2 of T
such that R1 and R2 have the same non-null values in the unique
columns and the ATPN period values of R1 and R2 overlap. In addition,
if the unique constraint was defined with PRIMARY KEY, then it
requires that none of the values in the specified column or columns be
a null value.
-
based on the above, the unique constraint does not specify that the
column list must be range type. UNIQUE (a, c WITHOUT OVERLAPS).
Here column "a" can be a range type (that have overlap property) and
can be not.
In fact, many of your primary key, foreign key regess test using
something like '[11,11]' (which make it more easy to understand),
which in logic is a non-range usage.
So UNIQUE (a, c WITHOUT OVERLAPS), column "a" be a non-range data type
does make sense?

Re: Simple CustomScan example

2023-09-07 Thread Michael Paquier

On Thu, Sep 07, 2023 at 06:29:27PM +0200, Drouvot, Bertrand wrote:
> You may find the one in this repo:
> https://github.com/bdrouvot/pg_directpaths simple enough.

I'll repeat a remark I have made exactly yesterday on a different
thread: having a test module for custom scans in src/test/modules/ to
test these APIs and have a usable template would be welcome :D
--
Michael


signature.asc
Description: PGP signature

Re: [BackendXidGetPid] only access allProcs when xid matches

2023-09-07 Thread Michael Paquier

On Thu, Sep 07, 2023 at 04:34:20PM +0530, Ashutosh Bapat wrote:
> I would still love to see some numbers. It's not that hard. Either
> using my micro benchmark or Michael's. We may or may not see an
> improvement. But at least we tried. But Michael feels otherwise,
> changing it to RoC is fine.

I have hardcoded a small SQL function that calls BackendXidGetPid() in
a tight loop a few hundred millions of times, and I see no difference
in runtime between HEAD and the patch under -O2.  The argument about
readability still applies for me, so applied.
--
Michael


signature.asc
Description: PGP signature

Re: psql help message contains excessive indentations

2023-09-07 Thread Kyotaro Horiguchi

At Thu, 7 Sep 2023 13:06:35 +0200, Alvaro Herrera  
wrote in 
> On 2023-Sep-07, Yugo NAGATA wrote:
> > +   HELP0("  \\watch [[i=]SEC] [c=N] [m=MIN]\n"
> > + " execute query every SEC seconds, up 
> > to N times\n"
> > + " stop if less than MIN rows are 
> > returned\n");
> 
> Yeah, this looks right to me -- the whole help entry as a single
> translatable unit, instead of three separately translatable lines.

+1

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2023-09-07 Thread Thomas Munro

On Fri, Sep 8, 2023 at 11:48 AM Michael Paquier  wrote:
> On Thu, Sep 07, 2023 at 01:44:11PM +0200, Daniel Gustafsson wrote:
> > Sadly I wouldn't be the least bit surprised if there are 1.0.2 users on 
> > modern
> > operating systems, especially given its LTS status (which OpenSSL hasn't 
> > even
> > capped but sells by "for as long as it remains commercially viable to do so"
> > basis).
>
> Yes, I would not be surprised by that either.  TBH, I don't like much
> the fact that we rely on OpenSSL to decide when we should cut it.
> Particularly since all the changes given to it after it got EOL'd are
> close source at this point.

Right, just like there are people running ancient PostgreSQL and
buying support.  That's not relevant to PostgreSQL 17 IMHO, which
should target contemporary distributions.

BTW I'm not asking anyone to do anything here, I just didn't want to
allow the "RHEL ELS" and "closed source OpenSSL [sic]" theories
mentioned on this thread to go unchallenged.  Next time I'm trying to
clean up some other cruft in our tree, I don't want this thread to be
cited as evidence that that is our policy, because I don't buy it, it
doesn't make any sense.  Of course there is someone, somewhere selling
support for anything you can think of.  There are companies that
support VAXen.  There's a company in Irvine, California selling and
supporting modern drop-in replacements for PDP 11s for production use.

Re: Correct the documentation for work_mem

2023-09-07 Thread Bruce Momjian

On Fri, Apr 21, 2023 at 01:15:01PM -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > On 21.04.23 16:28, Imseih (AWS), Sami wrote:
> >> I suggest a small doc fix:
> >> “Note that for a complex query, several sort or hash operations might be 
> >> running simultaneously;”
> 
> > Here is a discussion of these terms: 
> > https://takuti.me/note/parallel-vs-concurrent/
> 
> > I think "concurrently" is the correct word here.
> 
> Probably, but it'd do little to remove the confusion Sami is on about,
> especially since the next sentence uses "concurrently" to describe the
> other case.  I think we need a more thorough rewording, perhaps like
> 
> -   Note that for a complex query, several sort or hash operations might 
> be
> -   running in parallel; each operation will generally be allowed
> +   Note that a complex query may include several sort or hash
> +   operations; each such operation will generally be allowed
> to use as much memory as this value specifies before it starts
> to write data into temporary files.  Also, several running
> sessions could be doing such operations concurrently.
> 
> I also find this wording a bit further down to be poor:
> 
> Hash-based operations are generally more sensitive to memory
> availability than equivalent sort-based operations.  The
> memory available for hash tables is computed by multiplying
> work_mem by
> hash_mem_multiplier.  This makes it
> 
> I think "available" is not le mot juste, and it's also unclear from
> this whether we're speaking of the per-hash-table limit or some
> (nonexistent) overall limit.  How about
> 
> -   memory available for hash tables is computed by multiplying
> +   memory limit for a hash table is computed by multiplying

Adjusted patch attached.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6bc1b215db..45d1bb4b7b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1829,9 +1829,10 @@ include_dir 'conf.d'
 (such as a sort or hash table) before writing to temporary disk files.
 If this value is specified without units, it is taken as kilobytes.
 The default value is four megabytes (4MB).
-Note that for a complex query, several sort or hash operations might be
-running in parallel; each operation will generally be allowed
-to use as much memory as this value specifies before it starts
+Note that a complex query might perform several sort or hash
+operations at the same time, with each operation generally being
+allowed to use as much memory as this value specifies before
+it starts
 to write data into temporary files.  Also, several running
 sessions could be doing such operations concurrently.
 Therefore, the total memory used could be many times the value
@@ -1845,7 +1846,7 @@ include_dir 'conf.d'

 Hash-based operations are generally more sensitive to memory
 availability than equivalent sort-based operations.  The
-memory available for hash tables is computed by multiplying
+memory limit for a hash table is computed by multiplying
 work_mem by
 hash_mem_multiplier.  This makes it
 possible for hash-based operations to use an amount of memory

Re: Impact of checkpointer during pg_upgrade

2023-09-07 Thread Michael Paquier

On Thu, Sep 07, 2023 at 03:33:52PM +0530, Amit Kapila wrote:
> I think if we just make max_slot_wal_keep_size to -1 that should be
> sufficient to not let any slots get invalidated during upgrade. Do you
> have anything else in mind?

Forcing wal_keep_size while on it may be a good thing.

> If we do (b) either via GUCs or IsBinaryUpgrade check we don't need to
> do any of (a), (b), or (d). I feel that would be a minimal and
> sufficient fix to prevent any side impact of checkpointer on slots
> during an upgrade.

I could get into the addition of a post-upgrade check to make sure
that nothing got invalidated while the upgrade was running, FWIW.
--
Michael


signature.asc
Description: PGP signature

Re: Row pattern recognition

2023-09-07 Thread Jacob Champion

Hello!

> (1) I completely changed the pattern matching engine so that it
> performs backtracking. Now the engine evaluates all pattern elements
> defined in PATTER against each row, saving matched pattern variables
> in a string per row. For example if the pattern element A and B
> evaluated to true, a string "AB" is created for current row.

If I understand correctly, this strategy assumes that one row's
membership in a pattern variable is independent of the other rows'
membership. But I don't think that's necessarily true:

DEFINE
  A AS PREV(CLASSIFIER()) IS DISTINCT FROM 'A',
  ...

> See row_is_in_reduced_frame, search_str_set and search_str_set_recurse
> in nodeWindowAggs.c for more details. For now I use a naive depth
> first search and probably there is a lot of rooms for optimization
> (for example rewriting it without using
> recursion).

The depth-first match is doing a lot of subtle work here. For example, with

PATTERN ( A+ B+ )
DEFINE A AS TRUE,
   B AS TRUE

(i.e. all rows match both variables), and three rows in the partition,
our candidates will be tried in the order

aaa
aab
aba
abb
...
bbb

The only possible matches against our regex `^a+b+` are "aab" and "abb",
and that order matches the preferment order, so it's fine. But it's easy
to come up with a pattern where that's the wrong order, like

PATTERN ( A+ (B|A)+ )

Now "aaa" will be considered before "aab", which isn't correct.

Similarly, the assumption that we want to match the longest string only
works because we don't allow alternation yet.

> Suggestions/patches are welcome.

Cool, I will give this piece some more thought. Do you mind if I try to
add some more complicated pattern quantifiers to stress the
architecture, or would you prefer to tackle that later? Just alternation
by itself will open up a world of corner cases.

> With the new engine, cases above do not fail anymore. See new
> regression test cases. Thanks for providing valuable test cases!

You're very welcome!

On 8/9/23 01:41, Tatsuo Ishii wrote:
> - I found a bug with pattern matching code. It creates a string for
>   subsequent regular expression matching. It uses the initial letter
>   of each define variable name. For example, if the varname is "foo",
>   then "f" is used. Obviously this makes trouble if we have two or
>   more variables starting with same "f" (e.g. "food"). To fix this, I
>   assign [a-z] to each variable instead of its initial letter. However
>   this way limits us not to have more than 26 variables. I hope 26 is
>   enough for most use cases.

There are still plenty of alphanumerics left that could be assigned...

But I'm wondering if we might want to just implement the NFA directly?
The current implementation's Cartesian explosion can probably be pruned
aggressively, but replaying the entire regex match once for every
backtracked step will still duplicate a lot of work.

--

I've attached another test case; it looks like last_value() is depending
on some sort of side effect from either first_value() or nth_value(). I
know the window frame itself is still under construction, so apologies
if this is an expected failure.

Thanks!
--Jacobdiff --git a/src/test/regress/expected/rpr.out 
b/src/test/regress/expected/rpr.out
index 340ccf242c..bd35e8389e 100644
--- a/src/test/regress/expected/rpr.out
+++ b/src/test/regress/expected/rpr.out
@@ -89,6 +89,44 @@ SELECT company, tdate, price, first_value(price) OVER w, 
last_value(price) OVER
  company2 | 07-10-2023 |  1300 | || 
 (20 rows)

+-- last_value() should remain consistent
+SELECT company, tdate, price, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (START UP+ DOWN+)
+ DEFINE
+  START AS TRUE,
+  UP AS price > PREV(price),
+  DOWN AS price < PREV(price)
+);
+ company  |   tdate| price | last_value 
+--++---+
+ company1 | 07-01-2023 |   100 |140
+ company1 | 07-02-2023 |   200 |   
+ company1 | 07-03-2023 |   150 |   
+ company1 | 07-04-2023 |   140 |   
+ company1 | 07-05-2023 |   150 |   
+ company1 | 07-06-2023 |90 |120
+ company1 | 07-07-2023 |   110 |   
+ company1 | 07-08-2023 |   130 |   
+ company1 | 07-09-2023 |   120 |   
+ company1 | 07-10-2023 |   130 |   
+ company2 | 07-01-2023 |50 |   1400
+ company2 | 07-02-2023 |  2000 |   
+ company2 | 07-03-2023 |  1500 |   
+ company2 | 07-04-2023 |  1400 |   
+ company2 | 07-05-2023 |  1500 |   
+ company2 | 07-06-2023 |60 |   1200
+ company2 | 07-07-2023 |  1100 |   
+ company2 | 07-08-2023 |  1300 |   
+ company2 | 07-09-2023 |  1200 |   
+ company2 | 07-10-2023 |  1300 |   
+(20 rows)
+
 -- omit "START" in DEFINE but it is ok because "START

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2023-09-07 Thread Michael Paquier

On Thu, Sep 07, 2023 at 01:44:11PM +0200, Daniel Gustafsson wrote:
> Sadly I wouldn't be the least bit surprised if there are 1.0.2 users on modern
> operating systems, especially given its LTS status (which OpenSSL hasn't even
> capped but sells by "for as long as it remains commercially viable to do so"
> basis).

Yes, I would not be surprised by that either.  TBH, I don't like much
the fact that we rely on OpenSSL to decide when we should cut it.  
Particularly since all the changes given to it after it got EOL'd are
close source at this point.

> That being said, my gut feeling is that 3.x has gotten pretty good
> market penetration.

Perhaps.
--
Michael

signature.asc
Description: PGP signature

Re: Document that server will start even if it's unable to open some TCP/IP ports

2023-09-07 Thread Bruce Momjian

On Thu, Sep  7, 2023 at 02:54:13PM -0700, Nathan Bossart wrote:
> Thanks for picking this up.
> 
> On Thu, Sep 07, 2023 at 03:33:57PM -0400, Bruce Momjian wrote:
> >   The default value is  > class="systemname">localhost,
> >   which allows only local TCP/IP loopback 
> > connections to be
> > - made.  While client authentication ( > + made.  If listen_addresses is not empty,
> > + the server will start if it can open a port
> > + on at least one TCP/IP address.  A warning will be emitted for
> > + any TCP/IP address which cannot be opened.
> 
> I think we should move this sentence to before the ѕentence about the
> default value.  That way, "If the list is empty, ..." is immediately
> followed by "If the list is not empty, ..."
> 
> IMO the phrase "open a port" is kind of nonstandard.  I think we should say
> something along the lines of
> 
>   If listen_addresses is not empty, the server will start only if it can
>   listen on at least one of the specified addresses.  A warning will be
>   emitted for any addresses that the server cannot listen on.

Good idea, updated patch attached.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6bc1b215db..eaf2a42261 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -656,10 +656,16 @@ include_dir 'conf.d'
  :: allows listening for all IPv6 addresses.
  If the list is empty, the server does not listen on any IP interface
  at all, in which case only Unix-domain sockets can be used to connect
- to it.
+ to it.  If the list is not empty, the server will start if it can
+ open a port on at least one TCP/IP address.
+ A warning will be emitted for any TCP/IP address which cannot
+ be opened.
  The default value is localhost,
  which allows only local TCP/IP loopback connections to be
- made.  While client authentication (
+   
+ While client authentication () allows fine-grained control
  over who can access the server, listen_addresses
  controls which interfaces accept connection attempts, which

Re: Add 'worker_type' to pg_stat_subscription

2023-09-07 Thread Nathan Bossart

On Thu, Sep 07, 2023 at 12:36:29PM +1200, Peter Smith wrote:
> Modified as suggested. PSA v3.

Thanks.  I've attached v4 with a couple of small changes.  Notably, I've
moved the worker_type column to before the pid column, as it felt more
natural to me to keep the PID columns together.  I've also added an
elog(ERROR, ...) for WORKERTYPE_UNKNOWN, as that seems to be the standard
practice elsewhere.  That being said, are we absolutely confident that this
really cannot happen?  I haven't looked too closely, but if there is a
small race or something that could cause us to see a worker with this type,
perhaps it would be better to actually list it as "unknown".  Thoughts?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 85b826752b1b10176809918975ce7b495dadd3db Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 7 Sep 2023 15:24:23 -0700
Subject: [PATCH v4 1/1] add worker type to pg_stat_subscription

---
 doc/src/sgml/monitoring.sgml   | 11 +++
 src/backend/catalog/system_views.sql   |  1 +
 src/backend/replication/logical/launcher.c | 18 +-
 src/include/catalog/pg_proc.dat|  6 +++---
 src/test/regress/expected/rules.out|  3 ++-
 5 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 4ff415d6a0..17f9323f23 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1993,6 +1993,17 @@ description | Waiting for a newly initialized WAL file to reach durable storage
   
  
 
+ 
+  
+   worker_type text
+  
+  
+   Type of the subscription worker process.  Possible types are
+   apply, parallel, and
+   table synchronization.
+  
+ 
+
  
   
pid integer
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 77b06e2a7a..fcb14976c0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -949,6 +949,7 @@ CREATE VIEW pg_stat_subscription AS
 SELECT
 su.oid AS subid,
 su.subname,
+st.worker_type,
 st.pid,
 st.leader_pid,
 st.relid,
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 7882fc91ce..501910b445 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -1278,7 +1278,7 @@ GetLeaderApplyWorkerPid(pid_t pid)
 Datum
 pg_stat_get_subscription(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_SUBSCRIPTION_COLS	9
+#define PG_STAT_GET_SUBSCRIPTION_COLS	10
 	Oid			subid = PG_ARGISNULL(0) ? InvalidOid : PG_GETARG_OID(0);
 	int			i;
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
@@ -1339,6 +1339,22 @@ pg_stat_get_subscription(PG_FUNCTION_ARGS)
 		else
 			values[8] = TimestampTzGetDatum(worker.reply_time);
 
+		switch (worker.type)
+		{
+			case WORKERTYPE_APPLY:
+values[9] = CStringGetTextDatum("apply");
+break;
+			case WORKERTYPE_PARALLEL_APPLY:
+values[9] = CStringGetTextDatum("parallel apply");
+break;
+			case WORKERTYPE_TABLESYNC:
+values[9] = CStringGetTextDatum("table synchronization");
+break;
+			case WORKERTYPE_UNKNOWN:
+/* Should never happen. */
+elog(ERROR, "unknown worker type");
+		}
+
 		tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
 			 values, nulls);
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9805bc6118..f0b7b9cbd8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5484,9 +5484,9 @@
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'oid',
-  proallargtypes => '{oid,oid,oid,int4,int4,pg_lsn,timestamptz,timestamptz,pg_lsn,timestamptz}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{subid,subid,relid,pid,leader_pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time}',
+  proallargtypes => '{oid,oid,oid,int4,int4,pg_lsn,timestamptz,timestamptz,pg_lsn,timestamptz,text}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{subid,subid,relid,pid,leader_pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time,worker_type}',
   prosrc => 'pg_stat_get_subscription' },
 { oid => '2026', descr => 'statistics: current backend PID',
   proname => 'pg_backend_pid', provolatile => 's', proparallel => 'r',
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 5058be5411..2c60400ade 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2118,6 +2118,7 @@ pg_stat_ssl| SELECT pid,
   WHERE (client_port IS NOT NULL);
 pg_stat_subscription| SELECT su.oid AS subid,
 su.subname,
+

Re: Eliminate redundant tuple visibility check in vacuum

2023-09-07 Thread Melanie Plageman

On Thu, Sep 7, 2023 at 3:30 PM Robert Haas  wrote:
>
> On Thu, Sep 7, 2023 at 3:10 PM Melanie Plageman
>  wrote:
> > I can fix it by changing the type of PruneResult->off_loc to be an
> > OffsetNumber pointer. This does mean that PruneResult will be
> > initialized partially by heap_page_prune() callers. I wonder if you
> > think that undermines the case for making a new struct.
>
> I think that it undermines the case for including that particular
> argument in the struct. It's not really a Prune*Result* if the caller
> initializes it in part. It seems fairly reasonable to still have a
> PruneResult struct for the other stuff, though, at least to me. How do
> you see it?

Yes, I think off_loc probably didn't belong in PruneResult to begin with.
It is inherently not a result of pruning since it will only be used in
the event that pruning doesn't complete (it errors out).

In the attached v4 patch set, I have both PruneResult and off_loc as
parameters to heap_page_prune(). I have also added a separate commit
which adds comments both above heap_page_prune()'s call site in
lazy_scan_prune() and in the heap_page_prune() function header
clarifying the various points we discussed.

> > I still want to eliminate the NULL check of off_loc in
> > heap_page_prune() by making it a required parameter. Even though
> > on-access pruning does not have an error callback mechanism which uses
> > the offset, it seems better to have a pointless local variable in
> > heap_page_prune_opt() than to do a NULL check for every tuple pruned.
>
> It doesn't seem important to me unless it improves performance. If
> it's just stylistic, I don't object, but I also don't really see a
> reason to care.

I did some performance testing but, as expected, I couldn't concoct a
scenario where the overhead was noticeable in a profile. So much else
is happening in that code, the NULL check basically doesn't matter
(even though it isn't optimized out).

I mostly wanted to remove the NULL checks because I found them
distracting (so, a stylistic complaint). However, upon further
reflection, I actually think it is better if heap_page_prune_opt()
passes NULL. heap_page_prune() has no error callback mechanism that
could use it, and passing a valid value implies otherwise. Also, as
you said, off_loc will always be InvalidOffsetNumber when
heap_page_prune() returns normally, so having heap_page_prune_opt()
pass a dummy value might actually be more confusing for future
programmers.

> > > I haven't run the regression suite with 0001 applied. I'm guessing
> > > that you did, and that they passed, which if true means that we don't
> > > have a test for this, which is a shame, although writing such a test
> > > might be a bit tricky. If there is a test case for this and you didn't
> > > run it, woops.
> >
> > There is no test coverage for the vacuum error callback context
> > currently (tests passed for me). I looked into how we might add such a
> > test. First, I investigated what kind of errors might occur during
> > heap_prune_satisfies_vacuum(). Some of the multixact functions called
> > by HeapTupleSatisfiesVacuumHorizon() could error out -- for example
> > GetMultiXactIdMembers(). It seems difficult to trigger the errors in
> > GetMultiXactIdMembers(), as we would have to cause wraparound. It
> > would be even more difficult to ensure that we hit those specific
> > errors from a call stack containing heap_prune_satisfies_vacuum(). As
> > such, I'm not sure I can think of a way to protect future developers
> > from repeating my mistake--apart from improving the comment like you
> > mentioned.
>
> 004_verify_heapam.pl has some tests that intentionally corrupt pages
> and then use pg_amcheck to detect the corruption. Such an approach
> could probably also be used here. But it's a pain to get such tests
> right, because any change to the page format due to endianness,
> different block size, or whatever can make carefully-written tests go
> boom.

Cool! I hadn't examined how these tests work until now. I took a stab
at writing a test in the existing 0004_verify_heapam.pl. The simplest
thing would be if we could just vacuum the corrupted table ("test")
after running pg_amcheck and compare the error context to our
expectation. I found that this didn't work, though. In an assert
build, vacuum trips an assert before it hits an error while vacuuming
a corrupted tuple in the "test" table. There might be a way of
modifying the existing test code to avoid this, but I tried the next
easiest thing -- corrupting a tuple in the other existing table in the
file, "junk". This is possible to do, but we have to repeat a lot of
the setup code done for the "test" table to get the line pointer array
and loop through and corrupt a tuple. In order to do this well, I
would want to refactor some of the boilerplate into a function. There
are other fiddly bits as well that I needed to consider. I think a
test like this could be useful coverage of the some of the possible
errors

Re: Possibility to disable `ALTER SYSTEM`

2023-09-07 Thread Joe Conway


On 9/7/23 15:51, Gabriele Bartolini wrote:
I would like to propose a patch that allows administrators to disable 
`ALTER SYSTEM` via either a runt-time option to pass to the Postgres 
server process at startup (e.g. `--disable-alter-system=true`, false by 
default) or a new GUC (or even both), without changing the current 
default method of the server.


Without trying to debate the particulars, a huge +1 for the concept of 
allowing ALTER SYSTEM to be disabled. FWIW I would vote the same for 
control over COPY PROGRAM.


Not coincidentally both concepts were built into set_user: 
https://github.com/pgaudit/set_user


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Document that server will start even if it's unable to open some TCP/IP ports

2023-09-07 Thread Nathan Bossart

Thanks for picking this up.

On Thu, Sep 07, 2023 at 03:33:57PM -0400, Bruce Momjian wrote:
>   The default value is  class="systemname">localhost,
>   which allows only local TCP/IP loopback connections 
> to be
> - made.  While client authentication ( + made.  If listen_addresses is not empty,
> + the server will start if it can open a port
> + on at least one TCP/IP address.  A warning will be emitted for
> + any TCP/IP address which cannot be opened.

I think we should move this sentence to before the ѕentence about the
default value.  That way, "If the list is empty, ..." is immediately
followed by "If the list is not empty, ..."

IMO the phrase "open a port" is kind of nonstandard.  I think we should say
something along the lines of

If listen_addresses is not empty, the server will start only if it can
listen on at least one of the specified addresses.  A warning will be
emitted for any addresses that the server cannot listen on.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Possibility to disable `ALTER SYSTEM`

2023-09-07 Thread Tom Lane

Gabriele Bartolini  writes:
> I would like to propose a patch that allows administrators to disable
> `ALTER SYSTEM` via either a runt-time option to pass to the Postgres server
> process at startup (e.g. `--disable-alter-system=true`, false by default)
> or a new GUC (or even both), without changing the current default method of
> the server.

ALTER SYSTEM is already heavily restricted.  I don't think we need random
kluges added to the permissions system.  I especially don't believe in
kluges to the effect of "superuser doesn't have all permissions anymore".

If you nonetheless feel that that's a good idea for your use case,
you can implement the restriction with an event trigger or the like.

regards, tom lane

Re: Possibility to disable `ALTER SYSTEM`

2023-09-07 Thread Gabriele Bartolini

Hi Joe,

On Thu, 7 Sept 2023 at 21:57, Joe Conway  wrote:

> Without trying to debate the particulars, a huge +1 for the concept of
> allowing ALTER SYSTEM to be disabled. FWIW I would vote the same for
> control over COPY PROGRAM.
>

Oh ... another one of my favourite security friendly features! :)

That sounds like a good idea to me.

Thanks,
Gabriele
-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com

Possibility to disable `ALTER SYSTEM`

2023-09-07 Thread Gabriele Bartolini

Hi everyone,

I would like to propose a patch that allows administrators to disable
`ALTER SYSTEM` via either a runt-time option to pass to the Postgres server
process at startup (e.g. `--disable-alter-system=true`, false by default)
or a new GUC (or even both), without changing the current default method of
the server.

The main reason is that this would help improve the “security by default”
posture of Postgres in a Kubernetes/Cloud Native environment - and, in
general, in any environment on VMs/bare metal behind a configuration
management system in which changes should only be made in a declarative way
and versioned like Ansible Tower, to cite one.

Below you find some background information and the longer story behind this
proposal.

Sticking to the Kubernetes use case, I am primarily speaking on behalf of
the CloudNativePG open source operator (cloudnative-pg.io, of which I am
one of the maintainers). However, I am sure that this option could benefit
any operator for Postgres - an operator is the most common and recommended
way to run a complex application like a PostgreSQL database management
system inside Kubernetes.

In this case, the state of a PostgreSQL cluster (for example its number of
replicas, configuration, storage, etc.) is defined in a Custom Resource
Definition in the form of configuration, typically YAML, and the operator
works with Kubernetes to ensure that, at any moment, the requested Postgres
cluster matches the observed one. This is a very basic example in
CloudNativePG:
https://cloudnative-pg.io/documentation/current/samples/cluster-example.yaml

As I was mentioning above, in a Cloud Native environment it is expected
that workloads are secure by default. Without going into much detail, many
decisions have been made in that direction by operators for Postgres,
including CloudNativePG. The goal of this proposal is to provide a way to
ensure that changes to the PostgreSQL configuration in a Kubernetes
controlled Postgres cluster are allowed only through the Kubernetes API.

Basically, if you want to change an option for PostgreSQL, you need to
change the desired state in the Kubernetes resource, then Kubernetes will
converge (through the operator). In simple words, it’s like empowering the
operator to impersonate the PostgreSQL superuser.

However, given that we cannot force this use case, there could be roles
with the login+superuser privileges connecting to the PostgreSQL instance
and potentially “interfering” with the requested state defined in the
configuration by imperatively running “ALTER SYSTEM” commands.

For example: CloudNativePG has a fixed value for some GUCs in order to
manage a full HA cluster, including SSL, log, some WAL and replication
settings. While the operator eventually reconciles those settings, even the
temporary change of those settings in a cluster might be harmful. Think for
example of a user that, through `ALTER SYSTEM`, tries to change WAL level
to minimal, or change the setting of the log (we require CSV), potentially
creating issues to the underlying instance and cluster (potentially leaving
it in an unrecoverable state in the case of other more invasive GUCS).

At the moment, a possible workaround is that `ALTER SYSTEM` can be blocked
by making the postgresql.auto.conf read only, but the returned message is
misleading and that’s certainly bad user experience (which is very
important in a cloud native environment):

```
postgres=# ALTER SYSTEM SET wal_level TO minimal;
ERROR:  could not open file "postgresql.auto.conf": Permission denied
```

For this reason, I would like to propose the option to be given to the
postgres process at startup, in order to be as less invasive as possible
(the operator could then start Postgres requesting `ALTER SYSTEM` to be
disabled). That’d be my preference at the moment, if possible.

Alternatively, or in addition, the introduction of a GUC to disable `ALTER
SYSTEM` altogether. This enables tuning this setting through configuration
at the Kubernetes level, only if the operators require it - without
damaging the rest of the users.

Before I start writing any lines of code and propose a patch, I would like
first to understand if there’s room for it.

Thanks for your attention and … looking forward to your feedback!

Ciao,
Gabriele
--
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com

Re: Document that server will start even if it's unable to open some TCP/IP ports

2023-09-07 Thread Bruce Momjian

On Mon, Jun 12, 2023 at 11:57:45PM -0700, Gurjeet Singh wrote:
> On Mon, Jun 12, 2023 at 10:59 PM Nathan Bossart
>  wrote:
> > On Sat, May 27, 2023 at 03:17:21PM -0700, Gurjeet Singh wrote:
> > > If listen_addresses is empty, then server won't try to open any TCP/IP
> > > ports. The patch does not change any language related to that.
> >
> > Your proposed change notes that the server only starts if it can listen on
> > at least one TCP/IP address, which I worry might lead folks to think that
> > the server won't start if listen_addresses is empty.
> 
> Perhaps we can prefix that statement with "If listen_addresses is not
> empty", like so:

I came up with a slightly modified doc patch, attached.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f0a50a5f9a..872dcdf325 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -659,7 +659,13 @@ include_dir 'conf.d'
  to it.
  The default value is localhost,
  which allows only local TCP/IP loopback connections to be
- made.  While client authentication (listen_addresses is not empty,
+ the server will start if it can open a port
+ on at least one TCP/IP address.  A warning will be emitted for
+ any TCP/IP address which cannot be opened.
+   
+   
+ While client authentication () allows fine-grained control
  over who can access the server, listen_addresses
  controls which interfaces accept connection attempts, which

Re: Eliminate redundant tuple visibility check in vacuum

2023-09-07 Thread Robert Haas

On Thu, Sep 7, 2023 at 3:10 PM Melanie Plageman
 wrote:
> I can fix it by changing the type of PruneResult->off_loc to be an
> OffsetNumber pointer. This does mean that PruneResult will be
> initialized partially by heap_page_prune() callers. I wonder if you
> think that undermines the case for making a new struct.

I think that it undermines the case for including that particular
argument in the struct. It's not really a Prune*Result* if the caller
initializes it in part. It seems fairly reasonable to still have a
PruneResult struct for the other stuff, though, at least to me. How do
you see it?

(One could also argue that this is a somewhat more byzantine way of
doing error reporting than would be desirable, but fixing that problem
doesn't seem too straightforward so perhaps it's prudent to leave it
well enough alone.)

> I still want to eliminate the NULL check of off_loc in
> heap_page_prune() by making it a required parameter. Even though
> on-access pruning does not have an error callback mechanism which uses
> the offset, it seems better to have a pointless local variable in
> heap_page_prune_opt() than to do a NULL check for every tuple pruned.

It doesn't seem important to me unless it improves performance. If
it's just stylistic, I don't object, but I also don't really see a
reason to care.

> > I haven't run the regression suite with 0001 applied. I'm guessing
> > that you did, and that they passed, which if true means that we don't
> > have a test for this, which is a shame, although writing such a test
> > might be a bit tricky. If there is a test case for this and you didn't
> > run it, woops.
>
> There is no test coverage for the vacuum error callback context
> currently (tests passed for me). I looked into how we might add such a
> test. First, I investigated what kind of errors might occur during
> heap_prune_satisfies_vacuum(). Some of the multixact functions called
> by HeapTupleSatisfiesVacuumHorizon() could error out -- for example
> GetMultiXactIdMembers(). It seems difficult to trigger the errors in
> GetMultiXactIdMembers(), as we would have to cause wraparound. It
> would be even more difficult to ensure that we hit those specific
> errors from a call stack containing heap_prune_satisfies_vacuum(). As
> such, I'm not sure I can think of a way to protect future developers
> from repeating my mistake--apart from improving the comment like you
> mentioned.

004_verify_heapam.pl has some tests that intentionally corrupt pages
and then use pg_amcheck to detect the corruption. Such an approach
could probably also be used here. But it's a pain to get such tests
right, because any change to the page format due to endianness,
different block size, or whatever can make carefully-written tests go
boom.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

2023-09-07 Thread Robert Haas

On Tue, Sep 5, 2023 at 8:07 AM Richard Guo  wrote:
> Yeah, this seems an omission in commit 45be99f8.

It's been a while, but I think I omitted this deliberately because I
didn't really understand the value of it and wanted to keep the
planning cost down.

The example query provided here seems rather artificial. Surely few
people write a join clause that references neither of the tables being
joined. Is there a more realistic case where this makes a big
difference?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Eliminate redundant tuple visibility check in vacuum

2023-09-07 Thread Melanie Plageman

On Thu, Sep 7, 2023 at 1:37 PM Robert Haas  wrote:
>
> On Wed, Sep 6, 2023 at 5:21 PM Melanie Plageman
>  wrote:
> > Yeah, I think this is a fair concern. I have addressed it in the
> > attached patches.
> >
> > I thought a lot about whether or not adding a PruneResult which
> > contains only the output parameters and result of heap_page_prune() is
> > annoying since we have so many other *Prune* data structures. I
> > decided it's not annoying. In some cases, four outputs don't merit a
> > new structure. In this case, I think it declutters the code a bit --
> > independent of any other patches I may be writing :)
>
> I took a look at 0001 and I think that it's incorrect. In the existing
> code, *off_loc gets updated before each call to
> heap_prune_satisfies_vacuum(). This means that if an error occurs in
> heap_prune_satisfies_vacuum(), *off_loc will as of that moment contain
> the relevant offset number. In your version, the relevant offset
> number will only be stored in some local structure to which the caller
> doesn't yet have access. The difference is meaningful. lazy_scan_prune
> passes off_loc as vacrel->offnum, which means that if an error
> happens, vacrel->offnum will have the right value, and so when the
> error context callback installed by heap_vacuum_rel() fires, namely
> vacuum_error_callback(), it can look at vacrel->offnum and know where
> the error happened. With your patch, I think that would no longer
> work.

You are right. That is a major problem. Thank you for finding that. I
was able to confirm the breakage by patching in an error to
heap_page_prune() after we have set off_loc and confirming that the
error context has the offset in master and is missing it with my patch
applied.

I can fix it by changing the type of PruneResult->off_loc to be an
OffsetNumber pointer. This does mean that PruneResult will be
initialized partially by heap_page_prune() callers. I wonder if you
think that undermines the case for making a new struct.

I still want to eliminate the NULL check of off_loc in
heap_page_prune() by making it a required parameter. Even though
on-access pruning does not have an error callback mechanism which uses
the offset, it seems better to have a pointless local variable in
heap_page_prune_opt() than to do a NULL check for every tuple pruned.

> I haven't run the regression suite with 0001 applied. I'm guessing
> that you did, and that they passed, which if true means that we don't
> have a test for this, which is a shame, although writing such a test
> might be a bit tricky. If there is a test case for this and you didn't
> run it, woops.

There is no test coverage for the vacuum error callback context
currently (tests passed for me). I looked into how we might add such a
test. First, I investigated what kind of errors might occur during
heap_prune_satisfies_vacuum(). Some of the multixact functions called
by HeapTupleSatisfiesVacuumHorizon() could error out -- for example
GetMultiXactIdMembers(). It seems difficult to trigger the errors in
GetMultiXactIdMembers(), as we would have to cause wraparound. It
would be even more difficult to ensure that we hit those specific
errors from a call stack containing heap_prune_satisfies_vacuum(). As
such, I'm not sure I can think of a way to protect future developers
from repeating my mistake--apart from improving the comment like you
mentioned.

- Melanie

Re: GUC for temporarily disabling event triggers

2023-09-07 Thread Robert Haas

On Thu, Sep 7, 2023 at 1:57 AM Michael Paquier  wrote:
> +   GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
>
> I am a bit surprised by these two additions.  Setting this GUC at
> file-level can be useful, as is documenting it in the control file if
> it provides some control of how a statement behaves, no?

Yeah, I don't think these options should be used.

> +   Allow temporarily disabling execution of event triggers in order to
> +   troubleshoot and repair faulty event triggers. All event triggers will
> +   be disabled by setting it to true. Setting the 
> value
> +   to false will not disable any event triggers, this
> +   is the default value. Only superusers and users with the appropriate
> +   SET privilege can change this setting.
>
> Event triggers are disabled if setting this GUC to false, while true,
> the default, allows event triggers.  The values are reversed in this
> description.

Woops.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: A minor adjustment to get_cheapest_path_for_pathkeys

2023-09-07 Thread Robert Haas

On Thu, Sep 7, 2023 at 5:32 AM Aleksander Alekseev
 wrote:
> v2 LGTM.

Committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

2023-09-07 Thread Bruce Momjian



Does anyone else have an opinion on this patch?  It looks promising.

---

On Wed, Jun 28, 2023 at 04:53:06PM +0300, Alena Rybakina wrote:
> Hi!
> 
> On 15.06.2023 11:30, Andrey Lepikhov wrote:
> 
> Hi, all.
> 
> Some of my clients use JOIN's with three - four clauses. Quite frequently,
> I see complaints on unreasonable switch of JOIN algorithm to Merge Join
> instead of Hash Join. Quick research have shown one weak place - 
> estimation
> of an average bucket size in final_cost_hashjoin (see q2.sql in 
> attachment)
> with very conservative strategy.
> Unlike estimation of groups, here we use smallest ndistinct value across
> all buckets instead of multiplying them (or trying to make multivariate
> analysis).
> It works fine for the case of one clause. But if we have many clauses, and
> if each has high value of ndistinct, we will overestimate average size of 
> a
> bucket and, as a result, prefer to use Merge Join. As the example in
> attachment shows, it leads to worse plan than possible, sometimes
> drastically worse.
> I assume, this is done with fear of functional dependencies between hash
> clause components. But as for me, here we should go the same way, as
> estimation of groups.
> The attached patch shows a sketch of the solution.
> 
> 
> This problem is very important.
> 
> Honestly, I'm still learning your code and looking for cases on which cases
> your patch can affect for the worse or for the better. But I have already 
> found
> something that seemed interesting to me. I have found several other 
> interesting
> cases where your patch can solve some problem in order to choose a more 
> correct
> plan, but in focus on memory consumption.
> 
> To make it easier to evaluate, I added a hook to your patch that makes it
> easier to switch to your or the original way of estimating the size of baskets
> (diff_estimate.diff).
> 
> Here are other cases where your fix improves the query plan.
> 
> 
> First of all, I changed the way creation of tables are created to look at the
> behavior of the query plan in terms of planning and execution time:
> 
> DROP TABLE IF EXISTS a,b CASCADE;
> CREATE TABLE a AS
>   SELECT ((3*gs) % 300) AS x, ((3*gs+1) % 300) AS y, ((3*gs+2) % 300) AS z
>   FROM generate_series(1,1e5) AS gs;
> CREATE TABLE b AS
>   SELECT gs % 90 AS x, gs % 49 AS y, gs %100 AS z, 'abc' || gs AS payload
>   FROM generate_series(1,1e5) AS gs;
> ANALYZE a,b;
> 
> SET enable_cost_size = 'on';
> EXPLAIN ANALYZE
> SELECT * FROM a,b
> WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;
> 
> SET enable_cost_size = 'off';
> EXPLAIN ANALYZE
> SELECT * FROM a,b
> WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;
> 
> 
>     QUERY PLAN     
> ---
>  Hash Join (actual time=200.872..200.879 rows=0 loops=1)
>    Hash Cond: ((b.x = a.x) AND (b.y = a.y) AND (b.z = a.z))
>    ->  Seq Scan on b (actual time=0.029..15.946 rows=10 loops=1)
>    ->  Hash (actual time=97.645..97.649 rows=10 loops=1)
>  Buckets: 131072  Batches: 1  Memory Usage: 5612kB
>  ->  Seq Scan on a (actual time=0.024..17.153 rows=10 loops=1)
>  Planning Time: 2.910 ms
>  Execution Time: 201.949 ms
> (8 rows)
> 
> SET
>     QUERY PLAN     
> ---
>  Merge Join (actual time=687.415..687.416 rows=0 loops=1)
>    Merge Cond: ((b.y = a.y) AND (b.x = a.x) AND (b.z = a.z))
>    ->  Sort (actual time=462.022..536.716 rows=10 loops=1)
>  Sort Key: b.y, b.x, b.z
>  Sort Method: external merge  Disk: 3328kB
>  ->  Seq Scan on b (actual time=0.017..12.326 rows=10 loops=1)
>    ->  Sort (actual time=111.295..113.196 rows=16001 loops=1)
>  Sort Key: a.y, a.x, a.z
>  Sort Method: external sort  Disk: 2840kB
>  ->  Seq Scan on a (actual time=0.020..10.129 rows=10 loops=1)
>  Planning Time: 0.752 ms
>  Execution Time: 688.829 ms
> (12 rows)
> 
> Secondly, I found another case that is not related to the fact that the 
> planner
> would prefer to choose merge join rather than hash join, but we have the
> opportunity to see that the plan has become better due to the consumption of
> less memory, and also takes less planning time.
> 
> Here, with the same query, the planning time was reduced by 5 times, and the
> number of buckets by 128 times, therefore, memory consumption also decreased:
> 
> DROP TABLE IF EXISTS a,b CASCADE;
> 
> CREATE TABLE a AS
>   SELECT ((3*gs) % 300) AS x, ((3*gs+1) % 300) AS y, ((3*gs+2) % 300) AS z
>   FROM generate_series(1,600) AS gs;
> CREATE TABLE b AS
>   SELECT gs % 90 AS x, gs % 49 AS y, gs %100 AS z, 'abc' || gs AS payload
>   FROM generate_series(1,1e5) AS gs;
>

Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?

2023-09-07 Thread Bruce Momjian

On Mon, Jul 10, 2023 at 02:37:24PM -0700, Nikolay Samokhvalov wrote:
> Maybe. It will require changes in other parts of this doc.
> Thinking (here: 
> https://gitlab.com/postgres/postgres/-/merge_requests/18/diffs)
> 
> Meanwhile, attached is v2
> 
> thanks for the comments

I looked over this issue thoroughly and I think I see the cause of the
confusion.  In step 8 we say:

8. Stop both servers
Streaming replication and log-shipping standby servers can remain
   ---
running until a later step.

Of course this has to be "must" and it would be good to explain why,
which I have done in the attached patch.

Secondly, in step 9 we say "verify the LSNs", but have a parenthetical
sentence that explains why they might not match:

(There will be a mismatch if old standby servers were shut down before
the old primary or if the old standby servers are still running.)

People might take that to mean that it is okay if this is the reason
they don't match, which is incorrect.  Better to tell them to keep the
streaming replication and log-shipping servers running so we don't need
that sentence.

The instructions are already long so I am hesitant to add more text
without a clear purpose.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..f9c16a5cdb 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -380,8 +380,8 @@ NET STOP postgresql-
 
 
 
- Streaming replication and log-shipping standby servers can
- remain running until a later step.
+ Streaming replication and log-shipping standby servers must be
+ running during this shutdown so they receive all changes.
 

 
@@ -394,8 +394,6 @@ NET STOP postgresql-
  servers are caught up by running pg_controldata
  against the old primary and standby clusters.  Verify that the
  Latest checkpoint location values match in all clusters.
- (There will be a mismatch if old standby servers were shut down
- before the old primary or if the old standby servers are still running.)
  Also, make sure wal_level is not set to
  minimal in the postgresql.conf file on the
  new primary cluster.

Re: Wrong command name in writeable-CTE related error messages

2023-09-07 Thread Markus Winand

> On 23.05.2023, at 19:40, Tom Lane  wrote:
> 
> Markus Winand  writes:
>> I noticed that errors due to writable CTEs in read-only or non-volatile 
>> context say the offensive command is SELECT.
> 
> Good point.
> 
>> My first thought was that these error messages should mention INSERT, but 
>> after looking into the source I’m not sure anymore. The name of the command 
>> is obtained from CreateCommandName(). After briefly looking around it 
>> doesn’t seem to be trivial to introduce something along the line of 
>> CreateModifyingCommandName().
> 
> Yeah, you would have to inspect the plan tree pretty carefully to
> determine that.
> 
> Given the way the test is written, maybe it'd make sense to forget about
> mentioning the command name, and instead identify the table we are
> complaining about:
> 
> ERROR: table "foo" cannot be modified in a read-only transaction

Attached patch takes the active form:

cannot modify table ”foo" in a read-only transaction

It obtains the table name by searching rtable for an RTE_RELATION with 
rellockmode == RowExclusiveLock. Not sure if there are any cases where that 
falls apart.

> I don't see any huge point in using PreventCommandIfReadOnly if we
> go that way, so no refactoring is needed: just test XactReadOnly
> directly.

As there are several places where this is needed, the patch introduces some 
utility functions.

More interestingly, I found that BEGIN ATOMIC bodies of non-volatile functions 
happily accept data-modifying statements and FOR UPDATE. While they fail at 
runtime it was my expectation that this would be caught at CREATE time. The 
attached patch also takes care of this by walking the Query tree and looking 
for resultRelation and hasForUpdate — assuming that non-volatile functions 
should have neither. Let me know if this is desired behavior or not.

-markus

0001-immutable-stable-create-time-validation.patch
Description: Binary data

Re: Eliminate redundant tuple visibility check in vacuum

2023-09-07 Thread Robert Haas

On Wed, Sep 6, 2023 at 5:21 PM Melanie Plageman
 wrote:
> Yeah, I think this is a fair concern. I have addressed it in the
> attached patches.
>
> I thought a lot about whether or not adding a PruneResult which
> contains only the output parameters and result of heap_page_prune() is
> annoying since we have so many other *Prune* data structures. I
> decided it's not annoying. In some cases, four outputs don't merit a
> new structure. In this case, I think it declutters the code a bit --
> independent of any other patches I may be writing :)

I took a look at 0001 and I think that it's incorrect. In the existing
code, *off_loc gets updated before each call to
heap_prune_satisfies_vacuum(). This means that if an error occurs in
heap_prune_satisfies_vacuum(), *off_loc will as of that moment contain
the relevant offset number. In your version, the relevant offset
number will only be stored in some local structure to which the caller
doesn't yet have access. The difference is meaningful. lazy_scan_prune
passes off_loc as vacrel->offnum, which means that if an error
happens, vacrel->offnum will have the right value, and so when the
error context callback installed by heap_vacuum_rel() fires, namely
vacuum_error_callback(), it can look at vacrel->offnum and know where
the error happened. With your patch, I think that would no longer
work.

I haven't run the regression suite with 0001 applied. I'm guessing
that you did, and that they passed, which if true means that we don't
have a test for this, which is a shame, although writing such a test
might be a bit tricky. If there is a test case for this and you didn't
run it, woops. This is also why I think it's *extremely* important for
the header comment of a function that takes pointer parameters to
document the semantics of those pointers. Normally they are in
parameters or out parameters or in-out parameters, but here it's
something even more complicated. The existing header comment says
"off_loc is the offset location required by the caller to use in error
callback," which I didn't really understand until I actually looked at
what the code is doing, so I consider that somebody could've done a
better job writing this comment, but in theory you could've also
noticed that, at least AFAICS, there's no way for the function to
return with *off_loc set to anything other than InvalidOffsetNumber.
That means that the statements which set *off_loc to other values must
have some other purpose.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] pgrowlocks: Make mode names consistent with docs

2023-09-07 Thread Bruce Momjian

On Fri, Jun 30, 2023 at 10:45:57AM -0500, David Cook wrote:
> I noticed that pgrowlocks will use different names for shared locks depending
> on whether the locks are intermediated by a multixact or not. Particularly, if
> a single transaction has locked a row, it may return "For Key Share" or "For
> Share" in the "modes" array, while if multiple transactions have locked a row,
> it may return "Key Share" or "Share". The documentation of the pgrowlocks
> function only lists "Key Share" and "Share" as possible modes. (The four
> exclusive lock modes use the correct names in both cases)
> 
> The attached patch (against the master branch) fixes this discrepancy, by 
> using
> "Key Share" and "Share" in the single transaction case, since that matches the
> documentation. I also updated the test's expected output so it passes again.

You are right something is wrong.  However, I looked at your patch and I
am thinking we need to go the other way and add "For" in the upper
block, rather than removing it in the lower one.  I have two reasons. 
Looking at the code block:

case MultiXactStatusUpdate:
snprintf(buf, NCHARS, "Update");
break;
case MultiXactStatusNoKeyUpdate:
snprintf(buf, NCHARS, "No Key Update");
break;
case MultiXactStatusForUpdate:
snprintf(buf, NCHARS, "For Update");
break;
case MultiXactStatusForNoKeyUpdate:
snprintf(buf, NCHARS, "For No Key Update");
break;
case MultiXactStatusForShare:
snprintf(buf, NCHARS, "Share");
break;
case MultiXactStatusForKeyShare:
snprintf(buf, NCHARS, "Key Share");
break;

You will notice there are "For" and non-"For" versions of "Update" and
"No Key Update".  Notice that "For" appears in the macro names for the
"For" macro versions of update, but "For" does not appear in the "Share"
and "Key Share" versions, though the macro has "For".

Second, notice that the "For" and non-"For" match in the block below
that, which suggests it is correct, and the non-"For" block is later:

values[Atnum_modes] = palloc(NCHARS);
if (infomask & HEAP_XMAX_LOCK_ONLY)
{
if (HEAP_XMAX_IS_SHR_LOCKED(infomask))
snprintf(values[Atnum_modes], NCHARS, "{For Share}");
else if (HEAP_XMAX_IS_KEYSHR_LOCKED(infomask))
snprintf(values[Atnum_modes], NCHARS, "{For Key Share}");
else if (HEAP_XMAX_IS_EXCL_LOCKED(infomask))
{
if (tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)
snprintf(values[Atnum_modes], NCHARS, "{For Update}");
else
snprintf(values[Atnum_modes], NCHARS, "{For No Key Update}");
}
else
/* neither keyshare nor exclusive bit it set */
snprintf(values[Atnum_modes], NCHARS,
 "{transient upgrade status}");
}
else
{
if (tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)
snprintf(values[Atnum_modes], NCHARS, "{Update}");
else
snprintf(values[Atnum_modes], NCHARS, "{No Key Update}");
}

I therefore suggest this attached patch, which should be marked as an
incompatibility in PG 17.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/contrib/pgrowlocks/expected/pgrowlocks.out b/contrib/pgrowlocks/expected/pgrowlocks.out
index 725467266a..77431bfe18 100644
--- a/contrib/pgrowlocks/expected/pgrowlocks.out
+++ b/contrib/pgrowlocks/expected/pgrowlocks.out
@@ -136,10 +136,10 @@ a|b
 (2 rows)
 
 step s2_rowlocks: SELECT locked_row, multi, modes FROM pgrowlocks('multixact_conflict');
-locked_row|multi|modes  
---+-+---
-(0,1) |t|{"Key Share",Share}
-(0,2) |t|{"Key Share",Share}
+locked_row|multi|modes
+--+-+-
+(0,1) |t|{"For Key Share","For Share"}
+(0,2) |t|{"For Key Share","For Share"}
 (2 rows)
 
 step s1_commit: COMMIT;
@@ -161,10 +161,10 @@ a|b
 (2 rows)
 
 step s2_rowlocks: SELECT locked_row, multi, modes FROM pgrowlocks('multixact_conflict');
-locked_row|multi|modes
---+-+-
-(0,1) |t|{"Key Share","For No Key Update"}
-(0,2) |t|{"Key Share","For No Key Update"}
+locked_row|multi|modes
+--+-+-
+(0,1) |t|{"For Key Share","For No Key Update"}
+(0,2) |t|{"For Key Share","For No Key Update"}
 (2 rows)
 
 step s1_commit: COMMIT;
@@ -186,10 +186,10 @@ a|b
 (2 rows)
 
 step s2_rowlocks: SELECT locked_row, multi, modes FROM pgrowlocks('multixact_conflict');
-locked_row|multi|modes 
---+-+--
-(0,1) |t|{"Key Share","For Update"}
-(0,2) |t|{"Key Share","For Update"}

Re: Simple CustomScan example

2023-09-07 Thread Drouvot, Bertrand


Hi,

On 9/6/23 9:32 PM, Chris Cleveland wrote:

I'm trying to write a custom scan. It's pretty confusing. I've read the 
documentation at
https://www.postgresql.org/docs/current/custom-scan.html 
, and I've scanned the
code in Citus Columnar and in Timescale, both of which are quite complex.

Is anyone aware of code with a simple example of a custom scan? 


You may find the one in this repo: https://github.com/bdrouvot/pg_directpaths 
simple enough.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Build the docs if there are changes in docs and don't run other tasks if the changes are only in docs

2023-09-07 Thread Nazir Bilal Yavuz

Hi,

There is an old work item about building the docs if there are changes
in the docs, otherwise don't build the docs. I wanted to make an
addition to that idea; if the changes are only in the docs, don't run
all tasks except building the docs task; this could help to save more
CI times. I attached two patches.

I assumed that the docs related changes are limited with the changes
in the docs folder but I am not really sure about that.

v1-0001-Only-built-the-docs-if-there-are-changes-are-in-t.patch:
This patch creates another task named 'Building the Docs' and moves
building the doc script from 'CompilerWarnings' task to this task.
This building the docs task only runs if there are changes in the docs
(under the doc/**) or in the CI files ('.cirrus.yml',
'.cirrus.tasks.yml') and if a specific OS is not requested.

v1-0002-Just-run-the-Build-the-Docs-task-if-the-changes-a.patch:
This patch adds that: if the changes are *only* in the docs (under the
doc/**), *only* run building the docs task.

As a summary:
1- If the changes are not in the docs: Don't run build the docs task.
2- If the changes are in the docs or in the CI files : Run build the docs task.
3- If the changes are only in the docs: Only run build the docs task.
4- If 'ci-os-only:' set (There could be changes in the docs): Don't
run build the docs task.

Any kind of feedback would be appreciated.

Regards,
Nazir Bilal Yavuz
Microsoft
From 43e18ed9eb328b4703ad3d8cb95d7d4b6140db89 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Thu, 7 Sep 2023 17:58:06 +0300
Subject: [PATCH v1 1/2] Only built the docs if there are changes are in the
 docs

Building the docs triggered although there are no changes in the docs.
So, the new 'Building the Docs' task is created. This task only
run if a specific OS is not requested and if there are changes in docs
or in the CI files.
---
 .cirrus.tasks.yml | 67 ---
 1 file changed, 52 insertions(+), 15 deletions(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index e137769850d..21e276beb3b 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -740,21 +740,6 @@ task:
   make -s -j${BUILD_JOBS} clean
   time make -s -j${BUILD_JOBS} world-bin
 
-  ###
-  # Verify docs can be built
-  ###
-  # XXX: Only do this if there have been changes in doc/ since last build
-  always:
-docs_build_script: |
-  time ./configure \
---cache gcc.cache \
-CC="ccache gcc" \
-CXX="ccache g++" \
-CLANG="ccache clang"
-  make -s -j${BUILD_JOBS} clean
-  time make -s -j${BUILD_JOBS} -C doc
-
-  ###
   # Verify headerscheck / cpluspluscheck succeed
   #
   # - Don't use ccache, the files are uncacheable, polluting ccache's
@@ -777,3 +762,55 @@ task:
 
   always:
 upload_caches: ccache
+
+
+task:
+  name: Build the Docs
+  depends_on: SanityCheck
+  # Only run if a specific OS is not requested and if there are changes in docs
+  # or in the CI files.
+  skip: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:.*' || !changesInclude('doc/**', '.cirrus.yml', '.cirrus.tasks.yml')
+
+  env:
+CPUS: 4
+BUILD_JOBS: 4
+IMAGE_FAMILY: pg-ci-bullseye
+CCACHE_MAXSIZE: "10M"
+CCACHE_DIR: "/tmp/ccache_dir"
+
+LINUX_CONFIGURE_FEATURES: *LINUX_CONFIGURE_FEATURES
+LINUX_MESON_FEATURES: *LINUX_MESON_FEATURES
+
+  <<: *linux_task_template
+
+  sysinfo_script: |
+id
+uname -a
+cat /proc/cmdline
+ulimit -a -H && ulimit -a -S
+gcc -v
+clang -v
+export
+
+  ccache_cache:
+folder: $CCACHE_DIR
+
+  setup_additional_packages_script: |
+#apt-get update
+#DEBIAN_FRONTEND=noninteractive apt-get -y install ...
+
+  setup_script: echo "COPT=-Werror" > src/Makefile.custom
+
+  # Verify docs can be built
+  always:
+docs_build_script: |
+  time ./configure \
+--cache gcc.cache \
+CC="ccache gcc" \
+CXX="ccache g++" \
+CLANG="ccache clang"
+  make -s -j${BUILD_JOBS} clean
+  time make -s -j${BUILD_JOBS} -C doc
+
+  always:
+upload_caches: ccache
-- 
2.40.1

From d82eb7349924f6a2a32b05435086a91a0c767796 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Thu, 7 Sep 2023 17:10:45 +0300
Subject: [PATCH v1 2/2] Just run the Build the Docs task if the changes are
 only in docs

If the changes are only in the docs, skip all tasks except 'Build the
Docs' task.
---
 .cirrus.tasks.yml | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 21e276beb3b..4ccf75cefda 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -3,6 +3,11 @@
 # For instructions on how to enable the CI integration in a repository and
 # further details, see src/tools/ci/README
 
+# When changes are only in docs, skip all tasks except
+# 'Build the Docs' task.
+#
+# This skip is overriden in 'SanityCheck' and 'Build the Docs' task.
+skip: changesIncludeOnly('doc/**')
 
 env:
   # The lower depth accelerates

Re: Fix bogus Asserts in calc_non_nestloop_required_outer

2023-09-07 Thread Aleksander Alekseev

Hi Richard,

> As stated in [1], all paths arriving here are parameterized by top
> parents, so we should check against top_parent_relids if it exists in
> the two Asserts.
>
> Attached is a patch fixing that.

Probably it's just because of my limited experience with the optimizer
but I don't find the proposed change particularly straightforward. I
would suggest adding a comment before the Assert's and/or a detailed
commit message would be helpful.

Other than that I can confirm that both branches in the Assert's are
executed and the tests pass in different test environments.

--
Best regards,
Aleksander Alekseev

Re: Output affected rows in EXPLAIN

2023-09-07 Thread Damir Belyalov

> This creates a bug, not fixes one.  It's intentional that "insert into a"
> is shown as returning zero rows, because that's what it did.  If you'd
> written "insert ... returning", you'd have gotten a different result:
>
Maybe I didn't understand you correctly, but I didn't touch the number of
affected rows in EXPLAIN output.
It's just a simple patch that adds 1 row after using commands: EXPLAIN
INSERT, EXPLAIN UPDATE, EXPLAIN DELETE.
It was done because the commands INSERT/UPDATE/DELETE return one row after
execution: "UPDATE 7" or "INSERT 0 4".
EXPLAIN (ANALYZE) INSERT/UPDATE/DELETE does the same thing as these
commands, but doesn't output this row. So I added it.


Patch is fixed. There is no row "EXPLAIN" in queries like:
postgres=# explain (analyze) select * from t;
  QUERY PLAN
---
 Seq Scan on t  (cost=0.00..35.50 rows=2550 width=4) (actual
time=0.064..0.075 rows=5 loops=1)
 Planning Time: 1.639 ms
 Execution Time: 0.215 ms
(3 rows)

EXPLAIN


What is about queries EXPLAIN INSERT/UPDATE/DELETE without ANALYZE?
Now it is outputting a row with 0 affected (inserted) rows at the end:
"INSERT 0 0", "UPDATE 0". Example:
explain update a set n = 2;
 QUERY PLAN

 Update on a  (cost=0.00..35.50 rows=0 width=0)
   ->  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=10)
(2 rows)

UPDATE 0

Regards,
Damir Belyalov
Postgres Professional
From c6cbc6fa9ddf24f29bc19ff115224dd76e351db1 Mon Sep 17 00:00:00 2001
From: Damir Belyalov 
Date: Tue, 5 Sep 2023 15:04:01 +0300
Subject: [PATCH 1/2] Output affected rows in EXPLAIN.

---
 src/backend/commands/explain.c | 10 +-
 src/backend/tcop/cmdtag.c  |  2 +-
 src/backend/tcop/pquery.c  |  8 +++-
 src/backend/tcop/utility.c | 27 ++-
 src/bin/psql/common.c  |  5 +++--
 src/include/commands/explain.h |  3 ++-
 src/include/tcop/cmdtag.h  |  1 +
 src/include/tcop/cmdtaglist.h  |  3 +++
 src/interfaces/libpq/fe-exec.c |  4 +++-
 9 files changed, 55 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..453e545ba5 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -162,7 +162,7 @@ static void escape_yaml(StringInfo buf, const char *str);
  */
 void
 ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
-			 ParamListInfo params, DestReceiver *dest)
+			 ParamListInfo params, DestReceiver *dest, uint64 *processed)
 {
 	ExplainState *es = NewExplainState();
 	TupOutputState *tstate;
@@ -173,6 +173,9 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	bool		timing_set = false;
 	bool		summary_set = false;
 
+	if (processed)
+		*processed = 0;
+
 	/* Parse options list. */
 	foreach(lc, stmt->options)
 	{
@@ -311,6 +314,9 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	end_tup_output(tstate);
 
 	pfree(es->str->data);
+
+	if (processed)
+		*processed = es->es_processed;
 }
 
 /*
@@ -649,6 +655,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	 */
 	INSTR_TIME_SET_CURRENT(starttime);
 
+	es->es_processed += queryDesc->estate->es_processed;
+
 	ExecutorEnd(queryDesc);
 
 	FreeQueryDesc(queryDesc);
diff --git a/src/backend/tcop/cmdtag.c b/src/backend/tcop/cmdtag.c
index 4bd713a0b4..9e6fdbd8af 100644
--- a/src/backend/tcop/cmdtag.c
+++ b/src/backend/tcop/cmdtag.c
@@ -146,7 +146,7 @@ BuildQueryCompletionString(char *buff, const QueryCompletion *qc,
 	 */
 	if (command_tag_display_rowcount(tag) && !nameonly)
 	{
-		if (tag == CMDTAG_INSERT)
+		if (tag == CMDTAG_INSERT || tag == CMDTAG_EXPLAIN_INSERT)
 		{
 			*bufp++ = ' ';
 			*bufp++ = '0';
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5565f200c3..ba0b33cc67 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -775,7 +775,13 @@ PortalRun(Portal portal, long count, bool isTopLevel, bool run_once,
 if (qc && portal->qc.commandTag != CMDTAG_UNKNOWN)
 {
 	CopyQueryCompletion(qc, >qc);
-	qc->nprocessed = nprocessed;
+	if (portal->qc.commandTag == CMDTAG_EXPLAIN ||
+		portal->qc.commandTag == CMDTAG_EXPLAIN_INSERT ||
+		portal->qc.commandTag == CMDTAG_EXPLAIN_UPDATE ||
+		portal->qc.commandTag == CMDTAG_EXPLAIN_DELETE)
+		qc->nprocessed = portal->qc.nprocessed;
+	else
+		qc->nprocessed = nprocessed;
 }
 
 /* Mark portal not active */
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index e3ccf6c7f7..8975d046f9 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -867,7 +867,32 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			break;
 
 		case T_ExplainStmt:
-			ExplainQuery(pstate, (ExplainStmt *) parsetree, params, dest);
+			{
+Query	   *query;
+uint64		processed;
+

Re: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

2023-09-07 Thread Masahiko Sawada

On Tue, Sep 5, 2023 at 11:32 AM Zhijie Hou (Fujitsu)
 wrote:
>
> On Monday, September 4, 2023 10:42 PM Masahiko Sawada  
> wrote:
>
> Hi,
>
> > On Mon, Sep 4, 2023 at 9:38 PM Zhijie Hou (Fujitsu) 
> > wrote:
> > >
> > > On Wednesday, July 5, 2023 2:53 PM Masahiko Sawada
> >  wrote:
> > >
> > > > Thanks for reviewing the patch. Pushed.
> > >
> > > My colleague Vignesh noticed that the newly added test cases were failing 
> > > in
> > BF animal sungazer[1].
> >
> > Thank you for reporting!
> >
> > >
> > > The test failed to drop the slot which is active on publisher.
> > >
> > > --error-log--
> > > This failure is because pg_drop_replication_slot fails with "replication 
> > > slot
> > "test_tab2_sub" is active for PID 55771638":
> > > 2023-09-02 09:00:04.806 UTC [12910732:4] 026_stats.pl LOG:  statement:
> > > SELECT pg_drop_replication_slot('test_tab2_sub')
> > > 2023-09-02 09:00:04.807 UTC [12910732:5] 026_stats.pl ERROR:
> > > replication slot "test_tab2_sub" is active for PID 55771638
> > > 2023-09-02 09:00:04.807 UTC [12910732:6] 026_stats.pl STATEMENT:
> > > SELECT pg_drop_replication_slot('test_tab2_sub')
> > > -
> > >
> > > I the reason is that the test DISABLEd the subscription before
> > > dropping the slot, while "ALTER SUBSCRIPTION DISABLE" doesn't wait for
> > > the walsender to release the slot, so it's possible that the walsender
> > > is still alive when calling
> > > pg_drop_replication_slot() to drop the slot on
> > > publisher(pg_drop_xxxslot() doesn't wait for slot to be released).
> >
> > I agree with your analysis.
> >
> > >
> > > I think we can wait for the slot to become inactive before dropping like:
> > > $node_primary->poll_query_until('otherdb',
> > > "SELECT NOT EXISTS (SELECT 1 FROM pg_replication_slots
> > WHERE active_pid IS NOT NULL)"
> > > )
> > >
> >
> > I prefer this approach but it would be better to specify the slot name in 
> > the
> > query.
> >
> > > Or we can just don't drop the slot as it’s the last testcase.
> >
> > Since we might add other tests after that in the future, I think it's 
> > better to drop
> > the replication slot (and subscription).
> >
> > >
> > > Another thing might be worth considering is we can add one parameter
> > > in
> > > pg_drop_replication_slot() to let user control whether to wait or not,
> > > and the test can be fixed as well by passing nowait=false to the func.
> >
> > While it could be useful in general we cannot use this approach for this 
> > issue
> > since it cannot be backpatched to older branches. We might want to discuss 
> > it
> > on a new thread.
> >
> > I've attached a draft patch to fix this issue. Feedback is very welcome.
>
> Thanks, it looks good to me.

Thank you for reviewing the patch.

I'll push the attached patch to master, v16, and v15 tomorrow, barring
any objections.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v1-0001-Stabilize-subscription-stats-test.patch
Description: Binary data

Re: persist logical slots to disk during shutdown checkpoint

2023-09-07 Thread Amit Kapila

On Thu, Sep 7, 2023 at 4:30 PM Ashutosh Bapat
 wrote:
>
> On Thu, Sep 7, 2023 at 4:11 PM Amit Kapila  wrote:
> >
> > On Thu, Sep 7, 2023 at 3:38 PM Ashutosh Bapat
> >  wrote:
> > >
> > >   * This needn't actually be part of a checkpoint, but it's a convenient
> > > - * location.
> > > + * location. is_shutdown is true in case of a shutdown checkpoint.
> > >
> > > Relying on the first sentence, if we decide to not persist the
> > > replication slot at the time of checkpoint, would that be OK? It
> > > doesn't look like a convenience thing to me any more.
> > >
> >
> > Instead of removing that comment, how about something like this: "This
> > needn't actually be part of a checkpoint except for shutdown
> > checkpoint, but it's a convenient location."?
> >
>
> I find the wording a bit awkward. My version would be "Checkpoint is a
> convenient location to persist all the slots. But in a shutdown
> checkpoint, indicated by is_shutdown = true, we also update
> confirmed_flush." But please feel free to choose whichever version you
> are comfortable with.
>

I think saying we also update confirmed_flush appears unclear to me.
So, I tried another version by changing the entire comment to:
"Normally, we can flush dirty replication slots at regular intervals
by any background process like bgwriter but checkpoint is a convenient
location to persist. Additionally, in case of a shutdown checkpoint,
we also identify the slots for which confirmed_flush has been updated
since the last time it persisted and flush them."

-- 
With Regards,
Amit Kapila.

Re: psql help message contains excessive indentations

2023-09-07 Thread Yugo NAGATA

On Thu, 7 Sep 2023 13:06:35 +0200
Alvaro Herrera  wrote:

> On 2023-Sep-07, Yugo NAGATA wrote:
> 
> > Yes. So, I mean how about fixing \watch description as the attached patch.
> 
> > diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
> > index 38c165a627..12280c0e54 100644
> > --- a/src/bin/psql/help.c
> > +++ b/src/bin/psql/help.c
> > @@ -200,9 +200,9 @@ slashUsage(unsigned short int pager)
> > HELP0("  \\gset [PREFIX] execute query and store result in psql 
> > variables\n");
> > HELP0("  \\gx [(OPTIONS)] [FILE] as \\g, but forces expanded output 
> > mode\n");
> > HELP0("  \\q quit psql\n");
> > -   HELP0("  \\watch [[i=]SEC] [c=N] [m=MIN]\n");
> > -   HELP0("  execute query every SEC seconds, up to 
> > N times\n");
> > -   HELP0("  stop if less than MIN rows are 
> > returned\n");
> > +   HELP0("  \\watch [[i=]SEC] [c=N] [m=MIN]\n"
> > + " execute query every SEC seconds, up 
> > to N times\n"
> > + " stop if less than MIN rows are 
> > returned\n");
> 
> Yeah, this looks right to me -- the whole help entry as a single
> translatable unit, instead of three separately translatable lines.

Thank you for your explanation. I understood it. I thought of just
imitating other places, and I didn't know each is a single translatable
unit.

Regards,
Yugo Nagata

> -- 
> Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
> "Estoy de acuerdo contigo en que la verdad absoluta no existe...
> El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)


-- 
Yugo NAGATA

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-07 Thread Hayato Kuroda (Fujitsu)

Dear Peter,

Thank you for reviewing! PSA new version.

> ==
> src/bin/pg_upgrade/check.c
> 
> 1. check_new_cluster_logical_replication_slots
> 
> + res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
> + max_replication_slots = atoi(PQgetvalue(res, 0, 0));
> +
> + if (PQntuples(res) != 1)
> + pg_fatal("could not determine max_replication_slots");
> 
> Shouldn't the PQntuples check be *before* the PQgetvalue and
> assignment to max_replication_slots?

Right, fixed. Also, the checking was added at the first query.

> 2. check_new_cluster_logical_replication_slots
> 
> + res = executeQueryOrDie(conn, "SHOW wal_level;");
> + wal_level = PQgetvalue(res, 0, 0);
> +
> + if (PQntuples(res) != 1)
> + pg_fatal("could not determine wal_level");
> 
> Shouldn't the PQntuples check be *before* the PQgetvalue and
> assignment to wal_level?

Fixed.

> 3. check_old_cluster_for_valid_slots
> 
> I saw that similar code with scripts like this is doing PG_REPORT:
> 
> pg_log(PG_REPORT, "fatal");
> 
> but that PG_REPORT is missing from this function.

Added.

> src/bin/pg_upgrade/function.c
> 
> 4. get_loadable_libraries
> 
> @@ -42,11 +43,12 @@ library_name_compare(const void *p1, const void *p2)
>   ((const LibraryInfo *) p2)->dbnum;
>  }
> 
> -
>  /*
>   * get_loadable_libraries()
> 
> ~
> 
> Removing that blank line (above this function) should not be included
> in the patch.

Restored the blank.

> 5. get_loadable_libraries
> 
> + /*
> + * Allocate a memory for extensions and logical replication output
> + * plugins.
> + */
> + os_info.libraries = pg_malloc_array(LibraryInfo,
> + totaltups + count_old_cluster_logical_slots());
> 
> /Allocate a memory/Allocate memory/

Fixed.

> 6. get_loadable_libraries
> + /*
> + * Store the name of output plugins as well. There is a possibility
> + * that duplicated plugins are set, but the consumer function
> + * check_loadable_libraries() will avoid checking the same library, so
> + * we do not have to consider their uniqueness here.
> + */
> + for (slotno = 0; slotno < slot_arr->nslots; slotno++)
> 
> /Store the name/Store the names/

Fixed.

> src/bin/pg_upgrade/info.c
> 
> 7. get_old_cluster_logical_slot_infos
> 
> + i_slotname = PQfnumber(res, "slot_name");
> + i_plugin = PQfnumber(res, "plugin");
> + i_twophase = PQfnumber(res, "two_phase");
> + i_caughtup = PQfnumber(res, "caughtup");
> + i_conflicting = PQfnumber(res, "conflicting");
> +
> + for (slotnum = 0; slotnum < num_slots; slotnum++)
> + {
> + LogicalSlotInfo *curr = [slotnum];
> +
> + curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
> + curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
> + curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
> + curr->caughtup = (strcmp(PQgetvalue(res, slotnum, i_caughtup), "t") == 0);
> + curr->conflicting = (strcmp(PQgetvalue(res, slotnum, i_conflicting),
> "t") == 0);
> + }
> 
> Saying "tup" always looks like it should be something tuple-related.
> IMO it will be better to call all these "caught_up" instead of
> "caughtup":
> 
> "caughtup" ==> "caught_up"
> i_caughtup ==> i_caught_up
> curr->caughtup ==> curr->caught_up

Fixed. The alias was also fixed.

> 8. print_slot_infos
> 
> +static void
> +print_slot_infos(LogicalSlotInfoArr *slot_arr)
> +{
> + int slotnum;
> +
> + if (slot_arr->nslots > 1)
> + pg_log(PG_VERBOSE, "Logical replication slots within the database:");
> +
> + for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
> + {
> + LogicalSlotInfo *slot_info = _arr->slots[slotnum];
> +
> + pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
> +slot_info->slotname,
> +slot_info->plugin,
> +slot_info->two_phase);
> + }
> +}
> 
> Although it makes no functional difference, it might be neater if the
> for loop is also within that "if (slot_arr->nslots > 1)" condition.

Hmm, but the point makes more differences between print_rel_infos() and
print_slot_infos(), I thought it should be similar. Instead, I added a quick
return. Thought?

> src/bin/pg_upgrade/pg_upgrade.h
> 
> 9.
> +/*
> + * Structure to store logical replication slot information
> + */
> +typedef struct
> +{
> + char*slotname; /* slot name */
> + char*plugin; /* plugin */
> + bool two_phase; /* can the slot decode 2PC? */
> + bool caughtup; /* Is confirmed_flush_lsn the same as latest
> + * checkpoint LSN? */
> + bool conflicting; /* Is the slot usable? */
> +} LogicalSlotInfo;
> 
> 9a.
> + bool caughtup; /* Is confirmed_flush_lsn the same as latest
> + * checkpoint LSN? */
> 
> caughtup ==> caught_up

Fixed.

> 9b.
> + bool conflicting; /* Is the slot usable? */
> 
> The field name has the opposite meaning of the wording of the comment.
> (e.g. it is usable when it is NOT conflicting, right?).
> 
> Maybe there needs a better field name, or a better comment, or both.
> AFAICT from other code pg_fatal message 'conflicting' is always
> interpreted as 'lost' so maybe the field should be called that?

Changed to

Re: add (void) cast inside advance_aggregates for function ExecEvalExprSwitchContext

2023-09-07 Thread Daniel Gustafsson

> On 3 Sep 2023, at 05:16, jian he  wrote:

> In src/backend/executor/nodeAgg.c
> 817: advance_aggregates(AggState *aggstate)
> 
> Do we need to add "(void)" before ExecEvalExprSwitchContext?

I don't think we need to, but we could since we are in fact discardnig the
return value.  Did you get a compiler warning on unchecked return, and if so
with which flags?

--
Daniel Gustafsson

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2023-09-07 Thread Daniel Gustafsson

> On 7 Sep 2023, at 13:30, Thomas Munro  wrote:

> I don't like the idea that our *next* release's library version
> horizon is controlled by Red Hat's "ELS" phase.

Agreed.  If we instead fence it by "only non-EOL version" then 1.1.1 is also on
the chopping block for v17 as it goes EOL in 4 days from now with 1.1.1w (which
contains a CVE, going out with a bang).  Not sure what the best strategy is,
but whichever we opt for I think the most important point is to document it
clearly.

> These hypothetical users that want to run
> an OS even older than that and don't know how to get modern crypto
> libraries on it but insist on a shiny new PostgreSQL release and build
> it from source because there are no packages available... don't exist?

Sadly I wouldn't be the least bit surprised if there are 1.0.2 users on modern
operating systems, especially given its LTS status (which OpenSSL hasn't even
capped but sells by "for as long as it remains commercially viable to do so"
basis).  That being said, my gut feeling is that 3.x has gotten pretty good
market penetration.

--
Daniel Gustafsson

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2023-09-07 Thread Thomas Munro

On Wed, May 24, 2023 at 11:03 PM Daniel Gustafsson  wrote:
> > On 24 May 2023, at 11:52, Michael Paquier  wrote:
> > On Wed, May 24, 2023 at 11:36:56AM +0200, Daniel Gustafsson wrote:
> >> 1.0.2 is also an LTS version available commercially for premium support
> >> customers of OpenSSL (1.1.1 will become an LTS version as well), with 
> >> 1.0.2zh
> >> slated for release next week.  This raises the likelyhood of Postgres
> >> installations using 1.0.2 in production still, and for some time to come.
> >
> > Good point.  Indeed, that makes it pretty clear that not dropping
> > 1.0.2 would be the best option for the time being, so 0001 would be
> > enough.
>
> I think thats the right move re 1.0.2 support.  1.0.2 is also the version in
> RHEL7 which is in ELS until 2026.

I don't mind either way if we rip out OpenSSL 1.0.2 support now or
later, other than a general feeling that cryptography must be about
the worst possible category of software to keep supporting for years
after it has been declared EOL.

But.. I don't like the idea that our *next* release's library version
horizon is controlled by Red Hat's "ELS" phase.  The
yum.postgresql.org team aren't packaging 17 for RHEL7 AFAICS, which is
as it should be if you ask me, because the 10 year maintenance phase
ends before 17 will ship.  These hypothetical users that want to run
an OS even older than that and don't know how to get modern crypto
libraries on it but insist on a shiny new PostgreSQL release and build
it from source because there are no packages available... don't exist?

Re: Add resource intensiveness as a reason to not running tests by default

2023-09-07 Thread Daniel Gustafsson

> On 7 Sep 2023, at 13:09, Nazir Bilal Yavuz  wrote:

> With f47ed79cc8, the test suite doesn't run 'wal_consistency_checking'
> as default because it is resource intensive; but regress docs doesn't
> state resource intensiveness as a reason for not running tests by
> default. So, I created a patch for updating the docs.

Agreed, the current wording lacks the mention of skipping tests due to high
resource usage.  Patch looks good from a quick skim, I'll backpatch it down to
15 which is where PG_TEST_EXTRA was first used in this capacity.

--
Daniel Gustafsson

Add resource intensiveness as a reason to not running tests by default

2023-09-07 Thread Nazir Bilal Yavuz

Hi,

With f47ed79cc8, the test suite doesn't run 'wal_consistency_checking'
as default because it is resource intensive; but regress docs doesn't
state resource intensiveness as a reason for not running tests by
default. So, I created a patch for updating the docs.

Any kind of feedback would be appreciated.

Regards,
Nazir Bilal Yavuz
Microsoft
From 847a06a3a0546a14f07654ff79da4039c5c7cc34 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Thu, 7 Sep 2023 13:33:46 +0300
Subject: [PATCH v1] Update the reasons of tests not running by default on docs

With f47ed79cc8, test suite doesn't run 'wal_consistency_checking' as
default because it is resource intensive. However, regress
documentation only states 'not secure to run on a multiuser system'
or 'requiring special software' as a reason to not run the tests
by default. Therefore, update the docs by adding 'resource intensive'
as a reason to not run tests by default.
---
 doc/src/sgml/regress.sgml | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index de065c0564a..69f627d7f43 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -253,11 +253,11 @@ make check-world -j8 >/dev/null
 
   
Some test suites are not run by default, either because they are not secure
-   to run on a multiuser system or because they require special software.  You
-   can decide which test suites to run additionally by setting the
-   make or environment variable
-   PG_TEST_EXTRA to a whitespace-separated list, for
-   example:
+   to run on a multiuser system, because they require special software or
+   because they are resource intensive.  You can decide which test suites to
+   run additionally by setting the make or environment
+   variable PG_TEST_EXTRA to a whitespace-separated list,
+   for example:
 
 make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance'
 
-- 
2.40.1

Re: psql help message contains excessive indentations

2023-09-07 Thread Alvaro Herrera

On 2023-Sep-07, Yugo NAGATA wrote:

> Yes. So, I mean how about fixing \watch description as the attached patch.

> diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
> index 38c165a627..12280c0e54 100644
> --- a/src/bin/psql/help.c
> +++ b/src/bin/psql/help.c
> @@ -200,9 +200,9 @@ slashUsage(unsigned short int pager)
>   HELP0("  \\gset [PREFIX] execute query and store result in psql 
> variables\n");
>   HELP0("  \\gx [(OPTIONS)] [FILE] as \\g, but forces expanded output 
> mode\n");
>   HELP0("  \\q quit psql\n");
> - HELP0("  \\watch [[i=]SEC] [c=N] [m=MIN]\n");
> - HELP0("  execute query every SEC seconds, up to 
> N times\n");
> - HELP0("  stop if less than MIN rows are 
> returned\n");
> + HELP0("  \\watch [[i=]SEC] [c=N] [m=MIN]\n"
> +   " execute query every SEC seconds, up 
> to N times\n"
> +   " stop if less than MIN rows are 
> returned\n");

Yeah, this looks right to me -- the whole help entry as a single
translatable unit, instead of three separately translatable lines.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)

Re: [BackendXidGetPid] only access allProcs when xid matches

2023-09-07 Thread Ashutosh Bapat

On Thu, Sep 7, 2023 at 3:05 PM Junwang Zhao  wrote:
>
> On Thu, Sep 7, 2023 at 5:07 PM Ashutosh Bapat
>  wrote:
> >
> > Forgot to attach the patch.
>
> LGTM
>
> Should I change the status to ready for committer now?
>

I would still love to see some numbers. It's not that hard. Either
using my micro benchmark or Michael's. We may or may not see an
improvement. But at least we tried. But Michael feels otherwise,
changing it to RoC is fine.

-- 
Best Wishes,
Ashutosh Bapat

Re: missing privilege check after not-null constraint rework

2023-09-07 Thread Alvaro Herrera

On 2023-Sep-05, Alvaro Herrera wrote:

> On 2023-Sep-05, Alvaro Herrera wrote:
> 
> > Here's a fix to move the privilege check on constraint dropping from
> > ATExecDropConstraint to dropconstraint_internal.

I have pushed this.  It's just a fixup for an embarrasing bug in
b0e96f311985.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)

Re: persist logical slots to disk during shutdown checkpoint

2023-09-07 Thread Ashutosh Bapat

On Thu, Sep 7, 2023 at 4:11 PM Amit Kapila  wrote:
>
> On Thu, Sep 7, 2023 at 3:38 PM Ashutosh Bapat
>  wrote:
> >
> >   * This needn't actually be part of a checkpoint, but it's a convenient
> > - * location.
> > + * location. is_shutdown is true in case of a shutdown checkpoint.
> >
> > Relying on the first sentence, if we decide to not persist the
> > replication slot at the time of checkpoint, would that be OK? It
> > doesn't look like a convenience thing to me any more.
> >
>
> Instead of removing that comment, how about something like this: "This
> needn't actually be part of a checkpoint except for shutdown
> checkpoint, but it's a convenient location."?
>

I find the wording a bit awkward. My version would be "Checkpoint is a
convenient location to persist all the slots. But in a shutdown
checkpoint, indicated by is_shutdown = true, we also update
confirmed_flush." But please feel free to choose whichever version you
are comfortable with.

-- 
Best Wishes,
Ashutosh Bapat

RE: pg_ctl start may return 0 even if the postmaster has been already started on Windows

2023-09-07 Thread Hayato Kuroda (Fujitsu)

Dear Michael,

Thank you for replying!

> Not failing on `pg_ctl start` if the command is run on a data folder
> that has already been started previously by a different command with a
> postmaster still alive feels like cheating, because pg_ctl is lying
> about its result.  If pg_ctl wants to start a cluster but is not able
> to do it, either because the postmaster failed at startup or because
> the cluster has already started, it should report a failure.

I have a same feelings as you. Users may use the return code in their batch file
and they may decide what to do based on the wrong status. Reporting the status
more accurately is nice.

My first idea is that to move the checking part to above, but this may not 
handle
the case the postmaster is still alive (now sure this is real issue). Do we 
have to
add a new indicator which ensures the identity of processes for windows?
Please tell me how you feel.

> Now, I
> also recall that the processes spawned by pg_ctl on Windows make the
> status handling rather tricky to reason about..

Did you say about the below comment? Currently I have no idea to make
codes more proper, sorry.

```
 * On Windows, we may be checking the postmaster's parent 
shell, but
 * that's fine for this purpose.
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



move_checking.patch
Description: move_checking.patch

Re: persist logical slots to disk during shutdown checkpoint

2023-09-07 Thread Amit Kapila

On Thu, Sep 7, 2023 at 3:38 PM Ashutosh Bapat
 wrote:
>
>   * This needn't actually be part of a checkpoint, but it's a convenient
> - * location.
> + * location. is_shutdown is true in case of a shutdown checkpoint.
>
> Relying on the first sentence, if we decide to not persist the
> replication slot at the time of checkpoint, would that be OK? It
> doesn't look like a convenience thing to me any more.
>

Instead of removing that comment, how about something like this: "This
needn't actually be part of a checkpoint except for shutdown
checkpoint, but it's a convenient location."?

-- 
With Regards,
Amit Kapila.

Re: Impact of checkpointer during pg_upgrade

2023-09-07 Thread Dilip Kumar

On Thu, Sep 7, 2023 at 3:34 PM Amit Kapila  wrote:
>
> On Thu, Sep 7, 2023 at 12:55 PM Michael Paquier  wrote:
> >
> > (Just dumping what I have in mind while reading the thread.)
> >
> > On Sat, Sep 02, 2023 at 10:08:51AM +0530, Amit Kapila wrote:
> > > During pg_upgrade, we start the server for the old cluster which can
> > > allow the checkpointer to remove the WAL files. It has been noticed
> > > that we do generate certain types of WAL records (e.g
> > > XLOG_RUNNING_XACTS, XLOG_CHECKPOINT_ONLINE, and XLOG_FPI_FOR_HINT)
> > > even during pg_upgrade for old cluster, so additional WAL records
> > > could let checkpointer decide that certain WAL segments can be removed
> > > (e.g. say wal size crosses max_slot_wal_keep_size_mb) and invalidate
> > > the slots. Currently, I can't see any problem with this but for future
> > > work where we want to migrate logical slots during an upgrade[1], we
> > > need to decide what to do for such cases. The initial idea we had was
> > > that if the old cluster has some invalid slots, we won't allow an
> > > upgrade unless the user removes such slots or uses some option like
> > > --exclude-slots. It is quite possible that slots got invalidated
> > > during pg_upgrade due to no user activity. Now, even though the
> > > possibility of the same is less I think it is worth considering what
> > > should be the behavior.
> > >
> > > The other possibilities apart from not allowing an upgrade in such a
> > > case could be (a) Before starting the old cluster, we fetch the slots
> > > directly from the disk using some tool like [2] and make the decisions
> > > based on that state; (b) During the upgrade, we don't allow WAL to be
> > > removed if it can invalidate slots; (c) Copy/Migrate the invalid slots
> > > as well but for that, we need to expose an API to invalidate the
> > > slots; (d) somehow distinguish the slots that are invalidated during
> > > an upgrade and then simply copy such slots because anyway we ensure
> > > that all the WAL required by slot is sent before shutdown.
> >
> > Checking for any invalid slots at the beginning of the upgrade and
> > complain sounds like a good thing to do, because these are not
> > expected to begin with, no?  That's similar to a pre-check requirement
> > that the slots should have fed the subscribers with all the data
> > available up to the shutdown checkpoint when the publisher was stopped
> > before running pg_upgrade.  So (a) is a good idea to prevent an
> > upgrade based on a state we don't expect from the start, as something
> > in check.c, I assume.
> >
> > On top of that, (b) sounds like a good idea to me anyway to be more
> > defensive.  But couldn't we do that just like we do for autovacuum and
> > force the GUCs that could remove the slot's WAL to not do their work
> > instead?
> >
>
> I think if we just make max_slot_wal_keep_size to -1 that should be
> sufficient to not let any slots get invalidated during upgrade. Do you
> have anything else in mind?

This seems like a good solution to the problem.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: persist logical slots to disk during shutdown checkpoint

2023-09-07 Thread Ashutosh Bapat

On Thu, Sep 7, 2023 at 1:43 PM Amit Kapila  wrote:
>
> On Thu, Sep 7, 2023 at 1:18 PM Michael Paquier  wrote:
> >
> > On Thu, Sep 07, 2023 at 11:56:28AM +0530, Amit Kapila wrote:
> > > Thanks, the patch looks good to me as well.
> >
> > +   /* This is used to track the last saved confirmed_flush LSN value */
> > +   XLogRecPtr  last_saved_confirmed_flush;
> >
> > This does not feel sufficient, as the comment explaining what this
> > variable does uses the same terms as the variable name (aka it is the
> > last save of the confirmed_lsn).  Why it it here and why it is useful?
> > In which context and/or code paths is it used?  Okay, there are some
> > explanations when saving a slot, restoring a slot or when a checkpoint
> > processes slots, but it seems important to me to document more things
> > in ReplicationSlot where this is defined.
> >
>
> Hmm, this is quite debatable as different people feel differently
> about this. The patch author kept it where it is now but in one of my
> revisions, I rewrote and added it in the ReplicationSlot. Then
> Ashutosh argued that it is better to keep it near where we are saving
> the slot (aka where the patch has) [1]. Anyway, as I also preferred
> the core part of the theory about this variable to be in
> ReplicationSlot, so I'll move it there before commit unless someone
> argues against it or has any other comments.

If we want it to be in ReplicationSlot, I suggest we just say, - saves
last confirmed flush LSN to detect any divergence in the in-memory and
on-disk confirmed flush LSN cheaply.

When to detect that divergence and what to do when there is divergence
should be document at relevant places in the code. In future if we
expand the When and How we use this variable, the comment in
ReplicationSlot will be insufficient.

We follow this commenting style at several places e.g.
/* any outstanding modifications? */
bool just_dirtied;
bool dirty;

how and when these variables are used is commented upon in the relevant code.

  * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * location. is_shutdown is true in case of a shutdown checkpoint.

Relying on the first sentence, if we decide to not persist the
replication slot at the time of checkpoint, would that be OK? It
doesn't look like a convenience thing to me any more.

-- 
Best Wishes,
Ashutosh Bapat

Re: Impact of checkpointer during pg_upgrade

2023-09-07 Thread Amit Kapila

On Thu, Sep 7, 2023 at 12:55 PM Michael Paquier  wrote:
>
> (Just dumping what I have in mind while reading the thread.)
>
> On Sat, Sep 02, 2023 at 10:08:51AM +0530, Amit Kapila wrote:
> > During pg_upgrade, we start the server for the old cluster which can
> > allow the checkpointer to remove the WAL files. It has been noticed
> > that we do generate certain types of WAL records (e.g
> > XLOG_RUNNING_XACTS, XLOG_CHECKPOINT_ONLINE, and XLOG_FPI_FOR_HINT)
> > even during pg_upgrade for old cluster, so additional WAL records
> > could let checkpointer decide that certain WAL segments can be removed
> > (e.g. say wal size crosses max_slot_wal_keep_size_mb) and invalidate
> > the slots. Currently, I can't see any problem with this but for future
> > work where we want to migrate logical slots during an upgrade[1], we
> > need to decide what to do for such cases. The initial idea we had was
> > that if the old cluster has some invalid slots, we won't allow an
> > upgrade unless the user removes such slots or uses some option like
> > --exclude-slots. It is quite possible that slots got invalidated
> > during pg_upgrade due to no user activity. Now, even though the
> > possibility of the same is less I think it is worth considering what
> > should be the behavior.
> >
> > The other possibilities apart from not allowing an upgrade in such a
> > case could be (a) Before starting the old cluster, we fetch the slots
> > directly from the disk using some tool like [2] and make the decisions
> > based on that state; (b) During the upgrade, we don't allow WAL to be
> > removed if it can invalidate slots; (c) Copy/Migrate the invalid slots
> > as well but for that, we need to expose an API to invalidate the
> > slots; (d) somehow distinguish the slots that are invalidated during
> > an upgrade and then simply copy such slots because anyway we ensure
> > that all the WAL required by slot is sent before shutdown.
>
> Checking for any invalid slots at the beginning of the upgrade and
> complain sounds like a good thing to do, because these are not
> expected to begin with, no?  That's similar to a pre-check requirement
> that the slots should have fed the subscribers with all the data
> available up to the shutdown checkpoint when the publisher was stopped
> before running pg_upgrade.  So (a) is a good idea to prevent an
> upgrade based on a state we don't expect from the start, as something
> in check.c, I assume.
>
> On top of that, (b) sounds like a good idea to me anyway to be more
> defensive.  But couldn't we do that just like we do for autovacuum and
> force the GUCs that could remove the slot's WAL to not do their work
> instead?
>

I think if we just make max_slot_wal_keep_size to -1 that should be
sufficient to not let any slots get invalidated during upgrade. Do you
have anything else in mind?

  An upgrade is a special state of the cluster, and I'm not
> much into painting more checks based on IsBinaryUpgrade to prevent WAL
> segments to be removed while we can have a full control of what we
> want with just the GUCs that force the hand of the slots.  That just
> seems simpler to me, and the WAL recycling logic is already complex
> particularly with all the GUCs popping lately to force some conditions
> to do the WAL recycling and/or removal.
>
> During the upgrade, if some segments are removed and some of the slots
> we expect to still be valid once the upgrade is done are marked as
> invalid and become unusable, I think that we should just copy these
> slots, but also update their state data so as they can still be used
> with what we expect, as these could be wanted by the subscribers.
> That's what you mean with (d), I assume. Do you think that it would
> be possible to force the slot's data on the publishers so as they use
> a local LSN based on the new WAL we are resetting at?
>

Yes.

>
> At least that
> seems more friendly to me as this limits the set of manipulations to
> do on the slots for the end-user.  The protection from (b) ought to be
> enough, in itself.  (c) overlaps with (a), especially if we want to be
> able to read or write some of the slot's data offline.
>

If we do (b) either via GUCs or IsBinaryUpgrade check we don't need to
do any of (a), (b), or (d). I feel that would be a minimal and
sufficient fix to prevent any side impact of checkpointer on slots
during an upgrade.

-- 
With Regards,
Amit Kapila.

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

2023-09-07 Thread tender wang

Richard Guo  于2023年9月5日周二 18:51写道：

>
> On Tue, Sep 5, 2023 at 4:52 PM tender wang  wrote:
>
>>I recently run benchmark[1] on master, but I found performance problem
>> as below:
>> ...
>>
>> I debug the code and find consider_parallel_nestloop() doesn't consider
>> materialized form of the cheapest inner path.
>>
>
> Yeah, this seems an omission in commit 45be99f8.  I reviewed the patch
> and here are some comments.
>
> * I think we should not consider materializing the cheapest inner path
>   if we're doing JOIN_UNIQUE_INNER, because in this case we have to
>   unique-ify the inner path.
>

 That's right. The V2 patch has been fixed.


> * I think we can check if it'd be parallel safe before creating the
>   material path, thus avoid the creation in unsafe cases.
>

Agreed.



> * I don't think the test case you added works for the code changes.
>   Maybe a plan likes below is better:
>

  Agreed.

explain (costs off)
> select * from tenk1, tenk2 where tenk1.two = tenk2.two;
>   QUERY PLAN
> --
>  Gather
>Workers Planned: 4
>->  Nested Loop
>  Join Filter: (tenk1.two = tenk2.two)
>  ->  Parallel Seq Scan on tenk1
>  ->  Materialize
>->  Seq Scan on tenk2
> (7 rows)
>
> Thanks
> Richard
>
From 096c645d7c8d06df3a4e6a8aa7fc4edd1c0a3755 Mon Sep 17 00:00:00 2001
From: "tender.wang" 
Date: Thu, 7 Sep 2023 17:55:04 +0800
Subject: [PATCH v2] Parallel seq scan should consider materila inner path in
 nestloop case.

---
 src/backend/optimizer/path/joinpath.c | 19 +++
 src/test/regress/expected/select_parallel.out | 24 +++
 src/test/regress/sql/select_parallel.sql  |  9 +++
 3 files changed, 52 insertions(+)

diff --git a/src/backend/optimizer/path/joinpath.c 
b/src/backend/optimizer/path/joinpath.c
index 821d282497..87111ad643 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -2004,10 +2004,25 @@ consider_parallel_nestloop(PlannerInfo *root,
 {
JoinTypesave_jointype = jointype;
ListCell   *lc1;
+   Path *matpath = NULL;
+   Path *inner_cheapest_total = innerrel->cheapest_total_path;
 
if (jointype == JOIN_UNIQUE_INNER)
jointype = JOIN_INNER;
 
+   /*
+* Consider materializing the cheapest inner path, unless we're
+* doing JOIN_UNIQUE_INNER or enable_material is off or the subpath
+* is parallel unsafe or the path in question materializes its output 
anyway.
+*/
+   if (save_jointype != JOIN_UNIQUE_INNER &&
+   enable_material &&
+   inner_cheapest_total != NULL &&
+   inner_cheapest_total->parallel_safe &&
+   !ExecMaterializesOutput(inner_cheapest_total->pathtype))
+   matpath = (Path *)
+   create_material_path(innerrel, inner_cheapest_total);
+
foreach(lc1, outerrel->partial_pathlist)
{
Path   *outerpath = (Path *) lfirst(lc1);
@@ -2064,6 +2079,10 @@ consider_parallel_nestloop(PlannerInfo *root,
try_partial_nestloop_path(root, joinrel, 
outerpath, mpath,

  pathkeys, jointype, extra);
}
+   /* Also consider materialized form of the cheapest inner path */
+   if (matpath != NULL && matpath->parallel_safe)
+   try_partial_nestloop_path(root, joinrel, outerpath, 
matpath,
+ 
pathkeys, jointype, extra);
}
 }
 
diff --git a/src/test/regress/expected/select_parallel.out 
b/src/test/regress/expected/select_parallel.out
index d88353d496..5b9f5c58cc 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -844,6 +844,30 @@ select * from
 (12 rows)
 
 reset enable_material;
+-- test materialized form of the cheapest inner path
+set min_parallel_table_scan_size = '512kB';
+explain(costs off)
+select count(*) from tenk1, int4_tbl where tenk1.two < int4_tbl.f1;
+ QUERY PLAN 
+
+ Finalize Aggregate
+   ->  Gather
+ Workers Planned: 4
+ ->  Partial Aggregate
+   ->  Nested Loop
+ Join Filter: (tenk1.two < int4_tbl.f1)
+ ->  Parallel Seq Scan on tenk1
+ ->  Materialize
+   ->  Seq Scan on int4_tbl
+(9 rows)
+
+select count(*) from tenk1, int4_tbl where tenk1.two < int4_tbl.f1;
+ count 
+---
+ 2
+(1 row)
+
+set min_parallel_table_scan_size = 0;
 reset enable_hashagg;
 -- check parallelized int8 aggregate (bug #14897)
 explain (costs off)
diff --git

Re: information_schema and not-null constraints

2023-09-07 Thread Alvaro Herrera

On 2023-Sep-06, Alvaro Herrera wrote:

> On 2023-Sep-04, Alvaro Herrera wrote:
> 
> > In reference to [1], 0001 attached to this email contains the updated
> > view definitions that I propose.
> 
> Given the downthread discussion, I propose the attached.  There are no
> changes to v2, other than dropping the test part.

Pushed.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: [BackendXidGetPid] only access allProcs when xid matches

2023-09-07 Thread Junwang Zhao

On Thu, Sep 7, 2023 at 5:07 PM Ashutosh Bapat
 wrote:
>
> Forgot to attach the patch.

LGTM

Should I change the status to ready for committer now?

>
> On Thu, Sep 7, 2023 at 1:22 PM Ashutosh Bapat
>  wrote:
> >
> > Hi Junwang,
> > We leave a line blank after variable declaration as in the attached patch.
> >
> > Otherwise the patch looks good to me.
> >
> > The function modified by the patch is only used by extension
> > pgrowlocks. Given that the function will be invoked as many times as
> > the number of locked rows in the relation, the patch may show some
> > improvement and thus be more compelling. One way to measure
> > performance is to create a table with millions of rows, SELECT all
> > rows with FOR SHARE/UPDATE clause. Then run pgrowlock() on that
> > relation. This will invoke the given function a million times. That
> > way we might be able to catch some miniscule improvement per row.
> >
> > If the performance is measurable, we can mark the CF entry as ready
> > for committer.
> >
> > --
> > Best Wishes,
> > Ashutosh Bapat
> >
> > On Thu, Aug 10, 2023 at 1:48 PM Junwang Zhao  wrote:
> > >
> > > On Thu, Aug 10, 2023 at 4:11 PM Ashutosh Bapat
> > >  wrote:
> > > >
> > > > Please add this to commitfest so that it's not forgotten.
> > > >
> > >
> > > Added [1], thanks
> > >
> > > [1]: https://commitfest.postgresql.org/44/4495/
> > >
> > > > On Wed, Aug 9, 2023 at 8:37 PM Junwang Zhao  wrote:
> > > > >
> > > > > On Wed, Aug 9, 2023 at 10:46 PM Ashutosh Bapat
> > > > >  wrote:
> > > > > >
> > > > > > On Wed, Aug 9, 2023 at 9:30 AM Junwang Zhao  
> > > > > > wrote:
> > > > > > >
> > > > > > > In function `BackendXidGetPid`, when looping every proc's
> > > > > > > TransactionId, there is no need to access its PGPROC since 
> > > > > > > there
> > > > > > > is shared memory access: `arrayP->pgprocnos[index]`.
> > > > > > >
> > > > > > > Though the compiler can optimize this kind of inefficiency, I
> > > > > > > believe we should ship with better code.
> > > > > > >
> > > > > >
> > > > > > Looks good to me. However, I would just move the variable 
> > > > > > declaration
> > > > > > with their assignments inside the if () rather than combing the
> > > > > > expressions. It more readable that way.
> > > > >
> > > > > yeah, make sense, also checked elsewhere using the original style,
> > > > > attachment file
> > > > > keep that style, thanks ;)
> > > > >
> > > > > >
> > > > > > --
> > > > > > Best Wishes,
> > > > > > Ashutosh Bapat
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards
> > > > > Junwang Zhao
> > > >
> > > >
> > > >
> > > > --
> > > > Best Wishes,
> > > > Ashutosh Bapat
> > >
> > >
> > >
> > > --
> > > Regards
> > > Junwang Zhao
>
>
>
> --
> Best Wishes,
> Ashutosh Bapat



-- 
Regards
Junwang Zhao

Re: A minor adjustment to get_cheapest_path_for_pathkeys

2023-09-07 Thread Aleksander Alekseev

Hi,

>> I agree. I don't think the patch submitter is obliged to try to write
>> a good commit message, but people who contribute regularly or are
>> posting large stacks of complex patches are probably well-advised to
>> try. It makes life easier for committers and even for reviewers trying
>> to make sense of their patches.
>
>
> Fair point.  So I had a go at writing a commit message for this patch as
> attached.  Thanks for all the reviews.

+1 to Robert's and Andy's arguments above. IMO the problem with the
patch was that it was declared as a performance improvement. In such
cases we typically ask the authors to prove that the actual
improvement took place and that there were no degradations.

If we consider the patch marley as a refactoring that improves the
readability I see no reason not to merge it.

v2 LGTM.

-- 
Best regards,
Aleksander Alekseev

Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500

2023-09-07 Thread Jian Guo

Hi Jian He,

Thanks for fixing the compiler warnings, seems the CI used a little old 
compiler and complained:

 ISO C90 forbids mixed declarations and code 
[-Werror=declaration-after-statement]

But later C standard have relaxed the requirements for this, ISO C99 and later 
standard allow declarations and code to be freely mixed within compound 
statements: 
https://gcc.gnu.org/onlinedocs/gcc/Mixed-Labels-and-Declarations.html
Mixed Labels and Declarations (Using the GNU Compiler Collection 
(GCC))
Mixed Labels and Declarations (Using the GNU Compiler Collection (GCC))
gcc.gnu.org

From: jian he 
Sent: Wednesday, September 6, 2023 14:00
To: Jian Guo 
Cc: Tomas Vondra ; Hans Buschmann 
; pgsql-hackers@lists.postgresql.org 

Subject: Re: Wrong rows estimations with joins of CTEs slows queries by more 
than factor 500

!! External Email

On Tue, Aug 22, 2023 at 10:35 AM Jian Guo  wrote:
>
> Sure, Tomas.
>
> Here is the PG Commitfest link: 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcommitfest.postgresql.org%2F44%2F4510%2F=05%7C01%7Cgjian%40vmware.com%7C711eddbb381e4e5ed2cb08dbae9ea0cf%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638295768555223775%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=%2FuPl5rS1rFaQRnNevIVxKZCNA2Bbmr2rg%2BRoX5yUE9s%3D=0
> 

hi.
wondering around 
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcfbot.cputube.org%2F=05%7C01%7Cgjian%40vmware.com%7C711eddbb381e4e5ed2cb08dbae9ea0cf%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638295768555223775%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=t%2B8JrNQQAibe3Hdeico06U3HhLx70B17kzPMERY39os%3D=0
there is a compiler warning: 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcirrus-ci.com%2Ftask%2F6052087599988736=05%7C01%7Cgjian%40vmware.com%7C711eddbb381e4e5ed2cb08dbae9ea0cf%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638295768555223775%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=8WbXadRi7MhO0AiHjtJOs4y5mqCP8VHBdcQao%2FXPpM8%3D=0

I slightly edited the code, making the compiler warning out.

I am not sure if the following duplicate comment from (rte->rtekind ==
RTE_SUBQUERY && !rte->inh) branch is correct.
/*
* OK, recurse into the subquery.  Note that the original setting
* of vardata->isunique (which will surely be false) is left
* unchanged in this situation.  That's what we want, since even
* if the underlying column is unique, the subquery may have
* joined to other tables in a way that creates duplicates.
*/

Index varnoSaved = var->varno;
here varnoSaved should be int?

image attached is the coverage report
if I  understand coverage report correctly,
`
if (rel->subroot) examine_simple_variable(rel->subroot, var, vardata);
`
the above never actually executed?

!! External Email: This email originated from outside of the organization. Do 
not click links or open attachments unless you recognize the sender.

Re: [BackendXidGetPid] only access allProcs when xid matches

2023-09-07 Thread Ashutosh Bapat

Forgot to attach the patch.

On Thu, Sep 7, 2023 at 1:22 PM Ashutosh Bapat
 wrote:
>
> Hi Junwang,
> We leave a line blank after variable declaration as in the attached patch.
>
> Otherwise the patch looks good to me.
>
> The function modified by the patch is only used by extension
> pgrowlocks. Given that the function will be invoked as many times as
> the number of locked rows in the relation, the patch may show some
> improvement and thus be more compelling. One way to measure
> performance is to create a table with millions of rows, SELECT all
> rows with FOR SHARE/UPDATE clause. Then run pgrowlock() on that
> relation. This will invoke the given function a million times. That
> way we might be able to catch some miniscule improvement per row.
>
> If the performance is measurable, we can mark the CF entry as ready
> for committer.
>
> --
> Best Wishes,
> Ashutosh Bapat
>
> On Thu, Aug 10, 2023 at 1:48 PM Junwang Zhao  wrote:
> >
> > On Thu, Aug 10, 2023 at 4:11 PM Ashutosh Bapat
> >  wrote:
> > >
> > > Please add this to commitfest so that it's not forgotten.
> > >
> >
> > Added [1], thanks
> >
> > [1]: https://commitfest.postgresql.org/44/4495/
> >
> > > On Wed, Aug 9, 2023 at 8:37 PM Junwang Zhao  wrote:
> > > >
> > > > On Wed, Aug 9, 2023 at 10:46 PM Ashutosh Bapat
> > > >  wrote:
> > > > >
> > > > > On Wed, Aug 9, 2023 at 9:30 AM Junwang Zhao  wrote:
> > > > > >
> > > > > > In function `BackendXidGetPid`, when looping every proc's
> > > > > > TransactionId, there is no need to access its PGPROC since there
> > > > > > is shared memory access: `arrayP->pgprocnos[index]`.
> > > > > >
> > > > > > Though the compiler can optimize this kind of inefficiency, I
> > > > > > believe we should ship with better code.
> > > > > >
> > > > >
> > > > > Looks good to me. However, I would just move the variable declaration
> > > > > with their assignments inside the if () rather than combing the
> > > > > expressions. It more readable that way.
> > > >
> > > > yeah, make sense, also checked elsewhere using the original style,
> > > > attachment file
> > > > keep that style, thanks ;)
> > > >
> > > > >
> > > > > --
> > > > > Best Wishes,
> > > > > Ashutosh Bapat
> > > >
> > > >
> > > >
> > > > --
> > > > Regards
> > > > Junwang Zhao
> > >
> > >
> > >
> > > --
> > > Best Wishes,
> > > Ashutosh Bapat
> >
> >
> >
> > --
> > Regards
> > Junwang Zhao



-- 
Best Wishes,
Ashutosh Bapat
From 7ad91841dda0f49e7bcc44809b58230ed67d80f7 Mon Sep 17 00:00:00 2001
From: Zhao Junwang 
Date: Wed, 9 Aug 2023 11:43:21 +0800
Subject: [PATCH] only access allProcs when xid matches

In function `BackendXidGetPid`, when looping every proc's
TransactionId, there is no need to access its PGPROC since there
is shared memory access: `arrayP->pgprocnos[index]`.

Though the compiler can optimize this kind of inefficiency, I
believe we should ship with better code.

Signed-off-by: Zhao Junwang 
---
 src/backend/storage/ipc/procarray.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index bfbf7f903f..d93475b2bd 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -3174,11 +3174,11 @@ BackendXidGetPid(TransactionId xid)
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
-		int			pgprocno = arrayP->pgprocnos[index];
-		PGPROC	   *proc = [pgprocno];
-
 		if (other_xids[index] == xid)
 		{
+			int			pgprocno = arrayP->pgprocnos[index];
+			PGPROC	   *proc = [pgprocno];
+
 			result = proc->pid;
 			break;
 		}
-- 
2.25.1

Re: Create shorthand for including all extra tests

2023-09-07 Thread Daniel Gustafsson

> On 4 Sep 2023, at 23:09, Noah Misch  wrote:

> I could imagine categories for filesystem bytes and RAM bytes.  Also, while
> needs-private-lo has a bounded definition, "slow" doesn't.  If today's one
> "slow" test increases check-world duration by 1.1x, we may not let a
> 100x-increase test use the same keyword.

Agreed, the names should be descriptive enough to contain a boundary.  Any new
test which is orders of magnitude slower than an existing test suite most
likely will have one/more boundary characteristics not shared with existing
suites.  The test in 20210423204306.5osfpkt2ggaed...@alap3.anarazel.de for
autovacuum wraparound comes to mind as one that would warrant a new category.

> If one introduced needs-private-lo, the present spelling of "all" would be
> "needs-private-lo wal_consistency_checking".

I think it makes sense to invent a new PG_TEST_EXTRA category which (for now)
only contains wal_consistency_checking to make it consistent, such that "all"
can be achieved by a set of categories.

--
Daniel Gustafsson

Re: psql - pager support - using invisible chars for signalling end of report

2023-09-07 Thread Daniel Gustafsson

> On 6 Sep 2023, at 05:07, Thomas Munro  wrote:

> This sounds better than the QUERY_SEPARATOR hack from commit
> 664d757531e, and similar kludges elsewhere.  I think Pavel and David
> are right about NUL being impossible in psql query output, no?

+1, I would love to be able to rip out that hack.

--
Daniel Gustafsson

Re: Obsolete reference to pg_relation in comment

2023-09-07 Thread Daniel Gustafsson

> On 6 Sep 2023, at 21:13, Bruce Momjian  wrote:
> On Wed, Jul 26, 2023 at 05:14:08PM -0400, Tom Lane wrote:

>> I think we should reword this to just generically claim that holding
>> the Relation reference open for the whole transaction reduces overhead.
> 
> How is this attached patch?

Reads good to me, +1.

--
Daniel Gustafsson

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-09-07 Thread Christoph Berg

Re: Thomas Munro
> I have now disabled the test in 15 and 16 (like the older branches).
> I'll see about getting the fixes into master today, and we can
> contemplate back-patching later, after we've collected a convincing
> volume of test results from the build farm, CI and hopefully your
> s390x master snapshot builds (if that is a thing).

I have now seen the problem on 15 as well, if I had not reported that
yet.

I think the 16/17 builds look good now with the patches applied, but
there's still a lot of noise from the build box having random other
problems (network disconnect and other silly things), so it still
needs a lot of hand-holding :-/.

Christoph

Re: [BackendXidGetPid] only access allProcs when xid matches

2023-09-07 Thread Junwang Zhao

On Thu, Sep 7, 2023 at 4:04 PM Michael Paquier  wrote:
>
> On Thu, Sep 07, 2023 at 01:22:07PM +0530, Ashutosh Bapat wrote:
> > Otherwise the patch looks good to me.
> >
> > The function modified by the patch is only used by extension
> > pgrowlocks. Given that the function will be invoked as many times as
> > the number of locked rows in the relation, the patch may show some
> > improvement and thus be more compelling. One way to measure
> > performance is to create a table with millions of rows, SELECT all
> > rows with FOR SHARE/UPDATE clause. Then run pgrowlock() on that
> > relation. This will invoke the given function a million times. That
> > way we might be able to catch some miniscule improvement per row.
> >
> > If the performance is measurable, we can mark the CF entry as ready
> > for committer.
>
> So, is the difference measurable?  Assuming that your compiler does
I have checked my compiler, this patch give them same assembly code
as before.
> not optimize that, my guess is no because the cycles are going to be
> eaten by the other system calls in pgrowlocks.  You could increase the
> number of proc entries and force a large loop around
> BackendXidGetPid() to see a difference of runtime, but that's going
> to require a lot more proc entries to see any kind of difference
> outside the scope of a usual ~1% (somewhat magic number!) noise with
> such micro benchmarks.
>
> -intpgprocno = arrayP->pgprocnos[index];
> -PGPROC   *proc = [pgprocno];
> -
>  if (other_xids[index] == xid)
>  {
> +PGPROC   *proc = [arrayP->pgprocnos[index]];
>  result = proc->pid;
>  break;
>
> Saying that, moving the declarations into the inner loop is usually a
> good practice, but I would just keep two variables instead of one for
> the sake of readability.  That's a nit, sure.

I remember I split this into two lines in v2 patch. Whatever, I attached
a v3 with a suggestion from Ashutosh Bapat.

> --
> Michael



-- 
Regards
Junwang Zhao


v3-0001-BackendXidGetPid-only-access-allProcs-when-xid-ma.patch
Description: Binary data

Re: persist logical slots to disk during shutdown checkpoint

2023-09-07 Thread Amit Kapila

On Thu, Sep 7, 2023 at 1:18 PM Michael Paquier  wrote:
>
> On Thu, Sep 07, 2023 at 11:56:28AM +0530, Amit Kapila wrote:
> > Thanks, the patch looks good to me as well.
>
> +   /* This is used to track the last saved confirmed_flush LSN value */
> +   XLogRecPtr  last_saved_confirmed_flush;
>
> This does not feel sufficient, as the comment explaining what this
> variable does uses the same terms as the variable name (aka it is the
> last save of the confirmed_lsn).  Why it it here and why it is useful?
> In which context and/or code paths is it used?  Okay, there are some
> explanations when saving a slot, restoring a slot or when a checkpoint
> processes slots, but it seems important to me to document more things
> in ReplicationSlot where this is defined.
>

Hmm, this is quite debatable as different people feel differently
about this. The patch author kept it where it is now but in one of my
revisions, I rewrote and added it in the ReplicationSlot. Then
Ashutosh argued that it is better to keep it near where we are saving
the slot (aka where the patch has) [1]. Anyway, as I also preferred
the core part of the theory about this variable to be in
ReplicationSlot, so I'll move it there before commit unless someone
argues against it or has any other comments.

[1] - 
https://www.postgresql.org/message-id/CAExHW5uXq_CU80XJtmWbPJinRjJx54kbQJ9DT%3DUFySKXpjVwrw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: [BackendXidGetPid] only access allProcs when xid matches

2023-09-07 Thread Michael Paquier

On Thu, Sep 07, 2023 at 01:22:07PM +0530, Ashutosh Bapat wrote:
> Otherwise the patch looks good to me.
> 
> The function modified by the patch is only used by extension
> pgrowlocks. Given that the function will be invoked as many times as
> the number of locked rows in the relation, the patch may show some
> improvement and thus be more compelling. One way to measure
> performance is to create a table with millions of rows, SELECT all
> rows with FOR SHARE/UPDATE clause. Then run pgrowlock() on that
> relation. This will invoke the given function a million times. That
> way we might be able to catch some miniscule improvement per row.
> 
> If the performance is measurable, we can mark the CF entry as ready
> for committer.

So, is the difference measurable?  Assuming that your compiler does
not optimize that, my guess is no because the cycles are going to be
eaten by the other system calls in pgrowlocks.  You could increase the 
number of proc entries and force a large loop around
BackendXidGetPid() to see a difference of runtime, but that's going
to require a lot more proc entries to see any kind of difference
outside the scope of a usual ~1% (somewhat magic number!) noise with
such micro benchmarks. 

-intpgprocno = arrayP->pgprocnos[index];
-PGPROC   *proc = [pgprocno];
-
 if (other_xids[index] == xid)
 {
+PGPROC   *proc = [arrayP->pgprocnos[index]];
 result = proc->pid;
 break;

Saying that, moving the declarations into the inner loop is usually a
good practice, but I would just keep two variables instead of one for
the sake of readability.  That's a nit, sure.
--
Michael


signature.asc
Description: PGP signature

Re: psql help message contains excessive indentations

2023-09-07 Thread Kyotaro Horiguchi

At Thu, 7 Sep 2023 16:08:10 +0900, Yugo NAGATA  wrote in 
> On Thu, 07 Sep 2023 15:36:10 +0900 (JST)
> Kyotaro Horiguchi  wrote:
> 
> > At Thu, 7 Sep 2023 15:02:49 +0900, Yugo NAGATA  wrote 
> > in 
> > > I wonder this better to fix this in similar way to other places where the
> > > description has multiple lines., like "\g [(OPTIONS)] [FILE]".
> > 
> > It looks correct to me:
> 
> Yes. So, I mean how about fixing \watch description as the attached patch.

Ah. I see. I thought you meant that line needed the same change.

> > 
> > >  \errverboseshow most recent error message at maximum 
> > > verbosity
> > >  \g [(OPTIONS)] [FILE]  execute query (and send result to file or |pipe);
> > > \g with no arguments is equivalent to a semicolon
> > >  \gdesc describe result of query, without executing it

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: SLRUs in the main buffer pool - Page Header definitions

2023-09-07 Thread Bagga, Rishu

Alekseev, Aleksander (aleksan...@timescale.com) wrote:

> It looks like the patch in *this* thread was never registered on the
> commitfest and/or tested by CF bot, unless I'm missing something.
> Unfortunately it's a bit late to register it for the September CF
> especially considering the fact that it doesn't apply at the moment.

> This being said, please consider submitting the patch for the upcoming
> CF. Also, please consider joining the efforts and having one thread
> with a single patchset rather than different threads with different
> competing patches. This will simplify the work of the reviewers a lot.


Hi Aleksander,

Thank you for letting me know about this. I’ll follow up on the original 
thread within the next couple weeks with a new and updated patch for the 
next commitfest.


Sincerely,

Rishu Bagga, Amazon Web Services (AWS)

Re: CHECK Constraint Deferrable

2023-09-07 Thread Himanshu Upadhyaya

Attached is v2 of the patch, rebased against the latest HEAD.

Thanks,
Himanshu
From cf6057ebeffd026ae075ec43d573eca1164eff5b Mon Sep 17 00:00:00 2001
From: Himanshu Upadhyaya 
Date: Thu, 7 Sep 2023 13:19:14 +0530
Subject: [PATCH v2] Implementation of "CHECK Constraint" to make it

Deferrable.
---
 src/backend/access/common/tupdesc.c   |   2 +
 src/backend/catalog/heap.c|  50 --
 src/backend/commands/constraint.c | 116 ++
 src/backend/commands/copyfrom.c   |  10 +-
 src/backend/commands/tablecmds.c  |   8 ++
 src/backend/commands/trigger.c|  43 ++--
 src/backend/executor/execMain.c   |  33 +-
 src/backend/executor/execReplication.c|  10 +-
 src/backend/executor/nodeModifyTable.c|  29 +++---
 src/backend/parser/gram.y |   2 +-
 src/backend/parser/parse_utilcmd.c|   9 +-
 src/backend/utils/cache/relcache.c|   2 +
 src/include/access/tupdesc.h  |   2 +
 src/include/catalog/heap.h|   2 +
 src/include/catalog/pg_proc.dat   |   5 +
 src/include/commands/trigger.h|   2 +
 src/include/executor/executor.h   |  42 +++-
 src/test/regress/expected/constraints.out |  98 ++
 src/test/regress/sql/constraints.sql  |  99 ++
 19 files changed, 518 insertions(+), 46 deletions(-)

diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c
index 7c5c390503..098cb27932 100644
--- a/src/backend/access/common/tupdesc.c
+++ b/src/backend/access/common/tupdesc.c
@@ -204,6 +204,8 @@ CreateTupleDescCopyConstr(TupleDesc tupdesc)
 cpy->check[i].ccbin = pstrdup(constr->check[i].ccbin);
 cpy->check[i].ccvalid = constr->check[i].ccvalid;
 cpy->check[i].ccnoinherit = constr->check[i].ccnoinherit;
+cpy->check[i].ccdeferrable = constr->check[i].ccdeferrable;
+cpy->check[i].ccdeferred = constr->check[i].ccdeferred;
 			}
 		}
 
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index b42711f574..b595a60b42 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -52,17 +52,20 @@
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_subscription_rel.h"
 #include "catalog/pg_tablespace.h"
+#include "catalog/pg_trigger.h"
 #include "catalog/pg_type.h"
 #include "catalog/storage.h"
 #include "commands/tablecmds.h"
 #include "commands/typecmds.h"
 #include "miscadmin.h"
+#include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/optimizer.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
 #include "parser/parse_relation.h"
+#include "parser/parser.h"
 #include "parser/parsetree.h"
 #include "partitioning/partdesc.h"
 #include "pgstat.h"
@@ -102,7 +105,8 @@ static ObjectAddress AddNewRelationType(const char *typeName,
 static void RelationRemoveInheritance(Oid relid);
 static Oid	StoreRelCheck(Relation rel, const char *ccname, Node *expr,
 		  bool is_validated, bool is_local, int inhcount,
-		  bool is_no_inherit, bool is_internal);
+		  bool is_no_inherit, bool is_internal,
+		  bool is_deferrable, bool initdeferred);
 static void StoreConstraints(Relation rel, List *cooked_constraints,
 			 bool is_internal);
 static bool MergeWithExistingConstraint(Relation rel, const char *ccname, Node *expr,
@@ -2050,13 +2054,15 @@ SetAttrMissing(Oid relid, char *attname, char *value)
 static Oid
 StoreRelCheck(Relation rel, const char *ccname, Node *expr,
 			  bool is_validated, bool is_local, int inhcount,
-			  bool is_no_inherit, bool is_internal)
+			  bool is_no_inherit, bool is_internal,
+			  bool is_deferrable, bool initdeferred)
 {
 	char	   *ccbin;
 	List	   *varList;
 	int			keycount;
 	int16	   *attNos;
 	Oid			constrOid;
+	CreateTrigStmt *trigger;
 
 	/*
 	 * Flatten expression to string form for storage.
@@ -2113,8 +2119,10 @@ StoreRelCheck(Relation rel, const char *ccname, Node *expr,
 		CreateConstraintEntry(ccname,	/* Constraint Name */
 			  RelationGetNamespace(rel),	/* namespace */
 			  CONSTRAINT_CHECK, /* Constraint Type */
-			  false,	/* Is Deferrable */
-			  false,	/* Is Deferred */
+			  is_deferrable,	/* Is Check Constraint
+ * deferrable */
+			  initdeferred, /* Is Check Constraint initially
+			 * deferred */
 			  is_validated,
 			  InvalidOid,	/* no parent constraint */
 			  RelationGetRelid(rel),	/* relation */
@@ -2142,6 +2150,36 @@ StoreRelCheck(Relation rel, const char *ccname, Node *expr,
 			  is_no_inherit,	/* connoinherit */
 			  is_internal); /* internally constructed? */
 
+	/*
+	 * If the constraint is deferrable, create the deferred trigger to
+	 * re-validate the check constraint.(The trigger will be given an internal
+	 * dependency on the constraint by CreateTrigger, so there's no need to do
+	 * anything more here.)
+	 */
+	if

Re: [BackendXidGetPid] only access allProcs when xid matches

2023-09-07 Thread Ashutosh Bapat

Hi Junwang,
We leave a line blank after variable declaration as in the attached patch.

Otherwise the patch looks good to me.

The function modified by the patch is only used by extension
pgrowlocks. Given that the function will be invoked as many times as
the number of locked rows in the relation, the patch may show some
improvement and thus be more compelling. One way to measure
performance is to create a table with millions of rows, SELECT all
rows with FOR SHARE/UPDATE clause. Then run pgrowlock() on that
relation. This will invoke the given function a million times. That
way we might be able to catch some miniscule improvement per row.

If the performance is measurable, we can mark the CF entry as ready
for committer.

-- 
Best Wishes,
Ashutosh Bapat

On Thu, Aug 10, 2023 at 1:48 PM Junwang Zhao  wrote:
>
> On Thu, Aug 10, 2023 at 4:11 PM Ashutosh Bapat
>  wrote:
> >
> > Please add this to commitfest so that it's not forgotten.
> >
>
> Added [1], thanks
>
> [1]: https://commitfest.postgresql.org/44/4495/
>
> > On Wed, Aug 9, 2023 at 8:37 PM Junwang Zhao  wrote:
> > >
> > > On Wed, Aug 9, 2023 at 10:46 PM Ashutosh Bapat
> > >  wrote:
> > > >
> > > > On Wed, Aug 9, 2023 at 9:30 AM Junwang Zhao  wrote:
> > > > >
> > > > > In function `BackendXidGetPid`, when looping every proc's
> > > > > TransactionId, there is no need to access its PGPROC since there
> > > > > is shared memory access: `arrayP->pgprocnos[index]`.
> > > > >
> > > > > Though the compiler can optimize this kind of inefficiency, I
> > > > > believe we should ship with better code.
> > > > >
> > > >
> > > > Looks good to me. However, I would just move the variable declaration
> > > > with their assignments inside the if () rather than combing the
> > > > expressions. It more readable that way.
> > >
> > > yeah, make sense, also checked elsewhere using the original style,
> > > attachment file
> > > keep that style, thanks ;)
> > >
> > > >
> > > > --
> > > > Best Wishes,
> > > > Ashutosh Bapat
> > >
> > >
> > >
> > > --
> > > Regards
> > > Junwang Zhao
> >
> >
> >
> > --
> > Best Wishes,
> > Ashutosh Bapat
>
>
>
> --
> Regards
> Junwang Zhao

Re: persist logical slots to disk during shutdown checkpoint

2023-09-07 Thread Michael Paquier

On Thu, Sep 07, 2023 at 11:56:28AM +0530, Amit Kapila wrote:
> Thanks, the patch looks good to me as well.

+   /* This is used to track the last saved confirmed_flush LSN value */
+   XLogRecPtr  last_saved_confirmed_flush;

This does not feel sufficient, as the comment explaining what this
variable does uses the same terms as the variable name (aka it is the
last save of the confirmed_lsn).  Why it it here and why it is useful?
In which context and/or code paths is it used?  Okay, there are some
explanations when saving a slot, restoring a slot or when a checkpoint
processes slots, but it seems important to me to document more things
in ReplicationSlot where this is defined.

(Just passing by, I have not checked the patch logic in details but
that looks under-documented to me.)
--
Michael


signature.asc
Description: PGP signature

Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

2023-09-07 Thread Michael Paquier

On Thu, Sep 07, 2023 at 07:07:36AM +, Hayato Kuroda (Fujitsu) wrote:
> # Problem
> 
> The "pg_ctl start" command returns 0 (succeeded) even if the cluster has
> already been started. This occurs on Windows environment, and when the command
> is executed just after postmaster starts.

Not failing on `pg_ctl start` if the command is run on a data folder
that has already been started previously by a different command with a
postmaster still alive feels like cheating, because pg_ctl is lying
about its result.  If pg_ctl wants to start a cluster but is not able
to do it, either because the postmaster failed at startup or because
the cluster has already started, it should report a failure.  Now, I
also recall that the processes spawned by pg_ctl on Windows make the
status handling rather tricky to reason about..
--
Michael


signature.asc
Description: PGP signature

Re: Can a role have indirect ADMIN OPTION on another role?

2023-09-07 Thread Ashutosh Sharma

On Thu, Sep 7, 2023 at 3:43 AM David G. Johnston
 wrote:
>
> On Wed, Sep 6, 2023 at 1:55 PM Ashutosh Sharma  wrote:
>>
>> But what if roleB doesn't want to give roleA access to
>> the certain objects it owns.
>
>
> Not doable - roleA can always pretend they are roleB one way or another since 
> roleA made roleB.
>

Okay. It makes sense. thanks.!

--
With Regards,
Ashutosh Sharma.

Re: Can a role have indirect ADMIN OPTION on another role?

2023-09-07 Thread Ashutosh Sharma

On Thu, Sep 7, 2023 at 12:20 AM Robert Haas  wrote:
>
> On Wed, Sep 6, 2023 at 1:33 PM Ashutosh Sharma  wrote:
> > Actually I have one more question. With this new design, assuming that
> > createrole_self_grant is set to 'set, inherit' in postgresql.conf and
> > if roleA creates roleB. So, in this case, roleA will inherit
> > permissions of roleB which means roleA will have access to objects
> > owned by roleB. But what if roleB doesn't want to give roleA access to
> > the certain objects it owns. As an example let's say that roleB
> > creates a table 't' and by default (with this setting) roleA will have
> > access to this table, but for some reason roleB does not want roleA to
> > have access to it. So what's the option for roleB? I've tried running
> > "revoke select on table t from roleA" but that doesn't seem to be
> > working. the only option that works is roleA himself set inherit
> > option on roleB to false - "grant roleB to roleA with inherit false;"
>
> It doesn't matter what roleB wants. roleA is strictly more powerful
> than roleB and can do whatever they want to roleB or roleB's objects
> regardless of how roleB feels about it.
>
> In the same way, the superuser is strictly more powerful than either
> roleA or roleB and can override any security control that either one
> of them put in place.
>
> Neither roleB nor roleA has any right to hide their data from the
> superuser, and roleB has no right to hide data from roleA. It's a
> hierarchy. If you're on top, you're in charge, and that's it.
>
> Here again, it can't really meaningfully work in any other way.
> Suppose you were to add a feature to allow roleB to hide data from
> roleA. Given that roleA has the ability to change roleB's password,
> how could that possibly work? When you give one user the ability to
> administer another user, that includes the right to change that user's
> password, change whether they can log in, drop the role, give the
> privileges of that role to themselves or other users, and a whole
> bunch of other super-powerful stuff. You can't really give someone
> that level of power over another account and, at the same time, expect
> the account being administered to be able to keep the more powerful
> account from doing stuff. It just can't possibly work. If you want
> roleB to be able to resist roleA, you have to give roleA less power.
>

I agree with you. thank you once again.

--
With Regards,
Ashutosh Sharma.

Re: Impact of checkpointer during pg_upgrade

2023-09-07 Thread Michael Paquier

(Just dumping what I have in mind while reading the thread.)

On Sat, Sep 02, 2023 at 10:08:51AM +0530, Amit Kapila wrote:
> During pg_upgrade, we start the server for the old cluster which can
> allow the checkpointer to remove the WAL files. It has been noticed
> that we do generate certain types of WAL records (e.g
> XLOG_RUNNING_XACTS, XLOG_CHECKPOINT_ONLINE, and XLOG_FPI_FOR_HINT)
> even during pg_upgrade for old cluster, so additional WAL records
> could let checkpointer decide that certain WAL segments can be removed
> (e.g. say wal size crosses max_slot_wal_keep_size_mb) and invalidate
> the slots. Currently, I can't see any problem with this but for future
> work where we want to migrate logical slots during an upgrade[1], we
> need to decide what to do for such cases. The initial idea we had was
> that if the old cluster has some invalid slots, we won't allow an
> upgrade unless the user removes such slots or uses some option like
> --exclude-slots. It is quite possible that slots got invalidated
> during pg_upgrade due to no user activity. Now, even though the
> possibility of the same is less I think it is worth considering what
> should be the behavior.
> 
> The other possibilities apart from not allowing an upgrade in such a
> case could be (a) Before starting the old cluster, we fetch the slots
> directly from the disk using some tool like [2] and make the decisions
> based on that state; (b) During the upgrade, we don't allow WAL to be
> removed if it can invalidate slots; (c) Copy/Migrate the invalid slots
> as well but for that, we need to expose an API to invalidate the
> slots; (d) somehow distinguish the slots that are invalidated during
> an upgrade and then simply copy such slots because anyway we ensure
> that all the WAL required by slot is sent before shutdown.

Checking for any invalid slots at the beginning of the upgrade and
complain sounds like a good thing to do, because these are not
expected to begin with, no?  That's similar to a pre-check requirement
that the slots should have fed the subscribers with all the data
available up to the shutdown checkpoint when the publisher was stopped
before running pg_upgrade.  So (a) is a good idea to prevent an
upgrade based on a state we don't expect from the start, as something
in check.c, I assume.

On top of that, (b) sounds like a good idea to me anyway to be more
defensive.  But couldn't we do that just like we do for autovacuum and
force the GUCs that could remove the slot's WAL to not do their work
instead?  An upgrade is a special state of the cluster, and I'm not
much into painting more checks based on IsBinaryUpgrade to prevent WAL
segments to be removed while we can have a full control of what we
want with just the GUCs that force the hand of the slots.  That just
seems simpler to me, and the WAL recycling logic is already complex
particularly with all the GUCs popping lately to force some conditions
to do the WAL recycling and/or removal.
 
During the upgrade, if some segments are removed and some of the slots
we expect to still be valid once the upgrade is done are marked as
invalid and become unusable, I think that we should just copy these
slots, but also update their state data so as they can still be used
with what we expect, as these could be wanted by the subscribers.
That's what you mean with (d), I assume.  Do you think that it would
be possible to force the slot's data on the publishers so as they use
a local LSN based on the new WAL we are resetting at?  At least that
seems more friendly to me as this limits the set of manipulations to
do on the slots for the end-user.  The protection from (b) ought to be
enough, in itself.  (c) overlaps with (a), especially if we want to be
able to read or write some of the slot's data offline.
--
Michael


signature.asc
Description: PGP signature

Re: psql help message contains excessive indentations

2023-09-07 Thread Yugo NAGATA

On Thu, 07 Sep 2023 15:36:10 +0900 (JST)
Kyotaro Horiguchi  wrote:

> At Thu, 7 Sep 2023 15:02:49 +0900, Yugo NAGATA  wrote in 
> > On Thu, 07 Sep 2023 14:29:56 +0900 (JST)
> > Kyotaro Horiguchi  wrote:
> > >  >  \q quit psql
> > >  >  \watch [[i=]SEC] [c=N] [m=MIN]
> > > !>  execute query every SEC seconds, up to N times
> > > !>  stop if less than MIN rows are returned
> > > 
> > > The last two lines definitely have some extra indentation.
> > 
> > Agreed.
> > 
> > > I've attached a patch that fixes this.
> > 
> > I wonder this better to fix this in similar way to other places where the
> > description has multiple lines., like "\g [(OPTIONS)] [FILE]".
> 
> It looks correct to me:

Yes. So, I mean how about fixing \watch description as the attached patch.

> 
> >  \errverboseshow most recent error message at maximum verbosity
> >  \g [(OPTIONS)] [FILE]  execute query (and send result to file or |pipe);
> > \g with no arguments is equivalent to a semicolon
> >  \gdesc describe result of query, without executing it
> 
> regards.
> 
> -- 
> Kyotaro Horiguchi
> NTT Open Source Software Center


-- 
Yugo NAGATA 
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index 38c165a627..12280c0e54 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -200,9 +200,9 @@ slashUsage(unsigned short int pager)
 	HELP0("  \\gset [PREFIX] execute query and store result in psql variables\n");
 	HELP0("  \\gx [(OPTIONS)] [FILE] as \\g, but forces expanded output mode\n");
 	HELP0("  \\q quit psql\n");
-	HELP0("  \\watch [[i=]SEC] [c=N] [m=MIN]\n");
-	HELP0("  execute query every SEC seconds, up to N times\n");
-	HELP0("  stop if less than MIN rows are returned\n");
+	HELP0("  \\watch [[i=]SEC] [c=N] [m=MIN]\n"
+		  " execute query every SEC seconds, up to N times\n"
+		  " stop if less than MIN rows are returned\n");
 	HELP0("\n");
 
 	HELP0("Help\n");

1 2 >

1 - 100 of 110 matches

Mail list logo