Re: Patch: Add parse_type Function

2024-02-24 Thread Jim Jones



On 24.02.24 14:46, David E. Wheeler wrote:
> What’s the protocol for marking a patch ready for committer?
I guess after the review of the last assigned reviewer


v9 applies cleanly, all tests pass and documentation builds correctly.

Just a very small observation:

The fact that a completely invalid type returns NULL ..

SELECT to_regtypemod('foo');
 to_regtypemod
---
  
(1 row)


.. but a "partially" valid one returns an error might be confusing

postgres=# SELECT to_regtypemod('timestamp(-4)');
ERROR:  syntax error at or near "-"
LINE 1: SELECT to_regtypemod('timestamp(-4)');
  ^
CONTEXT:  invalid type name "timestamp(-4)"

postgres=# SELECT to_regtypemod('text(-4)');
ERROR:  type modifier is not allowed for type "text"


This behaviour is mentioned in the documentation, so I'd say it is ok.

+    ). Failure to extract a valid
potential
+    type name results in an error; however, if the extracted name
is not
+    known to the system, this function will return
NULL.

I would personally prefer either NULL or an error in both cases, but I
can totally live with the current design.

OTOH, format_type returns "???" and it seems to be fine, so don't take
this comment too seriously :)

SELECT format_type(-1,-1);
 format_type
-
 ???
(1 row)


Other than that, LGTM.

Thanks David!

Best,
Jim

-- 
Jim





Re: Optimize planner memory consumption for huge arrays

2024-02-24 Thread Tom Lane
I wrote:
> Tomas Vondra  writes:
>> On 2/19/24 16:45, Tom Lane wrote:
>>> Tomas Vondra  writes:
 For example, I don't think we expect selectivity functions to allocate
 long-lived objects, right? So maybe we could run them in a dedicated
 memory context, and reset it aggressively (after each call).

>>> That could eliminate a whole lot of potential leaks.  I'm not sure 
>>> though how much it moves the needle in terms of overall planner
>>> memory consumption.

>> It was an ad hoc thought, inspired by the issue at hand. Maybe it would
>> be possible to find similar "boundaries" in other parts of the planner.

> Here's a quick and probably-incomplete implementation of that idea.
> I've not tried to study its effects on memory consumption, just made
> sure it passes check-world.

I spent a bit more time on this patch.  One thing I was concerned
about was whether it causes any noticeable slowdown, and it seems that
it does: testing with "pgbench -S" I observe perhaps 1% slowdown.
However, we don't necessarily need to reset the temp context after
every single usage.  I experimented with resetting it every tenth
time, and that got me from 1% slower than HEAD to 1% faster.  Of
course "every tenth time" is very ad hoc.  I wondered if we could
make it somehow conditional on how much memory had been consumed
in the temp context, but there doesn't seem to be any cheap way
to get that.  Applying something like MemoryContextMemConsumed
would surely be a loser.  I'm not sure if it'd be worth extending
the mcxt.c API to provide something like "MemoryContextResetIfBig",
with some internal rule that would be cheap to apply like "reset
if we have any non-keeper blocks".

I also looked into whether it really does reduce overall memory
consumption noticeably, by collecting stats about planner memory
consumption during the core regression tests.  The answer is that
it barely helps.  I see the average used space across all planner
invocations drop from 23344 bytes to 23220, and the worst-case
numbers hardly move at all.  So that's a little discouraging.
But of course the regression tests prefer not to deal in very
large/expensive test cases, so maybe it's not surprising that
I don't see much win in this test.

Anyway, 0001 attached is a cleaned-up patch with the every-tenth-
time rule, and 0002 (not meant for commit) is the quick and
dirty instrumentation patch I used for collecting usage stats.

Even though this seems of only edge-case value, I'd much prefer
to do this than the sort of ad-hoc patching initially proposed
in this thread.

regards, tom lane

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index c949dc1866..4678c778b0 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -24,6 +24,7 @@
 #include "statistics/statistics.h"
 #include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
+#include "utils/memutils.h"
 #include "utils/selfuncs.h"
 
 /*
@@ -693,6 +694,7 @@ clause_selectivity_ext(PlannerInfo *root,
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
 	bool		cacheable = false;
+	MemoryContext saved_cxt;
 
 	if (clause == NULL)			/* can this still happen? */
 		return s1;
@@ -756,6 +758,21 @@ clause_selectivity_ext(PlannerInfo *root,
 			clause = (Node *) rinfo->clause;
 	}
 
+	/*
+	 * Run all the remaining work in the short-lived planner_tmp_cxt, which
+	 * we'll reset at the bottom.  This allows selectivity-related code to not
+	 * be too concerned about leaking memory.
+	 */
+	saved_cxt = MemoryContextSwitchTo(root->glob->planner_tmp_cxt);
+
+	/*
+	 * This function can be called recursively, in which case we'd better not
+	 * reset planner_tmp_cxt until we exit the topmost level.  Use of
+	 * planner_tmp_cxt_depth also makes it safe for other places to use and
+	 * reset planner_tmp_cxt in the same fashion.
+	 */
+	root->glob->planner_tmp_cxt_depth++;
+
 	if (IsA(clause, Var))
 	{
 		Var		   *var = (Var *) clause;
@@ -967,6 +984,20 @@ clause_selectivity_ext(PlannerInfo *root,
 			rinfo->outer_selec = s1;
 	}
 
+	/* Exit and optionally clean up the planner_tmp_cxt */
+	MemoryContextSwitchTo(saved_cxt);
+	root->glob->planner_tmp_cxt_usage++;
+	Assert(root->glob->planner_tmp_cxt_depth > 0);
+	if (--root->glob->planner_tmp_cxt_depth == 0)
+	{
+		/* It's safe to reset the tmp context, but do we want to? */
+		if (root->glob->planner_tmp_cxt_usage >= PLANNER_TMP_CXT_USAGE_RESET)
+		{
+			MemoryContextReset(root->glob->planner_tmp_cxt);
+			root->glob->planner_tmp_cxt_usage = 0;
+		}
+	}
+
 #ifdef SELECTIVITY_DEBUG
 	elog(DEBUG4, "clause_selectivity: s1 %f", s1);
 #endif			/* SELECTIVITY_DEBUG */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index be4e182869..df6a1fd330 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -307,6 +307,12 @@ 

Re: Relation bulk write facility

2024-02-24 Thread Thomas Munro
On Sun, Feb 25, 2024 at 11:16 AM Thomas Munro  wrote:
> On Sun, Feb 25, 2024 at 11:06 AM Heikki Linnakangas  wrote:
> > Regarding the issue at hand, perhaps we should define PG_IO_ALIGN_SIZE as 
> > 16 on AIX, if that's the best the linker can do on that platform.
>
> You'll probably get either an error or silently fall back to buffered
> I/O, if direct I/O is enabled and you try to read/write a badly
> aligned buffer.  That's documented (they offer finfo() to query it,
> but it's always 4KB for the same sort of reasons as it is on every
> other OS).

I guess it's the latter ("to work efficiently" sounds like it isn't
going to reject the request):

https://www.ibm.com/docs/en/aix/7.3?topic=tuning-direct-io

If you make it < 4KB then all direct I/O would be affected, not just
this one place, so then you might as well just not allow direct I/O on
AIX at all, to avoid giving a false impression that it does something.
(Note that if we think the platform lacks O_DIRECT we don't make those
assertions about alignment).

FWIW I'm aware of one other thing that is wrong with our direct I/O
support on AIX: it should perhaps be using a different flag.  I
created a wiki page to defer thinking about any AIX issues
until/unless at least one real, live user shows up, which hasn't
happened yet:  https://wiki.postgresql.org/wiki/AIX




Re: Relation bulk write facility

2024-02-24 Thread Thomas Munro
On Sun, Feb 25, 2024 at 11:06 AM Heikki Linnakangas  wrote:
> Regarding the issue at hand, perhaps we should define PG_IO_ALIGN_SIZE as 16 
> on AIX, if that's the best the linker can do on that platform.

You'll probably get either an error or silently fall back to buffered
I/O, if direct I/O is enabled and you try to read/write a badly
aligned buffer.  That's documented (they offer finfo() to query it,
but it's always 4KB for the same sort of reasons as it is on every
other OS).




Re: Relation bulk write facility

2024-02-24 Thread Heikki Linnakangas
On 24 February 2024 23:29:36 EET, Andres Freund  wrote:
>Hi,
>
>On 2024-02-24 11:50:24 -0800, Noah Misch wrote:
>> > We see this happen with both xlc and gcc (new enough to know how to do
>> > this).  One idea would be that the AIX *linker* is unable to align it,
>> > as that is the common tool-chain component here (and unlike stack and
>> > heap objects, this scope is the linker's job).  There is a
>> > pre-existing example of a zero-buffer that is at file scope like that:
>> > pg_prewarm.c.  Perhaps it doesn't get tested?
>> >
>> > Hmm.
>>
>> GCC docs do say "For some linkers, the maximum supported alignment may be 
>> very
>> very small.", but AIX "man LD" says "data sections are aligned on a boundary
>> so as to satisfy the alignment of all CSECTs in the sections".  It also has 
>> -H
>> and -K flags to force some particular higher alignment.
>
>Some xlc manual [1] states that
>
>  n must be a positive power of 2, or NIL. NIL can be specified as either
>  __attribute__((aligned())) or __attribute__((aligned)); this is the same as
>  specifying the maximum system alignment (16 bytes on all UNIX platforms).
>
>Which does seems to suggest that this is a platform restriction.

My reading of that paragraph is that you can set it to any powet of two, and it 
should work. 16 bytes is just what you get if you set it to NIL.

>Let's just drop AIX. This isn't the only alignment issue we've found and the
>solution for those isn't so much a fix as forcing everyone to carefully only
>look into one direction and not notice the cliffs to either side.

I think the way that decision should go is that as long as someone is willing 
to step up and do the work keep AIX support going, we support it. To be clear, 
that someone is not me. Anyone willing to do it?

Regarding the issue at hand, perhaps we should define PG_IO_ALIGN_SIZE as 16 on 
AIX, if that's the best the linker can do on that platform.

We could also make the allocation 2*PG_IO_ALIGN_SIZE and round up the starting 
address ourselves to PG_IO_ALIGN_SIZE. Or allocate it in the heap.

- Heikki




Re: Relation bulk write facility

2024-02-24 Thread Andres Freund
Hi,

On 2024-02-24 11:50:24 -0800, Noah Misch wrote:
> > We see this happen with both xlc and gcc (new enough to know how to do
> > this).  One idea would be that the AIX *linker* is unable to align it,
> > as that is the common tool-chain component here (and unlike stack and
> > heap objects, this scope is the linker's job).  There is a
> > pre-existing example of a zero-buffer that is at file scope like that:
> > pg_prewarm.c.  Perhaps it doesn't get tested?
> >
> > Hmm.
>
> GCC docs do say "For some linkers, the maximum supported alignment may be very
> very small.", but AIX "man LD" says "data sections are aligned on a boundary
> so as to satisfy the alignment of all CSECTs in the sections".  It also has -H
> and -K flags to force some particular higher alignment.

Some xlc manual [1] states that

  n must be a positive power of 2, or NIL. NIL can be specified as either
  __attribute__((aligned())) or __attribute__((aligned)); this is the same as
  specifying the maximum system alignment (16 bytes on all UNIX platforms).

Which does seems to suggest that this is a platform restriction.


Let's just drop AIX. This isn't the only alignment issue we've found and the
solution for those isn't so much a fix as forcing everyone to carefully only
look into one direction and not notice the cliffs to either side.

Greetings,

Andres Freund

[1] 
https://www.ibm.com/docs/en/SSGH2K_13.1.2/com.ibm.compilers.aix.doc/proguide.pdf




Re: Psql meta-command conninfo+

2024-02-24 Thread Nathan Bossart
On Sat, Feb 17, 2024 at 02:53:43PM +, Maiquel Grassi wrote:
> The "pg_stat_ssl" view is available from >= PG 9.5, and the "pg_stat_gssapi" 
> view is
> available from >= PG 12. The "compression" column was removed from the
> "pg_stat_ssl" from >= PG 14. All of these cases introduce greater complexity 
> in
> maintaining the SQL. The central idea from the beginning has always been to 
> show
> the user all the information from \conninfo and extend it in \conninfo+.

IMHO we should use the views whenever possible (for the reason stated
above [0]).  I think it's okay if we need to fall back to a different
approach for older versions.  But presumably we'll discontinue psql support
for these old server versions at some point, at which point we can simply
delete the dead code that doesn't use the views.

> The absence
> of the "compression" column in version 14 and above makes dealing with this 
> even
> more complicated, and not showing it seems to contradict \conninfo.

I would be okay with using PQsslAttribute() for all versions for this one
since any remaining support for this feature is on its way out.  Once psql
no longer supports any versions that allow SSL compression, we could
probably remove it from \conninfo[+] completely or hard-code it to "off".

> SSL support has been available since version 7.1 (see documentation); if 
> there was
> 
> support before that, I can't say. In this regard, it may seem strange, but 
> there are still
> many legacy systems running older versions of PostgreSQL. Just yesterday, I 
> assisted
> a client who is still using PG 8.2. In these cases, using the "pg_stat_ssl" 
> and
> "pg_stat_gssapi" views would not be possible because they don't exist on the 
> server.
> I believe that psql should cover as many cases as possible when it comes to 
> compatibility
> with older versions (even those no longer supported). In this case, 
> concerning SSL and
> GSS, I think libpq is better prepared to handle this.

At the moment, the psql support cutoff appears to be v9.2 (see commit
cf0cab8), which has been out of support for over 6 years.  If \conninfo+
happens to work for older versions, then great, but I don't think we should
expend too much energy in this area.

[0] https://postgr.es/m/20240216155449.GA1236054%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Relation bulk write facility

2024-02-24 Thread Noah Misch
On Sun, Feb 25, 2024 at 09:13:47AM +1300, Thomas Munro wrote:
> On Sun, Feb 25, 2024 at 9:12 AM Thomas Munro  wrote:
> > On Sun, Feb 25, 2024 at 8:50 AM Noah Misch  wrote:
> > > On GNU/Linux x64, gcc correctly records alignment=2**12 for the associated
> > > section (.rodata for bulk_write.o zero_buffer, .bss for pg_prewarm.o
> > > blockbuffer).  If I'm reading this right, neither AIX gcc nor xlc is 
> > > marking
> > > the section with sufficient alignment, in bulk_write.o or pg_prewarm.o:
> >
> > Ah, that is a bit of a hazard that we should probably document.
> >
> > I guess the ideas to fix this would be: use smgrzeroextend() instead
> > of this coding, and/or perhaps look at the coding of pg_pwrite_zeros()
> > (function-local static) for any other place that needs such a thing,
> > if it would be satisfied by function-local scope?

True.  Alternatively, could arrange for "#define PG_O_DIRECT 0" on AIX, which
disables the alignment assertions (and debug_io_direct).

> Erm, wait, how does that function-local static object work differently?

I don't know specifically, but I expect they're different parts of the gcc
implementation.  Aligning an xcoff section may entail some xcoff-specific gcc
component.  Aligning a function-local object just changes the early
instructions of the function; it's independent of the object format.




Re: Relation bulk write facility

2024-02-24 Thread Thomas Munro
On Sun, Feb 25, 2024 at 9:12 AM Thomas Munro  wrote:
> On Sun, Feb 25, 2024 at 8:50 AM Noah Misch  wrote:
> > On GNU/Linux x64, gcc correctly records alignment=2**12 for the associated
> > section (.rodata for bulk_write.o zero_buffer, .bss for pg_prewarm.o
> > blockbuffer).  If I'm reading this right, neither AIX gcc nor xlc is marking
> > the section with sufficient alignment, in bulk_write.o or pg_prewarm.o:
>
> Ah, that is a bit of a hazard that we should probably document.
>
> I guess the ideas to fix this would be: use smgrzeroextend() instead
> of this coding, and/or perhaps look at the coding of pg_pwrite_zeros()
> (function-local static) for any other place that needs such a thing,
> if it would be satisfied by function-local scope?

Erm, wait, how does that function-local static object work differently?




Re: Relation bulk write facility

2024-02-24 Thread Thomas Munro
On Sun, Feb 25, 2024 at 8:50 AM Noah Misch  wrote:
> On GNU/Linux x64, gcc correctly records alignment=2**12 for the associated
> section (.rodata for bulk_write.o zero_buffer, .bss for pg_prewarm.o
> blockbuffer).  If I'm reading this right, neither AIX gcc nor xlc is marking
> the section with sufficient alignment, in bulk_write.o or pg_prewarm.o:

Ah, that is a bit of a hazard that we should probably document.

I guess the ideas to fix this would be: use smgrzeroextend() instead
of this coding, and/or perhaps look at the coding of pg_pwrite_zeros()
(function-local static) for any other place that needs such a thing,
if it would be satisfied by function-local scope?




Re: Relation bulk write facility

2024-02-24 Thread Noah Misch
On Sun, Feb 25, 2024 at 07:52:16AM +1300, Thomas Munro wrote:
> On Sun, Feb 25, 2024 at 6:24 AM Noah Misch  wrote:
> > On Fri, Feb 23, 2024 at 04:27:34PM +0200, Heikki Linnakangas wrote:
> > > Committed this. Thanks everyone!
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill=2024-02-24%2015%3A13%3A14
> >  got:
> > TRAP: failed Assert("(uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, 
> > buffer)"), File: "md.c", Line: 472, PID: 43188608
> >
> > with this stack trace:
> > #5  0x10005cf0 in ExceptionalCondition (conditionName=0x1015d790 
> >  "`", fileName=0x0, lineNumber=16780064) at assert.c:66
> > #6  0x102daba8 in mdextend (reln=0x1042628c , 
> > forknum=812540744, blocknum=33, buffer=0x306e6000, skipFsync=812539904) at 
> > md.c:472
> > #7  0x102d6760 in smgrextend (reln=0x306e6670, forknum=812540744, 
> > blocknum=33, buffer=0x306e6000, skipFsync=812539904) at smgr.c:541
> > #8  0x104c8dac in smgr_bulk_flush (bulkstate=0x306e6000) at bulk_write.c:245
> 
> So that's:
> 
> static const PGIOAlignedBlock zero_buffer = {{0}};  /* worth BLCKSZ */
> 
> ...
> smgrextend(bulkstate->smgr, bulkstate->forknum,
>bulkstate->pages_written++,
>_buffer,
>true);
> 
> ... where PGIOAlignedBlock is:
> 
> typedef union PGIOAlignedBlock
> {
> #ifdef pg_attribute_aligned
> pg_attribute_aligned(PG_IO_ALIGN_SIZE)
> #endif
> chardata[BLCKSZ];
> ...
> 
> We see this happen with both xlc and gcc (new enough to know how to do
> this).  One idea would be that the AIX *linker* is unable to align it,
> as that is the common tool-chain component here (and unlike stack and
> heap objects, this scope is the linker's job).  There is a
> pre-existing example of a zero-buffer that is at file scope like that:
> pg_prewarm.c.  Perhaps it doesn't get tested?
> 
> Hmm.

GCC docs do say "For some linkers, the maximum supported alignment may be very
very small.", but AIX "man LD" says "data sections are aligned on a boundary
so as to satisfy the alignment of all CSECTs in the sections".  It also has -H
and -K flags to force some particular higher alignment.

On GNU/Linux x64, gcc correctly records alignment=2**12 for the associated
section (.rodata for bulk_write.o zero_buffer, .bss for pg_prewarm.o
blockbuffer).  If I'm reading this right, neither AIX gcc nor xlc is marking
the section with sufficient alignment, in bulk_write.o or pg_prewarm.o:

$ /opt/cfarm/binutils-latest/bin/objdump --section-headers 
~/farm/*/HEAD/pgsqlkeep.*/src/backend/storage/smgr/bulk_write.o

/home/nm/farm/gcc64/HEAD/pgsqlkeep.2024-02-24_00-03-22/src/backend/storage/smgr/bulk_write.o:
 file format aix5coff64-rs6000

Sections:
Idx Name  Size  VMA   LMA   File off  Algn
  0 .text 277c      00f0  2**2
  CONTENTS, ALLOC, LOAD, RELOC, CODE
  1 .data 00e4  277c  277c  286c  2**3
  CONTENTS, ALLOC, LOAD, RELOC, DATA
  2 .debug0001f7ea      2950  2**3
  CONTENTS

/home/nm/farm/xlc32/HEAD/pgsqlkeep.2024-02-24_15-12-23/src/backend/storage/smgr/bulk_write.o:
 file format aixcoff-rs6000

Sections:
Idx Name  Size  VMA   LMA   File off  Algn
  0 .text 0880      0180  2**2
  CONTENTS, ALLOC, LOAD, RELOC, CODE
  1 .data 410c  0880  0880  0a00  2**3
  CONTENTS, ALLOC, LOAD, RELOC, DATA
  2 .bss    498c  498c    2**3
  ALLOC
  3 .debug8448      4b24  2**3
  CONTENTS
  4 .except   0018      4b0c  2**3
  CONTENTS, LOAD

$ /opt/cfarm/binutils-latest/bin/objdump --section-headers 
~/farm/*/HEAD/pgsqlkeep.*/contrib/pg_prewarm/pg_prewarm.o

/home/nm/farm/gcc32/HEAD/pgsqlkeep.2024-01-21_03-16-12/contrib/pg_prewarm/pg_prewarm.o:
 file format aixcoff-rs6000

Sections:
Idx Name  Size  VMA   LMA   File off  Algn
  0 .text 0a6c      00b4  2**2
  CONTENTS, ALLOC, LOAD, RELOC, CODE
  1 .data 0044  0a6c  0a6c  0b20  2**3
  CONTENTS, ALLOC, LOAD, RELOC, DATA
  2 .bss  2550  0ab0  0ab0    2**3
  ALLOC
  3 .debug0001c50e      0b64  2**3
  CONTENTS

/home/nm/farm/gcc64/HEAD/pgsqlkeep.2024-02-15_17-13-04/contrib/pg_prewarm/pg_prewarm.o:
 file format aix5coff64-rs6000

Sections:
Idx Name  Size  VMA   LMA   File off  Algn
  0 .text 0948      0138  2**2
  CONTENTS, ALLOC, LOAD, 

Re: Relation bulk write facility

2024-02-24 Thread Thomas Munro
On Sun, Feb 25, 2024 at 6:24 AM Noah Misch  wrote:
> On Fri, Feb 23, 2024 at 04:27:34PM +0200, Heikki Linnakangas wrote:
> > Committed this. Thanks everyone!
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill=2024-02-24%2015%3A13%3A14
>  got:
> TRAP: failed Assert("(uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, 
> buffer)"), File: "md.c", Line: 472, PID: 43188608
>
> with this stack trace:
> #5  0x10005cf0 in ExceptionalCondition (conditionName=0x1015d790 
>  "`", fileName=0x0, lineNumber=16780064) at assert.c:66
> #6  0x102daba8 in mdextend (reln=0x1042628c , 
> forknum=812540744, blocknum=33, buffer=0x306e6000, skipFsync=812539904) at 
> md.c:472
> #7  0x102d6760 in smgrextend (reln=0x306e6670, forknum=812540744, 
> blocknum=33, buffer=0x306e6000, skipFsync=812539904) at smgr.c:541
> #8  0x104c8dac in smgr_bulk_flush (bulkstate=0x306e6000) at bulk_write.c:245

So that's:

static const PGIOAlignedBlock zero_buffer = {{0}};  /* worth BLCKSZ */

...
smgrextend(bulkstate->smgr, bulkstate->forknum,
   bulkstate->pages_written++,
   _buffer,
   true);

... where PGIOAlignedBlock is:

typedef union PGIOAlignedBlock
{
#ifdef pg_attribute_aligned
pg_attribute_aligned(PG_IO_ALIGN_SIZE)
#endif
chardata[BLCKSZ];
...

We see this happen with both xlc and gcc (new enough to know how to do
this).  One idea would be that the AIX *linker* is unable to align it,
as that is the common tool-chain component here (and unlike stack and
heap objects, this scope is the linker's job).  There is a
pre-existing example of a zero-buffer that is at file scope like that:
pg_prewarm.c.  Perhaps it doesn't get tested?

Hmm.




Re: Adding OLD/NEW support to RETURNING

2024-02-24 Thread Tomas Vondra
On 12/4/23 13:14, Dean Rasheed wrote:
> I have been playing around with the idea of adding support for OLD/NEW
> to RETURNING, partly motivated by the discussion on the MERGE
> RETURNING thread [1], but also because I think it would be a very
> useful addition for other commands (UPDATE in particular).
> 

Sounds reasonable ...

> This was discussed a long time ago [2], but that previous discussion
> didn't lead to a workable patch, and so I have taken a different
> approach here.
> 

Presumably the 2013 thread went nowhere because of some implementation
problems, not simply because the author lost interest and disappeared?
Would it be helpful for this new patch to briefly summarize what the
main issues were and how this new approach deals with that? (It's hard
to say if reading the old thread is necessary/helpful for understanding
this new patch, and time is a scarce resource.)

> My first thought was that this would only really make sense for UPDATE
> and MERGE, since OLD/NEW are pretty pointless for INSERT/DELETE
> respectively. However...
> 
> 1. For an INSERT with an ON CONFLICT ... DO UPDATE clause, returning
> OLD might be very useful, since it provides a way to see which rows
> conflicted, and return the old conflicting values.
> 
> 2. If a DELETE is turned into an UPDATE by a rule (e.g., to mark rows
> as deleted, rather than actually deleting them), then returning NEW
> can also be useful. (I admit, this is a somewhat obscure use case, but
> it's still possible.)
> 
> 3. In a MERGE, we need to be able to handle all 3 command types anyway.
> 
> 4. It really isn't any extra effort to support INSERT and DELETE.
> 
> So in the attached very rough patch (no docs, minimal testing) I have
> just allowed OLD/NEW in RETURNING for all command types (except, I
> haven't done MERGE here - I think that's best kept as a separate
> patch). If there is no OLD/NEW row in a particular context, it just
> returns NULLs. The regression tests contain examples of  1 & 2 above.
> 
> 
> Based on Robert Haas' suggestion in [2], the patch works by adding a
> new "varreturningtype" field to Var nodes. This field is set during
> parse analysis of the returning clause, which adds new namespace
> aliases for OLD and NEW, if tables with those names/aliases are not
> already present. So the resulting Var nodes have the same
> varno/varattno as they would normally have had, but a different
> varreturningtype.
> 

No opinion on whether varreturningtype is the right approach - it sounds
like it's working better than the 2013 patch, but I won't pretend my
knowledge of this code is sufficient to make judgments beyond that.

> For the most part, the rewriter and parser are then untouched, except
> for a couple of places necessary to ensure that the new field makes it
> through correctly. In particular, none of this affects the shape of
> the final plan produced. All of the work to support the new Var
> returning type is done in the executor.
> 
> This turns out to be relatively straightforward, except for
> cross-partition updates, which was a little trickier since the tuple
> format of the old row isn't necessarily compatible with the new row,
> which is in a different partition table and so might have a different
> column order.
> 

So we just "remap" the attributes, right?

> One thing that I've explicitly disallowed is returning OLD/NEW for
> updates to foreign tables. It's possible that could be added in a
> later patch, but I have no plans to support that right now.
> 

Sounds like an acceptable restriction, as long as it's documented.

What are the challenges for supporting OLD/NEW for foreign tables? I
guess we'd need to ask the FDW handler to tell us if it can support
OLD/NEW for this table (and only allow it for postgres_fdw with
sufficiently new server version), and then deparse the SQL.

I'm asking because this seems like a nice first patch idea, but if I
don't see some major obstacle that I don't see ...

> 
> One difficult question is what names to use for the new aliases. I
> think OLD and NEW are the most obvious and natural choices. However,
> there is a problem - if they are used in a trigger function, they will
> conflict. In PL/pgSQL, this leads to an error like the following:
> 
> ERROR:  column reference "new.f1" is ambiguous
> LINE 3: RETURNING new.f1, new.f4
>   ^
> DETAIL:  It could refer to either a PL/pgSQL variable or a table column.
> 
> That's the same error that you'd get if a different alias name had
> been chosen, and it happened to conflict with a user-defined PL/pgSQL
> variable, except that in that case, the user could just change their
> variable name to fix the problem, which is not possible with the
> automatically-added OLD/NEW trigger variables. As a way round that, I
> added a way to optionally change the alias used in the RETURNING list,
> using the following syntax:
> 
>   RETURNING [ WITH ( { OLD | NEW } AS output_alias [, ...] ) ]
> * | output_expression [ [ AS 

Re: Relation bulk write facility

2024-02-24 Thread Noah Misch
On Fri, Feb 23, 2024 at 04:27:34PM +0200, Heikki Linnakangas wrote:
> Committed this. Thanks everyone!

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill=2024-02-24%2015%3A13%3A14
 got:
TRAP: failed Assert("(uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, 
buffer)"), File: "md.c", Line: 472, PID: 43188608

with this stack trace:
#5  0x10005cf0 in ExceptionalCondition (conditionName=0x1015d790 
 "`", fileName=0x0, lineNumber=16780064) at assert.c:66
#6  0x102daba8 in mdextend (reln=0x1042628c , 
forknum=812540744, blocknum=33, buffer=0x306e6000, skipFsync=812539904) at 
md.c:472
#7  0x102d6760 in smgrextend (reln=0x306e6670, forknum=812540744, blocknum=33, 
buffer=0x306e6000, skipFsync=812539904) at smgr.c:541
#8  0x104c8dac in smgr_bulk_flush (bulkstate=0x306e6000) at bulk_write.c:245
#9  0x107baf24 in _bt_blwritepage (wstate=0x100d0a14 
, buf=0x304f13b0, blkno=807631240) at nbtsort.c:638
#10 0x107bccd8 in _bt_buildadd (wstate=0x104c9384 , 
state=0x304eb190, itup=0xe10, truncextra=805686672) at nbtsort.c:984
#11 0x107bc86c in _bt_sort_dedup_finish_pending (wstate=0x3b6, state=0x19, 
dstate=0x3) at nbtsort.c:1036
#12 0x107bc188 in _bt_load (wstate=0x10, btspool=0x0, btspool2=0x0) at 
nbtsort.c:1331
#13 0x107bd4ec in _bt_leafbuild (btspool=0x101589fc 
, btspool2=0x0) at nbtsort.c:571
#14 0x107be028 in btbuild (heap=0x304d2a00, index=0x4e1f, indexInfo=0x3) at 
nbtsort.c:329
#15 0x1013538c in index_build (heapRelation=0x2, indexRelation=0x10bdc518 
, indexInfo=0x2, isreindex=10, parallel=false) at 
index.c:3047
#16 0x101389e0 in index_create (heapRelation=0x1001aac0 , 
indexRelationName=0x20 , 
indexRelationId=804393376, parentIndexRelid=805686672,
parentConstraintId=268544704, relFileNumber=805309688, 
indexInfo=0x3009a328, indexColNames=0x30237a20, accessMethodId=403, 
tableSpaceId=0, collationIds=0x304d29d8, opclassIds=0x304d29f8,
opclassOptions=0x304d2a18, coloptions=0x304d2a38, reloptions=0, flags=0, 
constr_flags=0, allow_system_table_mods=false, is_internal=false, 
constraintId=0x2ff211b4) at index.c:1260
#17 0x1050342c in DefineIndex (tableId=19994, stmt=0x2ff21370, 
indexRelationId=0, parentIndexId=0, parentConstraintId=0, total_parts=0, 
is_alter_table=false, check_rights=true, check_not_in_use=true,
skip_build=false, quiet=false) at indexcmds.c:1204
#18 0x104b4474 in ProcessUtilitySlow (pstate=, 
pstmt=0x3009a408, queryString=0x30099730 "CREATE INDEX dupindexcols_i ON 
dupindexcols (f1, id, f1 text_pattern_ops);",

If there are other ways I should poke at it, let me know.





Re: Removing unneeded self joins

2024-02-24 Thread Noah Misch
Hello,

On Sat, Feb 24, 2024 at 01:02:01PM +0200, Alexander Korotkov wrote:
> On Sat, Feb 24, 2024 at 7:12 AM Noah Misch  wrote:
> > On Sat, Feb 24, 2024 at 12:36:59AM +0200, Alexander Korotkov wrote:
> > > On Thu, Feb 22, 2024 at 10:51 AM Andrei Lepikhov 
> > >  wrote:
> > > > On 21/2/2024 14:26, Richard Guo wrote:
> > > > > I think the right fix for these issues is to introduce a new element
> > > > > 'sublevels_up' in ReplaceVarnoContext, and enhance 
> > > > > replace_varno_walker
> > > > > to 1) recurse into subselects with sublevels_up increased, and 2)
> > > > > perform the replacement only when varlevelsup is equal to 
> > > > > sublevels_up.
> > > > This code looks good. No idea how we have lost it before.
> > >
> > > Thanks to Richard for the patch and to Andrei for review.  I also find
> > > code looking good.  Pushed with minor edits from me.
> >
> > I feel this, commit 466979e, misses a few of our project standards:
> >
> > - The patch makes many non-whitespace changes to existing test queries.  
> > This
> >   makes it hard to review the consequences of the non-test part of the 
> > patch.
> >   Did you minimize such edits?  Of course, not every such edit is avoidable.
> >
> > - The commit message doesn't convey whether this is refactoring or is a bug
> >   fix.  This makes it hard to write release notes, among other things.  From
> >   this mailing list thread, it gather it's a bug fix in 489072ab7a, hence
> >   v17-specific.  The commit message for 489072ab7a is also silent about that
> >   commit's status as refactoring or as a bug fix.
> >
> > - Normally, I could answer the previous question by reading the test case
> >   diffs.  However, in addition to the first point about non-whitespace
> >   changes, the first three join.sql patch hunks just change whitespace.
> >   Worse, since they move line breaks, "git diff -w" doesn't filter them out.
> >
> > To what extent are those community standards vs. points of individual
> > committer preference?  Please tell me where I'm wrong here.
> 
> I agree that commit 466979e is my individual committer failure.  I
> should have written a better, more clear commit message and separate
> tests refactoring from the bug fix.
> 
> I'm not so sure about 489072ab7a (except it provides a wrong fix).  It
> has a "Reported-by:" field meaning it's a problem reported by a
> particular person.  The "Discussion:" points directly to the reported
> test case.  And commit contains the relevant test case.  The commit
> message could be more wordy though.

Agreed, the first and third points don't apply to 489072ab7a.  Thanks to that,
one can deduce from its new test case query that it fixes a bug.  It sounds
like we agree about commit 466979e, so that's good.




Re: incremental backup mishandles XLOG_DBASE_CREATE_FILE_COPY

2024-02-24 Thread Noah Misch
On Sat, Feb 24, 2024 at 04:16:24PM +0530, Robert Haas wrote:
> On Sat, Feb 24, 2024 at 10:05 AM Noah Misch  wrote:
> > On Fri, Feb 23, 2024 at 08:47:52PM +0530, Robert Haas wrote:
> > > I thought about whether there were any other WAL records that have
> > > similar problems to XLOG_DBASE_CREATE_FILE_COPY and didn't come up
> > > with anything. If anyone knows of any similar cases, please let me
> > > know.
> >
> > Regarding records the summarizer potentially can't ignore that don't deal in
> > relfilenodes, these come to mind:
> >
> > XLOG_DBASE_DROP - covered in this thread's patch
> > XLOG_RELMAP_UPDATE
> > XLOG_TBLSPC_CREATE
> > XLOG_TBLSPC_DROP
> > XLOG_XACT_PREPARE
> 
> At present, only relation data files are ever sent incrementally; I
> don't think any of these touch those.

Agreed, those don't touch relation data files.  I think you've got all
relation data file mutations.  XLOG_DBASE_CREATE_FILE_COPY and XLOG_DBASE_DROP
are the only record types that touch a relation data file without mentioning
it in XLogRecordBlockHeader, XACT_XINFO_HAS_RELFILELOCATORS, or an RM_SMGR_ID
rlocator field.

> > Also, any record that writes XIDs needs to update nextXid; likewise for 
> > other
> > ID spaces.  See the comment at "XLOG stuff" in heap_lock_tuple().  Perhaps 
> > you
> > don't summarize past a checkpoint, making that irrelevant.
> 
> I'm not quite following this. It's true that we summarize from one
> redo pointer to the next; but also, our summary is only trying to
> ascertain which relation data blocks have been modified. Therefore, I
> don't understand the relevance of nextXid here.

No relevance, given incremental backup is incremental with respect to relation
data blocks only.




Re: Functions to return random numbers in a given range

2024-02-24 Thread Tomas Vondra
Hi Dean,

I did a quick review and a little bit of testing on the patch today. I
think it's a good/useful idea, and I think the code is ready to go (the
code is certainly much cleaner than anything I'd written ...).

I do have one minor comments regarding the docs - it refers to "random
functions" in a couple places, which sounds to me as if it was talking
about some functions arbitrarily taken from some list, although it
clearly means "functions generating random numbers". (I realize this
might be just due to me not being native speaker.)


Did you think about adding more functions generating either other types
of data distributions (now we have uniform and normal), or random data
for other data types (I often need random strings, for example)?

Of course, I'm not saying this patch needs to do that. But perhaps it
might affect how we name stuff to make it "extensible".


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: locked reads for atomics

2024-02-24 Thread Nathan Bossart
On Fri, Feb 23, 2024 at 05:34:49PM -0800, Andres Freund wrote:
> On 2024-02-23 14:58:12 -0600, Nathan Bossart wrote:
>> +/*
>> + * pg_atomic_write_membarrier_u32 - write with barrier semantics.
>> + *
>> + * The write is guaranteed to succeed as a whole, i.e., it's not possible to
>> + * observe a partial write for any reader.  Note that this correctly 
>> interacts
>> + * with both pg_atomic_compare_exchange_u32() and
>> + * pg_atomic_read_membarrier_u32().  While this may be less performant than
>> + * pg_atomic_write_u32() and pg_atomic_unlocked_write_u32(), it may be 
>> easier
>> + * to reason about correctness with this function in less 
>> performance-sensitive
>> + * code.
>> + *
>> + * Full barrier semantics.
>> + */
> 
> The callout to pg_atomic_unlocked_write_u32() is wrong. The reason to use
> pg_atomic_unlocked_write_u32() is for variables where we do not ever want to
> fall back to spinlocks/semaphores, because the underlying variable isn't
> actually shared. In those cases using the other variants is a bug. The only
> use of pg_atomic_unlocked_write_u32() is temp-table buffers which share the
> data structure with the shared buffers case.

I removed the reference to pg_atomic_unlocked_write_u32().

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 647eea91cc07e2d6a4d6634bcb933df245711fa0 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 23 Feb 2024 14:23:15 -0600
Subject: [PATCH v4 1/2] Introduce atomic read/write functions with full
 barrier semantics.

Writing correct code using atomic variables is often difficult due
to the implied memory barrier semantics (or lack thereof) of the
underlying operations.  This commit introduces atomic read/write
functions with full barrier semantics to ease this cognitive load
in areas that are not performance critical.  For example, some
spinlocks protect a single value, and these new functions make it
easy to convert the value to an atomic variable (thus eliminating
the need for the spinlock) without modifying the barrier semantics
previously provided by the spinlock.  However, since these
functions are less performant than the other atomic reads/writes,
they are not suitable for every use-case.

The base implementations for these new functions are atomic
exchanges (for writes) and atomic fetch/adds with 0 (for reads).
These implementations can be overwritten with better architecture-
specific versions as they are discovered.

This commit leaves converting existing code to use these new
functions as a future exercise.

Reviewed-by: Andres Freund, Yong Li, Jeff Davis
Discussion: https://postgr.es/me/20231110205128.GB1315705%40nathanxps13
---
 src/include/port/atomics.h | 58 ++
 src/include/port/atomics/generic.h | 36 +++
 2 files changed, 94 insertions(+)

diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h
index bf151037f7..c3b14440cf 100644
--- a/src/include/port/atomics.h
+++ b/src/include/port/atomics.h
@@ -239,6 +239,26 @@ pg_atomic_read_u32(volatile pg_atomic_uint32 *ptr)
 	return pg_atomic_read_u32_impl(ptr);
 }
 
+/*
+ * pg_atomic_read_membarrier_u32 - read with barrier semantics.
+ *
+ * This read is guaranteed to return the current value, provided that the value
+ * is only ever updated via operations with barrier semantics, such as
+ * pg_atomic_compare_exchange_u32() and pg_atomic_write_membarrier_u32().
+ * While this may be less performant than pg_atomic_read_u32(), it may be
+ * easier to reason about correctness with this function in less performance-
+ * sensitive code.
+ *
+ * Full barrier semantics.
+ */
+static inline uint32
+pg_atomic_read_membarrier_u32(volatile pg_atomic_uint32 *ptr)
+{
+	AssertPointerAlignment(ptr, 4);
+
+	return pg_atomic_read_membarrier_u32_impl(ptr);
+}
+
 /*
  * pg_atomic_write_u32 - write to atomic variable.
  *
@@ -276,6 +296,26 @@ pg_atomic_unlocked_write_u32(volatile pg_atomic_uint32 *ptr, uint32 val)
 	pg_atomic_unlocked_write_u32_impl(ptr, val);
 }
 
+/*
+ * pg_atomic_write_membarrier_u32 - write with barrier semantics.
+ *
+ * The write is guaranteed to succeed as a whole, i.e., it's not possible to
+ * observe a partial write for any reader.  Note that this correctly interacts
+ * with both pg_atomic_compare_exchange_u32() and
+ * pg_atomic_read_membarrier_u32().  While this may be less performant than
+ * pg_atomic_write_u32(), it may be easier to reason about correctness with
+ * this function in less performance-sensitive code.
+ *
+ * Full barrier semantics.
+ */
+static inline void
+pg_atomic_write_membarrier_u32(volatile pg_atomic_uint32 *ptr, uint32 val)
+{
+	AssertPointerAlignment(ptr, 4);
+
+	pg_atomic_write_membarrier_u32_impl(ptr, val);
+}
+
 /*
  * pg_atomic_exchange_u32 - exchange newval with current value
  *
@@ -429,6 +469,15 @@ pg_atomic_read_u64(volatile pg_atomic_uint64 *ptr)
 	return pg_atomic_read_u64_impl(ptr);
 }
 
+static inline uint64
+pg_atomic_read_membarrier_u64(volatile 

Re: RFC: Logging plan of the running query

2024-02-24 Thread James Coleman
On Fri, Feb 23, 2024 at 10:23 AM Robert Haas  wrote:
>
> On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud  wrote:
> > On Fri, Feb 23, 2024 at 10:22:32AM +0530, Robert Haas wrote:
> > > On Thu, Feb 22, 2024 at 6:25 AM James Coleman  wrote:
> > > > This is potentially a bit of a wild idea, but I wonder if having some
> > > > kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in
> > > > "normal" as opposed to "critical" (using that word differently than
> > > > the existing critical sections) would be worth it.
> > >
> > > It's worth considering, but the definition of "normal" vs. "critical"
> > > might be hard to pin down. Or, we might end up with a definition that
> > > is specific to this particular case and not generalizable to others.
> >
> > But it doesn't have to be all or nothing right?  I mean each call could say
> > what the situation is like in their context, like
> > CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | GUARANTEE_WHATEVER), 
> > and
> > slowly tag calls as needed, similarly to how we add already CFI based on 
> > users
> > report.
>
> Absolutely. My gut feeling is that it's going to be simpler to pick a
> small number of places that are safe and sufficient for this
> particular feature and add an extra call there, similar to how we do
> vacuum_delay_point(). The reason I think that's likely to be better is
> that it will likely require changing only a relatively small number of
> places. If we instead start annotating CFIs, well, we've got hundreds
> of those. That's a lot more to change, and it also inconveniences
> third-party extension authors and people doing back-patching. I'm not
> here to say it can't work; I just think it's likely not the easiest
> path.

Yes, I suspect it's not the easiest path. I have a small hunch it
might end up paying more dividends in the future: there isn't just one
of these things that is regularly a thorny discussion for the same
reasons each time (basically "no way to trigger this safely from
another backend interrupting another one at an arbitrary point"), and
if we were able to generalize a solution we may have multiple wins (a
very obvious one in my mind is the inability of auto explain to run an
explain at the precise time it's most useful: when statement timeout
fires).

But it's also possible there are simply ways that get us more than
this scenario also, so I might be wrong; it's merely a hunch.

Regards,
James Coleman




Re: Patch: Add parse_type Function

2024-02-24 Thread David E. Wheeler
On Feb 21, 2024, at 19:13, David E. Wheeler  wrote:

> Thanks. Anyone else? Or is it ready for committer?

What’s the protocol for marking a patch ready for committer?

Thanks,

David





Re: locked reads for atomics

2024-02-24 Thread Andres Freund
Hi,

On 2024-02-23 10:25:00 -0800, Jeff Davis wrote:
> On Fri, 2024-02-23 at 10:17 -0600, Nathan Bossart wrote:
> > The idea is
> > to provide an easy way to remove spinlocks, etc. and use atomics for
> > less
> > performance-sensitive stuff.  The implementations are intended to be
> > relatively inexpensive and might continue to improve in the future,
> > but the
> > functions are primarily meant to help reason about correctness.
> 
> To be clear:
> 
>   x = pg_atomic_[read|write]_membarrier_u64();
> 
> is semantically equivalent to:
> 
>   pg_memory_barrier();
>   x = pg_atomic_[read|write]_u64();
>   pg_memory_barrier();
> ?
> 
> If so, that does seem more convenient.

Kinda. Semantically I think that's correct, however it doesn't commonly make
sense to have both those memory barriers, so you wouldn't really write code
like that and thus comparing on the basis of convenience doesn't quite seem
right.

Rather than convenience, I think performance and simplicity are better
arguments. If you're going to execute a read and then a memory barrier, it's
going to be faster to just do a single atomic operation. And it's a bit
simpler to analyze on which "side" of the read/write the barrier is needed.

Greetings,

Andres Freund




Re: locked reads for atomics

2024-02-24 Thread Andres Freund
Hi,

On 2024-02-23 14:58:12 -0600, Nathan Bossart wrote:
> +/*
> + * pg_atomic_write_membarrier_u32 - write with barrier semantics.
> + *
> + * The write is guaranteed to succeed as a whole, i.e., it's not possible to
> + * observe a partial write for any reader.  Note that this correctly 
> interacts
> + * with both pg_atomic_compare_exchange_u32() and
> + * pg_atomic_read_membarrier_u32().  While this may be less performant than
> + * pg_atomic_write_u32() and pg_atomic_unlocked_write_u32(), it may be easier
> + * to reason about correctness with this function in less 
> performance-sensitive
> + * code.
> + *
> + * Full barrier semantics.
> + */

The callout to pg_atomic_unlocked_write_u32() is wrong. The reason to use
pg_atomic_unlocked_write_u32() is for variables where we do not ever want to
fall back to spinlocks/semaphores, because the underlying variable isn't
actually shared. In those cases using the other variants is a bug. The only
use of pg_atomic_unlocked_write_u32() is temp-table buffers which share the
data structure with the shared buffers case.

Greetings,

Andres Freund




Re: POC, WIP: OR-clause support for indexes

2024-02-24 Thread jian he
Hi.
I wrote the first draft patch of the documentation.
it's under the section: Planner Method Configuration (runtime-config-query.html)
but this feature's main meat is in src/backend/parser/parse_expr.c
so it may be slightly inconsistent, as mentioned by others.

You can further furnish it.


v1-0001-Add-enable_or_transformation-doc-entry.no-cfbot
Description: Binary data


Re: Removing unneeded self joins

2024-02-24 Thread Alexander Korotkov
Hi, Noah!

On Sat, Feb 24, 2024 at 7:12 AM Noah Misch  wrote:
> On Sat, Feb 24, 2024 at 12:36:59AM +0200, Alexander Korotkov wrote:
> > On Thu, Feb 22, 2024 at 10:51 AM Andrei Lepikhov 
> >  wrote:
> > > On 21/2/2024 14:26, Richard Guo wrote:
> > > > I think the right fix for these issues is to introduce a new element
> > > > 'sublevels_up' in ReplaceVarnoContext, and enhance replace_varno_walker
> > > > to 1) recurse into subselects with sublevels_up increased, and 2)
> > > > perform the replacement only when varlevelsup is equal to sublevels_up.
> > > This code looks good. No idea how we have lost it before.
> >
> > Thanks to Richard for the patch and to Andrei for review.  I also find
> > code looking good.  Pushed with minor edits from me.
>
> I feel this, commit 466979e, misses a few of our project standards:
>
> - The patch makes many non-whitespace changes to existing test queries.  This
>   makes it hard to review the consequences of the non-test part of the patch.
>   Did you minimize such edits?  Of course, not every such edit is avoidable.
>
> - The commit message doesn't convey whether this is refactoring or is a bug
>   fix.  This makes it hard to write release notes, among other things.  From
>   this mailing list thread, it gather it's a bug fix in 489072ab7a, hence
>   v17-specific.  The commit message for 489072ab7a is also silent about that
>   commit's status as refactoring or as a bug fix.
>
> - Normally, I could answer the previous question by reading the test case
>   diffs.  However, in addition to the first point about non-whitespace
>   changes, the first three join.sql patch hunks just change whitespace.
>   Worse, since they move line breaks, "git diff -w" doesn't filter them out.
>
> To what extent are those community standards vs. points of individual
> committer preference?  Please tell me where I'm wrong here.

I agree that commit 466979e is my individual committer failure.  I
should have written a better, more clear commit message and separate
tests refactoring from the bug fix.

I'm not so sure about 489072ab7a (except it provides a wrong fix).  It
has a "Reported-by:" field meaning it's a problem reported by a
particular person.  The "Discussion:" points directly to the reported
test case.  And commit contains the relevant test case.  The commit
message could be more wordy though.

--
Regards,
Alexander Korotkov




Re: incremental backup mishandles XLOG_DBASE_CREATE_FILE_COPY

2024-02-24 Thread Robert Haas
On Sat, Feb 24, 2024 at 10:05 AM Noah Misch  wrote:
> Regarding records the summarizer potentially can't ignore that don't deal in
> relfilenodes, these come to mind:
>
> XLOG_DBASE_DROP - covered in this thread's patch
> XLOG_RELMAP_UPDATE
> XLOG_TBLSPC_CREATE
> XLOG_TBLSPC_DROP
> XLOG_XACT_PREPARE

At present, only relation data files are ever sent incrementally; I
don't think any of these touch those.

> Also, any record that writes XIDs needs to update nextXid; likewise for other
> ID spaces.  See the comment at "XLOG stuff" in heap_lock_tuple().  Perhaps you
> don't summarize past a checkpoint, making that irrelevant.

I'm not quite following this. It's true that we summarize from one
redo pointer to the next; but also, our summary is only trying to
ascertain which relation data blocks have been modified. Therefore, I
don't understand the relevance of nextXid here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com