date:20220824

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-24 Thread John Naylor

On Wed, Aug 24, 2022 at 11:56 PM Nathan Bossart
 wrote:
>
> On Wed, Aug 24, 2022 at 11:59:25AM +0700, John Naylor wrote:
> > It seems "scalar" would be a bad choice since it already means
> > (confusingly) operating on the least significant element of a vector.
> > I'm thinking of *_has and *_has_le, matching the already existing in
> > the earlier patch *_has_zero.
>
> That seems reasonable to me.

Okay, done that way, also in v9:
- a convenience macro in the test suite which is handy now and can be
used for 32-bit element tests if we like
- more tests
- pgindent and some additional comment smithing
- split out the json piece for a later commit
- For the following comment, pgindent will put spaced operands on a
separate line which is not great for readability. and our other
reference to the Stanford bithacks page keeps the in-page link, and I
see no reason to exclude it -- if it goes missing, the whole page will
still load. So I put back those two details.

+* To find bytes <= c, we can use bitwise operations to find
bytes < c+1,
+* but it only works if c+1 <= 128 and if the highest bit in v
is not set.
+* Adapted from
+* https://graphics.stanford.edu/~seander/bithacks.html#HasLessInWord

I think I'll go ahead and commit 0001 in a couple days pending further comments.

--
John Naylor
EDB: http://www.enterprisedb.com
From 606c14de59a68ed88bff75f7250c37ad082fbd9f Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Thu, 25 Aug 2022 13:32:28 +0700
Subject: [PATCH v9 2/2] Speed up json_lex_string via vector operations

TODO: tests
---
 src/common/jsonapi.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
index fefd1d24d9..87e1d0b192 100644
--- a/src/common/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -19,6 +19,7 @@
 
 #include "common/jsonapi.h"
 #include "mb/pg_wchar.h"
+#include "port/pg_lfind.h"
 
 #ifndef FRONTEND
 #include "miscadmin.h"
@@ -844,7 +845,7 @@ json_lex_string(JsonLexContext *lex)
 		}
 		else
 		{
-			char	   *p;
+			char	   *p = s;
 
 			if (hi_surrogate != -1)
 return JSON_UNICODE_LOW_SURROGATE;
@@ -853,7 +854,13 @@ json_lex_string(JsonLexContext *lex)
 			 * Skip to the first byte that requires special handling, so we
 			 * can batch calls to appendBinaryStringInfo.
 			 */
-			for (p = s; p < end; p++)
+			while (p < end - sizeof(Vector8) &&
+   !pg_lfind8('\\', (uint8 *) p, sizeof(Vector8)) &&
+   !pg_lfind8('"', (uint8 *) p, sizeof(Vector8)) &&
+   !pg_lfind8_le(0x1F, (uint8 *) p, sizeof(Vector8)))
+p += sizeof(Vector8);
+
+			for (; p < end; p++)
 			{
 if (*p == '\\' || *p == '"')
 	break;
-- 
2.36.1

From fe7d8f2471e2e0b37ccdc1a0a2d7fc5bb93af7d9 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Sat, 20 Aug 2022 21:14:01 -0700
Subject: [PATCH v9 1/2] Add optimized functions for linear search within byte
 arrays

In similar vein to b6ef167564, add pg_lfind8() and pg_lfind8_le()
to search for bytes equal or less-than-or-equal to a given byte,
respectively. To abstract away platform details, add helper functions
and typedefs to simd.h.

John Naylor and Nathan Bossart, per suggestion from Andres Freund

Discussion: https://www.postgresql.org/message-id/CAFBsxsGzaaGLF%3DNuq61iRXTyspbO9rOjhSqFN%3DV6ozzmta5mXg%40mail.gmail.com
---
 src/include/port/pg_lfind.h   |  68 +++-
 src/include/port/simd.h   | 154 ++
 .../test_lfind/expected/test_lfind.out|  18 +-
 .../modules/test_lfind/sql/test_lfind.sql |   4 +-
 .../modules/test_lfind/test_lfind--1.0.sql|  10 +-
 src/test/modules/test_lfind/test_lfind.c  | 100 +++-
 6 files changed, 345 insertions(+), 9 deletions(-)

diff --git a/src/include/port/pg_lfind.h b/src/include/port/pg_lfind.h
index fb125977b2..a4e13dffec 100644
--- a/src/include/port/pg_lfind.h
+++ b/src/include/port/pg_lfind.h
@@ -1,7 +1,8 @@
 /*-
  *
  * pg_lfind.h
- *	  Optimized linear search routines.
+ *	  Optimized linear search routines using SIMD intrinsics where
+ *	  available.
  *
  * Copyright (c) 2022, PostgreSQL Global Development Group
  *
@@ -15,6 +16,70 @@
 
 #include "port/simd.h"
 
+/*
+ * pg_lfind8
+ *
+ * Return true if there is an element in 'base' that equals 'key', otherwise
+ * return false.
+ */
+static inline bool
+pg_lfind8(uint8 key, uint8 *base, uint32 nelem)
+{
+	uint32		i;
+
+	/* round down to multiple of vector length */
+	uint32		tail_idx = nelem & ~(sizeof(Vector8) - 1);
+	Vector8		chunk;
+
+	for (i = 0; i < tail_idx; i += sizeof(Vector8))
+	{
+		vector8_load(&chunk, &base[i]);
+		if (vector8_has(chunk, key))
+			return true;
+	}
+
+	/* Process the remaining elements one at a time. */
+	for (; i < nelem; i++)
+	{
+		if (key == base[i])
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * pg_lfind8_le
+ *
+ * Return true if there is an element in 'base' that is less than

Re: Letter case of "admin option"

2022-08-24 Thread Kyotaro Horiguchi

Thanks for the comment.

At Tue, 23 Aug 2022 09:58:47 -0400, Robert Haas  wrote 
in 
> On Mon, Aug 22, 2022 at 9:29 PM Kyotaro Horiguchi
>  wrote:
> In the case of GRANT, that's more ambiguous, because the word OPTION
> actually appears in the syntax. But isn't that sort of accidental?

Yeah I think so.  My intension is to let the translators do their work
more mechanically.  A capital-letter word is automatically recognized
as a keyword then can be copied as-is.

I would translate "ADMIN OPTION" to "ADMIN OPTION" in Japanese but
"admin option" is translated to "管理者オプション" which is a bit hard
for the readers to come up with the connection to "ADMIN OPTION" (or
ADMIN ). I guess this is somewhat simliar to use "You need to
give capability to administrate the role" to suggest users to add WITH
ADMIN OPTION to the role.

Maybe Álvaro has a similar difficulty on it.

> It's quite possible to give someone the right to administer a role
> without ever mentioning the OPTION keyword:

Mmm.. Fair point.

> In short, I'm wondering whether we should regard ADMIN as the name of
> the option, but OPTION as part of the GRANT syntax, and hence
> capitalize it "ADMIN option". However, if the non-English speakers on
> this list have a strong preference for something else I'm certainly
> not going to fight about it.

"ADMIN option" which is translated into "ADMINオプション" is fine by
me.  I hope Álvaro thinks the same way.

What do you think about the attached?


> > In passing, I met the following code in the same file.
> >
> > >   if (!have_createrole_privilege() &&
> > >   !is_admin_of_role(currentUserId, roleid))
> > >   ereport(ERROR,
> > >   
> > > (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> > >errmsg("must have admin option on 
> > > role \"%s\"",
> > >   rolename)));
> >
> > The message seems a bit short that it only mentions admin option while
> > omitting CREATEROLE privilege. "must have CREATEROLE privilege or
> > admin option on role %s" might be better.  Or we could say just
> > "insufficient privilege" or "permission denied" in the main error
> > message then provide "CREATEROLE privilege or admin option on role %s
> > is required" in DETAILS or HINTS message.  The message was added by
> > c33d575899 along with the have_createrole_privilege() call so it is
> > unclear to me whether it is intentional or not.
> 
> Yeah, I wasn't sure what to do about this. We do not mention superuser
> privileges in every message where they theoretically apply, because it
> would make a lot of messages longer for not much benefit. CREATEROLE
> is a similar case and I think, but am not sure, that we treat it
> similarly. So in my mind it is a judgement call what to do here, and
> if other people think that what I picked wasn't best, we can change
> it.
> 
> For what it's worth, I'm hoping to eventually remove the CREATEROLE
> exception here. The superuser exception will remain, of course.

If it were simply "permission denied", I don't think about the details
then seek for the way to allow that. But I don't mean to fight this
for now.

For the record, I would prefer the follwoing message for this sort of
failure.

ERROR:  permission denied
DETAILS: CREATEROLE or ADMIN option is required for the role.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/doc/src/sgml/information_schema.sgml b/doc/src/sgml/information_schema.sgml
index 350c75bc31..5f661552d8 100644
--- a/doc/src/sgml/information_schema.sgml
+++ b/doc/src/sgml/information_schema.sgml
@@ -183,8 +183,8 @@
 
   
The view administrable_role_authorizations
-   identifies all roles that the current user has the admin option
-   for.
+   identifies all roles that the current user has the ADMIN
+   option for.
   
 
   
diff --git a/doc/src/sgml/ref/alter_group.sgml b/doc/src/sgml/ref/alter_group.sgml
index b9e641818c..4b528498df 100644
--- a/doc/src/sgml/ref/alter_group.sgml
+++ b/doc/src/sgml/ref/alter_group.sgml
@@ -55,7 +55,7 @@ ALTER GROUP group_name RENAME TO REVOKE. Note that
GRANT and REVOKE have additional
options which are not available with this command, such as the ability
-   to grant and revoke ADMIN OPTION, and the ability to
+   to grant and revoke ADMIN option, and the ability to
specify the grantor.
   
 
diff --git a/doc/src/sgml/ref/createuser.sgml b/doc/src/sgml/ref/createuser.sgml
index c6a7c603f7..22cd885c82 100644
--- a/doc/src/sgml/ref/createuser.sgml
+++ b/doc/src/sgml/ref/createuser.sgml
@@ -82,7 +82,8 @@ PostgreSQL documentation
   

 Indicates role that will be immediately added as a member of the new
-role with admin option, giving it the right to grant membership in the
+role with ADMIN option, giving it the right to
+grant membership in the
 new role to others.  Multiple roles to a

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-24 Thread Bharath Rupireddy

On Tue, Aug 23, 2022 at 11:37 PM Nathan Bossart
 wrote:
>
> On Wed, Aug 17, 2022 at 12:39:26PM +0530, Bharath Rupireddy wrote:
> > + /*
> > +  * We're only handling directories here, skip if it's not ours. Also, 
> > skip
> > +  * if the caller has already performed this check.
> > +  */
> > + if (!slot_dir_checked &&
> > + lstat(path, &statbuf) == 0 && !S_ISDIR(statbuf.st_mode))
> >   return;
>
> There was previous discussion about removing this stanza from
> ReorderBufferCleanupSeralizedTXNs() completely [0].  If that is still
> possible, I think I would choose that over having callers specify whether
> to do the directory check.
>
> [0] https://postgr.es/m/20220329224832.GA560657%40nathanxps13

>From [0]:

> My guess is that this was done because ReorderBufferCleanupSerializedTXNs()
> is also called from ReorderBufferAllocate() and ReorderBufferFree().
> However, it is odd that we just silently return if the slot path isn't a
> directory in those cases. I think we could use get_dirent_type() in
> StartupReorderBuffer() as you suggested, and then we could let ReadDir()
> ERROR for non-directories for the other callers of
> ReorderBufferCleanupSerializedTXNs(). WDYT?

Firstly, removing lstat() completely from
ReorderBufferCleanupSerializedTXNs() is a behavioural change.
ReorderBufferCleanupSerializedTXNs() returning silently to callers
ReorderBufferAllocate() and ReorderBufferFree(), when the slot path
isn't a directory completely makes sense to me because the callers are
trying to clean something and if it doesn't exist that's okay, they
can go ahead. Also, the usage of the ReorderBufferAllocate() and
ReorderBufferFree() is pretty wide and I don't want to risk the new
behaviour.

> > + /* we're only handling directories here, skip if it's not one 
> > */
> > + sprintf(path, "pg_replslot/%s", logical_de->d_name);
> > + if (get_dirent_type(path, logical_de, false, LOG) != 
> > PGFILETYPE_DIR)
> > + continue;
>
> IIUC an error in get_dirent_type() could cause slots to be skipped here,
> which is a behavior change.

That behaviour hasn't changed, no? Currently, if lstat() fails in
ReorderBufferCleanupSerializedTXNs() it returns to
StartupReorderBuffer()'s while loop which is in a way continuing with
the other slots, this patch does nothing new.

So, no new patch, please find the latest v16 patch at [1].

[1] 
https://www.postgresql.org/message-id/CALj2ACXZPMYXrPZ%2BCUE0_5DKDnxyqDGRftSgPPx--PWhWB3RtQ%40mail.gmail.com

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-24 Thread Nathan Bossart

On Thu, Aug 25, 2022 at 10:38:34AM +0700, John Naylor wrote:
> On Thu, Aug 25, 2022 at 1:01 AM Nathan Bossart  
> wrote:
>> On Wed, Aug 24, 2022 at 11:07:03AM +0700, John Naylor wrote:
>> > - Can a user on ARM64 ever get a runtime fault if the machine attempts
>> > to execute NEON instructions?
>>
>> IIUC yes, although I'm not sure how likely it is in practice.
> 
> Given the quoted part above, it doesn't seem likely, but we should try
> to find out for sure, because a runtime fault is surely not acceptable
> even on a toy system.

The ARM literature appears to indicate that Neon support is pretty standard
on aarch64, and AFAICT it's pretty common to just assume it's available.
As originally suspected, I believe that simply checking for __aarch64__
would be sufficient, but I don't think it would be unreasonable to also
check for __ARM_NEON to be safe.

>> Interestingly, Clang still defines __ARM_NEON__ even when
>> +nosimd is specified.
> 
> POLA violation, but if no one has complained to them, it's a good bet
> the instructions are always available.

Sorry, I should've been more specific.  In my testing, I could include or
omit __ARM_NEON using +[no]simd, but __ARM_NEON__ (with two underscores at
the end) was always there.  My brief research seems to indicate this might
be unique to Darwin, but in the end, it looks like __ARM_NEON (without the
trailing underscores) is the most widely used.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: add checkpoint stats of snapshot and mapping files of pg_logical dir

2022-08-24 Thread Bharath Rupireddy

On Tue, Aug 23, 2022 at 11:19 PM Nathan Bossart
 wrote:
>
> On Wed, Aug 17, 2022 at 11:17:24AM +0530, Bharath Rupireddy wrote:
> > + "logical decoding file(s) 
> > processing time=%ld.%03d s",
>
> I would suggest shortening this to something like "logical decoding
> processing" or "logical replication processing."

"logical decoding processing" seems fine to me, see [1]. I've found
other error messages using "logical decoding".

 errmsg("logical decoding requires wal_level >= logical")));
 errmsg("logical decoding requires a database connection")));
 errmsg("logical decoding cannot be used while in recovery")));
 errmsg("logical decoding cannot be used while in recovery")));
(errmsg("logical decoding found consistent point at %X/%X",
(errmsg("logical decoding found initial starting point
at %X/%X",
(errmsg("logical decoding found initial consistent
point at %X/%X",

> >   CheckPointRelationMap();
> >   CheckPointReplicationSlots();
> > +
> > + CheckpointStats.l_dec_ops_start_t = GetCurrentTimestamp();
> >   CheckPointSnapBuild();
> >   CheckPointLogicalRewriteHeap();
> > + CheckpointStats.l_dec_ops_end_t = GetCurrentTimestamp();
> > +
> >   CheckPointReplicationOrigin();
>
> Shouldn't we include CheckPointReplicationSlots() and
> CheckPointReplicationOrigin() in this new stat?

In my experience, I haven't seen those functions take a long time.
CheckPointReplicationSlots() runs a for loop for all slots and may not
take a huge amount of time and it deals with all the slots, not just
the logical slots. Whereas CheckPointReplicationOrigin() does deal
with logical replication and has some heavyweight operation such as
XLogFlush() for all logical slots, hence, counting it is reasonable.

On Wed, Aug 24, 2022 at 10:04 AM Michael Paquier  wrote:
>
> On Wed, Aug 24, 2022 at 10:48:01AM +0900, Kyotaro Horiguchi wrote:
> > By the way, I think we use INSTR_TIME_* macros to do masure internal
> > durations (mainly for the monotonic clock characteristics, and to
> > reduce performance degradation on Windows?). I'm not sure that's
> > crutial here but I don't think there's any reason to use
> > GetCurrentTimestamp() instead.
>
> This implies two calls of gettimeofday(), but that does not worry me
> much in this code path.  There is some consistency with
> CheckpointGuts() where we take timestamps for the sync requests.

+1 to use GetCurrentTimestamp() for consistency.

I've also improved the commit message a bit.

Please review the v12 patch attached.

[1] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s)
added, 0 removed, 0 recycled; write=0.003 s, sync=0.002 s, total=0.011
s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB,
estimate=0 kB; lsn=0/1496528, redo lsn=0/14964F0; logical decoding
processing=0.005 s

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/


v12-0001-Add-logical-decoding-processing-time-to-checkpoi.patch
Description: Binary data

Re: static libpq (and other libraries) overwritten on aix

2022-08-24 Thread Noah Misch

On Wed, Aug 24, 2022 at 08:43:04PM -0700, Andres Freund wrote:
> On 2022-08-20 10:42:13 -0700, Andres Freund wrote:
> > 0004 aix: when building with gcc, tell gcc we're building a shared library
> > 
> >   That's the gcc -shared issue I explained in the email I'm replying to.
> > 
> >   We should probably consider building executables with -shared-libgcc too,
> >   that shrinks them a decent amount (e.g. 1371684 -> 1126765 for psql). But
> >   I've not done that here.
> > 
> > 
> > 0005 aix: No need to use mkldexport when we want to export all symbols
> > 
> >   This makes the building of shared libraries a lot more similar to other
> >   platforms. Export files are only used when an exports.txt is present and
> >   there's no more intermediary static libraries.
> > 
> > 
> > 0006 configure: Expand -fvisibility checks to more compilers, add 
> > -qvisibility
> > 
> >   This isn't strictly speaking part of the same "thread" of work, but I 
> > don't
> >   want to touch aix more often than I have too... I'll post it in the other
> >   thread too.
> > 
> >   I did just test that this passes at least some tests on aix with xlc and
> >   solaris with sunpro.
> 
> Any comments here?

I don't know much about them, but they sound like the sort of thing that can't
cause subtle bugs.  If they build and test the first time, they're probably
valid.  You may as well push them.

Re: static libpq (and other libraries) overwritten on aix

2022-08-24 Thread Andres Freund

Hi,

On 2022-08-20 10:42:13 -0700, Andres Freund wrote:
> On 2022-08-20 01:35:22 -0700, Andres Freund wrote:
> > I'll send in a patch series tomorrow, too tired for today.
> 
> Here it goes.

> 0001 aix: Fix SHLIB_EXPORTS reference in VPATH builds
> 
>   That's mostly so I could even build. It's not quite right in the sense that
>   we don't depend on the file, but that's a preexisting issue. Could be folded
>   in with 0005, which fixes that aspect. Or it could be backpatched as the
>   minimal fix.
> 
> 
> 0002 Remove SUBSYS.o rule in common.mk, hasn't been used in a long time
> 0003 Remove rule to generate postgres.o, not needed for 20+ years
> 
>   Both obvious, I think.

Pushed these, given that they're all pretty trivial.



> 0004 aix: when building with gcc, tell gcc we're building a shared library
> 
>   That's the gcc -shared issue I explained in the email I'm replying to.
> 
>   We should probably consider building executables with -shared-libgcc too,
>   that shrinks them a decent amount (e.g. 1371684 -> 1126765 for psql). But
>   I've not done that here.
> 
> 
> 0005 aix: No need to use mkldexport when we want to export all symbols
> 
>   This makes the building of shared libraries a lot more similar to other
>   platforms. Export files are only used when an exports.txt is present and
>   there's no more intermediary static libraries.
> 
> 
> 0006 configure: Expand -fvisibility checks to more compilers, add -qvisibility
> 
>   This isn't strictly speaking part of the same "thread" of work, but I don't
>   want to touch aix more often than I have too... I'll post it in the other
>   thread too.
> 
>   I did just test that this passes at least some tests on aix with xlc and
>   solaris with sunpro.

Any comments here?

Greetings,

Andres Freund

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-24 Thread John Naylor

On Thu, Aug 25, 2022 at 1:01 AM Nathan Bossart  wrote:
>
> On Wed, Aug 24, 2022 at 11:07:03AM +0700, John Naylor wrote:
> > The important thing is: if we compile with __aarch64__ as a target:
> > - Will the compiler emit the intended instructions from the intrinsics
> > without extra flags?
>
> My testing with GCC and Clang did not require any extra flags.  GCC appears
> to enable it by default for aarch64 [0].  AFAICT this is the case for Clang
> as well, but that is based on the code and my testing (I couldn't find any
> documentation for this).

I guess you meant this part: "‘simd’ Enable Advanced SIMD
instructions. This also enables floating-point instructions. This is
on by default for all possible values for options -march and -mcpu."

> > - Can a user on ARM64 ever get a runtime fault if the machine attempts
> > to execute NEON instructions?
>
> IIUC yes, although I'm not sure how likely it is in practice.

Given the quoted part above, it doesn't seem likely, but we should try
to find out for sure, because a runtime fault is surely not acceptable
even on a toy system.

> > "I have been able to compile for
> > __aarch64__ without __ARM_NEON" doesn't really answer that question --
> > what exactly did this entail?
>
> Compiling with something like -march=armv8-a+nosimd prevents defining
> __ARM_NEON.

Okay, that's unsurprising.

> Interestingly, Clang still defines __ARM_NEON__ even when
> +nosimd is specified.

POLA violation, but if no one has complained to them, it's a good bet
the instructions are always available.

> > I took a quick look around at Debian code search, *BSD, Apple, and a
> > few other places, and I can't find it. Then, I looked at the
> > discussions around commit 5c7603c318872a42e "Add ARM64 (aarch64)
> > support to s_lock.h", and the proposed patch [1] only had __aarch64__
> > . When it was committed, the platform was vaporware and I suppose we
> > included "__aarch64" as a prophylactic measure because no other reason
> > was given. It doesn't seem to exist anywhere, so unless someone can
> > demonstrate otherwise, I'm going to rip it out soon.
>
> This is what I found, too, so +1.  I've attached a patch for this.

Thanks, I'll push this soon. I wondered if the same reasoning applies
to __arm__ / __arm nowadays, but a quick search does indicate that
__arm exists (existed?), at least.

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: pg_stat_have_stats() returns true for dropped indexes (or for index creation transaction rolled back)

2022-08-24 Thread Kyotaro Horiguchi

Good catch, and thanks for the patch!

(The file name would correctly be v2-0001-...:)

At Tue, 23 Aug 2022 09:58:03 +0200, "Drouvot, Bertrand"  
wrote in 
> Agree it's better to move it to heap_create(): it's done in the new
> version attached.

+1 (not considering stats splitting)

> We'll see later on if it needs to be duplicated for the table/index
> split work.

The code changes looks good to me.

> >> It does contain additional calls to pgstat_create_relation() and
> >> pgstat_drop_relation() as well as additional TAP tests.
> > Would be good to add a test for CREATE INDEX / DROP INDEX / REINDEX
> > CONCURRENTLY as well.
> >
> > Might be worth adding a test to stats.sql or stats.spec in the main
> > regression
> > tests. Perhaps that's best where the aforementioned things should be
> > tested?
> 
> Yeah that sounds better, I'm also adding more tests around table
> creation while at it.
> 
> I ended up adding the new tests in stats.sql.

+-- pg_stat_have_stats returns true for table creation inserting data
+-- pg_stat_have_stats returns true for committed index creation
+

Not sure we need this, as we check that already in the same file. (In
other words, if we failed this, the should have failed earlier.) Maybe
we only need check for drop operations and reindex cases?

We have other variable-numbered stats kinds
FUNCTION/REPLSLOT/SUBSCRIPTION. Don't we need the same for these?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_regress: lookup shellprog in $PATH

2022-08-24 Thread Tom Lane

I wrote:
> Given the lack of complaints about how pg_ctl works, I'd be inclined
> to follow its lead and just hard-wire "/bin/sh", removing the whole
> SHELLPROG/shellprog dance.  I have not heard of anyone using the
> theoretical ability to compile pg_regress with some other value.

git blame blames that whole mechanism on me: 60cfe25e68d.  It looks
like the reason I did it like that is that I was replacing use of
system(3) with execl(), and system(3) is defined thus by POSIX:

execl(, "sh", "-c", command, (char *)0);

where  is an unspecified pathname for the sh utility.

Using SHELL for the "unspecified path" is already a bit of a leap
of faith, since users are allowed to make that point at a non-Bourne
shell.  I don't see any strong reason to preserve that behavior.

regards, tom lane

Re: pg_regress: lookup shellprog in $PATH

2022-08-24 Thread Tom Lane

Gurjeet Singh  writes:
> Please see attached the one-letter patch that fixes this problem. I have
> chosen to replace the execl() call with execlp(), which performs a lookup
> in $PATH, and finds the 'sh' to use for running the postmaster.

I can't say that I think this is a great fix.  It creates security
questions that did not exist before, even without the point you
make about Windows considering execlp deprecated.

Given the lack of complaints about how pg_ctl works, I'd be inclined
to follow its lead and just hard-wire "/bin/sh", removing the whole
SHELLPROG/shellprog dance.  I have not heard of anyone using the
theoretical ability to compile pg_regress with some other value.

> The Nixpkgs and NixOS distro includes all the supported versions of
> Postgres, so one would assume they would've also encountered, and solved,
> this problem. But they didn't. My best guess as to why, is, I believe they
> never bothered to run `make check` on their built binaries.

TBH, it's not clear to me that that project is competent enough to
be something we should take into account.  But in any case, I'd
rather see us using fewer ways to do this, not more.

regards, tom lane

Re: shadow variables - pg15 edition

2022-08-24 Thread Justin Pryzby

On Wed, Aug 24, 2022 at 10:47:31PM +1200, David Rowley wrote:
> On Wed, 24 Aug 2022 at 14:39, Justin Pryzby  wrote:
> > Attached are half of the remainder of what I've written, ready for review.
> 
> Thanks for the patches.

> 4. "Repurpose" (variables have the same purpose and may as well use
> the same variable)

> Would you be able to write a patch for #4.

The first of the patches that I sent yesterday was all about "repurposed" vars
from outer scope (lc, l, isnull, save_errno), and was 70% of your list of vars
to repurpose.

Here, I've included the rest of your list.

Plus another patch for vars which I'd already written patches to repurpose, but
which aren't classified as "repurpose" on your list.

For subselect.c, you could remove some more "lc" vars and re-use the "l" var
for consistency (but I suppose you won't want that).

-- 
Justin
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 87b243e0d4b..a090cada400 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3017,46 +3017,45 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID 
logtli,
}
pgstat_report_wait_end();

if (save_errno)
{
/*
 * If we fail to make the file, delete it to release disk space
 */
unlink(tmppath);

close(fd);

errno = save_errno;

ereport(ERROR,
(errcode_for_file_access(),
 errmsg("could not write to file \"%s\": %m", 
tmppath)));
}

pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC);
if (pg_fsync(fd) != 0)
{
-   int save_errno = errno;
-
+   save_errno = errno;
close(fd);
errno = save_errno;
ereport(ERROR,
(errcode_for_file_access(),
 errmsg("could not fsync file \"%s\": %m", 
tmppath)));
}
pgstat_report_wait_end();

if (close(fd) != 0)
ereport(ERROR,
(errcode_for_file_access(),
 errmsg("could not close file \"%s\": %m", 
tmppath)));

/*
 * Now move the segment into place with its final name.  Cope with
 * possibility that someone else has created the file while we were
 * filling ours: if so, use ours to pre-create a future log segment.
 */
installed_segno = logsegno;

/*
 * XXX: What should we use as max_segno? We used to use XLOGfileslop 
when
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 9be04c8a1e7..dacc989d855 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -16777,45 +16777,44 @@ PreCommit_on_commit_actions(void)
oids_to_truncate = 
lappend_oid(oids_to_truncate, oc->relid);
break;
case ONCOMMIT_DROP:
oids_to_drop = lappend_oid(oids_to_drop, 
oc->relid);
break;
}
}

/*
 * Truncate relations before dropping so that all dependencies between
 * relations are removed after they are worked on.  Doing it like this
 * might be a waste as it is possible that a relation being truncated 
will
 * be dropped anyway due to its parent being dropped, but this makes the
 * code more robust because of not having to re-check that the relation
 * exists at truncation time.
 */
if (oids_to_truncate != NIL)
heap_truncate(oids_to_truncate);

if (oids_to_drop != NIL)
{
ObjectAddresses *targetObjects = new_object_addresses();
-   ListCell   *l;

foreach(l, oids_to_drop)
{
ObjectAddress object;

object.classId = RelationRelationId;
object.objectId = lfirst_oid(l);
object.objectSubId = 0;

Assert(!object_address_present(&object, targetObjects));

add_exact_object_address(&object, targetObjects);
}

/*
 * Since this is an automatic drop, rather than one directly 
initiated
 * by the user, we pass the PERFORM_DELETION_INTERNAL flag.
 */
performMultipleDeletions(targetObjects, DROP_CASCADE,

PERFORM_DELETION_INTERNAL | PERFORM_DELETION_QUIETLY);

 #ifdef USE_ASSERT_CHECKING
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index dbdfe8bd2d4..3670d1f1861 10

Re: shadow variables - pg15 edition

2022-08-24 Thread David Rowley

On Wed, 24 Aug 2022 at 22:47, David Rowley  wrote:
> 5. "Refactor" (fix the code to make it better)

> I have some ideas on how to fix the two #5s, so I'm going to go and do that 
> now.

I've attached a patch which I think improves the code in
gistRelocateBuildBuffersOnSplit() so that there's no longer a shadowed
variable. I also benchmarked this method in a tight loop and can
measure no performance change from getting the loop index this way vs
the old way.

This only fixes one of the #5s I mentioned. I ended up scraping my
idea to fix the shadowed 'i' in get_qual_for_range() as it became too
complex.  The idea was to use list_cell_number() to find out how far
we looped in the forboth() loop.  It turned out that 'i' was used in
the subsequent loop in "j = i;". The fix just became too complex and I
didn't think it was worth the risk of breaking something just to get
rid of the showed 'i'.

David
diff --git a/src/backend/access/gist/gistbuildbuffers.c 
b/src/backend/access/gist/gistbuildbuffers.c
index eabf746018..c6c7dfe4c2 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -543,8 +543,7 @@ gistRelocateBuildBuffersOnSplit(GISTBuildBuffers *gfbb, 
GISTSTATE *giststate,
GISTNodeBuffer *nodeBuffer;
BlockNumber blocknum;
IndexTuple  itup;
-   int splitPagesCount = 0,
-   i;
+   int splitPagesCount = 0;
GISTENTRY   entry[INDEX_MAX_KEYS];
boolisnull[INDEX_MAX_KEYS];
GISTNodeBuffer oldBuf;
@@ -595,11 +594,11 @@ gistRelocateBuildBuffersOnSplit(GISTBuildBuffers *gfbb, 
GISTSTATE *giststate,
 * Fill relocation buffers information for node buffers of pages 
produced
 * by split.
 */
-   i = 0;
foreach(lc, splitinfo)
{
GISTPageSplitInfo *si = (GISTPageSplitInfo *) lfirst(lc);
GISTNodeBuffer *newNodeBuffer;
+   int i = foreach_current_index(lc);

/* Decompress parent index tuple of node buffer page. */
gistDeCompressAtt(giststate, r,
@@ -618,8 +617,6 @@ gistRelocateBuildBuffersOnSplit(GISTBuildBuffers *gfbb, 
GISTSTATE *giststate,

relocationBuffersInfos[i].nodeBuffer = newNodeBuffer;
relocationBuffersInfos[i].splitinfo = si;
-
-   i++;
}

/*

Re: Tracking last scan time

2022-08-24 Thread David Rowley

On Thu, 25 Aug 2022 at 03:03, Bruce Momjian  wrote:
>
> On Wed, Aug 24, 2022 at 04:01:21PM +0100, Dave Page wrote:
> > On Wed, 24 Aug 2022 at 15:18, Bruce Momjian  wrote:
> > Would it be simpler to allow the sequential and index scan columns to be
> > cleared so you can look later to see if it is non-zero?  Should we allow
> >
> > I don't think so, because then stat values wouldn't necessarily correlate 
> > with
> > each other, and you wouldn't know when any of them were last reset unless we
> > started tracking each individual reset. At least now you can see when they 
> > were
> > all reset, and you know they were reset at the same time.
>
> Yeah, true.  I was more asking if these two columns are in some way
> special or if people would want a more general solution, and if so, is
> that something we want in core Postgres.

Back when I used to do a bit of PostgreSQL DBA stuff, I had a nightly
job setup to record the state of pg_stat_all_tables and put that into
another table along with the current date. I then had a view that did
some calculations with col - LAG(col) OVER (PARTITION BY relid ORDER
BY date) to fetch the numerical values for each date.  I didn't ever
want to reset the stats because it messes with autovacuum. If you zero
out n_ins_since_vacuum more often than auto-vacuum would trigger, then
bad things happen over time (we should really warn about that in the
docs).

I don't have a particular opinion about the patch, I'm just pointing
out that there are other ways. Even just writing down the numbers on a
post-it note and coming back in a month to see if they've changed is
enough to tell if the table or index has been used.

We do also need to consider now that stats are stored in shared memory
that any fields we add are in RAM.

David

Re: pg_upgrade failing for 200+ million Large Objects

2022-08-24 Thread Nathan Bossart

On Wed, Mar 24, 2021 at 12:05:27PM -0400, Jan Wieck wrote:
> On 3/24/21 12:04 PM, Jan Wieck wrote:
>> In any case I changed the options so that they behave the same way, the
>> existing -o and -O (for old/new postmaster options) work. I don't think
>> it would be wise to have option forwarding work differently between
>> options for postmaster and options for pg_dump/pg_restore.
> 
> Attaching the actual diff might help.

I'd like to revive this thread, so I've created a commitfest entry [0] and
attached a hastily rebased patch that compiles and passes the tests.  I am
aiming to spend some more time on this in the near future.

[0] https://commitfest.postgresql.org/39/3841/

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index c8a70d9bc1..faf1953e18 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -858,6 +858,11 @@ RunWorker(ArchiveHandle *AH, ParallelSlot *slot)
 	 */
 	WaitForCommands(AH, pipefd);
 
+	/*
+	 * Close an eventually open BLOB batch transaction.
+	 */
+	CommitBlobTransaction((Archive *)AH);
+
 	/*
 	 * Disconnect from database and clean up.
 	 */
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index fcc5f6bd05..f16ecdecc0 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -220,6 +220,8 @@ typedef struct Archive
 	int			numWorkers;		/* number of parallel processes */
 	char	   *sync_snapshot_id;	/* sync snapshot id for parallel operation */
 
+	int			blobBatchSize;	/* # of blobs to restore per transaction */
+
 	/* info needed for string escaping */
 	int			encoding;		/* libpq code for client_encoding */
 	bool		std_strings;	/* standard_conforming_strings */
@@ -290,6 +292,7 @@ extern void WriteData(Archive *AH, const void *data, size_t dLen);
 extern int	StartBlob(Archive *AH, Oid oid);
 extern int	EndBlob(Archive *AH, Oid oid);
 
+extern void	CommitBlobTransaction(Archive *AH);
 extern void CloseArchive(Archive *AH);
 
 extern void SetArchiveOptions(Archive *AH, DumpOptions *dopt, RestoreOptions *ropt);
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 233198afc0..7cfbed5e75 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -68,6 +68,7 @@ typedef struct _parallelReadyList
 	bool		sorted;			/* are valid entries currently sorted? */
 } ParallelReadyList;
 
+static int		blobBatchCount = 0;
 
 static ArchiveHandle *_allocAH(const char *FileSpec, const ArchiveFormat fmt,
 			   const int compression, bool dosync, ArchiveMode mode,
@@ -266,6 +267,8 @@ CloseArchive(Archive *AHX)
 	int			res = 0;
 	ArchiveHandle *AH = (ArchiveHandle *) AHX;
 
+	CommitBlobTransaction(AHX);
+
 	AH->ClosePtr(AH);
 
 	/* Close the output */
@@ -279,6 +282,23 @@ CloseArchive(Archive *AHX)
 		pg_fatal("could not close output file: %m");
 }
 
+/* Public */
+void
+CommitBlobTransaction(Archive *AHX)
+{
+	ArchiveHandle *AH = (ArchiveHandle *) AHX;
+
+	if (blobBatchCount > 0)
+	{
+		ahprintf(AH, "--\n");
+		ahprintf(AH, "-- End BLOB restore batch\n");
+		ahprintf(AH, "--\n");
+		ahprintf(AH, "COMMIT;\n\n");
+
+		blobBatchCount = 0;
+	}
+}
+
 /* Public */
 void
 SetArchiveOptions(Archive *AH, DumpOptions *dopt, RestoreOptions *ropt)
@@ -3489,6 +3509,57 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
 {
 	RestoreOptions *ropt = AH->public.ropt;
 
+	/* We restore BLOBs in batches to reduce XID consumption */
+	if (strcmp(te->desc, "BLOB") == 0 && AH->public.blobBatchSize > 0)
+	{
+		if (blobBatchCount > 0)
+		{
+			/* We are inside a BLOB restore transaction */
+			if (blobBatchCount >= AH->public.blobBatchSize)
+			{
+/*
+ * We did reach the batch size with the previous BLOB.
+ * Commit and start a new batch.
+ */
+ahprintf(AH, "--\n");
+ahprintf(AH, "-- BLOB batch size reached\n");
+ahprintf(AH, "--\n");
+ahprintf(AH, "COMMIT;\n");
+ahprintf(AH, "BEGIN;\n\n");
+
+blobBatchCount = 1;
+			}
+			else
+			{
+/* This one still fits into the current batch */
+blobBatchCount++;
+			}
+		}
+		else
+		{
+			/* Not inside a transaction, start a new batch */
+			ahprintf(AH, "--\n");
+			ahprintf(AH, "-- Start BLOB restore batch\n");
+			ahprintf(AH, "--\n");
+			ahprintf(AH, "BEGIN;\n\n");
+
+			blobBatchCount = 1;
+		}
+	}
+	else
+	{
+		/* Not a BLOB. If we have a BLOB batch open, close it. */
+		if (blobBatchCount > 0)
+		{
+			ahprintf(AH, "--\n");
+			ahprintf(AH, "-- End BLOB restore batch\n");
+			ahprintf(AH, "--\n");
+			ahprintf(AH, "COMMIT;\n\n");
+
+			blobBatchCount = 0;
+		}
+	}
+
 	/* Select owner, schema, tablespace and default AM as necessary */
 	_becomeOwner(AH, te);
 	_selectOutputSchema(AH, te->namespace);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index d25709ad5f..17c0dd7f0c 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -196,11 +196,20 @@ static inl

Re: SQL/JSON features for v15

2022-08-24 Thread Andrew Dunstan



On 2022-08-24 We 20:05, Nikita Glukhov wrote:
>
>
> v8 - is a highly WIP patch, which I failed to finish today.
> Even some test cases fail now, and they simply show unfinished
> things like casts to bytea (they can be simply removed) and missing
> safe input functions.
>

Thanks for your work, please keep going.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Strip -mmacosx-version-min options from plperl build

2022-08-24 Thread Andrew Dunstan



On 2022-08-24 We 18:56, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 2022-08-24 We 09:30, Tom Lane wrote:
>>> Presumably this is caused by not having
 -Wl,--export-all-symbols
>>> which is something we ought to be injecting for ourselves if we
>>> aren't doing anything to export the magic-block constant explicitly.
>>> But I too am confused why we haven't seen this elsewhere.
>> Me too. I note that we have -Wl,--out-implib=libplperl.a but we don't
>> appear to do anything with libplperl.a.
> I've poked around and formed a vague theory, based on noting this
> from the ld(1) man page:
>
>--export-all-symbols
>... When symbols are
>explicitly exported via DEF files or implicitly exported via
>function attributes, the default is to not export anything else
>unless this option is given.
>
> So we could explain the behavior if, say, plperl's _PG_init were
> explicitly marked with __attribute__((visibility("default"))) while
> its Pg_magic_func was not.  That would work anyway as long as
> --export-all-symbols was being used at link time, and would produce
> the observed symptom as soon as it wasn't.
>
> Now, seeing that both of those functions are surely marked with
> PGDLLEXPORT in the source code, how could such a state of affairs
> arise?  What I'm thinking about is that _PG_init's marking will be
> determined by the extern declaration for it in fmgr.h, while
> Pg_magic_func's marking will be determined by the extern declaration
> obtained from expanding PG_MODULE_MAGIC.  And there are a boatload
> of Perl-specific header files read between those points in plperl.c.
>
> In short: if the Cygwin Perl headers redefine PGDLLEXPORT (unlikely)
> or somehow #define "__attribute__()" or "visibility()" into no-ops
> (perhaps more likely) then we could explain this failure, and that
> would also explain why it doesn't fail elsewhere.
>
> I can't readily check this, since I have no idea exactly which version
> of the Perl headers lorikeet uses.
>
>   



It's built against cygwin perl 5.32.


I don't see anything like that in perl.h. It's certainly using
__attribute__() a lot.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Strip -mmacosx-version-min options from plperl build

2022-08-24 Thread Tom Lane

Andrew Dunstan  writes:
> On 2022-08-24 We 09:30, Tom Lane wrote:
>> Presumably this is caused by not having
>> > -Wl,--export-all-symbols
>> which is something we ought to be injecting for ourselves if we
>> aren't doing anything to export the magic-block constant explicitly.
>> But I too am confused why we haven't seen this elsewhere.

> Me too. I note that we have -Wl,--out-implib=libplperl.a but we don't
> appear to do anything with libplperl.a.

I've poked around and formed a vague theory, based on noting this
from the ld(1) man page:

   --export-all-symbols
   ... When symbols are
   explicitly exported via DEF files or implicitly exported via
   function attributes, the default is to not export anything else
   unless this option is given.

So we could explain the behavior if, say, plperl's _PG_init were
explicitly marked with __attribute__((visibility("default"))) while
its Pg_magic_func was not.  That would work anyway as long as
--export-all-symbols was being used at link time, and would produce
the observed symptom as soon as it wasn't.

Now, seeing that both of those functions are surely marked with
PGDLLEXPORT in the source code, how could such a state of affairs
arise?  What I'm thinking about is that _PG_init's marking will be
determined by the extern declaration for it in fmgr.h, while
Pg_magic_func's marking will be determined by the extern declaration
obtained from expanding PG_MODULE_MAGIC.  And there are a boatload
of Perl-specific header files read between those points in plperl.c.

In short: if the Cygwin Perl headers redefine PGDLLEXPORT (unlikely)
or somehow #define "__attribute__()" or "visibility()" into no-ops
(perhaps more likely) then we could explain this failure, and that
would also explain why it doesn't fail elsewhere.

I can't readily check this, since I have no idea exactly which version
of the Perl headers lorikeet uses.

regards, tom lane

Re: shadow variables - pg15 edition

2022-08-24 Thread David Rowley

On Thu, 25 Aug 2022 at 02:00, Justin Pryzby  wrote:
>
> On Wed, Aug 24, 2022 at 10:47:31PM +1200, David Rowley wrote:
> > I was hoping we'd already caught all of the #1s in 421892a19, but I
> > caught a few of those in some of your other patches. One you'd done
> > another way and some you'd done the rescope but just put it in the
> > wrong patch.  The others had not been done yet. I just pushed
> > f959bf9a5 to fix those ones.
>
> This fixed pg_get_statisticsobj_worker() but not pg_get_indexdef_worker() nor
> pg_get_partkeydef_worker().

The latter two can't be fixed in the same way as
pg_get_statisticsobj_worker(), which is why I left them alone.  We can
deal with those when getting onto the next category of warnings, which
I believe should be the "Repurpose"  category.  If you look at the
shadow_analysis spreadsheet then you can see how I've categorised
each.  I'm not pretending those are all 100% accurate. Various cases
the choice of category was subjective.  My aim here is to fix as many
of the warnings as possible in the safest way possible for the
particular warning. This is why pg_get_statisticsobj_worker() wasn't
fixed in the same pass as pg_get_indexdef_worker() and
pg_get_partkeydef_worker().

David

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2022-08-24 Thread Zhihong Yu

Hi,
I was looking at 0004-COPY_IGNORE_ERRORS.patch

+ * Ignore constraints if IGNORE_ERRORS is enabled
+ */
+static void
+safeExecConstraints(CopyFromState cstate, ResultRelInfo *resultRelInfo,
TupleTableSlot *myslot, EState *estate)

I think the existing ExecConstraints() can be expanded by
checking cstate->opts.ignore_errors so that it can selectively
ignore Constraint Violations.

This way you don't need safeExecConstraints().

Cheers

Re: archive modules

2022-08-24 Thread Nathan Bossart

On Wed, Aug 24, 2022 at 10:05:55AM +0200, talk to ben wrote:
> This view does not display  linkend="runtime-config-custom">customized options
> -   until the extension module that defines them has been loaded.
> +   until the extension module that defines them has been loaded. Therefore, 
> any
> +   option defined in a library that is dynamically loaded in a separate 
> process
> +   will not be visible in the view, unless the module is manually loaded
> +   beforehand. This case applies for example to an archive module loaded by 
> the
> +   archiver process.

I would suggest something like:

This view does not display customized options until the extension
module that defines them has been loaded by the backend process
executing the query (e.g., via shared_preload_libraries, the LOAD
command, or a call to a user-defined C function).  For example, since
the archive_library is only loaded by the archiver process, this view
will not display any customized options defined by archive modules
unless special action is taken to load them into the backend process
executing the query.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

pg_regress: lookup shellprog in $PATH

2022-08-24 Thread Gurjeet Singh

I'm trying to build Postgres using the Nix language and the Nix package
manager on macOS (see [1]). After some work I was able to build, and even
run Postgres. But `make check` failed with the error

pg_regress: could not exec "sh": No such file or directory

The reason is that pg_regress uses execl() function to execute the shell
recorded in its shellprog variable, whose value is provided using the SHELL
variable via the makefile. Specifically, this error is because execl()
function expects the full path of the executable as the first parameter,
and it does _not_ perform a lookup in $PATH.

Using execl() in pg_regress has worked in the past, because the default
value of SHELL used by GNUmake (and I presume other make implementations)
is /bin/sh, and /bin/sh expected to be present on any Unix-like system.

But in Nixpkgs (the default and the central package repository of Nix/NixOS
distro), they have chosen to patch GNU make, and turn the default value of
SHELL from '/bin/sh' to just 'sh'; see their patch [2]. They did this
because Nixpkgs consider any files outside the Nix Store (the path
/nix/store/, by default) to be "impure". They want the packagers to use
$PATH (consisting solely of paths that begin with /nix/store/...), to
lookup their binaries and other files.

So when pg_regress tries to run a program (the postmaster, in this case),
the execl() function complains that it could not find 'sh',  since there's
no file ./sh in the directory where pg_regress is being run.

Please see attached the one-letter patch that fixes this problem. I have
chosen to replace the execl() call with execlp(), which performs a lookup
in $PATH, and finds the 'sh' to use for running the postmaster. This patch
does _not_ cause 'make check' or any other failures when Postgres is built
with non-Nix build tools available on macOS.

There is one other use of execl(), in pg_ctl.c, but that is safe from the
behaviour introduced by Nixpkgs, since that call site uses the absolute
path /bin/sh, and hence there's no ambiguity in where to look for the
executable.

There are no previous uses of execlp() in Postgres, which made me rethink
this patch. But I think it's safe to use execlp() since it's part of POSIX,
and it's also supported by Windows (see [3]; they say the function name is
"deprecated" but function is "supported" in the same paragraph!!).

There's one mention of execl in src/pl/plperl/ppport.h, and since it's a
generated file, I believe now execlp also needs to be added to that list.
But I'm not sure how to generate that file, or if it really needs to be
generated and included in this patch; maybe the file is re-generated during
a release process. Need advice on that.

GNU make's docs clearly explain (see [4]) the special handling of variable
SHELL, and it seems impossible to pass this variable from an env variable
down to the GNUmakefile of interest. The only alternative I see for us
being able to pass a custom value via SHELL, is to detect and declare the
SHELL variable in one of our higher-level files; and I don't think that'd
be a good idea.

We could propose to Nixpkgs community that they stop patching make, and
leave the default SHELL value alone. But I see very low likelihood of them
accepting our arguments, or changing their ways.

It took many days of debugging, troubleshooting etc, to get to this
root-cause. I first tried to coax autoconf, make, etc. to pass my custom
SHELL through to pg_regress' makefile. Changing CONFIG_SHELL, or SHELL does
not have any impact. Then I had to read the Nixpkgs code, and work out the
archaic ways the packages are defined, and after much code wrangling I was
able to find out that _they_ changed the default value of SHELL by patching
the make sources.

The Nix language is not so bad, but the way it's used to write code in the
Nix community leaves a lot to be desired; ad-hoc environment variable
naming, polluting the built environment with all kinds of variables, almost
non-existent comments, no standards on indentation, etc. These reasons made
me decide to use the plain Nix language as much as possible, and not rely
on Nixpkgs, whenever I can avoid it.

The Nixpkgs and NixOS distro includes all the supported versions of
Postgres, so one would assume they would've also encountered, and solved,
this problem. But they didn't. My best guess as to why, is, I believe they
never bothered to run `make check` on their built binaries.

[1]: https://github.com/DrPostgres/HaathiFarm/blob/master/default.nix
[2]:
https://github.com/NixOS/nixpkgs/blob/release-22.05/pkgs/development/tools/build-managers/gnumake/0001-No-impure-bin-sh.patch
[3]:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/execlp?view=msvc-170
[4]:
https://www.gnu.org/software/make/manual/html_node/Choosing-the-Shell.html

Best regards,
Gurjeet
http://Gurje.et


pg_regress_use_execlp_to_lookup_shellprog.patch
Description: Binary data

Re: Making Vars outer-join aware

2022-08-24 Thread Tom Lane

Richard Guo  writes:
> On Sun, Aug 21, 2022 at 6:52 AM Tom Lane  wrote:
>> What I'm thinking we should do about this, once we detect that
>> this identity is applicable, is to generate *both* forms of Pbc,
>> either adding or removing the varnullingrels bits depending on
>> which form we got from the parser.

> Do you mean we generate two RestrictInfos for Pbc in the case of
> identity 3, one with varnullingrels and one without varnullingrels, and
> choose the appropriate one when forming join paths?

Right.

> Do we need to also
> generate two SpecialJoinInfos for the B/C join in the first order, with
> and without the A/B join in its min_lefthand?

No, the SpecialJoinInfos would stay as they are now.  It's already the
case that the first join's min_righthand would contain only B, and
the second one's min_righthand would contain only C.

regards, tom lane

Re: making relfilenodes 56 bits

2022-08-24 Thread Robert Haas

On Mon, Aug 1, 2022 at 7:57 AM Dilip Kumar  wrote:
> I have fixed other comments, and also fixed comments from Alvaro to
> use %lld instead of INT64_FORMAT inside the ereport and wherever he
> suggested.

Notwithstanding the ongoing discussion about the exact approach for
the main patch, it seemed OK to push the preparatory patch you posted
here, so I have now done that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-08-24 Thread Ranier Vilela

Em qua., 24 de ago. de 2022 às 16:41, Robert Haas 
escreveu:

> On Wed, Aug 24, 2022 at 3:20 PM Ranier Vilela  wrote:
> > But, these same arguments apply to Designated Initializers [1].
> >
> > like:
> > struct foo a = {
> >.i = 0,
> >.b = 0,
> > };
> >
> > That is slowly being introduced and IMHO brings the same problems with
> padding bits.
>
> Yep. I don't find that an improvement over a MemSet on the struct
> either, if we're just using it to fill in zeroes.
>
> If we're using it to fill in non-zero values, though, then there's a
> reasonable argument that it offers some notational convenience.
>
Even in that case, it still hides bugs.
All arguments against {0} apply entirely to this initialization type.
Because the padding bits remain uninitialized.

Note that where all major compilers are correctly initializing padding bits
with {0}, then this misbehavior will become of no practical effect in the
future.

regards,
Ranier Vilela

Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-08-24 Thread Robert Haas

On Wed, Aug 24, 2022 at 3:20 PM Ranier Vilela  wrote:
> But, these same arguments apply to Designated Initializers [1].
>
> like:
> struct foo a = {
>.i = 0,
>.b = 0,
> };
>
> That is slowly being introduced and IMHO brings the same problems with 
> padding bits.

Yep. I don't find that an improvement over a MemSet on the struct
either, if we're just using it to fill in zeroes.

If we're using it to fill in non-zero values, though, then there's a
reasonable argument that it offers some notational convenience.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-08-24 Thread Ranier Vilela

Em qua., 24 de ago. de 2022 às 16:00, Tom Lane  escreveu:

> Peter Eisentraut  writes:
> > On 24.08.22 16:30, Alvaro Herrera wrote:
> >> If you do this, you're creating a potential backpatching hazard.  This
> >> is OK if we get something in return, so a question to ask is whether
> >> there is any benefit in doing it.
>
> > I don't follow how this is a backpatching hazard.
>
> Call me a trogdolyte, but I don't follow how it's an improvement.
> It looks to me like an entirely random change that doesn't get rid
> of assumptions about what the bits are, it just replaces one set of
> assumptions with a different set.  Moreover, the new set of assumptions
> may include "there are no padding bits in here", which is mighty fragile
> and hard to verify.  So I frankly do not find this a stylistic improvement.
>
But, these same arguments apply to Designated Initializers [1].

like:
struct foo a = {
   .i = 0,
   .b = 0,
};

That is slowly being introduced and IMHO brings the same problems with
padding bits.

regards,
Ranier Vilela

[1] https://interrupt.memfault.com/blog/c-struct-padding-initialization

Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-08-24 Thread Robert Haas

On Wed, Aug 24, 2022 at 3:00 PM Tom Lane  wrote:
> Call me a trogdolyte, but I don't follow how it's an improvement.
> It looks to me like an entirely random change that doesn't get rid
> of assumptions about what the bits are, it just replaces one set of
> assumptions with a different set.  Moreover, the new set of assumptions
> may include "there are no padding bits in here", which is mighty fragile
> and hard to verify.  So I frankly do not find this a stylistic improvement.

Ditto.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-08-24 Thread Tom Lane

Peter Eisentraut  writes:
> On 24.08.22 16:30, Alvaro Herrera wrote:
>> If you do this, you're creating a potential backpatching hazard.  This
>> is OK if we get something in return, so a question to ask is whether
>> there is any benefit in doing it.

> I don't follow how this is a backpatching hazard.

Call me a trogdolyte, but I don't follow how it's an improvement.
It looks to me like an entirely random change that doesn't get rid
of assumptions about what the bits are, it just replaces one set of
assumptions with a different set.  Moreover, the new set of assumptions
may include "there are no padding bits in here", which is mighty fragile
and hard to verify.  So I frankly do not find this a stylistic improvement.

regards, tom lane

Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-08-24 Thread Peter Eisentraut


On 24.08.22 16:30, Alvaro Herrera wrote:

On 2022-Aug-19, David Zhang wrote:

Should these obviously possible replacement of the standard library function
"memset" be considered as well? For example, something like the attached one
which is focusing on the pageinspect extension only.


If you do this, you're creating a potential backpatching hazard.  This
is OK if we get something in return, so a question to ask is whether
there is any benefit in doing it.


I don't follow how this is a backpatching hazard.

Re: ICU for global collation

2022-08-24 Thread Peter Eisentraut


On 24.08.22 10:59, Julien Rouhaud wrote:

I have not tested the patch in details but
this looks rather sane to me on a quick read.  Peter?

Patch looks good to me too.


Committed, thanks.

(This should conclude all the issues discussed in this thread recently.)

Re: [RFC] building postgres with meson - v11

2022-08-24 Thread samay sharma

Hi,

On Wed, Aug 24, 2022 at 8:30 AM Andres Freund  wrote:

> Hi,
>
> On 2022-08-24 11:39:06 +0200, Peter Eisentraut wrote:
> > I have looked at your branch at 0545eec895:
> >
> > 258f6dc0a7 Don't hardcode tmp_check/ as test directory for tap tests
> > 8ecc33cf04 Split TESTDIR into TESTLOGDIR and TESTDATADIR
> >
> > I think these patches are split up a bit incorrectly.  If you apply
> > the first patch by itself, then the output appears in tab_comp_dir/
> > directly under the source directory.  And then the second patch moves
> > it to tmp_check/tap_comp_dir/.  If there is an intent to apply these
> > patches separately somehow, this should be cleaned up.
>
> How is that happening with that version of the patch? The test puts
> tap_comp_dir under TESTDIR, and TESTDIR is $(CURDIR)/tmp_check. There was
> an
> earlier version of the patch that was split one more time that did have
> that
> problem, but I don't quite see how that version has it?
>
>
> > I haven't checked the second patch in detail yet, but it looks like
> > the thought was that the first patch is about ready to go.
> >
> > 834a40e609 meson: prereq: Extend gendef.pl in preparation for meson
> >
> > I'm not qualified to check that in detail, but it looks reasonable
> > enough to me.
> >
> > See attached patch (0001) for a perlcritic fix.
>
> Thanks.
>
>
> > 97a0b096e8 meson: prereq: Add src/tools/gen_export.pl
> >
> > This produces leading whitespace in the output files that at least on
> > darwin wasn't there before.  See attached patch (0002).  This should
> > be checked again on other platforms as well.
>
> Hm, to me the indentation as is makes more sense, but ...
>
> > Other than that this looks good.  Attached is a small cosmetic patch
> (0003).
>
> I wonder if we should rewrite this in python - I chose perl because I
> thought
> we could share it, but as you pointed out, that's not possible, because we
> don't want to depend on perl during the autoconf build from a tarball.
>
>
> > e0a8387660 solaris: Use versioning scripts instead of -Bsymbolic
> >
> > This looks like a good idea.  The documentation clearly states that
> > -Bsymbolic shouldn't be used, at least not in the way we have been
> > doing.  Might as well go ahead with this and give it a whirl on the
> > build farm.
>
> Cool. I looked at this because I was confused about getting warnings with
> autoconf that I wasn't getting with meson.
>
>
> > 0545eec895 meson: Add docs
> >
> > We should think more about how to arrange the documentation.  We
> > probably don't want to copy-and-paste all the introductory and
> > requirements information.  I think we can make this initially much
> > briefer, like the Windows installation chapter.  For example, instead
> > of documenting each setup option again, just mention which ones exist
> > and then point (link) to the configure chapter for details.
>
> The current docs, including the windows ones, are already hard to follow. I
> think we should take some care to not make the meson bits even more
> confusing. Cross referencing left and right seems problematic from that
> angle.
>

On Configure options:

To add to the above, very few sections are an exact copy paste. The
arguments and default behaviors of quite a few configure options are
different. The change in default behavior and arguments is primarily due to
"auto" features which get enabled if the dependencies are found. Whereas
with make, we have explicit --enable or --disable options which don't take
any arguments.

Also, a few instructions / commands which worked with make will need to be
done a bit differently due to environment variables etc. which also had to
be communicated.

Communicating these differences and nuances with cross referencing would
make it confusing as most of this information is in the explanation
paragraph.

On requirements:

They are also a bit different eg. readline is not a "required" thing
anymore, perl, flex, bison are required etc. Also, these are bullet points
with information inlined and not separate sections, so cross-referencing
here also would be hard.

Regards,
Samay


> > I spent a bit of time with the test suites.  I think there is a
> > problem in that selecting a test suite directly, like
> >
> > meson test -C _build --suite recovery
> >
> > doesn't update the tmp_install.  So if this is the first thing you run
> > after a build, everything will fail.  Also, if you run this later, the
> > tmp_install doesn't get updated, so you're not testing up-to-date
> > code.
>
> At the moment creation of the tmp_install is its own test suite. I don't
> know
> if that's the best way, or what the best way is, but that explains that
> fact. You can do the above without the issue by specifying
> --suite setup --suite recovery.
>
>
> Greetings,
>
> Andres Freund
>

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-24 Thread Nathan Bossart

On Wed, Aug 24, 2022 at 11:07:03AM +0700, John Naylor wrote:
> The important thing is: if we compile with __aarch64__ as a target:
> - Will the compiler emit the intended instructions from the intrinsics
> without extra flags?

My testing with GCC and Clang did not require any extra flags.  GCC appears
to enable it by default for aarch64 [0].  AFAICT this is the case for Clang
as well, but that is based on the code and my testing (I couldn't find any
documentation for this).

> - Can a user on ARM64 ever get a runtime fault if the machine attempts
> to execute NEON instructions?

IIUC yes, although I'm not sure how likely it is in practice.

> "I have been able to compile for
> __aarch64__ without __ARM_NEON" doesn't really answer that question --
> what exactly did this entail?

Compiling with something like -march=armv8-a+nosimd prevents defining
__ARM_NEON.  Interestingly, Clang still defines __ARM_NEON__ even when
+nosimd is specified.

> I took a quick look around at Debian code search, *BSD, Apple, and a
> few other places, and I can't find it. Then, I looked at the
> discussions around commit 5c7603c318872a42e "Add ARM64 (aarch64)
> support to s_lock.h", and the proposed patch [1] only had __aarch64__
> . When it was committed, the platform was vaporware and I suppose we
> included "__aarch64" as a prophylactic measure because no other reason
> was given. It doesn't seem to exist anywhere, so unless someone can
> demonstrate otherwise, I'm going to rip it out soon.

This is what I found, too, so +1.  I've attached a patch for this.

[0] https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h
index f7cd0f6f20..b14ce832bf 100644
--- a/src/include/port/atomics.h
+++ b/src/include/port/atomics.h
@@ -63,8 +63,7 @@
  * compiler barrier.
  *
  */
-#if defined(__arm__) || defined(__arm) || \
-	defined(__aarch64__) || defined(__aarch64)
+#if defined(__arm__) || defined(__arm) || defined(__aarch64__)
 #include "port/atomics/arch-arm.h"
 #elif defined(__i386__) || defined(__i386) || defined(__x86_64__)
 #include "port/atomics/arch-x86.h"
diff --git a/src/include/port/atomics/arch-arm.h b/src/include/port/atomics/arch-arm.h
index 9fe8f1b95f..7449f8404a 100644
--- a/src/include/port/atomics/arch-arm.h
+++ b/src/include/port/atomics/arch-arm.h
@@ -21,7 +21,7 @@
  * 64 bit atomics on ARM32 are implemented using kernel fallbacks and thus
  * might be slow, so disable entirely. On ARM64 that problem doesn't exist.
  */
-#if !defined(__aarch64__) && !defined(__aarch64)
+#if !defined(__aarch64__)
 #define PG_DISABLE_64_BIT_ATOMICS
 #else
 /*
@@ -29,4 +29,4 @@
  * general purpose register is atomic.
  */
 #define PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY
-#endif  /* __aarch64__ || __aarch64 */
+#endif  /* __aarch64__ */
diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
index cc83d561b2..65aa66c598 100644
--- a/src/include/storage/s_lock.h
+++ b/src/include/storage/s_lock.h
@@ -256,7 +256,7 @@ spin_delay(void)
  * We use the int-width variant of the builtin because it works on more chips
  * than other widths.
  */
-#if defined(__arm__) || defined(__arm) || defined(__aarch64__) || defined(__aarch64)
+#if defined(__arm__) || defined(__arm) || defined(__aarch64__)
 #ifdef HAVE_GCC__SYNC_INT32_TAS
 #define HAS_TEST_AND_SET
 
@@ -277,7 +277,7 @@ tas(volatile slock_t *lock)
  * high-core-count ARM64 processors.  It seems mostly a wash for smaller gear,
  * and ISB doesn't exist at all on pre-v7 ARM chips.
  */
-#if defined(__aarch64__) || defined(__aarch64)
+#if defined(__aarch64__)
 
 #define SPIN_DELAY() spin_delay()
 
@@ -288,9 +288,9 @@ spin_delay(void)
 		" isb;\n");
 }
 
-#endif	 /* __aarch64__ || __aarch64 */
+#endif	 /* __aarch64__ */
 #endif	 /* HAVE_GCC__SYNC_INT32_TAS */
-#endif	 /* __arm__ || __arm || __aarch64__ || __aarch64 */
+#endif	 /* __arm__ || __arm || __aarch64__ */
 
 
 /*

Re: [RFC] building postgres with meson - v11

2022-08-24 Thread Andres Freund

Hi,

On 2022-08-17 14:53:17 -0700, Andres Freund wrote:
> > - In the top-level meson.build, the "renaming" of the Windows system
> >   name
> >
> >   host_system = host_machine.system() == 'windows' ? 'win32' :
> > host_machine.system()
> >   build_system = build_machine.system() == 'windows' ? 'win32' :
> > build_machine.system()
> >
> >   seems unnecessary to me.  Why not stick with the provided names?
> 
> Because right now we also use it for things like choosing the "source" for
> pg_config_os.h (i.e. include/port/{darwin,linux,win32,..}.h). And it seemed
> easier to just have one variable name for all of it.

I am now changing this so that there's an additional 'portname' variable for
this purpose. Otherwise the meson names are used.

Greetings,

Andres Freund

Re: Stack overflow issue

2022-08-24 Thread Tom Lane

I wrote:
> I think most likely we should report this to Snowball upstream
> and see what they think is an appropriate fix.

Done at [1], and I pushed the other fixes.  Thanks again for the report!

regards, tom lane

[1] 
https://lists.tartarus.org/pipermail/snowball-discuss/2022-August/001734.html

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2022-08-24 Thread Damir Belyalov

>
From 09befdad45a6b1ae70d6c5abc90d1c2296e56ee1 Mon Sep 17 00:00:00 2001
From: Damir Belyalov 
Date: Fri, 15 Oct 2021 11:55:18 +0300
Subject: [PATCH] COPY_IGNORE_ERRORS with GUC for replay_buffer size

---
 doc/src/sgml/config.sgml  |  17 ++
 doc/src/sgml/ref/copy.sgml|  19 ++
 src/backend/commands/copy.c   |   8 +
 src/backend/commands/copyfrom.c   | 114 +++-
 src/backend/commands/copyfromparse.c  | 169 ++
 src/backend/parser/gram.y |   8 +-
 src/backend/utils/misc/guc.c  |  11 ++
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/bin/psql/tab-complete.c   |   3 +-
 src/include/commands/copy.h   |   6 +
 src/include/commands/copyfrom_internal.h  |  19 ++
 src/include/parser/kwlist.h   |   1 +
 src/test/regress/expected/copy2.out   | 130 ++
 src/test/regress/sql/copy2.sql| 116 
 14 files changed, 617 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 37fd80388c..69373b8d8c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1961,6 +1961,23 @@ include_dir 'conf.d'
   
  
 
+ 
+  replay_buffer_size (integer)
+  
+   replay_buffer_size configuration parameter
+  
+  
+  
+   
+Specifies the size of buffer for COPY FROM IGNORE_ERRORS option. This buffer
+is created when subtransaction begins and accumulates tuples until an error
+occurs. Then it starts replaying stored tuples. the buffer size is the size
+of the subtransaction. Therefore, on large tables, in order to avoid the
+error of the maximum number of subtransactions, it should be increased.
+   
+  
+ 
+
  
   max_stack_depth (integer)
   
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8aae711b3b..7ff6f6dea7 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { table_name [ ( format_name
 FREEZE [ boolean ]
+IGNORE_ERRORS [ boolean ]
 DELIMITER 'delimiter_character'
 NULL 'null_string'
 HEADER [ boolean | MATCH ]
@@ -233,6 +234,24 @@ COPY { table_name [ ( 

 
+   
+IGNORE_ERRORS
+
+ 
+  Drop rows that contain malformed data while copying. These are rows
+  containing syntax errors in data, rows with too many or too few columns,
+  rows that result in constraint violations, rows containing columns where
+  the data type's input function raises an error.
+  Outputs warnings about rows with incorrect data (the number of warnings
+  is not more than 100) and the total number of errors.
+  The option is implemented with subtransactions and a buffer created in
+  them, which accumulates tuples until an error occures.
+  The size of buffer is the size of subtransaction block.
+  It is a GUC parameter "replay_buffer_size" and equals 1000 by default.
+ 
+
+   
+

 DELIMITER
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3ac731803b..fead1aba46 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -402,6 +402,7 @@ ProcessCopyOptions(ParseState *pstate,
 {
 	bool		format_specified = false;
 	bool		freeze_specified = false;
+	bool		ignore_errors_specified = false;
 	bool		header_specified = false;
 	ListCell   *option;
 
@@ -442,6 +443,13 @@ ProcessCopyOptions(ParseState *pstate,
 			freeze_specified = true;
 			opts_out->freeze = defGetBoolean(defel);
 		}
+		else if (strcmp(defel->defname, "ignore_errors") == 0)
+		{
+			if (ignore_errors_specified)
+errorConflictingDefElem(defel, pstate);
+			ignore_errors_specified = true;
+			opts_out->ignore_errors = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "delimiter") == 0)
 		{
 			if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index a976008b3d..7e997d15c6 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -73,6 +73,11 @@
 /* Trim the list of buffers back down to this number after flushing */
 #define MAX_PARTITION_BUFFERS	32
 
+/*
+ * GUC parameters
+ */
+int		replay_buffer_size;
+
 /* Stores multi-insert data related to a single relation in CopyFrom. */
 typedef struct CopyMultiInsertBuffer
 {
@@ -100,12 +105,13 @@ typedef struct CopyMultiInsertInfo
 	int			ti_options;		/* table insert options */
 } CopyMultiInsertInfo;
 
-
 /* non-export function prototypes */
 static char *limit_printout_length(const char *str);
 
 static void ClosePipeFromProgram(CopyFromState cstate);
 
+static void safeExecConstraints(CopyFromState cstate, ResultRelInfo *resultRelInfo, TupleTableSlot *myslot, EState *estate);
+
 /*
  * error context callback for COPY FROM
  *
@@ -521,6 +527,61 @@ CopyMulti

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-24 Thread Nathan Bossart

On Wed, Aug 24, 2022 at 11:59:25AM +0700, John Naylor wrote:
> It seems "scalar" would be a bad choice since it already means
> (confusingly) operating on the least significant element of a vector.
> I'm thinking of *_has and *_has_le, matching the already existing in
> the earlier patch *_has_zero.

That seems reasonable to me.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2022-08-24 Thread Damir Belyalov

>
> > +   /* Buffer was filled, commit subtransaction and prepare
> to replay */
> > +   ReleaseCurrentSubTransaction();
> What is actually being committed by this ReleaseCurrentSubTransaction()?
> It seems to me that just safecstate->replay_buffer is fulfilled before
> this commit.
>
All tuples are collected in replay_buffer, which is created in
''oldcontext'' of CopyFrom() (not context of a subtransaction). That's why
it didn't clean up when you used RollbackAndReleaseCurrentSubTransaction().
Subtransactions are needed for better safety. There is no error when
copying from a file to the replay_buffer, but an error may occur at the
next stage - when adding a tuple to the table. Also there may be other
errors that are not obvious at first glance.

I feel it might be better to have it as a parameter rather than fixed at
> 1000.
>
Yes, I thought about it too. So I created another version with the GUC
parameter - replay_buffer_size. Attached it. But I think users won't need
to change replay_buffer_size.
Also replay_buffer does the same thing as MultiInsert buffer does and
MultiInsert buffer defined by const = 1000.

NextCopyFrom() is in copyfromparse.c while safeNextCopyFrom() is in
> copyfrom.c.
> Since safeNextCopyFrom() is analog of NextCopyFrom() as commented, would
> it be natural to put them in the same file?
>
Sure, corrected it.


> > 188 +   safecstate->errors++;
> > 189 +   if (safecstate->errors <= 100)
> > 190 +   ereport(WARNING,
> > 191 +   (errcode(errdata->sqlerrcode),
> > 192 +   errmsg("%s", errdata->context)));
>
> It would be better to state in the manual that errors exceeding 100 are
> not displayed.
> Or, it might be acceptable to output all errors that exceed 100.
>
It'll be too complicated to create a new parameter just for showing the
given number of errors. I think 100 is an optimal size.


> > +typedef struct SafeCopyFromState
> > +{
> > +#defineREPLAY_BUFFER_SIZE 1000
> > +   HeapTuple   replay_buffer[REPLAY_BUFFER_SIZE]; /* accumulates
> > tuples for replaying it after an error */
> > +   int saved_tuples;   /* # of tuples in
> > replay_buffer */
> > +   int replayed_tuples;/* # of tuples was
> > replayed from buffer */
> > +   int errors; /* total # of errors */
> > +   boolreplay_is_active;
> > +   boolbegin_subtransaction;
> > +   boolprocessed_remaining_tuples; /* for case of
> > replaying last tuples */
> > +   boolskip_row;
>
> It would be helpful to add comments about skip_row, etc.
>
Corrected it.


> WARNING:  FIND 0 ERRORS
> When there were no errors, this WARNING seems not necessary.
>
Corrected it.

Add to this patch processing other errors and constraints and tests for
them.
I had to create another function safeExecConstraints() only for processing
constraints, because ExecConstraints() is after NextCopyFrom() and is not
in PG_TRY. This thing a little bit complicated the code.
Maybe it is a good approach to create a new function SafeCopyFrom() and do
all ''safe copying'' in PG_TRY, but it will almost duplicate the CopyFrom()
code.
Or maybe create a function only for loop for(;;). But we have the same
thing with duplicating code and passing a lot of variables (which are
created at the beginning of CopyFrom()) to this function.

Re: O(n) tasks cause lengthy startups and checkpoints

2022-08-24 Thread Nathan Bossart

On Thu, Aug 11, 2022 at 04:09:21PM -0700, Nathan Bossart wrote:
> Here is a rebased patch set for cfbot.  There are no other differences
> between v7 and v8.

Another rebase for cfbot.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 6810355cb3d1a03326b152aebe3c907f7544be4f Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Jan 2022 19:24:22 +
Subject: [PATCH v9 1/6] Introduce custodian.

The custodian process is a new auxiliary process that is intended
to help offload tasks could otherwise delay startup and
checkpointing.  This commit simply adds the new process; it does
not yet do anything useful.
---
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/auxprocess.c |   8 +
 src/backend/postmaster/custodian.c  | 383 
 src/backend/postmaster/postmaster.c |  44 ++-
 src/backend/storage/ipc/ipci.c  |   3 +
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/wait_event.c |   3 +
 src/backend/utils/init/miscinit.c   |   3 +
 src/include/miscadmin.h |   3 +
 src/include/postmaster/custodian.h  |  32 ++
 src/include/storage/proc.h  |  11 +-
 src/include/utils/wait_event.h  |   1 +
 12 files changed, 488 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/postmaster/custodian.c
 create mode 100644 src/include/postmaster/custodian.h

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 3a794e54d6..e1e1d1123f 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -18,6 +18,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	custodian.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index 7765d1c83d..c275271c95 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "postmaster/auxprocess.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/custodian.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
@@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 		case CheckpointerProcess:
 			MyBackendType = B_CHECKPOINTER;
 			break;
+		case CustodianProcess:
+			MyBackendType = B_CUSTODIAN;
+			break;
 		case WalWriterProcess:
 			MyBackendType = B_WAL_WRITER;
 			break;
@@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 			CheckpointerMain();
 			proc_exit(1);
 
+		case CustodianProcess:
+			CustodianMain();
+			proc_exit(1);
+
 		case WalWriterProcess:
 			WalWriterMain();
 			proc_exit(1);
diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c
new file mode 100644
index 00..e90f5d0d1f
--- /dev/null
+++ b/src/backend/postmaster/custodian.c
@@ -0,0 +1,383 @@
+/*-
+ *
+ * custodian.c
+ *
+ * The custodian process handles a variety of non-critical tasks that might
+ * otherwise delay startup, checkpointing, etc.  Offloaded tasks should not
+ * be synchronous (e.g., checkpointing shouldn't wait for the custodian to
+ * complete a task before proceeding).  However, tasks can be synchronously
+ * executed when necessary (e.g., single-user mode).  The custodian is not
+ * an essential process and can shutdown quickly when requested.  The
+ * custodian only wakes up to perform its tasks when its latch is set.
+ *
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/postmaster/custodian.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include "libpq/pqsignal.h"
+#include "pgstat.h"
+#include "postmaster/custodian.h"
+#include "postmaster/interrupt.h"
+#include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
+#include "storage/fd.h"
+#include "storage/proc.h"
+#include "storage/procsignal.h"
+#include "storage/smgr.h"
+#include "utils/memutils.h"
+
+static void DoCustodianTasks(bool retry);
+static CustodianTask CustodianGetNextTask(void);
+static void CustodianEnqueueTask(CustodianTask task);
+static const struct cust_task_funcs_entry *LookupCustodianFunctions(CustodianTask task);
+
+typedef struct
+{
+	slock_t		cust_lck;
+
+	CustodianTask task_queue_elems[NUM_CUSTODIAN_TASKS];
+	int			task_queue_head;
+} CustodianShmemStruct;
+
+static CustodianShmemStruct *CustodianShmem;
+
+typedef void (*CustodianTaskFunction) (void);
+typedef void (*CustodianTaskHandleArg) (Datum arg);
+
+struct cust_task_funcs_entry
+{
+	CustodianTask task;
+	CustodianTaskFunction task_func;		/* performs task */
+	CustodianTaskHandleArg handle_arg_func;	/* handles additional info in request */
+};
+
+/*
+ * Add new tasks here.
+ *
+ * task_func is the logic that will be executed via DoCustodianTasks() when the
+ *

Re: Stack overflow issue

2022-08-24 Thread Tom Lane

=?UTF-8?B?0JXQs9C+0YAg0KfQuNC90LTRj9GB0LrQuNC9?=  writes:
> Firstly, we decided to test the regex catalog functions and found 6 of them 
> that lack the check_stach_depth() call.

> zaptreesubs
> markst
> next
> nfatree
> numst
> repeat

I took a closer look at these.  I think the markst, numst, and nfatree
cases are non-issues.  They are recursing on a subre tree that was just
built by parse(), so parse() must have successfully recursed the same
number of levels.  parse() surely has a larger stack frame, and it
does have a stack overflow guard (in subre()), so it would have failed
cleanly before making a data structure that could be hazardous here.
Also, having markst error out would be problematic for the reasons
discussed in its comment, so I'm disinclined to try to add checks
that have no use.

BTW, I wonder why your test didn't notice freesubre()?  But the
same analysis applies there, as does the concern that we can't
just error out.

Likewise, zaptreesubs() can't recurse more levels than cdissect() did,
and that has a stack check, so I'm not very excited about adding
another one there.

I believe that repeat() is a non-issue because (a) the number of
recursion levels in it is limited by DUPMAX, which is generally going
to be 255, or at least not enormous, and (b) it will recurse at most
once before calling dupnfa(), which contains stack checks.

I think next() is a legit issue, although your example doesn't crash
for me.  I suppose that's because my compiler turned the tail recursion
into a loop, and I suggest that we fix it by doing that manually.
(Richard's proposed fix is wrong anyway: we can't just throw elog(ERROR)
in the regex code without creating memory leaks.)

regards, tom lane

Re: [RFC] building postgres with meson - v11

2022-08-24 Thread Andres Freund

Hi,

On 2022-08-24 11:39:06 +0200, Peter Eisentraut wrote:
> I have looked at your branch at 0545eec895:
> 
> 258f6dc0a7 Don't hardcode tmp_check/ as test directory for tap tests
> 8ecc33cf04 Split TESTDIR into TESTLOGDIR and TESTDATADIR
> 
> I think these patches are split up a bit incorrectly.  If you apply
> the first patch by itself, then the output appears in tab_comp_dir/
> directly under the source directory.  And then the second patch moves
> it to tmp_check/tap_comp_dir/.  If there is an intent to apply these
> patches separately somehow, this should be cleaned up.

How is that happening with that version of the patch? The test puts
tap_comp_dir under TESTDIR, and TESTDIR is $(CURDIR)/tmp_check. There was an
earlier version of the patch that was split one more time that did have that
problem, but I don't quite see how that version has it?


> I haven't checked the second patch in detail yet, but it looks like
> the thought was that the first patch is about ready to go.
> 
> 834a40e609 meson: prereq: Extend gendef.pl in preparation for meson
> 
> I'm not qualified to check that in detail, but it looks reasonable
> enough to me.
> 
> See attached patch (0001) for a perlcritic fix.

Thanks.


> 97a0b096e8 meson: prereq: Add src/tools/gen_export.pl
> 
> This produces leading whitespace in the output files that at least on
> darwin wasn't there before.  See attached patch (0002).  This should
> be checked again on other platforms as well.

Hm, to me the indentation as is makes more sense, but ...

> Other than that this looks good.  Attached is a small cosmetic patch (0003).

I wonder if we should rewrite this in python - I chose perl because I thought
we could share it, but as you pointed out, that's not possible, because we
don't want to depend on perl during the autoconf build from a tarball.


> e0a8387660 solaris: Use versioning scripts instead of -Bsymbolic
> 
> This looks like a good idea.  The documentation clearly states that
> -Bsymbolic shouldn't be used, at least not in the way we have been
> doing.  Might as well go ahead with this and give it a whirl on the
> build farm.

Cool. I looked at this because I was confused about getting warnings with
autoconf that I wasn't getting with meson.


> 0545eec895 meson: Add docs
> 
> We should think more about how to arrange the documentation.  We
> probably don't want to copy-and-paste all the introductory and
> requirements information.  I think we can make this initially much
> briefer, like the Windows installation chapter.  For example, instead
> of documenting each setup option again, just mention which ones exist
> and then point (link) to the configure chapter for details.

The current docs, including the windows ones, are already hard to follow. I
think we should take some care to not make the meson bits even more
confusing. Cross referencing left and right seems problematic from that angle.


> I spent a bit of time with the test suites.  I think there is a
> problem in that selecting a test suite directly, like
> 
> meson test -C _build --suite recovery
> 
> doesn't update the tmp_install.  So if this is the first thing you run
> after a build, everything will fail.  Also, if you run this later, the
> tmp_install doesn't get updated, so you're not testing up-to-date
> code.

At the moment creation of the tmp_install is its own test suite. I don't know
if that's the best way, or what the best way is, but that explains that
fact. You can do the above without the issue by specifying
--suite setup --suite recovery.


Greetings,

Andres Freund

Re: Extending outfuncs support to utility statements

2022-08-24 Thread Peter Eisentraut


On 13.07.22 00:38, Tom Lane wrote:

Peter Eisentraut  writes:

This is also needed to be able to store utility statements in (unquoted)
SQL function bodies.  I have some in-progress code for that that I need
to dust off.  IIRC, there are still some nontrivial issues to work
through on the reading side.  I don't have a problem with enabling the
outfuncs side in the meantime.


BTW, I experimented with trying to enable WRITE_READ_PARSE_PLAN_TREES
for utility statements, and found that the immediate problem is that
Constraint and a couple of other node types lack read functions
(they're the ones marked "custom_read_write, no_read" in parsenodes.h).
They have out functions, so writing the inverses seems like it's just
something nobody ever got around to.  Perhaps there are deeper problems
lurking behind that one, though.


Here are patches for that.

v1-0001-Fix-reading-of-most-negative-integer-value-nodes.patch
v1-0002-Fix-reading-of-BitString-nodes.patch

These are some of those lurking problems.

v1-0003-Add-read-support-for-some-missing-raw-parse-nodes.patch

This adds the read support for the missing nodes.

The above patches are candidates for committing.

At this point we have one structural problem left: char * node fields 
output with WRITE_STRING_FIELD() (ultimately outToken()) don't 
distinguish between empty strings and NULL values.  A write/read 
roundtrip ends up as NULL for an empty string.  This shows up in the 
regression tests for commands such as


CREATE TABLESPACE regress_tblspace LOCATION '';
CREATE SUBSCRIPTION regress_addr_sub CONNECTION '' ...

This will need some expansion of the output format to handle this.

v1-0004-XXX-Turn-on-WRITE_READ_PARSE_PLAN_TREES-for-testi.patch
v1-0005-Implement-WRITE_READ_PARSE_PLAN_TREES-for-raw-par.patch
v1-0006-Enable-WRITE_READ_PARSE_PLAN_TREES-of-rewritten-u.patch

This is for testing the above.  Note that in 0005 we need some special 
handling for float values to preserve the full precision across 
write/read.  I suppose this could be unified with the code the preserves 
the location fields when doing write/read checking.


v1-0007-Enable-utility-statements-in-unquoted-SQL-functio.patch

This demonstrates what the ultimate goal is.  A few more tests should be 
added eventually.From 89def573ced57fc320ad191613dac62c8992c27a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 12 Aug 2022 21:15:40 +0200
Subject: [PATCH v1 1/7] Fix reading of most-negative integer value nodes

The main parser checks whether a literal fits into an int when
deciding whether it should be put into an Integer or Float node.  The
parser processes integer literals without signs.  So a most-negative
integer literal will not fit into Integer and will end up as a Float
node.

The node tokenizer did this differently.  It included the sign when
checking whether the literal fit into int.  So a most-negative integer
would indeed fit that way and end up as an Integer node.

In order to preserve the node structure correctly, we need the node
tokenizer to also analyze integer literals without sign.

There are a number of test cases in the regression tests that have a
most-negative integer argument of some utility statement, so this
issue is easily reproduced under WRITE_READ_PARSE_PLAN_TREES.
---
 src/backend/nodes/read.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/nodes/read.c b/src/backend/nodes/read.c
index 4a54996b63..a9cb81b129 100644
--- a/src/backend/nodes/read.c
+++ b/src/backend/nodes/read.c
@@ -267,7 +267,7 @@ nodeTokenType(const char *token, int length)
char   *endptr;
 
errno = 0;
-   (void) strtoint(token, &endptr, 10);
+   (void) strtoint(numptr, &endptr, 10);
if (endptr != token + length || errno == ERANGE)
return T_Float;
return T_Integer;
-- 
2.37.1

From 8a469d0a7195be4ecc65155b82c5836dfb13b920 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 12 Aug 2022 21:16:26 +0200
Subject: [PATCH v1 2/7] Fix reading of BitString nodes

The node tokenizer went out of its way to store BitString node values
without the leading 'b'.  But everything else in the system stores the
leading 'b'.  This would break if a BitString node is
read-printed-read.

Also, the node tokenizer didn't know that BitString node tokens could
also start with 'x'.
---
 src/backend/nodes/read.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/src/backend/nodes/read.c b/src/backend/nodes/read.c
index a9cb81b129..fe84f140ee 100644
--- a/src/backend/nodes/read.c
+++ b/src/backend/nodes/read.c
@@ -288,7 +288,7 @@ nodeTokenType(const char *token, int length)
retval = T_Boolean;
else if (*token == '"' && length > 1 && token[length - 1] == '"')
retval = T_String;
-   else if (*token == 'b')
+   else if (*token == 'b' || *token == 'x')
retval = T_BitString;

Re: ecpg assertion on windows

2022-08-24 Thread Andres Freund

Hi,

On 2022-08-24 00:32:53 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2022-08-24 00:18:27 -0400, Tom Lane wrote:
> >> But if the regression tests are triggering use of uninitialized values, how
> >> could we have failed to detect that?  Either valgrind or unstable behavior
> >> should have found this ages ago.
> 
> > I think it's just different criteria for when to report issues. Valgrind
> > reports uninitialized memory only when there's a conditional branch 
> > depending
> > on it or such. Whereas this seems to trigger when passing an uninitialized
> > value to a function by value, even if it's then not relied upon.
> 
> If the value is not actually relied on, then it's a false positive.

My understanding is that formally speaking passing an undefined value by value
to a function is "relying on it" and undefined behaviour. Hard to believe
it'll cause any compiler go haywire and eat the computer, but ...


> I don't say we shouldn't fix it, because we routinely jump through
> hoops to silence various sorts of functionally-harmless warnings.
> But let's be clear about whether there's a real bug here.

Yea.

Greetings,

Andres Freund

Re: Tracking last scan time

2022-08-24 Thread Dave Page

On Wed, 24 Aug 2022 at 16:03, Bruce Momjian  wrote:

> On Wed, Aug 24, 2022 at 04:01:21PM +0100, Dave Page wrote:
> > On Wed, 24 Aug 2022 at 15:18, Bruce Momjian  wrote:
> >
> > On Tue, Aug 23, 2022 at 10:55:09AM +0100, Dave Page wrote:
> > > Often it is beneficial to review one's schema with a view to
> removing
> > indexes
> > > (and sometimes tables) that are no longer required. It's very
> difficult
> > to
> > > understand when that is the case by looking at the number of scans
> of a
> > > relation as, for example, an index may be used infrequently but
> may be
> > critical
> > > in those times when it is used.
> > >
> > > The attached patch against HEAD adds optional tracking of the last
> scan
> > time
> > > for relations. It updates pg_stat_*_tables with new last_seq_scan
> and
> > > last_idx_scan columns, and pg_stat_*_indexes with a last_idx_scan
> column
> > to
> > > help with this.
> >
> > Would it be simpler to allow the sequential and index scan columns
> to be
> > cleared so you can look later to see if it is non-zero?  Should we
> allow
> >
> > I don't think so, because then stat values wouldn't necessarily
> correlate with
> > each other, and you wouldn't know when any of them were last reset
> unless we
> > started tracking each individual reset. At least now you can see when
> they were
> > all reset, and you know they were reset at the same time.
>
> Yeah, true.  I was more asking if these two columns are in some way
> special or if people would want a more general solution, and if so, is
> that something we want in core Postgres.
>

They're special in the sense that they're the ones you're most likely going
to look at to see how much a relation is used I think (at least, I'd look
at them rather than the tuple counts).

There are certainly other things for which a last usage value may be
useful. Functions/procedures for example, or views. The benefits to
removing unused objects of that type are far, far lower than indexes or
tables of course.

There are other potential use cases for similar timestamps, such as object
creation times (and creating user), but they are more useful for auditing
than monitoring and optimisation.

-- 
Dave Page
Blog: https://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: https://www.enterprisedb.com

Re: Inconsistencies around defining FRONTEND

2022-08-24 Thread Andres Freund

Hi,

On 2022-08-24 10:40:01 -0400, Robert Haas wrote:
> pg_rewind and pg_waldump seem to need the xlogreader code moved to
> src/common, as Andres proposes. I'm not volunteering to tackle that
> right now but I think it might be a good thing to do sometime.

The easier way would be to just keep their current method of building, but do
it as part of src/common.

Greetings,

Andres Freund

Re: Tracking last scan time

2022-08-24 Thread Bruce Momjian

On Wed, Aug 24, 2022 at 04:01:21PM +0100, Dave Page wrote:
> On Wed, 24 Aug 2022 at 15:18, Bruce Momjian  wrote:
> 
> On Tue, Aug 23, 2022 at 10:55:09AM +0100, Dave Page wrote:
> > Often it is beneficial to review one's schema with a view to removing
> indexes
> > (and sometimes tables) that are no longer required. It's very difficult
> to
> > understand when that is the case by looking at the number of scans of a
> > relation as, for example, an index may be used infrequently but may be
> critical
> > in those times when it is used.
> >
> > The attached patch against HEAD adds optional tracking of the last scan
> time
> > for relations. It updates pg_stat_*_tables with new last_seq_scan and
> > last_idx_scan columns, and pg_stat_*_indexes with a last_idx_scan column
> to
> > help with this.
> 
> Would it be simpler to allow the sequential and index scan columns to be
> cleared so you can look later to see if it is non-zero?  Should we allow
> 
> I don't think so, because then stat values wouldn't necessarily correlate with
> each other, and you wouldn't know when any of them were last reset unless we
> started tracking each individual reset. At least now you can see when they 
> were
> all reset, and you know they were reset at the same time.

Yeah, true.  I was more asking if these two columns are in some way
special or if people would want a more general solution, and if so, is
that something we want in core Postgres.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: Tracking last scan time

2022-08-24 Thread Dave Page

On Wed, 24 Aug 2022 at 15:18, Bruce Momjian  wrote:

> On Tue, Aug 23, 2022 at 10:55:09AM +0100, Dave Page wrote:
> > Often it is beneficial to review one's schema with a view to removing
> indexes
> > (and sometimes tables) that are no longer required. It's very difficult
> to
> > understand when that is the case by looking at the number of scans of a
> > relation as, for example, an index may be used infrequently but may be
> critical
> > in those times when it is used.
> >
> > The attached patch against HEAD adds optional tracking of the last scan
> time
> > for relations. It updates pg_stat_*_tables with new last_seq_scan and
> > last_idx_scan columns, and pg_stat_*_indexes with a last_idx_scan column
> to
> > help with this.
>
> Would it be simpler to allow the sequential and index scan columns to be
> cleared so you can look later to see if it is non-zero?  Should we allow
> arbitrary clearing of stat columns?
>

I don't think so, because then stat values wouldn't necessarily correlate
with each other, and you wouldn't know when any of them were last reset
unless we started tracking each individual reset. At least now you can see
when they were all reset, and you know they were reset at the same time.

-- 
Dave Page
Blog: https://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: https://www.enterprisedb.com

Re: Inconsistencies around defining FRONTEND

2022-08-24 Thread Robert Haas

On Tue, Aug 23, 2022 at 9:55 PM Andres Freund  wrote:
> We could, if we make xlogreader.c and the rmgrdesc routines built as part of
> src/common. I don't really see how otherwise.

After a little bit of study, I agree.

It looks to me like -DFRONTEND can be removed from
src/fe_utils/Makefile and probably also src/common/unicode/Makefile
without changing anything else, because the C files in those
directories seem to be frontend-only and they already include
"postgres_fe.h". I think we should go ahead and do that, and also
apply the patch I posted yesterday with whatever bikeshedding seems
appropriate.

It doesn't really seem like we have a plausible alternative to the
current system for src/common or src/port.

pg_rewind and pg_waldump seem to need the xlogreader code moved to
src/common, as Andres proposes. I'm not volunteering to tackle that
right now but I think it might be a good thing to do sometime.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-08-24 Thread Alvaro Herrera

On 2022-Aug-19, David Zhang wrote:

> Should these obviously possible replacement of the standard library function
> "memset" be considered as well? For example, something like the attached one
> which is focusing on the pageinspect extension only.

If you do this, you're creating a potential backpatching hazard.  This
is OK if we get something in return, so a question to ask is whether
there is any benefit in doing it.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Most hackers will be perfectly comfortable conceptualizing users as entropy
 sources, so let's move on."   (Nathaniel Smith)

Re: Strip -mmacosx-version-min options from plperl build

2022-08-24 Thread Andrew Dunstan



On 2022-08-24 We 09:30, Tom Lane wrote:
> Peter Eisentraut  writes:
>> This patch has failed on Cygwin lorikeet:
>> CREATE EXTENSION plperl;
>> +ERROR:  incompatible library 
>> "/home/andrew/bf/root/HEAD/inst/lib/postgresql/plperl.dll": missing 
>> magic block
> Presumably this is caused by not having
>
>> -Wl,--export-all-symbols
> which is something we ought to be injecting for ourselves if we
> aren't doing anything to export the magic-block constant explicitly.
> But I too am confused why we haven't seen this elsewhere.
>
>   


Me too. I note that we have -Wl,--out-implib=libplperl.a but we don't
appear to do anything with libplperl.a.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: replacing role-level NOINHERIT with a grant-level option

2022-08-24 Thread tushar


On 8/24/22 12:28 AM, Robert Haas wrote:

This patch needed to be rebased pretty extensively after commit
ce6b672e4455820a0348214be0da1a024c3f619f. Here is a new version.

Thanks, Robert, I have retested this patch with my previous scenarios
and things look good to me.

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

RE: Perform streaming logical transactions by background workers and parallel apply

2022-08-24 Thread houzj.f...@fujitsu.com

On Thur, Aug 18, 2022 11:44 AM Peter Smith  wrote:
> Here are my review comments for patch v21-0001:
> 
> Note - There are some "general" comments which will result in lots of 
> smaller changes. The subsequent "detailed" review comments have some 
> overlap with these general comments but I expect some will be missed 
> so please search/replace to fix all code related to those general 
> comments.

Thanks for your comments.

> 1. GENERAL - main_worker_pid and replorigin_session_setup
> 
> Quite a few of my subsequent review comments below are related to the 
> somewhat tricky (IMO) change to the code for this area. Here is a 
> summary of some things that can be done to clean/simplify this logic.
> 
> 1a.
> Make the existing replorigin_session_setup function just be a wrapper 
> that delegates to the other function passing the acquired_by as 0.
> This is because in every case but one (in the apply bg worker main) we 
> are always passing 0, and IMO there is no need to spread the messy 
> extra param to places that do not use it.

Not sure about this. I feel interface change should
be fine in major release.

> 17. src/backend/replication/logical/applybgworker.c - 
> LogicalApplyBgworkerMain
> 
> + MyLogicalRepWorker->last_send_time = MyLogicalRepWorker-
> >last_recv_time =
> + MyLogicalRepWorker->reply_time = 0;
> +
> + InitializeApplyWorker();
> 
> Lots of things happen within InitializeApplyWorker(). I think this 
> call deserves at least some comment to say it does lots of common 
> initialization. And same for the other caller or this in the apply 
> main worker.

I feel we can refer to the comments above/in the function InitializeApplyWorker.

> 19.
> + toc = shm_toc_create(PG_LOGICAL_APPLY_SHM_MAGIC,
> dsm_segment_address(seg),
> + segsize);

Since toc is just same as the input address which I think should not be NULL.
I think it's fine to skip the check here like what we did in other codes.

shm_toc_create(uint64 magic, void *address, Size nbytes)
{
shm_toc*toc = (shm_toc *) address;

> 20. src/backend/replication/logical/applybgworker.c - 
> apply_bgworker_setup
> 
> I think this function could be refactored to be cleaner and share more 
> common logic.
> 
> SUGGESTION
> 
> /* Setup shared memory, and attempt launch. */ if 
> (apply_bgworker_setup_dsm(wstate))
> {
> bool launched;
> launched = logicalrep_worker_launch(MyLogicalRepWorker->dbid,
> MySubscription->oid,
> MySubscription->name,
> MyLogicalRepWorker->userid,
> InvalidOid,
> dsm_segment_handle(wstate->dsm_seg));
> if (launched)
> {
> ApplyBgworkersList = lappend(ApplyBgworkersList, wstate); 
> MemoryContextSwitchTo(oldcontext);
> return wstate;
> }
> else
> {
> dsm_detach(wstate->dsm_seg);
> wstate->dsm_seg = NULL;
> }
> }
> 
> pfree(wstate);
> MemoryContextSwitchTo(oldcontext);
> return NULL;

Not sure about this.

> 36. src/backend/replication/logical/tablesync.c - 
> process_syncing_tables
> 
> @@ -589,6 +590,9 @@ process_syncing_tables_for_apply(XLogRecPtr
> current_lsn)
>  void
>  process_syncing_tables(XLogRecPtr current_lsn)  {
> + if (am_apply_bgworker())
> + return;
> +
> 
> Perhaps should be a comment to describe why process_syncing_tables 
> should be skipped for the apply background worker?

I might refactor this function soon, so didn't change for now.
But I will consider it.

> 39. src/backend/replication/logical/worker.c - 
> handle_streamed_transaction
> 
> + /* Not in streaming mode and not in apply background worker. */ if 
> + (!(in_streamed_transaction || am_apply_bgworker()))
>   return false;
> IMO if you wanted to write the comment in that way then the code 
> should have matched it more closely like:
> if (!in_streamed_transaction && !am_apply_bgworker())
> 
> OTOH, if you want to keep the code as-is then the comment should be 
> worded slightly differently.

I feel both the in_streamed_transaction flag and in bgworker indicate that
we are in streaming mode. So it seems the original /* Not in streaming mode */
Should be fine.

> 44. src/backend/replication/logical/worker.c - InitializeApplyWorker
> 
> 
> +/*
> + * Initialize the databse connection, in-memory subscription and 
> +necessary
> + * config options.
> + */
>  void
> -ApplyWorkerMain(Datum main_arg)
> 44b.
> Should there be some more explanation in this comment to say that this 
> is common code for both the appl main workers and apply background 
> workers?
> 
> 44c.
> Following on from #44b, consider renaming this to something like
> CommonApplyWorkerInit() to emphasize it is called from multiple 
> places?

Not sure about this. if we change the bgworker name to parallel
apply worker in the future, it might be worth emphasizing this. So
I will consider this.

> 52.
> 
> +/* Apply background worker setup and interactions */ extern 
> +ApplyBgworkerInfo *apply_bgworker_start(TransactionId xid); extern 
> +ApplyBgworkerInfo *apply_bgworker_find(TransactionId xid); extern 
> +void apply_bgworker_wait_for(ApplyBgworkerInfo *wstate,  
> +ApplyBgwo

Re: Tracking last scan time

2022-08-24 Thread Bruce Momjian

On Tue, Aug 23, 2022 at 10:55:09AM +0100, Dave Page wrote:
> Often it is beneficial to review one's schema with a view to removing indexes
> (and sometimes tables) that are no longer required. It's very difficult to
> understand when that is the case by looking at the number of scans of a
> relation as, for example, an index may be used infrequently but may be 
> critical
> in those times when it is used.
> 
> The attached patch against HEAD adds optional tracking of the last scan time
> for relations. It updates pg_stat_*_tables with new last_seq_scan and
> last_idx_scan columns, and pg_stat_*_indexes with a last_idx_scan column to
> help with this.

Would it be simpler to allow the sequential and index scan columns to be
cleared so you can look later to see if it is non-zero?  Should we allow
arbitrary clearing of stat columns?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: Stack overflow issue

2022-08-24 Thread Tom Lane

Richard Guo  writes:
> Attached adds the checks in these places. But I'm not sure about the
> snowball case. Can we edit src/backend/snowball/libstemmer/*.c directly?

No, that file is generated code, as it says right at the top.

I think most likely we should report this to Snowball upstream
and see what they think is an appropriate fix.

regards, tom lane

Re: shadow variables - pg15 edition

2022-08-24 Thread Justin Pryzby

On Wed, Aug 24, 2022 at 10:47:31PM +1200, David Rowley wrote:
> I was hoping we'd already caught all of the #1s in 421892a19, but I
> caught a few of those in some of your other patches. One you'd done
> another way and some you'd done the rescope but just put it in the
> wrong patch.  The others had not been done yet. I just pushed
> f959bf9a5 to fix those ones.

This fixed pg_get_statisticsobj_worker() but not pg_get_indexdef_worker() nor
pg_get_partkeydef_worker().

(Also, I'd mentioned that my fixes for those deliberately re-used the
outer-scope vars, which isn't what you did, and it's why I didn't include them
with the patch for inner-scope).

> I really think #2s should be done last. I'm not as comfortable with
> the renaming and we might want to discuss tactics on that. We could
> either opt to rename the shadowed or shadowing variable, or both.  If
> we rename the shadowing variable, then pending patches or forward
> patches could use the wrong variable.  If we rename the shadowed
> variable then it's not impossible that backpatching could go wrong
> where the new code intends to reference the outer variable using the
> newly named variable, but when that's backpatched it uses the variable
> with the same name in the inner scope.  Renaming both would make the
> problem more obvious.  I'm not sure which is best.  The answer may
> depend on how many lines the variable is in scope for. If it's just
> for a few lines then the hunk context would conflict and the committer
> would likely notice the issue when resolving the conflict.

Yes, the hope is to limit the change to variables that are only used a couple
times within a few lines.  It's also possible that these will break patches in
development, but that's normal for any change at all.

> I'll study #7 a bit more. My eyes glazed over a bit from doing all
> that analysis, so I might be mistaken about that being a bug.

I reported this last week.
https://www.postgresql.org/message-id/20220819211824.gx26...@telsasoft.com

-- 
Justin

Re: Stack overflow issue

2022-08-24 Thread Tom Lane

=?UTF-8?B?0JXQs9C+0YAg0KfQuNC90LTRj9GB0LrQuNC9?=  writes:
> Therefore, Alexander Lakhin and I decided to deal with this issue and 
> Alexander developed a methodology. We processed src/backend/*/*.c with "clang 
> -emit-llvm  ... | opt -analyze -print-calgraph" to find all the functions 
> that call themselves directly. I checked each of them for features that 
> protect against stack overflows.
> We analyzed 4 catalogs: regex, tsearch, snowball and adt.
> Firstly, we decided to test the regex catalog functions and found 6 of them 
> that lack the check_stach_depth() call.

Nice work!  I wonder if you can make the regex crashes reachable by
reducing the value of max_stack_depth enough that it's hit before
reaching the "regular expression is too complex" limit.

regards, tom lane

Re: Handle infinite recursion in logical replication setup

2022-08-24 Thread vignesh C

On Mon, Aug 22, 2022 at 9:19 AM houzj.f...@fujitsu.com
 wrote:
>
> On Thursday, August 18, 2022 11:13 AM Amit Kapila  
> wrote:
> >
> > On Wed, Aug 17, 2022 at 12:34 PM Peter Smith 
> > wrote:
> > >
> > > On Wed, Aug 17, 2022 at 4:33 PM Amit Kapila 
> > wrote:
> > > >
> > > > On Wed, Aug 17, 2022 at 8:48 AM houzj.f...@fujitsu.com
> > > >  wrote:
> > > > >
> > > > > On Tuesday, August 2, 2022 8:00 PM Amit Kapila
> >  wrote:
> > > > > > On Tue, Jul 26, 2022 at 9:07 AM Amit Kapila
> >  wrote:
> > > > >
> > > > > Thanks for the summary.
> > > > >
> > > > > I think it's fine to make the user use the copy_data option more
> > > > > carefully to prevent duplicate copies by reporting an ERROR.
> > > > >
> > > > > But I also have similar concern with Sawada-san as it's possible
> > > > > for user to receive an ERROR in some unexpected cases.
> > > > >
> > > > > For example I want to build bi-directional setup between two nodes:
> > > > >
> > > > > Node A: TABLE test (has actual data) Node B: TABLE test (empty)
> > > > >
> > > > > Step 1:
> > > > > CREATE PUBLICATION on both Node A and B.
> > > > >
> > > > > Step 2:
> > > > > CREATE SUBSCRIPTION on Node A with (copy_data = on)
> > > > > -- this is fine as there is no data on Node B
> > > > >
> > > > > Step 3:
> > > > > CREATE SUBSCRIPTION on Node B with (copy_data = on)
> > > > > -- this should be fine as user needs to copy data from Node A to
> > > > > Node B,
> > > > > -- but we still report an error for this case.
> > > > >
> > > > > It looks a bit strict to report an ERROR in this case and it seems
> > > > > not easy to avoid this. So, personally, I think it might be better
> > > > > to document the correct steps to build the bi-directional
> > > > > replication and probably also docuemnt the steps to recover if
> > > > > user accidently did duplicate initial copy if not documented yet.
> > > > >
> > > > > In addition, we could also LOG some additional information about
> > > > > the ORIGIN and initial copy which might help user to analyze if 
> > > > > needed.
> > > > >
> > > >
> > > > But why LOG instead of WARNING? I feel in this case there is a
> > > > chance of inconsistent data so a WARNING like "publication "pub1"
> > > > could have data from multiple origins" can be given when the user
> > > > has specified
> > > > options: "copy_data = on, origin = NONE" while creating a
> > > > subscription. We give a WARNING during subscription creation when
> > > > the corresponding publication doesn't exist, eg.
> > > >
> > > > postgres=# create subscription sub1 connection 'dbname = postgres'
> > > > publication pub1;
> > > > WARNING:  publication "pub1" does not exist in the publisher
> > > >
> > > > Then, we can explain in docs how users can avoid data
> > > > inconsistencies while setting up replication.
> > > >
> > >
> > > I was wondering if this copy/origin case really should be a NOTICE.
> > >
> >
> > We usually give NOTICE for some sort of additional implicit information, 
> > e.g.,
> > when we create a slot during CREATE SUBSCRIPTION
> > command: "NOTICE: created replication slot "sub1" on publisher". IMO, this 
> > is
> > likely to be a problem of data inconsistency so I think here we can choose
> > between WARNING and LOG. I prefer WARNING but okay with LOG as well if
> > others feel so. I think we can change this later as well if required. We do 
> > have an
> > option to not do anything and just document it but I feel it is better to 
> > give user
> > some indication of problem here because not everyone reads each update of
> > documentation.
> >
> > Jonathan, Sawada-San, Hou-San, and others, what do you think is the best way
> > to move forward here?
>
> I think it's fine to throw a WARNING in this case given that there is a
> chance of inconsistent data.

Since there was no objections to change it to throw a warning, I have
made the changes for the same.

I have made one change for the documentation patch.
The current steps states that:
1. Create a publication on primary3.
2. Lock table t1 on primary2 and primary3 in EXCLUSIVE mode until the
setup is completed. There is no need to lock table t1 on primary1
because any data changes made will be synchronized while creating the
subscription with copy_data = true.
3. Create a subscription on primary1 to subscribe to primary3.
4. Create a subscription on primary2 to subscribe to primary3.
5. Create a subscription on primary3 to subscribe to primary1. Use
copy_data = true so that the existing table data is copied during
initial sync.
6. Create a subscription on primary3 to subscribe to primary2. Use
copy_data = false because the initial table data would have been
already copied in the previous step.
7. Now the replication setup between primaries primary1, primary2 and
primary3 is complete. Incremental changes made on any primary will be
replicated to the other two primaries.

We found a problem with the proposed steps. Here the problem is that
we lock table t1 on primary2/primary3 in step2, then we create

RE: Perform streaming logical transactions by background workers and parallel apply

2022-08-24 Thread houzj.f...@fujitsu.com

On Mon, Aug 22, 2022 20:50 PM Kuroda, Hayato/黒田 隼人  
wrote:
> Dear Wang,
> 
> Thank you for updating the patch! Followings are comments about 
> v23-0001 and v23-0005.

Thanks for your comments.

> v23-0001
> 
> 01. logical-replication.sgml
> 
> +  
> +   When the streaming mode is parallel, the finish LSN of
> +   failed transactions may not be logged. In that case, it may be necessary 
> to
> +   change the streaming mode to on and cause the same
> +   conflicts again so the finish LSN of the failed transaction will be 
> written
> +   to the server log. For the usage of finish LSN, please refer to  +   linkend="sql-altersubscription">ALTER SUBSCRIPTION ...
> +   SKIP.
> +  
> 
> I was not sure about streaming='off' mode. Is there any reasons that 
> only ON mode is focused?

Added off.

> 02. protocol.sgml
> 
> +  
> +   Int64 (XLogRecPtr)
> +   
> +
> + The LSN of the abort. This field is available since protocol version
> + 4.
> +
> +   
> +  
> +
> +  
> +   Int64 (TimestampTz)
> +   
> +
> + Abort timestamp of the transaction. The value is in number
> + of microseconds since PostgreSQL epoch (2000-01-01). This field is
> + available since protocol version 4.
> +
> +   
> +  
> +
> 
> It seems that changes are in the variablelist for stream commit.
> I think these are included in the stream abort message, so it should be moved.

Fixed.

> 03. decode.c
> 
> -   ReorderBufferForget(ctx->reorder, 
> parsed->subxacts[i], buf-
> >origptr);
> +   ReorderBufferForget(ctx->reorder, 
> + parsed->subxacts[i], buf-
> >origptr,
> +   
> + commit_time);
> }
> -   ReorderBufferForget(ctx->reorder, xid, buf->origptr);
> +   ReorderBufferForget(ctx->reorder, xid, buf->origptr, 
> + commit_time);
> 
> 'commit_time' has been passed as argument 'abort_time', I think it may 
> be confusing.
> How about adding a comment above, like:
> "In case of streamed transactions, they are regarded as being aborted 
> at commit_time"

IIRC, I free the comment above the loop might be more clear about this,
but I will think about it again. 

> 04. launcher.c
> 
> 04.a
> 
> +   worker->main_worker_pid = is_subworker ? MyProcPid : 0;
> 
> You can use InvalidPid instead of 0.
> （I thought pid should be represented by the datatype pid_t, but in 
> some codes it is defined as int...)
> 
> 04.b
> 
> +   worker->main_worker_pid = 0;
> 
> You can use InvalidPid instead of 0, same as above.

Improved

> 05. origin.c
> 
>  void
> -replorigin_session_setup(RepOriginId node)
> +replorigin_session_setup(RepOriginId node, int acquired_by)
> 
> IIUC the same slot can be used only when the apply main worker has 
> already acquired the slot and the subworker for the same subscription 
> tries to acquire, but it cannot understand from comments.
> How about adding comments, or an assertion that acquired_by is same as 
> session_replication_state->acquired_by ?
> Moreover acquired_by should be compared with InvalidPid, based on 
> above comments.

I think we have tried to check if 'acquired_by' and acquired_by of
slot are equal inside this function.

I am not sure if it's a good idea to use InvalidPid here ,as we set
session_replication_state->acquired_by(int) to 0(instead of -1) to indicate
that no worker acquire it.

> 06. proto.c
> 
>  void
>  logicalrep_write_stream_abort(StringInfo out, TransactionId xid,
> - TransactionId 
> subxid)
> + ReorderBufferTXN 
> *txn, XLogRecPtr abort_lsn,
> + bool 
> + write_abort_lsn
> 
> I think write_abort_lsn may be not needed, because abort_lsn can be 
> used for controlling whether abort_XXX fields should be filled or not.

I think if the subscriber's version is lower than 16 (which won't handle the 
abort_XXX fields),
then we don't need to send the abort_XXX fields either.

> 07. worker.c
> 
> +/*
> + * The number of changes during one streaming block (only for apply
> background
> + * workers)
> + */
> +static uint32 nchanges = 0;
> 
> This variable is used only by the main apply worker, so the comment 
> seems not correct.
> How about "...(only for SUBSTREAM_PARALLEL case)"?

The previous comments seemed a bit confusing. I tried to improve this comments 
to this:
```
The number of changes sent to apply background workers during one streaming 
block.
```

> v23-0005
> 
> 08. monitoring.sgml
> 
> I cannot decide which option proposed in [1] is better, but followings 
> descriptions are needed in both cases.
> (In [2] I had intended to propose something like option 2)
> 
> 08.a
> 
> You can add a description that the field 'relid' will be NULL even for 
> apply background worker.
> 
> 08.b
> 
>

Re: Strip -mmacosx-version-min options from plperl build

2022-08-24 Thread Tom Lane

Peter Eisentraut  writes:
> This patch has failed on Cygwin lorikeet:
> CREATE EXTENSION plperl;
> +ERROR:  incompatible library 
> "/home/andrew/bf/root/HEAD/inst/lib/postgresql/plperl.dll": missing 
> magic block

Presumably this is caused by not having

> -Wl,--export-all-symbols

which is something we ought to be injecting for ourselves if we
aren't doing anything to export the magic-block constant explicitly.
But I too am confused why we haven't seen this elsewhere.

regards, tom lane

Re: Tracking last scan time

2022-08-24 Thread Dave Page

Hi

On Tue, 23 Aug 2022 at 13:07, Greg Stark  wrote:

> On Tue, 23 Aug 2022 at 11:00, Dave Page  wrote:
> >
> > Often it is beneficial to review one's schema with a view to removing
> indexes (and sometimes tables) that are no longer required. It's very
> difficult to understand when that is the case by looking at the number of
> scans of a relation as, for example, an index may be used infrequently but
> may be critical in those times when it is used.
>
> I think this is easy to answer in a prometheus/datadog/etc world since
> you can consult the history of the count to see when it was last
> incremented. (Or do effectively that continously).
>

Yes. But not every PostgreSQL instance is monitored in that way.


>
> I guess that just reinforces the idea that it should be optional.
> Perhaps there's room for some sort of general feature for controlling
> various time series aggregates like max() and min() sum() or, uhm,
> timeoflastchange() on whatever stats you want. That would let us
> remove a bunch of stuff from pg_stat_statements and let users turn on
> just the ones they want. And also let users enable things like time of
> last rollback or conflict etc. But that's just something to think
> about down the road.
>

It's certainly an interesting idea.


>
> > The attached patch against HEAD adds optional tracking of the last scan
> time for relations. It updates pg_stat_*_tables with new last_seq_scan and
> last_idx_scan columns, and pg_stat_*_indexes with a last_idx_scan column to
> help with this.
> >
> > Due to the use of gettimeofday(), those values are only maintained if a
> new GUC, track_scans, is set to on. By default, it is off.
>
> Bikeshedding warning -- "track_scans" could equally apply to almost
> any stats about scans. I think the really relevant thing here is the
> times, not the scans. I think the GUC should be "track_scan_times". Or
> could that still be confused with scan durations? Maybe
> "track_scan_timestamps"?
>

The latter seems reasonable.


>
> You could maybe make the gettimeofday cheaper by doing it less often.
> Like, skipping the increment if the old timestamp is newer than 1s
> before the transaction start time (I think that's available free if
> some other guc is enabled but I don't recall). Or isn't this cb
> normally happening after transaction end? So xactStopTimestamp might
> be available already?
>

Something like:

 if (pgstat_track_scan_timestamps && lstats->t_counts.t_numscans &&
tabentry->lastscan + USECS_PER_SEC <
GetCurrentTransactionStopTimestamp())
tabentry->lastscan = GetCurrentTimestamp();

?

-- 
Dave Page
Blog: https://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: https://www.enterprisedb.com

Re: Logical replication support for generic wal record

2022-08-24 Thread Natarajan R

Thanks, I'll check it out.

On Wed, 24 Aug 2022 at 18:00, Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Wed, Aug 24, 2022 at 5:12 PM Natarajan R  wrote:
> >
> >
> > On Mon, 22 Aug 2022 at 12:16, Bharath Rupireddy <
> bharath.rupireddyforpostg...@gmail.com> wrote:
> >>
> >> On Mon, Aug 22, 2022 at 11:59 AM Natarajan R 
> wrote:
> >> >
> >> > Hi All,
> >> >
> >> > I am writing a postgres extension which writes only generic wal
> record, but this wal is not recognized by logical replication decoder. I
> have a basic understanding of how logical replication(COPY command for
> initial sync, wal replica for final sync) works, can you please tell us a
> way to support this?
> >>
> >> "Generic" resource manager doesn't have a decoding API, see [1], which
> >> means that the generic WAL records will not get decoded.
> >>
> >> Can you be more specific about the use-case? Why use only "Generic"
> >> type WAL records? Why not use "LogicalMessage" type WAL records if you
> >> want your WAL records to be decoded?
> >>
> > I am writing an extension which implements postgres table access method
> interface[1] with master-slave architecture, with the help of doc[1] i
> decided to go with generic_wal to achieve crash_safety and also for
> streaming replication. It seems like generic_wal couldn't help with logical
> replication..
> > But, I don't have knowledge on "LogicalMessage" Resource Manager, need
> to explore about it.
> >
> >
> > [1] https://www.postgresql.org/docs/current/tableam.html
>
> I think the 'Custom WAL Resource Managers' feature would serve the
> exact same purpose (as also pointed out by Amit upthread), you may
> want to explore that feature [1]. Here's a sample extension using that
> feature [2], for a different purpose though, but helps to understand
> the usage of custom WAL rmgrs.
>
> I notice that the docs [3] don't mention the feature in the right
> place, IMO, it can be improved to refer to custom-rmgr.sgml page,
> cc-ing Jeff Davis for his thoughts. This would help developers quickly
> try the feature out and saves time.
>
> [1]
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=5c279a6d350205cc98f91fb8e1d3e4442a6b25d1
> [2] https://github.com/BRupireddy/pg_synthesize_wal
> [2] https://www.postgresql.org/docs/devel/tableam.html
> "For crash safety, an AM can use postgres' WAL, or a custom
> implementation. If WAL is chosen, either Generic WAL Records can be
> used, or a new type of WAL records can be implemented. Generic WAL
> Records are easy, but imply higher WAL volume. Implementation of a new
> type of WAL record currently requires modifications to core code
> (specifically, src/include/access/rmgrlist.h)."
>
> --
> Bharath Rupireddy
> RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
>

Re: Stack overflow issue

2022-08-24 Thread mahendrakar s

Hi Richard,

Patch is looking good to me. Would request others to take a look at it as
well.

Thanks,
Mahendrakar.

On Wed, 24 Aug 2022 at 17:24, Richard Guo  wrote:

>
> On Wed, Aug 24, 2022 at 7:12 PM Richard Guo 
> wrote:
>
>>
>> On Wed, Aug 24, 2022 at 6:49 PM Alvaro Herrera 
>> wrote:
>>
>>> On 2022-Aug-24, mahendrakar s wrote:
>>>
>>> > Hi,
>>> > Can we have a parameter to control the recursion depth in these cases
>>> to
>>> > avoid crashes?
>>>
>>> We already have one (max_stack_depth).  The problem is lack of calling
>>> the control function in a few places.
>>
>>
>> Thanks Egor and Alexander for the work! I think we can just add
>> check_stack_depth checks in these cases.
>>
>
> Attached adds the checks in these places. But I'm not sure about the
> snowball case. Can we edit src/backend/snowball/libstemmer/*.c directly?
>
> Thanks
> Richard
>

Re: Logical replication support for generic wal record

2022-08-24 Thread Bharath Rupireddy

On Wed, Aug 24, 2022 at 5:12 PM Natarajan R  wrote:
>
>
> On Mon, 22 Aug 2022 at 12:16, Bharath Rupireddy 
>  wrote:
>>
>> On Mon, Aug 22, 2022 at 11:59 AM Natarajan R  wrote:
>> >
>> > Hi All,
>> >
>> > I am writing a postgres extension which writes only generic wal record, 
>> > but this wal is not recognized by logical replication decoder. I have a 
>> > basic understanding of how logical replication(COPY command for initial 
>> > sync, wal replica for final sync) works, can you please tell us a way to 
>> > support this?
>>
>> "Generic" resource manager doesn't have a decoding API, see [1], which
>> means that the generic WAL records will not get decoded.
>>
>> Can you be more specific about the use-case? Why use only "Generic"
>> type WAL records? Why not use "LogicalMessage" type WAL records if you
>> want your WAL records to be decoded?
>>
> I am writing an extension which implements postgres table access method 
> interface[1] with master-slave architecture, with the help of doc[1] i 
> decided to go with generic_wal to achieve crash_safety and also for streaming 
> replication. It seems like generic_wal couldn't help with logical 
> replication..
> But, I don't have knowledge on "LogicalMessage" Resource Manager, need to 
> explore about it.
>
>
> [1] https://www.postgresql.org/docs/current/tableam.html

I think the 'Custom WAL Resource Managers' feature would serve the
exact same purpose (as also pointed out by Amit upthread), you may
want to explore that feature [1]. Here's a sample extension using that
feature [2], for a different purpose though, but helps to understand
the usage of custom WAL rmgrs.

I notice that the docs [3] don't mention the feature in the right
place, IMO, it can be improved to refer to custom-rmgr.sgml page,
cc-ing Jeff Davis for his thoughts. This would help developers quickly
try the feature out and saves time.

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=5c279a6d350205cc98f91fb8e1d3e4442a6b25d1
[2] https://github.com/BRupireddy/pg_synthesize_wal
[2] https://www.postgresql.org/docs/devel/tableam.html
"For crash safety, an AM can use postgres' WAL, or a custom
implementation. If WAL is chosen, either Generic WAL Records can be
used, or a new type of WAL records can be implemented. Generic WAL
Records are easy, but imply higher WAL volume. Implementation of a new
type of WAL record currently requires modifications to core code
(specifically, src/include/access/rmgrlist.h)."

--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

Re: Change pfree to accept NULL argument

2022-08-24 Thread David Rowley

On Wed, 24 Aug 2022 at 23:07, David Rowley  wrote:
> One counter argument to that is for cases like list_free_deep().
> Right now if I'm not mistaken there's a bug (which I just noticed) in
> list_free_private() that would trigger if you have a List of Lists and
> one of the inner Lists is NIL.  The code in list_free_private() just
> seems to go off and pfree() whatever is stored in the element, which I
> think would crash if it found a NIL List.  If pfree() was to handle
> NULLs at least that wouldn't have been a crash, but in reality, we
> should probably fix that with recursion if we detect the element IsA
> List type. If we don't use recursion, then the "free" does not seem
> very "deep". (Or maybe it's too late to make it go deeper as it might
> break existing code.)

Hmm, that was a false alarm. It seems list_free_deep() can't really
handle freeing sublists as the list elements might be non-Node types,
which of course have no node tag, so we can't check for sub-Lists.

David

Re: standby promotion can create unreadable WAL

2022-08-24 Thread Robert Haas

On Wed, Aug 24, 2022 at 4:40 AM Kyotaro Horiguchi
 wrote:
> Me, too.  There are two ways to deal with this, I think. One is start
> writing new records from abortedContRecPtr as if it were not
> exist. Another is copying WAL file up to missingContRecPtr. Since the
> first segment of the new timeline doesn't need to be identcal to the
> last one of the previous timeline, so I think the former way is
> cleaner.

I agree, mostly because that gets us back to the way all of this
worked before the contrecord stuff went in. This case wasn't broken
then, because the breakage had to do with it being unsafe to back up
and rewrite WAL that might have already been shipped someplace, and
that's not an issue when we're first creating a totally new timeline.
It seems safer to me to go back to the way this worked before the fix
went in than to change over to a new system.

Honestly, in a vacuum, I might prefer to get rid of this thing where
the WAL segment gets copied over from the old timeline to the new, and
just always switch TLIs at segment boundaries. And while we're at it,
I'd also like TLIs to be 64-bit random numbers instead of integers
assigned in ascending order. But those kinds of design changes seem
best left for a future master-only development effort. Here, we need
to back-patch the fix, and should try to just unbreak what's currently
broken.

> XLogInitNewTimeline or near seems to be be the place for fix
> to me. Clearing abortedRecPtr and missingContrecPtr just before the
> call to findNewestTimeLine will work?

Hmm, yeah, that seems like a good approach.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Stack overflow issue

2022-08-24 Thread Richard Guo

On Wed, Aug 24, 2022 at 7:12 PM Richard Guo  wrote:

>
> On Wed, Aug 24, 2022 at 6:49 PM Alvaro Herrera 
> wrote:
>
>> On 2022-Aug-24, mahendrakar s wrote:
>>
>> > Hi,
>> > Can we have a parameter to control the recursion depth in these cases to
>> > avoid crashes?
>>
>> We already have one (max_stack_depth).  The problem is lack of calling
>> the control function in a few places.
>
>
> Thanks Egor and Alexander for the work! I think we can just add
> check_stack_depth checks in these cases.
>

Attached adds the checks in these places. But I'm not sure about the
snowball case. Can we edit src/backend/snowball/libstemmer/*.c directly?

Thanks
Richard


v1-0001-add-check_stack_depth-in-more-places.patch
Description: Binary data

Re: Fix typo in func.sgml

2022-08-24 Thread Shinya Kato


On 2022-08-24 20:47, David Rowley wrote:
On Wed, 24 Aug 2022 at 22:44, Shinya Kato 
 wrote:

I've found a duplicate "a a" in func.sgml and fixed it.
Patch is attached.


Thanks. Pushed.

David

Thanks for pushing!

--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Fix typo in func.sgml

2022-08-24 Thread David Rowley

On Wed, 24 Aug 2022 at 22:44, Shinya Kato  wrote:
> I've found a duplicate "a a" in func.sgml and fixed it.
> Patch is attached.

Thanks. Pushed.

David

Re: Logical replication support for generic wal record

2022-08-24 Thread Natarajan R

On Mon, 22 Aug 2022 at 12:16, Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Mon, Aug 22, 2022 at 11:59 AM Natarajan R 
> wrote:
> >
> > Hi All,
> >
> > I am writing a postgres extension which writes only generic wal record,
> but this wal is not recognized by logical replication decoder. I have a
> basic understanding of how logical replication(COPY command for initial
> sync, wal replica for final sync) works, can you please tell us a way to
> support this?
>
> "Generic" resource manager doesn't have a decoding API, see [1], which
> means that the generic WAL records will not get decoded.
>
> Can you be more specific about the use-case? Why use only "Generic"
> type WAL records? Why not use "LogicalMessage" type WAL records if you
> want your WAL records to be decoded?
>
> I am writing an extension which implements postgres table access method
interface[1] with master-slave architecture, with the help of doc[1] i
decided to go with generic_wal to achieve crash_safety and also for
streaming replication. It seems like generic_wal couldn't help with logical
replication..
But, I don't have knowledge on "LogicalMessage" Resource Manager, need to
explore about it.


[1] https://www.postgresql.org/docs/current/tableam.html

Re: Stack overflow issue

2022-08-24 Thread Richard Guo

On Wed, Aug 24, 2022 at 6:49 PM Alvaro Herrera 
wrote:

> On 2022-Aug-24, mahendrakar s wrote:
>
> > Hi,
> > Can we have a parameter to control the recursion depth in these cases to
> > avoid crashes?
>
> We already have one (max_stack_depth).  The problem is lack of calling
> the control function in a few places.

Thanks Egor and Alexander for the work! I think we can just add
check_stack_depth checks in these cases.

Thanks
Richard

Re: Change pfree to accept NULL argument

2022-08-24 Thread David Rowley

On Tue, 23 Aug 2022 at 13:17, David Rowley  wrote:
> I think making pfree() accept NULL is a bad idea.

One counter argument to that is for cases like list_free_deep().
Right now if I'm not mistaken there's a bug (which I just noticed) in
list_free_private() that would trigger if you have a List of Lists and
one of the inner Lists is NIL.  The code in list_free_private() just
seems to go off and pfree() whatever is stored in the element, which I
think would crash if it found a NIL List.  If pfree() was to handle
NULLs at least that wouldn't have been a crash, but in reality, we
should probably fix that with recursion if we detect the element IsA
List type. If we don't use recursion, then the "free" does not seem
very "deep". (Or maybe it's too late to make it go deeper as it might
break existing code.)

David

Re: Stack overflow issue

2022-08-24 Thread mahendrakar s

Thanks.

On Wed, 24 Aug, 2022, 4:19 pm Alvaro Herrera, 
wrote:

> On 2022-Aug-24, mahendrakar s wrote:
>
> > Hi,
> > Can we have a parameter to control the recursion depth in these cases to
> > avoid crashes?
>
> We already have one (max_stack_depth).  The problem is lack of calling
> the control function in a few places.
>
> --
> Álvaro Herrera   48°01'N 7°57'E  —
> https://www.EnterpriseDB.com/
>

Re: Stack overflow issue

2022-08-24 Thread Alvaro Herrera

On 2022-Aug-24, mahendrakar s wrote:

> Hi,
> Can we have a parameter to control the recursion depth in these cases to
> avoid crashes?

We already have one (max_stack_depth).  The problem is lack of calling
the control function in a few places.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: [BUG] parenting a PK constraint to a self-FK one (Was: Self FK oddity when attaching a partition)

2022-08-24 Thread Alvaro Herrera

On 2022-Aug-24, Jehan-Guillaume de Rorthais wrote:

> I was naively wondering about such a patch, but was worrying about potential
> side effects on ATExecAttachPartitionIdx(), DetachPartitionFinalize() and
> DefineIndex() where I didn't had a single glance. Did you had a look?

No.  But AFAIR all the code there is supposed to worry about unique
constraints and PK only, not FKs.  So if something changes, then most 
likely it was wrong to begin with.

> I did a quick ATTACH + DETACH test, and it seems DETACH partly fails with its
> housecleaning:

Ugh.  More fixes required, then.

> Looking for few minutes in ATExecDetachPartitionFinalize(), it seems it only
> support removing the parental link on FK, not to clean the FKs added during 
> the
> ATTACH DDL anyway. That explains the FK child1->parent left behind. But in
> fact, this let me wonder if this part of the code ever considered implication
> of self-FK during the ATTACH and DETACH process?

No, or at least I don't remember thinking about self-referencing FKs.
If there are no tests for it, then that's likely what happened.

> Why in the first place TWO FK are created during the ATTACH DDL?

That's probably a bug too.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"The eagle never lost so much time, as
when he submitted to learn of the crow." (William Blake)

Re: shadow variables - pg15 edition

2022-08-24 Thread David Rowley

On Wed, 24 Aug 2022 at 14:39, Justin Pryzby  wrote:
> Attached are half of the remainder of what I've written, ready for review.

Thanks for the patches.

I started to do some analysis of the remaining warnings and put them
in the attached spreadsheet. I put each of the remaining warnings into
a category of how I think they should be fixed.

These categories are:

1. "Rescope" (adjust scope of outer variable to move it into a deeper scope)
2. "Rename" (a variable needs to be renamed)
3. "RenameOrScope" (a variable needs renamed or we need to something
more extreme to rescope)
4. "Repurpose" (variables have the same purpose and may as well use
the same variable)
5. "Refactor" (fix the code to make it better)
6. "Remove" (variable is not needed)

There's also:
7. "Bug?" (might be a bug)
8. "?"  (I don't know)

I was hoping we'd already caught all of the #1s in 421892a19, but I
caught a few of those in some of your other patches. One you'd done
another way and some you'd done the rescope but just put it in the
wrong patch.  The others had not been done yet. I just pushed
f959bf9a5 to fix those ones.

I really think #2s should be done last. I'm not as comfortable with
the renaming and we might want to discuss tactics on that. We could
either opt to rename the shadowed or shadowing variable, or both.  If
we rename the shadowing variable, then pending patches or forward
patches could use the wrong variable.  If we rename the shadowed
variable then it's not impossible that backpatching could go wrong
where the new code intends to reference the outer variable using the
newly named variable, but when that's backpatched it uses the variable
with the same name in the inner scope.  Renaming both would make the
problem more obvious.  I'm not sure which is best.  The answer may
depend on how many lines the variable is in scope for. If it's just
for a few lines then the hunk context would conflict and the committer
would likely notice the issue when resolving the conflict.

For #3, I just couldn't decide the best fix.  Many of these could be
moved into an inner scope, but it would require indenting a large
amount of code, e.g. in a switch() statement's "case:" to allow
variables to be declared within the case.

I think probably #4 should be next to do (maybe after #5)

I have some ideas on how to fix the two #5s, so I'm going to go and do that now.

There's only 1 #6. I'm not so sure on that yet. The variable being
assigned to the variable is the current time and I'm not sure if we
can reuse the existing variable or not as time may have moved on
sufficiently.

I'll study #7 a bit more. My eyes glazed over a bit from doing all
that analysis, so I might be mistaken about that being a bug.

For #8s.  These are the PG_TRY() ones. I see you had a go at fixing
that by moving the nested PG_TRY()s to a helper function. I don't
think that's a good fix. If we were to ever consider making
-Wshadow=compatible-local in a standard build, then we'd basically be
saying that nested PG_TRYs are not allowed. I don't think that'll fly.
I'd rather find a better way to fix those.  I see we can't make use of
##__LINE__ in the variable name since PG_TRY()'s friends use the
variables too and they'd be on a different line. We maybe could have
an "ident" parameter in the macro that we ##ident onto the variables
names, but that would break existing code.

> The first patch removes 2ndary, "inner" declarations, where that seems
> reasonably safe and consistent with existing practice (and probably what the
> original authors intended or would have written).

Would you be able to write a patch for #4. I'll do #5 now. You could
do a draft patch for #2 as well, but I think it should be committed
last, if we decide it's a good move to make. It may be worth having
the discussion about if we actually want to run
-Wshadow=compatible-local as a standard build flag before we rename
anything.

David

shadow_analysis.ods
Description: application/vnd.oasis.opendocument.spreadsheet

Fix typo in func.sgml

2022-08-24 Thread Shinya Kato


Hi hackers,

I've found a duplicate "a a" in func.sgml and fixed it.
Patch is attached.


--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8dd63c0455..f87afefeae 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18033,7 +18033,7 @@ FROM
 or array, but if it is CONDITIONAL it will not be
 applied to a single array or object. UNCONDITIONAL
 is the default.
-If the result is a a scalar string, by default the value returned will have
+If the result is a scalar string, by default the value returned will have
 surrounding quotes making it a valid JSON value. However, this behavior
 is reversed if OMIT QUOTES is specified.
 The ON ERROR and ON EMPTY

Re: [BUG] parenting a PK constraint to a self-FK one (Was: Self FK oddity when attaching a partition)

2022-08-24 Thread Jehan-Guillaume de Rorthais

On Tue, 23 Aug 2022 18:30:06 +0200
Alvaro Herrera  wrote:

> On 2022-Aug-23, Jehan-Guillaume de Rorthais wrote:
> 
> Hi,
> 
> [...]
> 
> > However, it seems get_relation_idx_constraint_oid(), introduced in
> > eb7ed3f3063, assume there could be only ONE constraint depending to an
> > index. But in fact, multiple constraints can rely on the same index, eg.:
> > the PK and a self referencing FK. In consequence, when looking for a
> > constraint depending on an index for the given relation, either the FK or a
> > PK can appears first depending on various conditions. It is then possible
> > to trick it make a FK constraint a parent of a PK...  
> 
> Hmm, wow, that sounds extremely stupid.  I think a sufficient fix might
> be to have get_relation_idx_constraint_oid ignore any constraints that
> are not unique or primary keys.  I tried your scenario with the attached
> and it seems to work correctly.  Can you confirm?  (I only ran the
> pg_regress tests, not anything else for now.)
>
> If this is OK, we should make this API quirkiness very explicit in the
> comments, so the patch needs to be a few lines larger in order to be
> committable.  Also, perhaps the check should be that contype equals
> either primary or unique, rather than it doesn't equal foreign.

I was naively wondering about such a patch, but was worrying about potential
side effects on ATExecAttachPartitionIdx(), DetachPartitionFinalize() and
DefineIndex() where I didn't had a single glance. Did you had a look?

I did a quick ATTACH + DETACH test, and it seems DETACH partly fails with its
housecleaning:

  DROP TABLE IF EXISTS parent, child1;
  
  CREATE TABLE parent (
  id bigint NOT NULL default 1,
  no_part smallint NOT NULL,
  id_abc bigint,
  FOREIGN KEY (id_abc, no_part) REFERENCES parent(id, no_part)
  ON UPDATE RESTRICT ON DELETE RESTRICT,
  PRIMARY KEY (id, no_part)
  )
  PARTITION BY LIST (no_part);
  
  CREATE TABLE child1 (
  id bigint NOT NULL default 1,
  no_part smallint NOT NULL,
  id_abc bigint,
  PRIMARY KEY (id, no_part),
  CONSTRAINT child1 CHECK ((no_part = 1))
  );

  \C 'Before ATTACH'
  SELECT oid, conname, conparentid, conrelid, confrelid
  FROM pg_constraint
  WHERE conrelid in ('parent'::regclass, 'child1'::regclass)
  ORDER BY 1;

  ALTER TABLE parent ATTACH PARTITION child1 FOR VALUES IN ('1');

  \C 'After ATTACH'
  SELECT oid, conname, conparentid, conrelid, confrelid
  FROM pg_constraint
  WHERE conrelid in ('parent'::regclass, 'child1'::regclass)
  ORDER BY 1;

  ALTER TABLE parent DETACH PARTITION child1;

  \C 'After DETACH'
  SELECT oid, conname, conparentid, conrelid, confrelid
  FROM pg_constraint
  WHERE conrelid in ('parent'::regclass, 'child1'::regclass)
  ORDER BY 1;


Before ATTACH
oid  |  conname   | conparentid | conrelid | confrelid 
  ---++-+--+---
   24711 | parent_pkey|   0 |24706 | 0
   24712 | parent_id_abc_no_part_fkey |   0 |24706 | 24706
   24721 | child1 |   0 |24717 | 0
   24723 | child1_pkey|   0 |24717 | 0
  (4 rows)
  
 After ATTACH
oid  |   conname   | conparentid | conrelid | confrelid 
  ---+-+-+--+---
   24711 | parent_pkey |   0 |24706 | 0
   24712 | parent_id_abc_no_part_fkey  |   0 |24706 | 24706
   24721 | child1  |   0 |24717 | 0
   24723 | child1_pkey |   24711 |24717 | 0
   24724 | parent_id_abc_no_part_fkey1 |   24712 |24706 | 24717
   24727 | parent_id_abc_no_part_fkey  |   24712 |24717 | 24706
  (6 rows)
  
After DETACH
oid  |  conname   | conparentid | conrelid | confrelid 
  ---++-+--+---
   24711 | parent_pkey|   0 |24706 | 0
   24712 | parent_id_abc_no_part_fkey |   0 |24706 | 24706
   24721 | child1 |   0 |24717 | 0
   24723 | child1_pkey|   0 |24717 | 0
   24727 | parent_id_abc_no_part_fkey |   0 |24717 | 24706
  (5 rows)

Looking for few minutes in ATExecDetachPartitionFinalize(), it seems it only
support removing the parental link on FK, not to clean the FKs added during the
ATTACH DDL anyway. That explains the FK child1->parent left behind. But in
fact, this let me wonder if this part of the code ever considered implication
of self-FK during the ATTACH and DETACH process? Why in the first place TWO FK
are created during the ATTACH DDL?

Regards,

Re: Making Vars outer-join aware

2022-08-24 Thread Richard Guo

On Sun, Aug 21, 2022 at 6:52 AM Tom Lane  wrote:

> What I'm thinking we should do about this, once we detect that
> this identity is applicable, is to generate *both* forms of Pbc,
> either adding or removing the varnullingrels bits depending on
> which form we got from the parser.  Then, when we come to forming
> join paths, use the appropriate variant depending on which join
> order we're considering.  build_join_rel() already has the concept
> that the join restriction list varies depending on which input
> relations we're trying to join, so this doesn't require any
> fundamental restructuring -- only a way to identify which
> RestrictInfos to use or ignore for a particular join.  That will
> probably require some new field in RestrictInfo, but I'm not
> fussed about that because there are other fields we'll be able
> to remove as this work progresses.

Do you mean we generate two RestrictInfos for Pbc in the case of
identity 3, one with varnullingrels and one without varnullingrels, and
choose the appropriate one when forming join paths? Do we need to also
generate two SpecialJoinInfos for the B/C join in the first order, with
and without the A/B join in its min_lefthand?

Thanks
Richard

Re: Strip -mmacosx-version-min options from plperl build

2022-08-24 Thread Peter Eisentraut


On 19.08.22 09:12, Peter Eisentraut wrote:
After analyzing the source code of ExtUtils::Embed's ldopts, I think we 
can also do this by subtracting $Config{ldflags}, since


my $linkage = "$ccdlflags $ldflags @archives $ld_or_bs";

and we really just want the $ld_or_bs part. (@archives should be empty 
for our uses.)


This would get rid of -mmacosx-version-min and -arch and all the things 
you showed, including -L/opt/local/lib, which is probably there so that 
the build of Perl itself could look there for things, but we don't need it.


This patch has failed on Cygwin lorikeet:

Before:

checking for flags to link embedded Perl...
  -Wl,--enable-auto-import -Wl,--export-all-symbols 
-Wl,--enable-auto-image-base -fstack-protector-strong 
-L/usr/lib/perl5/5.32/x86_64-cygwin-threads/CORE -lperl -lpthread -ldl 
-lcrypt


After:

checking for flags to link embedded Perl... 
-L/usr/lib/perl5/5.32/x86_64-cygwin-threads/CORE -lperl -lpthread -ldl 
-lcrypt


That's as designed.  But the plperl tests fail:

CREATE EXTENSION plperl;
+ERROR:  incompatible library 
"/home/andrew/bf/root/HEAD/inst/lib/postgresql/plperl.dll": missing 
magic block

+HINT:  Extension libraries are required to use the PG_MODULE_MAGIC macro.

Among the now-dropped options, we can discount -Wl,--enable-auto-import, 
because that is used anyway via src/template/cygwin.


So one of the options

-Wl,--export-all-symbols
-Wl,--enable-auto-image-base
-fstack-protector-strong

is needed.  These options aren't used for any other shared libraries 
AFAICT, so nothing is clear to me.

Re: Stack overflow issue

2022-08-24 Thread mahendrakar s

Hi,
Can we have a parameter to control the recursion depth in these cases to
avoid crashes?
Just a thought.

Thanks,
Mahendrakar.

On Wed, 24 Aug, 2022, 3:21 pm Егор Чиндяскин,  wrote:

> Hello, I recently got a server crash (bug #17583 [1]) caused by a stack
> overflow.
>
> Tom Lane and Richard Guo, in a discussion of this bug, suggested that
> there could be more such places.
> Therefore, Alexander Lakhin and I decided to deal with this issue and
> Alexander developed a methodology. We processed src/backend/*/*.c with
> "clang -emit-llvm  ... | opt -analyze -print-calgraph" to find all the
> functions that call themselves directly. I checked each of them for
> features that protect against stack overflows.
> We analyzed 4 catalogs: regex, tsearch, snowball and adt.
> Firstly, we decided to test the regex catalog functions and found 6 of
> them that lack the check_stach_depth() call.
>
> zaptreesubs
> markst
> next
> nfatree
> numst
> repeat
>
> We have tried to exploit the recursion in the function zaptreesubs():
> select regexp_matches('a' || repeat(' a', 11000), '(.)(' || repeat(' \1',
> 11000) || ')?');
>
> ERROR:  invalid regular expression: regular expression is too complex
>
> repeat():
> select regexp_match('abc01234xyz',repeat('a{0,2}',11));
>
> ERROR:  invalid regular expression: regular expression is too complex
>
> numst():
> select regexp_match('abc01234xyz',repeat('(.)\1e',11));
>
> ERROR:  invalid regular expression: regular expression is too complex
>
> markst():
> markst is called in the code after v->tree = parse(...);
> it is necessary that the tree be successfully parsed, but with a nesting
> level of about 100,000 this will not work - stack protection will work
> during parsing and v->ntree = numst(...); is also there.
>
> next():
> we were able to crash the server with the following query:
> (printf "SELECT regexp_match('abc', 'a"; for ((i=1;i<100;i++)); do
> printf "(?#)"; done; printf "b')" ) | psql
>
> Secondly, we have tried to exploit the recursion in the adt catalog
> functions and Alexander was able to crash the server with the following
> query:
>
> regex_selectivity_sub():
> SELECT * FROM pg_proc WHERE proname ~ ('(a' || repeat('|', 20) ||
> 'b)');
>
> And this query:
>
> (n=10;
> printf "SELECT polygon '((0,0),(0,100))' <@ polygon
> '((-20,100),";
> for ((i=1;i<$n;i++)); do printf "(10,$(( 30 +
> $i))),(-10,$((80 + $i))),"; done;
> printf "(20,90),(20,0))';"
> ) | psql
>
> Thirdly, the snowball catalog, Alexander has tried to exploit the
> recursion in the r_stem_suffix_chain_before_ki function and crashed a
> server using this query:
>
> r_stem_suffix_chain_before_ki():
> SELECT ts_lexize('turkish_stem', repeat('lerdeki', 100));
>
> The last one is the tsearch catalog. We have found 4 functions that didn't
> have check_stach_depth() function:
>
> SplitToVariants
> mkANode
> mkSPNode
> LexizeExec
>
> We have tried to exploit the recursion in the SplitToVariants function and
> Alexander crashed a server using this:
>
> SplitToVariants():
> CREATE TEXT SEARCH DICTIONARY ispell (Template=ispell,
> DictFile=ispell_sample,AffFile=ispell_sample);
> SELECT ts_lexize('ispell', repeat('bally', 1));
>
> After trying to exploit the recursion in the LexizeExec function Alexander
> made this conlusion:
>
> LexizeExec has two branches "ld->curDictId == InvalidOid" (usual mode) and
> "ld->curDictId != InvalidOid" (multiword mode) - we start with the first
> one, then make recursive call to switch to the multiword mode, but then we
> return to the usual mode again.
>
> mkANode and mkSPNode deal with the dictionary structs, not with
> user-supplied data, so we believe these functions are not vulnerable.
>
> [1]
> https://www.postgresql.org/message-id/flat/CAMbWs499ytQiH4mLMhRxRWP-iEUz3-DSinpAD-cUCtVo_23Wtg%40mail.gmail.com#03ad703cf4bc8d28ccba69913e1e8106
>

Re: making relfilenodes 56 bits

2022-08-24 Thread Amit Kapila

On Tue, Aug 23, 2022 at 3:28 PM Dilip Kumar  wrote:
>
> On Tue, Aug 23, 2022 at 3:16 PM Amit Kapila  wrote:
>
> > One more thing we may want to think about is what if there are tables
> > created by extension? For example, I think BDR creates some tables
> > like node_group, conflict_history, etc. Now, I think if such an
> > extension is created by default, both old and new clusters will have
> > those tables. Isn't there a chance of relfilenumber conflict in such
> > cases?
>
> Shouldn't they behave as a normal user table? because before upgrade
> anyway new cluster can not have any table other than system tables and
> those tables created by an extension should also be restored as other
> user table does.
>

Right.

-- 
With Regards,
Amit Kapila.

Re: making relfilenodes 56 bits

2022-08-24 Thread Amit Kapila

On Tue, Aug 23, 2022 at 8:00 PM Robert Haas  wrote:
>
> On Tue, Aug 23, 2022 at 2:06 AM Dilip Kumar  wrote:
> > OTOH, if we keep the two separate ranges for the user and system table
> > then we don't need all this complex logic of conflict checking.
>
> True. That's the downside. The question is whether it's worth adding
> some complexity to avoid needing separate ranges.
>
> Honestly, if we don't care about having separate ranges, we can do
> something even simpler and just make the starting relfilenumber for
> system tables same as the OID. Then we don't have to do anything at
> all, outside of not changing the OID assigned to pg_largeobject in a
> future release. Then as long as pg_upgrade is targeting a new cluster
> with completely fresh databases that have not had any system table
> rewrites so far, there can't be any conflict.
>
> And perhaps that is the best solution after all, but while it is
> simple in terms of code, I feel it's a bit complicated for human
> beings. It's very simple to understand the scheme that Amit proposed:
> if there's anything in the new cluster that would conflict, we move it
> out of the way. We don't have to assume the new cluster hasn't had any
> table rewrites. We don't have to nail down starting relfilenumber
> assignments for system tables. We don't have to worry about
> relfilenumber or OID assignments changing between releases.
> pg_largeobject is not a special case. There are no special ranges of
> OIDs or relfilenumbers required. It just straight up works -- all the
> time, no matter what, end of story.
>

This sounds simple to understand. It seems we always create new system
tables in the new cluster before the upgrade, so I think it is safe to
assume there won't be any table rewrite in it. OTOH, if the
relfilenumber allocation scheme is robust to deal with table rewrites
then we probably don't need to worry about this assumption changing in
the future.

-- 
With Regards,
Amit Kapila.

Re: Pluggable toaster

2022-08-24 Thread Nikita Malakhov

Hi hackers!

I've rebased actual branch onto the latest master and re-created patches.
Checked with git am,
all applied correctly. Please check the attached patches.
Rebased branch resides here:
https://github.com/postgrespro/postgres/tree/toasterapi_clean

Just to remind - I've decided to leave only the first 2 patches for review
and send less significant
changes after the main patch will be straightened out.
So, here is
v14-0001-toaster-interface.patch - main TOAST API patch, with reference
TOAST
mechanics left as-is.
v14-0002-toaster-default.patch - reference TOAST re-implemented via TOAST
API.

On Tue, Aug 23, 2022 at 11:27 AM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi Nikita,
>
> > I've decided to leave only the first 2 patches for review and send less
> significant
> > changes after the main patch will be straightened out.
> > So, here is
> > v13-0001-toaster-interface.patch - main TOAST API patch, with reference
> TOAST
> > mechanics left as-is.
> > v13-0002-toaster-default.patch - reference TOAST re-implemented via
> TOAST API.
> > [...]
>
> Great! Thank you.
>
> Unfortunately the patchset still seems to have difficulties passing
> the CI checks (see http://cfbot.cputube.org/ ). Any chance we may see
> a version rebased to the current `master` branch for the September CF?
>
> --
> Best regards,
> Aleksander Alekseev
>

-- 
Regards,
Nikita Malakhov
https://postgrespro.ru/

v14-0002-toaster-default.patch.gz
Description: GNU Zip compressed data

v14-0001-toaster-interface.patch.gz
Description: GNU Zip compressed data

Stack overflow issue

2022-08-24 Thread Егор Чиндяскин


Hello, I recently got a server crash (bug #17583 [1]) caused by a stack 
overflow. 
 
Tom Lane and Richard Guo, in a discussion of this bug, suggested that there 
could be more such places. 
Therefore, Alexander Lakhin and I decided to deal with this issue and Alexander 
developed a methodology. We processed src/backend/*/*.c with "clang -emit-llvm  
... | opt -analyze -print-calgraph" to find all the functions that call 
themselves directly. I checked each of them for features that protect against 
stack overflows.
We analyzed 4 catalogs: regex, tsearch, snowball and adt.
Firstly, we decided to test the regex catalog functions and found 6 of them 
that lack the check_stach_depth() call.
 
zaptreesubs
markst
next
nfatree
numst
repeat
 
We have tried to exploit the recursion in the function zaptreesubs():
select regexp_matches('a' || repeat(' a', 11000), '(.)(' || repeat(' \1', 
11000) || ')?');
 
ERROR:  invalid regular expression: regular expression is too complex
 
repeat():
select regexp_match('abc01234xyz',repeat('a{0,2}',11));
 
ERROR:  invalid regular expression: regular expression is too complex
 
numst():
select regexp_match('abc01234xyz',repeat('(.)\1e',11));
 
ERROR:  invalid regular expression: regular expression is too complex
 
markst():
markst is called in the code after v->tree = parse(...);
it is necessary that the tree be successfully parsed, but with a nesting level 
of about 100,000 this will not work - stack protection will work during parsing 
and v->ntree = numst(...); is also there.
 
next():
we were able to crash the server with the following query:
(printf "SELECT regexp_match('abc', 'a"; for ((i=1;i<100;i++)); do printf 
"(?#)"; done; printf "b')" ) | psql
 
Secondly, we have tried to exploit the recursion in the adt catalog functions 
and Alexander was able to crash the server with the following query:
 
regex_selectivity_sub(): 
SELECT * FROM pg_proc WHERE proname ~ ('(a' || repeat('|', 20) || 'b)');
 
And this query:
 
(n=10;
printf "SELECT polygon '((0,0),(0,100))' <@ polygon '((-20,100),";
for ((i=1;i<$n;i++)); do printf "(10,$(( 30 + $i))),(-10,$((80 
+ $i))),"; done;
printf "(20,90),(20,0))';"
) | psql
 
Thirdly, the snowball catalog, Alexander has tried to exploit the recursion in 
the r_stem_suffix_chain_before_ki function and crashed a server using this 
query:
 
r_stem_suffix_chain_before_ki():
SELECT ts_lexize('turkish_stem', repeat('lerdeki', 100));
 
The last one is the tsearch catalog. We have found 4 functions that didn't have 
check_stach_depth() function: 
 
SplitToVariants
mkANode
mkSPNode
LexizeExec
 
We have tried to exploit the recursion in the SplitToVariants function and 
Alexander crashed a server using this:
 
SplitToVariants():
CREATE TEXT SEARCH DICTIONARY ispell (Template=ispell, 
DictFile=ispell_sample,AffFile=ispell_sample);
SELECT ts_lexize('ispell', repeat('bally', 1));
 
After trying to exploit the recursion in the LexizeExec function Alexander made 
this conlusion: 
 
LexizeExec has two branches "ld->curDictId == InvalidOid" (usual mode) and 
"ld->curDictId != InvalidOid" (multiword mode) - we start with the first one, 
then make recursive call to switch to the multiword mode, but then we return to 
the usual mode again.
 
mkANode and mkSPNode deal with the dictionary structs, not with user-supplied 
data, so we believe these functions are not vulnerable.
 
[1] 
https://www.postgresql.org/message-id/flat/CAMbWs499ytQiH4mLMhRxRWP-iEUz3-DSinpAD-cUCtVo_23Wtg%40mail.gmail.com#03ad703cf4bc8d28ccba69913e1e8106

Re: [RFC] building postgres with meson - v11

2022-08-24 Thread Peter Eisentraut


I have looked at your branch at 0545eec895:

258f6dc0a7 Don't hardcode tmp_check/ as test directory for tap tests
8ecc33cf04 Split TESTDIR into TESTLOGDIR and TESTDATADIR

I think these patches are split up a bit incorrectly.  If you apply
the first patch by itself, then the output appears in tab_comp_dir/
directly under the source directory.  And then the second patch moves
it to tmp_check/tap_comp_dir/.  If there is an intent to apply these
patches separately somehow, this should be cleaned up.

I haven't checked the second patch in detail yet, but it looks like
the thought was that the first patch is about ready to go.

834a40e609 meson: prereq: Extend gendef.pl in preparation for meson

I'm not qualified to check that in detail, but it looks reasonable
enough to me.

See attached patch (0001) for a perlcritic fix.

97a0b096e8 meson: prereq: Add src/tools/gen_export.pl

This produces leading whitespace in the output files that at least on
darwin wasn't there before.  See attached patch (0002).  This should
be checked again on other platforms as well.

Other than that this looks good.  Attached is a small cosmetic patch (0003).

40e363b263 meson: prereq: Refactor PG_TEST_EXTRA logic in autoconf build

Since I last looked, this has been turned into a meson option.  Which
is probably the best solution.  But then we should probably make this
a configure option as well.  Otherwise, it could get a bit confusing.
For example, I just unset PG_TEST_EXTRA in my environment to test
something with the meson build, but I was unaware that meson captures
the value at setup time, so my unsetting had no effect.

In any case, maybe adjust the regular expressions to check for word
boundaries, to maintain the original "whitespace-separated"
specification.  For example,

elsif ($ENV{PG_TEST_EXTRA} !~ /\bssl\b/)

e0a8387660 solaris: Use versioning scripts instead of -Bsymbolic

This looks like a good idea.  The documentation clearly states that
-Bsymbolic shouldn't be used, at least not in the way we have been
doing.  Might as well go ahead with this and give it a whirl on the
build farm.

0545eec895 meson: Add docs

We should think more about how to arrange the documentation.  We
probably don't want to copy-and-paste all the introductory and
requirements information.  I think we can make this initially much
briefer, like the Windows installation chapter.  For example, instead
of documenting each setup option again, just mention which ones exist
and then point (link) to the configure chapter for details.


I spent a bit of time with the test suites.  I think there is a
problem in that selecting a test suite directly, like

meson test -C _build --suite recovery

doesn't update the tmp_install.  So if this is the first thing you run
after a build, everything will fail.  Also, if you run this later, the
tmp_install doesn't get updated, so you're not testing up-to-date
code.From 2f25b48271bceb7aa1551e015b03fc20b9aff162 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 24 Aug 2022 11:23:52 +0200
Subject: [PATCH 1/3] Fix for perlcritic

---
 src/tools/msvc/gendef.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/tools/msvc/gendef.pl b/src/tools/msvc/gendef.pl
index f08268e781..cfbdef9007 100644
--- a/src/tools/msvc/gendef.pl
+++ b/src/tools/msvc/gendef.pl
@@ -174,7 +174,7 @@ sub usage
 {
if (-d $in)
{
-   push @files, <$in/*.obj>;
+   push @files, glob "$in/*.obj";
}
else
{
-- 
2.37.1

From a64c90e756d6996b7d8d9d63d42e30c72d4ce098 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 24 Aug 2022 11:24:22 +0200
Subject: [PATCH 2/3] Fix whitespace in output of gen_export.pl

---
 src/tools/gen_export.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/tools/gen_export.pl b/src/tools/gen_export.pl
index 6a8b196ad8..1265564473 100644
--- a/src/tools/gen_export.pl
+++ b/src/tools/gen_export.pl
@@ -56,7 +56,7 @@
}
elsif ($format eq 'darwin')
{
-   print $output_handle "_$1\n";
+   print $output_handle "_$1\n";
}
elsif ($format eq 'gnu')
{
-- 
2.37.1

From 5b79baf6bbf4fa62f153a5e96e97f8d5a6345821 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 24 Aug 2022 11:24:36 +0200
Subject: [PATCH 3/3] Some Perl code simplification in gen_export.pl

---
 src/tools/gen_export.pl | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/tools/gen_export.pl b/src/tools/gen_export.pl
index 1265564473..727105ba08 100644
--- a/src/tools/gen_export.pl
+++ b/src/tools/gen_export.pl
@@ -48,7 +48,7 @@
{
# don't do anything with a comment
}
-   elsif (/^([^\s]+)\s+([^\s]+)/)
+   elsif (/^(\S+)\s+(\S+)/)
{
if ($format eq 'aix')
{
@@ -79,5 +79,3 @@
 };
 ";
 }
-
-exit(0);
-- 
2.37.1

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-08-24 Thread Peter Smith

Here are some review comments for the patch v8-0001:

==

1. Commit message

1a.
Majority of the logic on the subscriber side has already existed in the code.

SUGGESTION
The majority of the logic on the subscriber side already exists in the code.

~

1b.
Second, when REPLICA IDENTITY IS FULL on the publisher and an index is
used on the subscriber...

SUGGESTION
Second, when REPLICA IDENTITY FULL is on the publisher and an index is
used on the subscriber...

~

1c.
Still, below I try to show case the potential improvements using an
index on the subscriber
`pgbench_accounts(bid)`. With the index, the replication catches up
around ~5 seconds.
When the index is dropped, the replication takes around ~300 seconds.

"show case" -> "showcase"

~

1d.
In above text, what was meant by "catches up around ~5 seconds"?
e.g. Did it mean *improves* by ~5 seconds, or *takes* ~5 seconds?

~

1e.
// create one indxe, even on a low cardinality column

typo "indxe"

==

2. GENERAL

2a.
There are lots of single-line comments that start lowercase, but by
convention, I think they should start uppercase.

e.g. + /* we should always use at least one attribute for the index scan */
e.g. + /* we might not need this if the index is unique */
e.g. + /* avoid expensive equality check if index is unique */
e.g. + /* unrelated Path, skip */
e.g. + /* simple case, we already have an identity or pkey */
e.g. + /* indexscans are disabled, use seq. scan */
e.g. + /* target is a regular table */

~~

2b.
There are some excess blank lines between the function. By convention,
I think 1 blank line is normal, but here there are sometimes 2.

~~

2c.
There are some new function comments which include their function name
in the comment. It seemed unnecessary.

e.g. GetCheapestReplicaIdentityFullPath
e.g. FindUsableIndexForReplicaIdentityFull
e.g. LogicalRepUsableIndex

==

3. src/backend/executor/execReplication.c - build_replindex_scan_key

- int attoff;
+ int index_attoff;
+ int scankey_attoff;
  bool isnull;
  Datum indclassDatum;
  oidvector  *opclass;
  int2vector *indkey = &idxrel->rd_index->indkey;
- bool hasnulls = false;
-
- Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel) ||
-RelationGetPrimaryKeyIndex(rel) == RelationGetRelid(idxrel));

  indclassDatum = SysCacheGetAttr(INDEXRELID, idxrel->rd_indextuple,
  Anum_pg_index_indclass, &isnull);
  Assert(!isnull);
  opclass = (oidvector *) DatumGetPointer(indclassDatum);
+ scankey_attoff = 0;

Maybe just assign scankey_attoff = 0 at the declaration?

~~~

4.

+ /*
+ * There are two cases to consider. First, if the index is a primary or
+ * unique key, we cannot have any indexes with expressions. So, at this
+ * point we are sure that the index we deal is not these.
+ */

"we deal" -> "we are dealing with" ?

~~~

5.

+ /*
+ * For a non-primary/unique index with an additional expression, do
+ * not have to continue at this point. However, the below code
+ * assumes the index scan is only done for simple column references.
+ */
+ continue;

Is this one of those comments that ought to have a "XXX" prefix as a
note for the future?

~~~

6.

- int pkattno = attoff + 1;
...
  /* Initialize the scankey. */
- ScanKeyInit(&skey[attoff],
- pkattno,
+ ScanKeyInit(&skey[scankey_attoff],
+ index_attoff + 1,
  BTEqualStrategyNumber,
Wondering if it would have been simpler if you just did:
int pkattno = index_attoff + 1;

~~~

7.

- skey[attoff].sk_flags |= SK_ISNULL;
+ skey[scankey_attoff].sk_flags |= SK_ISNULL;
+ skey[scankey_attoff].sk_flags |= SK_SEARCHNULL;

SUGGESTION
skey[scankey_attoff].sk_flags |= (SK_ISNULL | SK_SEARCHNULL)

~~~

8. src/backend/executor/execReplication.c - RelationFindReplTupleByIndex

@@ -128,28 +171,44 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
  TransactionId xwait;
  Relation idxrel;
  bool found;
+ TypeCacheEntry **eq;
+ bool indisunique;
+ int scankey_attoff;

  /* Open the index. */
  idxrel = index_open(idxoid, RowExclusiveLock);
+ indisunique = idxrel->rd_index->indisunique;
+
+ /* we might not need this if the index is unique */
+ eq = NULL;

Maybe just default assign eq = NULL in the declaration?

~~~

9.

+ scan = index_beginscan(rel, idxrel, &snap,
+scankey_attoff, 0);

Unnecessary wrapping?

~~~

10.

+ /* we only need to allocate once */
+ if (eq == NULL)
+ eq = palloc0(sizeof(*eq) * outslot->tts_tupleDescriptor->natts);

But shouldn't you also free this 'eq' before the function returns, to
prevent leaking memory?

==

11. src/backend/replication/logical/relation.c - logicalrep_rel_open

+ /*
+ * Finding a usable index is an infrequent operation, it is performed
+ * only when first time an operation is performed on the relation or
+ * after invalidation of the relation cache entry (e.g., such as ANALYZE).
+ */

SUGGESTION (minor rewording)
Finding a usable index is an infrequent task. It is performed only
when an operation is first performed on the relation, or after
invalidation of the relation cache entry (e.g.

Re: ICU for global collation

2022-08-24 Thread Julien Rouhaud

On Wed, Aug 24, 2022 at 01:38:44PM +0900, Michael Paquier wrote:
> On Tue, Aug 23, 2022 at 08:59:02PM +0300, Marina Polyakova wrote:
> > My colleague Andrew Bille found another bug in master
> > (b4e936859dc441102eb0b6fb7a104f3948c90490) and REL_15_STABLE
> > (2c63b0930aee1bb5c265fad4a65c9d0b62b1f9da): pg_collation.colliculocale is
> > not dumped. See check_icu_locale.sh:
> > 
> > In the old cluster:
> > SELECT collname, colliculocale FROM pg_collation WHERE collname =
> > 'testcoll_backwards'
> >   collname  |   colliculocale
> > +---
> >  testcoll_backwards | @colBackwards=yes
> > (1 row)
> > 
> > In the new cluster:
> > SELECT collname, colliculocale FROM pg_collation WHERE collname =
> > 'testcoll_backwards'
> >   collname  | colliculocale
> > +---
> >  testcoll_backwards |
> > (1 row)
> > 
> > diff_dump_colliculocale.patch works for me.
> 
> Ugh.  Good catch, again!

+1

> I have not tested the patch in details but
> this looks rather sane to me on a quick read.  Peter?

Patch looks good to me too.

Re: standby promotion can create unreadable WAL

2022-08-24 Thread Kyotaro Horiguchi

Nice find!

At Wed, 24 Aug 2022 11:09:44 +0530, Dilip Kumar  wrote 
in 
> On Tue, Aug 23, 2022 at 12:06 AM Robert Haas  wrote:
> 
> > Nothing that uses xlogreader is going to be able to bridge the gap
> > between file #4 and file #5. In this case it doesn't matter very much,
> > because we immediately write a checkpoint record into file #5, so if
> > we crash we won't try to replay file #4 anyway. However, if anything
> > did try to look at file #4 it would get confused. Maybe that can
> > happen if this is a streaming standby, where we only write an
> > end-of-recovery record upon promotion, rather than a checkpoint, or
> > maybe if there are cascading standbys someone could try to actually
> > use the 00020004 file for something. I'm not sure. But
> > unless I'm missing something, that file is bogus, and our only hope of
> > not having problems is that perhaps no one will ever look at it.
> 
> Yeah, this analysis looks correct to me.

(I didn't reproduce the case but understand what is happening.)

Me, too.  There are two ways to deal with this, I think. One is start
writing new records from abortedContRecPtr as if it were not
exist. Another is copying WAL file up to missingContRecPtr. Since the
first segment of the new timeline doesn't need to be identcal to the
last one of the previous timeline, so I think the former way is
cleaner.  XLogInitNewTimeline or near seems to be be the place for fix
to me. Clearing abortedRecPtr and missingContrecPtr just before the
call to findNewestTimeLine will work?


diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 87b243e0d4..27e01153e7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5396,6 +5396,13 @@ StartupXLOG(void)
 */
XLogInitNewTimeline(EndOfLogTLI, EndOfLog, newTLI);
 
+   /*
+* EndOfLog doesn't cover aborted contrecord even if the last 
record
+* was that, then the next timeline starts writing from there. 
Forget
+* about aborted and missing contrecords even if any.
+*/
+   abortedRecPtr = missingContrecPtr = InvalidXLogRecPtr;
+
/*
 * Remove the signal files out of the way, so that we don't
 * accidentally re-enter archive recovery mode in a subsequent 
crash.


> > I think that the cause of this problem is this code right here:
> >
> > /*
> >  * Actually, if WAL ended in an incomplete record, skip the parts that
> >  * made it through and start writing after the portion that persisted.
> >  * (It's critical to first write an OVERWRITE_CONTRECORD message, which
> >  * we'll do as soon as we're open for writing new WAL.)
> >  */
> > if (!XLogRecPtrIsInvalid(missingContrecPtr))
> > {
> > Assert(!XLogRecPtrIsInvalid(abortedRecPtr));
> > EndOfLog = missingContrecPtr;
> > }
> 
> Yeah, this statement as well as another statement that creates the
> overwrite contrecord.  After changing these two lines the problem is
> fixed for me.  Although I haven't yet thought of all the scenarios
> that whether it is safe in all the cases.  I agree that after timeline
> changes we are pointing to the end of the last valid record we can
> start writing the next record from that point onward.  But I think we
> should need to think hard that whether it will break any case for
> which the overwrite contrecord was actually introduced.
> 
> diff --git a/src/backend/access/transam/xlog.c
> b/src/backend/access/transam/xlog.c
> index 7602fc8..3d38613 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -5491,7 +5491,7 @@ StartupXLOG(void)
>  * (It's critical to first write an OVERWRITE_CONTRECORD message, 
> which
>  * we'll do as soon as we're open for writing new WAL.)
>  */
> -   if (!XLogRecPtrIsInvalid(missingContrecPtr))
> +   if (newTLI == endOfRecoveryInfo->lastRecTLI &&
> !XLogRecPtrIsInvalid(missingContrecPtr))
> {
> Assert(!XLogRecPtrIsInvalid(abortedRecPtr));
> EndOfLog = missingContrecPtr;
> @@ -5589,7 +5589,7 @@ StartupXLOG(void)
> LocalSetXLogInsertAllowed();
> 
> /* If necessary, write overwrite-contrecord before doing
> anything else */
> -   if (!XLogRecPtrIsInvalid(abortedRecPtr))
> +   if (newTLI == endOfRecoveryInfo->lastRecTLI &&
> !XLogRecPtrIsInvalid(abortedRecPtr))
> {
> Assert(!XLogRecPtrIsInvalid(missingContrecPtr));
> CreateOverwriteContrecordRecord(abortedRecPtr,
> missingContrecPtr, newTLI);

This also seems to work, of course.

# However, I haven't managed to reproduce that, yet...

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: archive modules

2022-08-24 Thread talk to ben

Nathan Bossart  writes:
> On one hand, it seems like folks will commonly encounter this behavior
with this
> module, so this feels like a natural place for such a note.

Yes, I looked there first.

Would this addition to the pg_settings description be better  ?
From 5346a8a0451e222e6592baacb994e6a0f884898d Mon Sep 17 00:00:00 2001
From: benoit 
Date: Mon, 22 Aug 2022 12:00:46 +0200
Subject: [PATCH] basic_archive parameter visibility doc patch

Module parameters are only visible from the pg_settings view once
the module is loaded. Since an archive module is loaded by the archiver
process, the parameters are never visible from the view. This patch
adds a note bout this in the pg_settings system view documentation.
---
 doc/src/sgml/system-views.sgml | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 9728039e71..929838dfa8 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3275,7 +3275,11 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
   
This view does not display customized options
-   until the extension module that defines them has been loaded.
+   until the extension module that defines them has been loaded. Therefore, any
+   option defined in a library that is dynamically loaded in a separate process
+   will not be visible in the view, unless the module is manually loaded
+   beforehand. This case applies for example to an archive module loaded by the
+   archiver process.
   
 
   
-- 
2.37.1

Re: Schema variables - new implementation for Postgres 15

2022-08-24 Thread Erik Rijkers


Op 24-08-2022 om 08:37 schreef Pavel Stehule:




I fixed these.



> [v20220824-1-*.patch]

Hi Pavel,

I noticed just now that variable assignment (i.e., LET) unexpectedly 
(for me anyway) cast the type of the input value. Surely that's wrong? 
The documentation says clearly enough:


'The result must be of the same data type as the session variable.'


Example:

create variable x integer;
let x=1.5;
select x, pg_typeof(x);
 x | pg_typeof
---+---
 2 | integer
(1 row)


Is this correct?

If such casts (there are several) are intended then the text of the 
documentation should be changed.


Thanks,

Erik

Add semi-join pushdown to postgres_fdw

2022-08-24 Thread Alexander Pyhalov


Hi.

It's possible to extend deparsing in postgres_fdw, so that we can push 
down semi-joins, which doesn't refer to inner reltarget. This allows

us to push down joins in queries like

SELECT * FROM ft1 t1 WHERE t1.c1 < 10 AND t1.c3 IN (SELECT c3 FROM ft2 
t2 WHERE date(c5) = '1970-01-17'::date);



EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE t1.c1 < 10 AND 
t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE date(c5) = '1970-01-17'::date);
 
QUERY PLAN

---
Foreign Scan
   Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
   Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
   Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, 
r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 10)) AND (EXISTS 
(SELECT NULL FROM "S 1"."T 1" r3 WHERE ((date(r3.c5) = 
'1970-01-17'::date)) AND ((r1.c3 = r3.c3


Deparsing semi-joins leads to generating (text) conditions like 'EXISTS 
(SELECT NULL FROM  inner_rel WHERE join_conds) . Such conditions are 
generated in deparseFromExprForRel() and distributed to nearest WHERE, 
where they are added to the list of and clauses.


--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 7833d67f69287648c4594a5508feed376427f95d Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Fri, 12 Aug 2022 15:02:24 +0300
Subject: [PATCH] postgres_fdw: add support for deparsing semi joins

We deparse semi-joins as EXISTS subqueries. So, deparsing
semi-join leads to generating addl_conds condition,
which is then added to the uppermost JOIN's WHERE clause.
---
 contrib/postgres_fdw/deparse.c| 198 +---
 .../postgres_fdw/expected/postgres_fdw.out| 297 --
 contrib/postgres_fdw/postgres_fdw.c   |  78 -
 contrib/postgres_fdw/postgres_fdw.h   |   3 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql | 119 ++-
 5 files changed, 613 insertions(+), 82 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index a9766f97346..fcdc679d51f 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -179,12 +179,13 @@ static void appendLimitClause(deparse_expr_cxt *context);
 static void appendConditions(List *exprs, deparse_expr_cxt *context);
 static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
   RelOptInfo *foreignrel, bool use_alias,
-  Index ignore_rel, List **ignore_conds,
+  Index ignore_rel, List **ignore_conds, StringInfo addl_conds,
   List **params_list);
+static void appendWhereClause(List *exprs, StringInfo addl_conds, deparse_expr_cxt *context);
 static void deparseFromExpr(List *quals, deparse_expr_cxt *context);
 static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root,
 			   RelOptInfo *foreignrel, bool make_subquery,
-			   Index ignore_rel, List **ignore_conds, List **params_list);
+			   Index ignore_rel, List **ignore_conds, StringInfo addl_conds, List **params_list);
 static void deparseAggref(Aggref *node, deparse_expr_cxt *context);
 static void appendGroupByClause(List *tlist, deparse_expr_cxt *context);
 static void appendOrderBySuffix(Oid sortop, Oid sortcoltype, bool nulls_first,
@@ -1372,23 +1373,20 @@ deparseFromExpr(List *quals, deparse_expr_cxt *context)
 {
 	StringInfo	buf = context->buf;
 	RelOptInfo *scanrel = context->scanrel;
+	StringInfoData addl_conds;
 
 	/* For upper relations, scanrel must be either a joinrel or a baserel */
 	Assert(!IS_UPPER_REL(context->foreignrel) ||
 		   IS_JOIN_REL(scanrel) || IS_SIMPLE_REL(scanrel));
 
+	initStringInfo(&addl_conds);
 	/* Construct FROM clause */
 	appendStringInfoString(buf, " FROM ");
 	deparseFromExprForRel(buf, context->root, scanrel,
 		  (bms_membership(scanrel->relids) == BMS_MULTIPLE),
-		  (Index) 0, NULL, context->params_list);
-
-	/* Construct WHERE clause */
-	if (quals != NIL)
-	{
-		appendStringInfoString(buf, " WHERE ");
-		appendConditions(quals, context);
-	}
+		  (Index) 0, NULL, &addl_conds, context->params_list);
+	appendWhereClause(quals, &addl_conds, context);
+	pfree(addl_conds.data);
 }
 
 /*
@@ -1600,6 +1598,33 @@ appendConditions(List *exprs, deparse_expr_cxt *context)
 	reset_transmission_modes(nestlevel);
 }
 
+/*
+ * Append WHERE clause, containing conditions
+ * from exprs and addl_conds, to context->buf.
+ */
+static void
+appendWhereClause(List *exprs, StringInfo addl_conds, deparse_expr_cxt *context)
+{
+	StringInfo	buf = context->buf;
+	bool		need_and = false;
+
+	if (exprs != NIL || addl_conds->len > 0)
+		appendStringInfoString(buf, " WHERE ");
+
+	if (exprs != NIL)
+	{
+		appendConditions(exprs, context);
+

91 matches

Mail list logo