date:20220817

Strip -mmacosx-version-min options from plperl build

2022-08-17 Thread Peter Eisentraut

When building on macOS against a Homebrew-provided Perl installation, I 
get these warnings during the build:


ld: warning: object file (SPI.o) was built for newer macOS version 
(12.4) than being linked (11.3)
ld: warning: object file (plperl.o) was built for newer macOS version 
(12.4) than being linked (11.3)

...

This is because the link command uses the option 
-mmacosx-version-min=11.3, which comes in from perl_embed_ldflags (perl 
-MExtUtils::Embed -e ldopts), but the compile commands don't use that 
option, which creates a situation that ld considers inconsistent.


I think an appropriate fix is to strip out the undesired option from 
perl_embed_ldflags.  We already do that for other options.  Proposed 
patch attached.From e50c439f2fced8d1b3af9b3bc142f8d43a781c5a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 18 Aug 2022 08:24:03 +0200
Subject: [PATCH] Strip -mmacosx-version-min options from plperl build

---
 config/perl.m4 | 2 +-
 configure  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/config/perl.m4 b/config/perl.m4
index c823fc8cf0..65f338bda7 100644
--- a/config/perl.m4
+++ b/config/perl.m4
@@ -100,7 +100,7 @@ if test "$PORTNAME" = "win32" ; then
 else
pgac_tmp1=`$PERL -MExtUtils::Embed -e ldopts`
pgac_tmp2=`$PERL -MConfig -e 'print $Config{ccdlflags}'`
-   perl_embed_ldflags=`echo X"$pgac_tmp1" | sed -e "s/^X//" -e 
"s%$pgac_tmp2%%" -e ["s/ -arch [-a-zA-Z0-9_]*//g"]`
+   perl_embed_ldflags=`echo X"$pgac_tmp1" | sed -e "s/^X//" -e 
"s%$pgac_tmp2%%" -e ["s/ -arch [-a-zA-Z0-9_]*//g"] -e ["s/ 
-mmacosx-version-min=[0-9.]*//g"]`
 fi
 AC_SUBST(perl_embed_ldflags)dnl
 if test -z "$perl_embed_ldflags" ; then
diff --git a/configure b/configure
index 176e0f9b00..8e0f648702 100755
--- a/configure
+++ b/configure
@@ -10479,7 +10479,7 @@ if test "$PORTNAME" = "win32" ; then
 else
pgac_tmp1=`$PERL -MExtUtils::Embed -e ldopts`
pgac_tmp2=`$PERL -MConfig -e 'print $Config{ccdlflags}'`
-   perl_embed_ldflags=`echo X"$pgac_tmp1" | sed -e "s/^X//" -e 
"s%$pgac_tmp2%%" -e "s/ -arch [-a-zA-Z0-9_]*//g"`
+   perl_embed_ldflags=`echo X"$pgac_tmp1" | sed -e "s/^X//" -e 
"s%$pgac_tmp2%%" -e "s/ -arch [-a-zA-Z0-9_]*//g" -e "s/ 
-mmacosx-version-min=[0-9.]*//g"`
 fi
 if test -z "$perl_embed_ldflags" ; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
-- 
2.37.1

Re: Perform streaming logical transactions by background workers and parallel apply

2022-08-17 Thread Peter Smith

Hi Wang-san,

FYI, I also checked the latest patch v23-0001 but found that the
v21-0001/v23-0001 differences are minimal, so all my v21* review
comments are still applicable for the patch v23-0001.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Compressed pluggable storage experiments

2022-08-17 Thread Natarajan R

Hi all, This is a continuation of the above thread...

>> > 4. In order to use WAL-logging each page must start with a standard 24
>> > byte PageHeaderData even if it is needless for storage itself. Not a
>> > big deal though. Another (acutally documented) WAL-related limitation
>> > is that only generic WAL can be used within extension. So unless
>> > inserts are made in bulks it's going to require a lot of disk space to
>> > accomodate logs and wide bandwith for replication.
>>
>> Not sure what to suggest.  Either you should ignore this problem, or
>> you should fix it.

I am working on an environment similar to the above extension(pg_cryogen
which experiments pluggable storage api's) but don't have much knowledge on
pg's logical replication..
Please suggest some approaches to support pg's logical replication for a
table with a custom access method, which writes generic wal record.

On Wed, 17 Aug 2022 at 19:04, Tomas Vondra 
wrote:

> On Fri, Oct 18, 2019 at 03:25:05AM -0700, Andres Freund wrote:
> >Hi,
> >
> >On 2019-10-17 12:47:47 -0300, Alvaro Herrera wrote:
> >> On 2019-Oct-10, Ildar Musin wrote:
> >>
> >> > 1. Unlike FDW API, in pluggable storage API there are no routines like
> >> > "begin modify table" and "end modify table" and there is no shared
> >> > state between insert/update/delete calls.
> >>
> >> Hmm.  I think adding a begin/end to modifytable is a reasonable thing to
> >> do (it'd be a no-op for heap and zheap I guess).
> >
> >I'm fairly strongly against that. Adding two additional "virtual"
> >function calls for something that's rarely going to be used, seems like
> >adding too much overhead to me.
> >
>
> That seems a bit strange to me. Sure - if there's an alternative way to
> achieve the desired behavior (clear way to finalize writes etc.), then
> cool, let's do that. But forcing people to use invonvenient workarounds
> seems like a bad thing to me - having a convenient and clear API is
> quite valueable, IMHO.
>
> Let's see if this actually has a measuerable overhead first.
>
> >
> >> > 2. It looks like I cannot implement custom storage options. E.g. for
> >> > compressed storage it makes sense to implement different compression
> >> > methods (lz4, zstd etc.) and corresponding options (like compression
> >> > level). But as i can see storage options (like fillfactor etc) are
> >> > hardcoded and are not extensible. Possible solution is to use GUCs
> >> > which would work but is not extremely convinient.
> >>
> >> Yeah, the reloptions module is undergoing some changes.  I expect that
> >> there will be a way to extend reloptions from an extension, at the end
> >> of that set of patches.
> >
> >Cool.
> >
>
> Yep.
>
> >
> >> > 3. A bit surprising limitation that in order to use bitmap scan the
> >> > maximum number of tuples per page must not exceed 291 due to
> >> > MAX_TUPLES_PER_PAGE macro in tidbitmap.c which is calculated based on
> >> > 8kb page size. In case of 1mb page this restriction feels really
> >> > limiting.
> >>
> >> I suppose this is a hardcoded limit that needs to be fixed by patching
> >> core as we make table AM more pervasive.
> >
> >That's not unproblematic - a dynamic limit would make a number of
> >computations more expensive, and we already spend plenty CPU cycles
> >building the tid bitmap. And we'd waste plenty of memory just having all
> >that space for the worst case.  ISTM that we "just" need to replace the
> >TID bitmap with some tree like structure.
> >
>
> I think the zedstore has roughly the same problem, and Heikki mentioned
> some possible solutions to dealing with it in his pgconfeu talk (and it
> was discussed in the zedstore thread, I think).
>
> >
> >> > 4. In order to use WAL-logging each page must start with a standard 24
> >> > byte PageHeaderData even if it is needless for storage itself. Not a
> >> > big deal though. Another (acutally documented) WAL-related limitation
> >> > is that only generic WAL can be used within extension. So unless
> >> > inserts are made in bulks it's going to require a lot of disk space to
> >> > accomodate logs and wide bandwith for replication.
> >>
> >> Not sure what to suggest.  Either you should ignore this problem, or
> >> you should fix it.
> >
> >I think if it becomes a problem you should ask for an rmgr ID to use for
> >your extension, which we encode and then then allow to set the relevant
> >rmgr callbacks for that rmgr id at startup.  But you should obviously
> >first develop the WAL logging etc, and make sure it's beneficial over
> >generic wal logging for your case.
> >
>
> AFAIK compressed/columnar engines generally implement two types of
> storage - write-optimized store (WOS) and read-optimized store (ROS),
> where the WOS is mostly just an uncompressed append-only buffer, and ROS
> is compressed etc. ISTM the WOS would benefit from a more elaborate WAL
> logging, but ROS should be mostly fine with the generic WAL logging.
>
> But yeah, we should test and measure how beneficial that actually is.

Re: Perform streaming logical transactions by background workers and parallel apply

2022-08-17 Thread Peter Smith

Here are my review comments for patch v21-0001:

Note - There are some "general" comments which will result in lots of
smaller changes. The subsequent "detailed" review comments have some
overlap with these general comments but I expect some will be missed
so please search/replace to fix all code related to those general
comments.

==

1. GENERAL - main_worker_pid and replorigin_session_setup

Quite a few of my subsequent review comments below are related to the
somewhat tricky (IMO) change to the code for this area. Here is a
summary of some things that can be done to clean/simplify this logic.

1a.
Make the existing replorigin_session_setup function just be a wrapper
that delegates to the other function passing the acquired_by as 0.
This is because in every case but one (in the apply bg worker main) we
are always passing 0, and IMO there is no need to spread the messy
extra param to places that do not use it.

1b.
'main_worker_pid' is a confusing member name given the way it gets
used - e.g. not even set when you actually *are* the main apply
worker? You can still keep all the same logic, but just change the
name to something more like 'apply_leader_pid' - then the code can
make sense because the main apply workers have no "apply leader" but
the apply background workers do.

1c.
IMO it will be much better to use pid_t and InvalidPid for the type
and the unset values of this member.

1d.
The checks/Asserts for main_worker_pid are confusing to read. (e.g.
Assert(worker->main_worker_pid != 0) means the worker is a apply
background worker. IMO there should be convenient macros for these -
then code can be readable again.
e.g.
#define isApplyMainWorker(worker) (worker->apply_leader_pid == InvalidPid)
#define isApplyBgWorker(worker) (worker->apply_leader_pid != InvalidPid)

==

2. GENERAL - ApplyBgworkerInfo

I like that the struct ApplyBgworkerState was renamed to the more
appropriate name ApplyBgworkerInfo. But now all the old variable names
(e.g. 'wstate') and parameters must be updated as well. Please
search/replace them all in code and comments.

e.g.
ApplyBgworkerInfo *wstate

should now be something like:
ApplyBgworkerInfo *winfo;

==

3. GENERAL - ApplyBgWorkerStatus --> ApplyBgworkerState

IMO the enum should be changed to ApplyBgWorkerState because the
values all represent the discrete state that the bgworker is at. See
the top StackOverflow answer here [1] which is the same as the point I
am trying to make with this comment.

This is a  simple mechanical exercise rename to fix the reliability
but it will impact lots of variables, parameters, function names, and
comments. Please search/replace to get them all.

==

4. Commit message

In addition, the patch extends the logical replication STREAM_ABORT message so
that abort_time and abort_lsn can also be sent which can be used to update the
replication origin in apply background worker when the streaming transaction is
aborted.

4a.
Should this para also mention something about the introduction of
protocol version 4?

4b.
Should this para also mention that these extensions are not strictly
mandatory for the parallel streaming to still work?

==

5. doc/src/sgml/catalogs.sgml

   
-   If true, the subscription will allow streaming of in-progress
-   transactions
+   Controls how to handle the streaming of in-progress transactions:
+   f = disallow streaming of in-progress transactions,
+   t = spill the changes of in-progress transactions to
+   disk and apply at once after the transaction is committed on the
+   publisher,
+   p = apply changes directly using a background
+   worker if available(same as 't' if no worker is available)
   

Missing whitespace before '('

==

6. doc/src/sgml/logical-replication.sgml

@@ -1334,7 +1344,8 @@ CONTEXT:  processing remote data for replication
origin "pg_16395" during "INSER
subscription.  A disabled subscription or a crashed subscription will have
zero rows in this view.  If the initial data synchronization of any
table is in progress, there will be additional workers for the tables
-   being synchronized.
+   being synchronized. Moreover, if the streaming transaction is applied
+   parallelly, there will be additional workers.
   

"applied parallelly" sounds a bit strange.

SUGGESTION-1
Moreover, if the streaming transaction is applied in parallel, there
will be additional workers.

SUGGESTION-2
Moreover, if the streaming transaction is applied using 'parallel'
mode, there will be additional workers.

==

7. doc/src/sgml/protocol.sgml

@@ -3106,6 +3106,11 @@ psql "dbname=postgres replication=database" -c
"IDENTIFY_SYSTEM;"
Version 3 is supported only for server version 15
and above, and it allows streaming of two-phase commits.
   
+  
+   Version 4 is supported only for server version 16
+   and above, and it allows applying stream of large in-progress
+   transactions in parallel.
+

Re: Cleaning up historical portability baggage

2022-08-17 Thread Thomas Munro

On Mon, Aug 15, 2022 at 5:53 PM Thomas Munro  wrote:
>   Remove configure probe for IPv6.
>   Remove dead ifaddrs.c fallback code.
>   Remove configure probe for net/if.h.
>   Fix macro problem with gai_strerror on Windows.
>   Remove configure probe for netinet/tcp.h.
>   mstcpip.h is not missing on MinGW.

I pushed these except one, plus one more about  which
turned out to be not needed after a bit of archeology.

Here's a slightly better AF_INET6 one.  I'm planning to push it
tomorrow if there are no objections.  It does something a little more
aggressive than the preceding stuff, because SUSv3 says that IPv6 is
an "option".  I don't see that as an issue: it also says that various
other ubiquitous stuff we're using is optional.  Of course, it would
be absurd for a new socket implementation to appear today that can't
talk to a decent chunk of the internet, and all we require here is the
headers.  That optionality was relevant for the transition period a
couple of decades ago.
From f162a15a6d723f8c94d9daa6236149e1f39b0d9a Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 18 Aug 2022 11:55:10 +1200
Subject: [PATCH] Remove configure probe for sockaddr_in6 and require AF_INET6.

SUSv3  defines struct sockaddr_in6, and all targeted Unix
systems have it.  Windows has it in .  Remove the configure
probe, the macro and a small amount of dead code.

Also remove a mention of IPv6-less builds from the documentation, since
there aren't any.

This is similar to commits f5580882 and 077bf2f2 for Unix sockets.
Even though AF_INET6 is an "optional" component of SUSv3, there are no
known modern operating system without it, and it seems even less likely
to be omitted from future systems than AF_UNIX.

Discussion: https://postgr.es/m/ca+hukgkernfhmvb_h0upremp4lpzgn06yr2_0tyikjzb-2e...@mail.gmail.com
---
 configure   | 10 --
 configure.ac|  6 --
 doc/src/sgml/client-auth.sgml   |  2 --
 src/backend/libpq/auth.c| 21 -
 src/backend/libpq/hba.c |  5 -
 src/backend/libpq/ifaddr.c  | 18 +-
 src/backend/libpq/pqcomm.c  |  2 --
 src/backend/utils/adt/network.c | 10 --
 src/backend/utils/adt/pgstatfuncs.c | 11 ++-
 src/bin/initdb/initdb.c | 10 --
 src/include/pg_config.h.in  |  3 ---
 src/include/utils/inet.h|  9 -
 src/interfaces/libpq/fe-connect.c   |  2 --
 src/port/inet_net_ntop.c|  5 ++---
 src/tools/ifaddrs/test_ifaddrs.c|  2 --
 src/tools/msvc/Solution.pm  |  1 -
 16 files changed, 9 insertions(+), 108 deletions(-)

diff --git a/configure b/configure
index b7fd6c5f4e..c275330114 100755
--- a/configure
+++ b/configure
@@ -16279,16 +16279,6 @@ cat >>confdefs.h <<_ACEOF
 _ACEOF
 
 
-ac_fn_c_check_type "$LINENO" "struct sockaddr_in6" "ac_cv_type_struct_sockaddr_in6" "$ac_includes_default
-#include 
-"
-if test "x$ac_cv_type_struct_sockaddr_in6" = xyes; then :
-
-$as_echo "#define HAVE_IPV6 1" >>confdefs.h
-
-fi
-
-
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for PS_STRINGS" >&5
 $as_echo_n "checking for PS_STRINGS... " >&6; }
 if ${pgac_cv_var_PS_STRINGS+:} false; then :
diff --git a/configure.ac b/configure.ac
index e5740f4fb5..a97a48a508 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1801,12 +1801,6 @@ AC_CHECK_DECLS([pwritev], [], [AC_LIBOBJ(pwritev)], [#include ])
 # This is probably only present on macOS, but may as well check always
 AC_CHECK_DECLS(F_FULLFSYNC, [], [], [#include ])
 
-AC_CHECK_TYPE([struct sockaddr_in6],
-[AC_DEFINE(HAVE_IPV6, 1, [Define to 1 if you have support for IPv6.])],
-[],
-[$ac_includes_default
-#include ])
-
 AC_CACHE_CHECK([for PS_STRINGS], [pgac_cv_var_PS_STRINGS],
 [AC_LINK_IFELSE([AC_LANG_PROGRAM(
 [#include 
diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 433759928b..c6f1b70fd3 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -305,8 +305,6 @@ hostnogssenc  database  user
 
   
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 1545ff9f16..71677b69d8 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -3014,13 +3014,8 @@ PerformRadiusTransaction(const char *server, const char *secret, const char *por
 	int			packetlength;
 	pgsocket	sock;
 
-#ifdef HAVE_IPV6
 	struct sockaddr_in6 localaddr;
 	struct sockaddr_in6 remoteaddr;
-#else
-	struct sockaddr_in localaddr;
-	struct sockaddr_in remoteaddr;
-#endif
 	struct addrinfo hint;
 	struct addrinfo *serveraddrs;
 	int			port;
@@ -3130,18 +3125,12 @@ PerformRadiusTransaction(const char *server, const char *secret, const char *por
 	}
 
 	memset(&localaddr, 0, sizeof(localaddr));
-#ifdef HAVE_IPV6
 	localaddr.sin6_family = serveraddrs[0].ai_family;
 	localaddr.sin6_addr = in6addr_any;
 	if (localaddr.sin6_family == AF_INET6)
 		addrsize = sizeof(struct sockaddr_in6);
 	else
 		ad

Re: shadow variables - pg15 edition

2022-08-17 Thread Justin Pryzby

On Thu, Aug 18, 2022 at 03:17:33PM +1200, David Rowley wrote:
> I'm probably not the only committer to want to run a mile when they
> see someone posting 17 or 26 patches in an email. So maybe "bang for
> buck" is a better method for getting the ball rolling here.  As you
> know, I was recently bitten by local shadows in af7d270dd, so I do
> believe in the cause.
> 
> What do you think?

You already fixed the shadow var introduced in master/pg16, and I sent patches
for the shadow vars added in pg15 (marked as such and presented as 001-008), so
perhaps it's okay to start with that ?

BTW, one of the remaining warnings seems to be another buglet, which I'll write
about at a later date.

-- 
Justin

Re: static libpq (and other libraries) overwritten on aix

2022-08-17 Thread Noah Misch

The AIX sections of Makefile.shlib misuse the terms "static" and "shared".
Imagine s/static library/library ending in .a/ and s/shared library/library
ending in .so/.  That yields an accurate description of the makefile rules.

On Wed, Aug 17, 2022 at 12:01:54PM -0700, Andres Freund wrote:
> Two questions:
> 1) Do we continue building static libraries for libpq etc?

Essentially, we don't build static libpq today, and we should continue not
building it.  (The first-built libpq.a is static, but that file is an
implementation detail of the makefile rules.  The surviving libpq.a is a
normal AIX shared library.)

> 2) Do we care about static libraries not suriving on AIX?

No.

>There could also be
>a race in the buildrules leading to sometimes static libs sometimes shared
>libs winning, I think.

Not since commit e8564ef, to my knowledge.

Along the lines of Robert's comment, it could be a nice code beautification to
use a different suffix for the short-lived .a file.  Perhaps _so_inputs.a.

I found this useful years ago:
https://web.archive.org/web/20151003130212/http://seriousbirder.com/blogs/aix-shared-and-static-libraries-explained/

Re: [PATCH] Clarify the comments about varlena header encoding

2022-08-17 Thread John Naylor

On Thu, Aug 18, 2022 at 1:06 AM Aleksander Alekseev
 wrote:
>
> Hi hackers,
>
> I noticed that the comments regarding bit layouts for varlena headers
> in postgres.h are somewhat misleading. For instance, when reading:

I agree it's confusing, but I don't think this patch is the right direction.

> ```
> 00xx 4-byte length word, aligned, uncompressed data (up to 1G)
> ```
>
> ... one can assume this is a 00xx byte followed by another 4 bytes
> (which is wrong). Also one can read this as "aligned, uncompressed
> data" (which again is wrong).

- * 00xx 4-byte length word, aligned, uncompressed data (up to 1G)
+ * 00xx   , uncompressed data (up to 1G)

Maybe "00xx 4-byte length word (aligned)," is more clear about
what is aligned. Also, adding all those xxx's obscures the point that
we only need to examine one byte to figure out what to do next.

> ```
> 1000 1-byte length word, unaligned, TOAST pointer
> ```
>
> This is misleading too. The comments above this line say that `struct
> varatt_external` is a TOAST pointer. sizeof(varatt_external) = 16,
> plus 1 byte equals 17, right? However the documentation [1] claims the
> result should be 18:

The patch has:

+ * In the third case the va_tag field (see varattrib_1b_e) is used to discern
+ * the specific type and length of the pointer datum. On disk the "xxx" bits
+ * currently always store sizeof(varatt_external) + 2.

...so not sure where 17 came from.

- * 1000 1-byte length word, unaligned, TOAST pointer
+ * 1000 , TOAST pointer (struct varatt_external)

This implies that the header is two bytes, which is not accurate. That
next byte is a type tag:

/* TOAST pointers are a subset of varattrib_1b with an identifying tag byte */
typedef struct
{
uint8 va_header; /* Always 0x80 or 0x01 */
uint8 va_tag; /* Type of datum */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Type-specific data */
} varattrib_1b_e;

...and does not always represent the on-disk length:

/*
 * Type tag for the various sorts of "TOAST pointer" datums.  The peculiar
 * value for VARTAG_ONDISK comes from a requirement for on-disk compatibility
 * with a previous notion that the tag field was the pointer datum's length.
 */
typedef enum vartag_external
{
VARTAG_INDIRECT = 1,
VARTAG_EXPANDED_RO = 2,
VARTAG_EXPANDED_RW = 3,
VARTAG_ONDISK = 18
} vartag_external;

And I don't think the new comments referring to "third case", "first
two cases", etc make it easier to follow.
-- 
John Naylor
EDB: http://www.enterprisedb.com

RE: Perform streaming logical transactions by background workers and parallel apply

2022-08-17 Thread shiy.f...@fujitsu.com

On Wed, Aug 17, 2022 2:28 PM Wang, Wei/王 威  wrote:
> 
> On Tues, August 16, 2022 15:33 PM I wrote:
> > Attach the new patches.
> 
> I found that cfbot has a failure.
> After investigation, I think it is because the worker's exit state is not set
> correctly. So I made some slight modifications.
> 
> Attach the new patches.
> 

Thanks for updating the patch. Here are some comments.

0003 patch
==
1. src/backend/replication/logical/applybgworker.c
+   ereport(ERROR,
+   
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+errmsg("cannot replicate target relation 
\"%s.%s\" using "
+   "subscription parameter 
streaming=parallel",
+   rel->remoterel.nspname, 
rel->remoterel.relname),
+errdetail("The unique column on subscriber is 
not the unique "
+  "column on publisher or 
there is at least one "
+  "non-immutable function."),
+errhint("Please change to use subscription 
parameter "
+"streaming=on.")));

Should we use "%s" instead of "streaming=parallel" and "streaming=on"?

2. src/backend/replication/logical/applybgworker.c
+* If any worker is handling the streaming transaction, this check 
needs to
+* be performed not only using the apply background worker, but also in 
the
+* main apply worker. This is because without these restrictions, main

this check needs to be performed not only using the apply background worker, but
also in the main apply worker.
->
this check not only needs to be performed by apply background worker, but also
by the main apply worker

3. src/backend/replication/logical/relation.c
+   if (ukey)
+   {
+   i = -1;
+   while ((i = bms_next_member(ukey, i)) >= 0)
+   {
+   attnum = AttrNumberGetAttrOffset(i + 
FirstLowInvalidHeapAttributeNumber);
+
+   if (entry->attrmap->attnums[attnum] < 0 ||
+   !bms_is_member(entry->attrmap->attnums[attnum], 
entry->remoterel.attunique))
+   {
+   entry->parallel_apply = PARALLEL_APPLY_UNSAFE;
+   return;
+   }
+   }
+
+   bms_free(ukey);

It looks we need to call bms_free() before return, right?

4. src/backend/replication/logical/relation.c
+   /* We don't need info for dropped or generated attributes */
+   if (att->attisdropped || att->attgenerated)
+   continue;

Would it be better to change the comment to:
We don't check dropped or generated attributes

5. src/test/subscription/t/032_streaming_apply.pl
+$node_publisher->wait_for_catchup($appname);
+
+# Then we check the foreign key on partition table.
+$node_publisher->wait_for_catchup($appname);

Here, wait_for_catchup() is called twice, we can remove the second one.

6. src/backend/replication/logical/applybgworker.c
+   /* If any workers (or the postmaster) have died, we have 
failed. */
+   if (status == APPLY_BGWORKER_EXIT)
+   ereport(ERROR,
+   
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+errmsg("background worker %u failed to 
apply transaction %u",
+   
entry->wstate->shared->worker_id,
+   
entry->wstate->shared->stream_xid)));

Should we change the error message to "apply background worker %u failed to
apply transaction %u" ? To be consistent with the error message in
apply_bgworker_wait_for().

0004 patch
==
1.
I saw that the commit message says:
If the subscriber exits with an error, this flag will be set true, and
whenever the transaction is applied successfully, this flag is reset false.

"subretry" is set to false if a transaction is applied successfully, it looks
similar to what clear_subscription_skip_lsn() does, so maybe we should remove
the following change in apply_handle_stream_abort()? Or only call
set_subscription_retry() when rollbacking the toplevel transaction.

@@ -1671,6 +1688,9 @@ apply_handle_stream_abort(StringInfo s)
 */
serialize_stream_abort(xid, subxid);
}
+
+   /* Reset the retry flag. */
+   set_subscription_retry(false);
}
 
reset_apply_error_context_info();

2. src/backend/replication/logical/worker.c
+   /* Reset subretry */
+   values[Anum_pg_subscription_subretry - 1] = BoolGetDatum(retry);
+   replaces[Anum_pg_subscription_sub

Re: shadow variables - pg15 edition

2022-08-17 Thread Tom Lane

Michael Paquier  writes:
> A lot of the changes proposed here update the code so as the same
> variable gets used across more code paths by removing declarations,
> but we have two variables defined because both are aimed to be used in
> a different context (see AttachPartitionEnsureIndexes() in tablecmds.c
> for example).

> Wouldn't it be a saner approach in a lot of cases to rename the
> shadowed variables (aka the ones getting removed in your patches) and
> keep them local to the code paths where we use them?

Yeah.  I do not think a patch of this sort has any business changing
the scopes of variables.  That moves it out of "cosmetic cleanup"
and into "hm, I wonder if this introduces any bugs".  Most hackers
are going to decide that they have better ways to spend their time
than doing that level of analysis for a very noncritical patch.

regards, tom lane

Re: MultiXact\SLRU buffers configuration

2022-08-17 Thread Andrey Borodin




> On 17 Aug 2022, at 00:36, i.laza...@postgrespro.ru wrote:
> 
>> Andrey Borodin wrote 2022-07-23 11:39:
>> Yura, thank you for your benchmarks!
>> We already knew that patch can save the day on pathological workloads,
>> now we have a proof of this.
>> Also there's the evidence that user can blindly increase size of SLRU
>> if they want (with the second patch). So there's no need for hard
>> explanations on how to tune the buffers size.
> 
> Hi @Andrey.Borodin, With some considerations and performance checks from 
> @Yura.Sokolov we simplified your approach by the following:
> 
> 1. Preamble. We feel free to increase any SLRU's, since there's no 
> performance degradation on large Buffers count using your SLRU buckets 
> solution.
> 2. `slru_buffers_size_scale` is only one config param introduced for all 
> SLRUs. It scales SLRUs upper cap by power 2.
> 3. All SLRU buffers count are capped by both `MBuffers (shared_buffers)` and 
> `slru_buffers_size_scale`. see
> 4. Magic initial constants `NUM_*_BUFFERS << slru_buffers_size_scale` are 
> applied for every SLRU.
> 5. All SLRU buffers are always sized as power of 2, their hash bucket size is 
> always 8.
> 
> There's attached patch for your consideration. It does gather and simplify 
> both `v21-0001-Make-all-SLRU-buffer-sizes-configurable.patch` and 
> `v21-0002-Divide-SLRU-buffers-into-8-associative-banks.patch` to much simpler 
> approach.

I like the idea of one knob instead of one per each SLRU. Maybe we even could 
deduce sane value from NBuffers? That would effectively lead to 0 knobs :)

Your patch have a prefix "v22-0006", does it mean there are 5 previous steps of 
the patchset?

Thank you!


Best regards, Andrey Borodin.

Re: Add last failed connection error message to pg_stat_wal_receiver

2022-08-17 Thread Michael Paquier

On Thu, Aug 04, 2022 at 03:27:11PM +0530, Bharath Rupireddy wrote:
> Good point. The walreceiver can exit for any reason. We can either 1)
> store for all the error messages or 2) think of using sigsetjmp but
> that only catches the ERROR kinds, leaving FATAL and PANIC messages.
> The option (1) is simple but there are problems - we may miss storing
> future error messages, good commenting and reviewing may help here and
> all the error messages now need to be stored in string, which is
> complex. The option (2) seems reasonable but we will miss FATAL and
> PANIC messages (we have many ERRORs, 2 FATALs, 3 PANICs). Maybe a
> combination of option (1) for FATALs and PANICs, and option (2) for
> ERRORs helps.
> 
> Thoughts?

PANIC is not something you'd care about as the system would go down as
and shared memory would be reset (right?) even if restart_on_crash is
enabled.  Perhaps it would help here to use something like a macro to
catch and save the error, in a style similar to what's in hba.c for
example, which is the closest example I can think of, even if on ERROR
we don't really care about the error string anyway as there is nothing
to report back to the SQL views used for the HBA/ident files.

FATAL may prove to be tricky though, because I'd expect the error to
be saved in shared memory in this case.  This is particularly critical
as this takes the WAL receiver process down, actually.

Anyway, outside the potential scope of the proposal, there are more
things that I find strange with the code:
- Why isn't the string reset when the WAL receiver is starting up?
That surely is not OK to keep a past state not referring to what
actually happens with a receiver currently running.
- pg_stat_wal_receiver (system view) reports no rows if pid is NULL,
which would be the state stored in shared memory after a connection.
This means that one would never be able to see last_conn_error except
when calling directly the SQL function pg_stat_get_wal_receiver().

One could say that we should report a row for this view all the time,
but this creates a compatibility breakage: existing application
assuming something like (one row <=> WAL receiver running) could
break.
--
Michael

signature.asc
Description: PGP signature

Re: shadow variables - pg15 edition

2022-08-17 Thread David Rowley

On Thu, 18 Aug 2022 at 02:54, Justin Pryzby  wrote:
> The first half of the patches fix shadow variables newly-introduced in v15
> (including one of my own patches), the rest are fixing the lowest hanging 
> fruit
> of the "short list" from COPT=-Wshadow=compatible-local

I wonder if it's better to fix the "big hitters" first.  The idea
there would be to try to reduce the number of these warnings as
quickly and easily as possible.  If we can get the numbers down fairly
significantly without too much effort, then that should provide us
with a bit more motivation to get rid of the remaining ones.

Here are the warnings grouped by the name of the variable:

$ make -s 2>&1 | grep "warning: declaration of" | grep -oP
"‘([_a-zA-Z]{1}[_a-zA-Z0-9]*)’" | sort | uniq -c
  2 ‘aclresult’
  3 ‘attnum’
  1 ‘cell’
  1 ‘cell__state’
  2 ‘cmp’
  2 ‘command’
  1 ‘constraintOid’
  1 ‘copyTuple’
  1 ‘data’
  1 ‘db’
  1 ‘_do_rethrow’
  1 ‘dpns’
  1 ‘econtext’
  1 ‘entry’
 36 ‘expected’
  1 ‘first’
  1 ‘found_whole_row’
  1 ‘host’
 20 ‘i’
  1 ‘iclause’
  1 ‘idxs’
  1 ‘i_oid’
  4 ‘isnull’
  1 ‘it’
  2 ‘item’
  1 ‘itemno’
  1 ‘j’
  1 ‘jtc’
  1 ‘k’
  1 ‘keyno’
  7 ‘l’
 13 ‘lc’
  4 ‘lc__state’
  1 ‘len’
  1 ‘_local_sigjmp_buf’
  1 ‘name’
  2 ‘now’
  1 ‘owning_tab’
  1 ‘page’
  1 ‘partitionId’
  2 ‘path’
  3 ‘proc’
  1 ‘proclock’
  1 ‘querytree_list’
  1 ‘range’
  1 ‘rel’
  1 ‘relation’
  1 ‘relid’
  1 ‘rightop’
  2 ‘rinfo’
  1 ‘_save_context_stack’
  1 ‘save_errno’
  1 ‘_save_exception_stack’
  1 ‘slot’
  1 ‘sqlca’
  9 ‘startelem’
  1 ‘stmt_list’
  2 ‘str’
  1 ‘subpath’
  1 ‘tbinfo’
  1 ‘ti’
  1 ‘transno’
  1 ‘ttype’
  1 ‘tuple’
  5 ‘val’
  1 ‘value2’
  1 ‘wco’
  1 ‘xid’
  1 ‘xlogfname’

The top 5 by count here account for about half of the warnings, so
maybe is best to start with those? Likely the ones ending in __state
will fix themselves when you fix the variable with the same name
without that suffix.

The attached patch targets fixing the "expected" variable.

$ ./configure --prefix=/home/drowley/pg
CFLAGS="-Wshadow=compatible-local" > /dev/null
$ make clean -s
$ make -j -s 2>&1 | grep "warning: declaration of" | wc -l
153
$ make clean -s
$ patch -p1 < reduce_local_variable_shadow_warnings_in_regress.c.patch
$ make -j -s 2>&1 | grep "warning: declaration of" | wc -l
117

So 36 fewer warnings with the attached.

I'm probably not the only committer to want to run a mile when they
see someone posting 17 or 26 patches in an email. So maybe "bang for
buck" is a better method for getting the ball rolling here.  As you
know, I was recently bitten by local shadows in af7d270dd, so I do
believe in the cause.

What do you think?

David
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index ba3532a51e..6d285255dd 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -56,22 +56,22 @@
 
 #define EXPECT_EQ_U32(result_expr, expected_expr)  \
do { \
-   uint32  result = (result_expr); \
-   uint32  expected = (expected_expr); \
-   if (result != expected) \
+   uint32  actual_result = (result_expr); \
+   uint32  expected_result = (expected_expr); \
+   if (actual_result != expected_result) \
elog(ERROR, \
 "%s yielded %u, expected %s in file \"%s\" 
line %u", \
-#result_expr, result, #expected_expr, 
__FILE__, __LINE__); \
+#result_expr, actual_result, #expected_expr, 
__FILE__, __LINE__); \
} while (0)
 
 #define EXPECT_EQ_U64(result_expr, expected_expr)  \
do { \
-   uint64  result = (result_expr); \
-   uint64  expected = (expected_expr); \
-   if (result != expected) \
+   uint64  actual_result = (result_expr); \
+   uint64  expected_result = (expected_expr); \
+   if (actual_result != expected_result) \
elog(ERROR, \
 "%s yielded " UINT64_FORMAT ", expected %s in 
file \"%s\" line %u", \
-#result_expr, result, #expected_expr, 
__FILE__, __LINE__); \
+#result_expr, actual_result, #expected_expr, 
__FILE__, __LINE__); \
} while (0)
 
 #define LDELIM '('

Re: Handle infinite recursion in logical replication setup

2022-08-17 Thread Amit Kapila

On Wed, Aug 17, 2022 at 12:34 PM Peter Smith  wrote:
>
> On Wed, Aug 17, 2022 at 4:33 PM Amit Kapila  wrote:
> >
> > On Wed, Aug 17, 2022 at 8:48 AM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > > On Tuesday, August 2, 2022 8:00 PM Amit Kapila  
> > > wrote:
> > > > On Tue, Jul 26, 2022 at 9:07 AM Amit Kapila  
> > > > wrote:
> > >
> > > Thanks for the summary.
> > >
> > > I think it's fine to make the user use the copy_data option more 
> > > carefully to
> > > prevent duplicate copies by reporting an ERROR.
> > >
> > > But I also have similar concern with Sawada-san as it's possible for user 
> > > to
> > > receive an ERROR in some unexpected cases.
> > >
> > > For example I want to build bi-directional setup between two nodes:
> > >
> > > Node A: TABLE test (has actual data)
> > > Node B: TABLE test (empty)
> > >
> > > Step 1:
> > > CREATE PUBLICATION on both Node A and B.
> > >
> > > Step 2:
> > > CREATE SUBSCRIPTION on Node A with (copy_data = on)
> > > -- this is fine as there is no data on Node B
> > >
> > > Step 3:
> > > CREATE SUBSCRIPTION on Node B with (copy_data = on)
> > > -- this should be fine as user needs to copy data from Node A to Node B,
> > > -- but we still report an error for this case.
> > >
> > > It looks a bit strict to report an ERROR in this case and it seems not 
> > > easy to
> > > avoid this. So, personally, I think it might be better to document the 
> > > correct
> > > steps to build the bi-directional replication and probably also docuemnt 
> > > the
> > > steps to recover if user accidently did duplicate initial copy if not
> > > documented yet.
> > >
> > > In addition, we could also LOG some additional information about the 
> > > ORIGIN and
> > > initial copy which might help user to analyze if needed.
> > >
> >
> > But why LOG instead of WARNING? I feel in this case there is a chance
> > of inconsistent data so a WARNING like "publication "pub1" could have
> > data from multiple origins" can be given when the user has specified
> > options: "copy_data = on, origin = NONE" while creating a
> > subscription. We give a WARNING during subscription creation when the
> > corresponding publication doesn't exist, eg.
> >
> > postgres=# create subscription sub1 connection 'dbname = postgres'
> > publication pub1;
> > WARNING:  publication "pub1" does not exist in the publisher
> >
> > Then, we can explain in docs how users can avoid data inconsistencies
> > while setting up replication.
> >
>
> I was wondering if this copy/origin case really should be a NOTICE.
>

We usually give NOTICE for some sort of additional implicit
information, e.g., when we create a slot during CREATE SUBSCRIPTION
command: "NOTICE: created replication slot "sub1" on publisher". IMO,
this is likely to be a problem of data inconsistency so I think here
we can choose between WARNING and LOG. I prefer WARNING but okay with
LOG as well if others feel so. I think we can change this later as
well if required. We do have an option to not do anything and just
document it but I feel it is better to give user some indication of
problem here because not everyone reads each update of documentation.

Jonathan, Sawada-San, Hou-San, and others, what do you think is the
best way to move forward here?

-- 
With Regards,
Amit Kapila.

Re: shadow variables - pg15 edition

2022-08-17 Thread Justin Pryzby

On Thu, Aug 18, 2022 at 09:39:02AM +0900, Michael Paquier wrote:
> On Thu, Aug 18, 2022 at 08:49:14AM +1000, Peter Smith wrote:
> > I'd started looking at these [1] last year and spent a day trying to
> > categorise them all in a spreadsheet (shadows a global, shadows a
> > parameter, shadows a local var etc) but I became swamped by the
> > volume, and then other work/life got in the way.
> > 
> > +1 from me.
> 
> A lot of the changes proposed here update the code so as the same
> variable gets used across more code paths by removing declarations,
> but we have two variables defined because both are aimed to be used in
> a different context (see AttachPartitionEnsureIndexes() in tablecmds.c
> for example).

> Wouldn't it be a saner approach in a lot of cases to rename the
> shadowed variables (aka the ones getting removed in your patches) and
> keep them local to the code paths where we use them?

The cases where I removed a declaration are ones where the variable either
hasn't yet been assigned in the outer scope (so it's safe to use first in the
inner scope, since its value is later overwriten in the outer scope).  Or it's
no longer used in the outer scope, so it's safe to re-use it in the inner scope
(as in AttachPartitionEnsureIndexes).  Since you think it's saner, I changed to
rename them.

In the case of "first", the var is used in two independent loops, the same way,
and re-initialized.  In the case of found_whole_row, the var is ignored, as the
comments say, so it would be silly to declare more vars to be additionally
ignored.

-- 
Justin

PS. I hadn't sent the other patches which rename the variables, having assumed
that the discussion would be bikeshedded to death and derail without having
fixed the lowest hanging fruits.  I'm attaching them those now to see what
happens.
>From 97768e5a439bef016e6ebd5221ed148f076c6e3f Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 16 Aug 2022 19:38:57 -0500
Subject: [PATCH 01/26] avoid shadow vars: pg_dump.c: i_oid

backpatch to v15

commit d498e052b4b84ae21b3b68d5b3fda6ead65d1d4d
Author: Robert Haas 
Date:   Fri Jul 8 10:15:19 2022 -0400

Preserve relfilenode of pg_largeobject and its index across pg_upgrade.
---
 src/bin/pg_dump/pg_dump.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index da6605175a0..322947c5609 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3144,7 +3144,6 @@ dumpDatabase(Archive *fout)
 		PQExpBuffer loHorizonQry = createPQExpBuffer();
 		int			i_relfrozenxid,
 	i_relfilenode,
-	i_oid,
 	i_relminmxid;
 
 		/*
-- 
2.17.1

>From ce729535c47d72db775ebcf1f185799c78615148 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 16 Aug 2022 15:55:13 -0500
Subject: [PATCH 02/26] avoid shadow vars: pg_dump.c: tbinfo

backpatch to v15

commit 9895961529ef8ff3fc12b39229f9a93e08bca7b7
Author: Tom Lane 
Date:   Mon Dec 6 13:07:31 2021 -0500

Avoid per-object queries in performance-critical paths in pg_dump.
---
 src/bin/pg_dump/pg_dump.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 322947c5609..5c196d66985 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -7080,21 +7080,21 @@ getConstraints(Archive *fout, TableInfo tblinfo[], int numTables)
 	appendPQExpBufferChar(tbloids, '{');
 	for (int i = 0; i < numTables; i++)
 	{
-		TableInfo  *tbinfo = &tblinfo[i];
+		TableInfo  *mytbinfo = &tblinfo[i];
 
 		/*
 		 * For partitioned tables, foreign keys have no triggers so they must
 		 * be included anyway in case some foreign keys are defined.
 		 */
-		if ((!tbinfo->hastriggers &&
-			 tbinfo->relkind != RELKIND_PARTITIONED_TABLE) ||
-			!(tbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION))
+		if ((!mytbinfo->hastriggers &&
+			 mytbinfo->relkind != RELKIND_PARTITIONED_TABLE) ||
+			!(mytbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION))
 			continue;
 
 		/* OK, we need info for this table */
 		if (tbloids->len > 1)	/* do we have more than the '{'? */
 			appendPQExpBufferChar(tbloids, ',');
-		appendPQExpBuffer(tbloids, "%u", tbinfo->dobj.catId.oid);
+		appendPQExpBuffer(tbloids, "%u", mytbinfo->dobj.catId.oid);
 	}
 	appendPQExpBufferChar(tbloids, '}');
 
-- 
2.17.1

>From 478fa745d4ddc38fe15f54d7d396ebf7a106772b Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 16 Aug 2022 16:22:52 -0500
Subject: [PATCH 03/26] avoid shadow vars: pg_dump.c: owning_tab

backpatch to v15

commit 344d62fb9a978a72cf8347f0369b9ee643fd0b31
Author: Peter Eisentraut 
Date:   Thu Apr 7 16:13:23 2022 +0200

Unlogged sequences
---
 src/bin/pg_dump/pg_dump.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 5c196d66985..4b5d8df1e4e 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -16799,21 +16799,21 @@ dumpSequence(Archive *fout, const Ta

Re: fix typos

2022-08-17 Thread John Naylor

On Tue, Aug 16, 2022 at 8:48 AM John Naylor
 wrote:
>
> On Fri, Aug 12, 2022 at 8:55 PM Tom Lane  wrote:
> >
> > John Naylor  writes:
> > > This is really a straw-man proposal, since I'm not volunteering to do
> > > the work, or suggest anybody else should do the same. That being the
> > > case, it seems we should just go ahead with Justin's patch for
> > > consistency. Possibly we could also change the messages to say "ID"?
> >
> > I'd be content if we change the user-facing messages (and documentation
> > if any) to say "ID" not "OID".
>
> The documentation has both, so it makes sense to standardize on "ID".
> The messages all had oid/OID, which I changed in the attached. I think
> I got all the places.
>
> I'm thinking it's not wrong enough to confuse people, but consistency
> is good, so backpatch to v15 and no further. Does anyone want to make
> a case otherwise?

This is done.

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: POC: GROUP BY optimization

2022-08-17 Thread David Rowley

On Thu, 18 Aug 2022 at 02:46, Tomas Vondra
 wrote:
> So I don't think the current costing is wrong, but it certainly is more
> complex. But the test does not test what it intended - I have two ideas
> how to make it work:
>
> 1) increase the number of rows in the table
>
> 2) increase cpu_operator_cost (for that one test?)
>
> 3) tweak the costing somehow, to increase the cost a bit

Why not, 4) SET parallel_setup_cost = 0;  there are plenty of other
places we do just that so we get a parallel plan without having to
generate enough cost to drown out the parallel worker startup cost.

Here are a couple of patches to demo the idea.

David
diff --git a/src/test/regress/expected/partition_aggregate.out 
b/src/test/regress/expected/partition_aggregate.out
index 4b41ccf1aa..db36e3a150 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -942,6 +942,7 @@ INSERT INTO pagg_tab_ml SELECT i % 30, i % 10, to_char(i % 
4, 'FM') FROM gen
 ANALYZE pagg_tab_ml;
 -- For Parallel Append
 SET max_parallel_workers_per_gather TO 2;
+SET parallel_setup_cost = 0;
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
 -- PARTITION KEY, but still we do not see a partial aggregation as array_agg()
@@ -1025,6 +1026,7 @@ SELECT a, sum(b), array_agg(distinct c), count(*) FROM 
pagg_tab_ml GROUP BY a HA
->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
 (25 rows)
 
+RESET parallel_setup_cost;
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
 -- PARTITION KEY, thus we will have a partial aggregation for them.
diff --git a/src/test/regress/sql/partition_aggregate.sql 
b/src/test/regress/sql/partition_aggregate.sql
index c17294b15b..ab070fee24 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -222,6 +222,7 @@ ANALYZE pagg_tab_ml;
 
 -- For Parallel Append
 SET max_parallel_workers_per_gather TO 2;
+SET parallel_setup_cost = 0;
 
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
@@ -235,6 +236,8 @@ SELECT a, sum(b), array_agg(distinct c), count(*) FROM 
pagg_tab_ml GROUP BY a HA
 EXPLAIN (COSTS OFF)
 SELECT a, sum(b), array_agg(distinct c), count(*) FROM pagg_tab_ml GROUP BY a 
HAVING avg(b) < 3;
 
+RESET parallel_setup_cost;
+
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
 -- PARTITION KEY, thus we will have a partial aggregation for them.
diff --git a/src/test/regress/expected/partition_aggregate.out 
b/src/test/regress/expected/partition_aggregate.out
index a08a3825ff..0dc6d63347 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -942,40 +942,43 @@ INSERT INTO pagg_tab_ml SELECT i % 30, i % 10, to_char(i 
% 4, 'FM') FROM gen
 ANALYZE pagg_tab_ml;
 -- For Parallel Append
 SET max_parallel_workers_per_gather TO 2;
+SET parallel_setup_cost = 0;
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
 -- PARTITION KEY, but still we do not see a partial aggregation as array_agg()
 -- is not partial agg safe.
 EXPLAIN (COSTS OFF)
 SELECT a, sum(b), array_agg(distinct c), count(*) FROM pagg_tab_ml GROUP BY a 
HAVING avg(b) < 3 ORDER BY 1, 2, 3;
-  QUERY PLAN   
   
---
- Sort
-   Sort Key: pagg_tab_ml.a, (sum(pagg_tab_ml.b)), (array_agg(DISTINCT 
pagg_tab_ml.c))
-   ->  Append
- ->  GroupAggregate
-   Group Key: pagg_tab_ml.a
-   Filter: (avg(pagg_tab_ml.b) < '3'::numeric)
-   ->  Sort
- Sort Key: pagg_tab_ml.a
- ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
- ->  GroupAggregate
-   Group Key: pagg_tab_ml_2.a
-   Filter: (avg(pagg_tab_ml_2.b) < '3'::numeric)
-   ->  Sort
- Sort Key: pagg_tab_ml_2.a
- ->  Append
-   ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
-   ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
- ->  GroupAggregate
-   Group Key: pagg_tab_ml_5.a
-   Filter: (avg(pagg_tab_ml_5.b) < '3'::numeric)
-   ->  Sort
- Sort Key: pagg_tab_ml_5.a
- ->  Append
-   ->  Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_5
-

Re: Assertion failure on PG15 with modified test_shm_mq test

2022-08-17 Thread Andres Freund

Hi,

On 2022-08-17 15:02:28 +0900, Michael Paquier wrote:
> On Wed, Aug 17, 2022 at 11:15:28AM +0530, Pavan Deolasee wrote:
> > I notice that pgstat_shutdown_hook() is registered as a before-shmem-exit
> > callback. The callback is responsible for detaching from the pgstat shared
> > memory segment. But looks like other parts of the system still expect it to
> > be available during later stages of proc exit.
> > 
> > It's not clear to me if pgstat shutdown should happen later or code that
> > gets executed later in the cycle should not try to use pgstat. It's also
> > entirely possible that my usage of SharedFileSet is completely wrong. If
> > that's the case, please let me know the mistake in the usage.
> 
> That's visibly an issue with shared memory and the stats.  I have
> added an open item.  Andres?

I don't think there's anything reasonably done about this for 15, as explained
upthread. We need a big redesign of the shutdown sequence at some point, but
...

Greetings,

Andres Freund

Re: shadow variables - pg15 edition

2022-08-17 Thread Michael Paquier

On Thu, Aug 18, 2022 at 08:49:14AM +1000, Peter Smith wrote:
> I'd started looking at these [1] last year and spent a day trying to
> categorise them all in a spreadsheet (shadows a global, shadows a
> parameter, shadows a local var etc) but I became swamped by the
> volume, and then other work/life got in the way.
> 
> +1 from me.

A lot of the changes proposed here update the code so as the same
variable gets used across more code paths by removing declarations,
but we have two variables defined because both are aimed to be used in
a different context (see AttachPartitionEnsureIndexes() in tablecmds.c
for example).

Wouldn't it be a saner approach in a lot of cases to rename the
shadowed variables (aka the ones getting removed in your patches) and
keep them local to the code paths where we use them?
--
Michael

signature.asc
Description: PGP signature

Re: Assertion failure on PG15 with modified test_shm_mq test

2022-08-17 Thread Andres Freund

Hi,

On 2022-08-17 11:15:28 +0530, Pavan Deolasee wrote:
> I've a slightly modified version of test_shm_mq, that I changed to include
> a shared fileset.  The motivation to do that came because I hit an
> assertion failure with PG15 while doing some development work on BDR and I
> suspected it to be a PG15 bug.

> I notice that pgstat_shutdown_hook() is registered as a before-shmem-exit
> callback. The callback is responsible for detaching from the pgstat shared
> memory segment. But looks like other parts of the system still expect it to
> be available during later stages of proc exit.

> It's not clear to me if pgstat shutdown should happen later or code that
> gets executed later in the cycle should not try to use pgstat. It's also
> entirely possible that my usage of SharedFileSet is completely wrong. If
> that's the case, please let me know the mistake in the usage.

I don't think we have the infrastructure for a nice solution to this at the
moment - we need a fairly large overhaul of process initialization / shutdown
to handle these interdependencies nicely.

We can't move pgstat shutdown into on_dsm callback because that's too late to
allocate *new* dsm segments, which we might need to do while flushing
out pending stats.

See 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=fa91d4c91f28f4819dc54f93adbd413a685e366a
for a way to avoid the problem.

Greetings,

Andres Freund

Re: Add proper planner support for ORDER BY / DISTINCT aggregates

2022-08-17 Thread David Rowley

On Wed, 17 Aug 2022 at 13:57, Justin Pryzby  wrote:
> But in a few places, it removes the locally-computed group_pathkeys:
>
> -   List   *group_pathkeys = root->group_pathkeys;

> I noticed because that creates a new shadow variable, which seems accidental.

Thanks for the report.

I've just pushed a fix for this that basically just removes the line
you quoted.  Really I should have been using the version of
group_pathkeys that stripped off the pathkeys from the ORDER BY /
DISTINCT aggregates that is calculated earlier in that function.   In
practice, there was no actual bug here as the wrong variable was only
being used in the code path that was handling partial paths.  We never
create any partial paths when there are aggregates with ORDER BY /
DISTINCT  clauses, so in that code path, the two versions of the
group_pathkeys variable would have always been set to the same thing.

It makes sense just to get rid of the shadowed variable since the
value of it will be the same anyway.

David

Re: shared-memory based stats collector - v70

2022-08-17 Thread Andres Freund

Hi,

On 2022-08-17 15:46:42 -0400, Greg Stark wrote:
> Isn't there also a local hash table used to find the entries to reduce
> traffic on the shared hash table? Even if you don't take a snapshot
> does it get entered there? There are definitely still parts of this
> I'm working on a pretty vague understanding of :/

Yes, there is. But it's more about code that generates stats, rather than
reporting functions. While there's backend local pending stats we need to have
a refcount on the shared stats item so that the stats item can't be dropped
and then revived when those local stats are flushed.

Relevant comments from pgstat.c:

 * To avoid contention on the shared hashtable, each backend has a
 * backend-local hashtable (pgStatEntryRefHash) in front of the shared
 * hashtable, containing references (PgStat_EntryRef) to shared hashtable
 * entries. The shared hashtable only needs to be accessed when no prior
 * reference is found in the local hashtable. Besides pointing to the
 * shared hashtable entry (PgStatShared_HashEntry) PgStat_EntryRef also
 * contains a pointer to the shared statistics data, as a process-local
 * address, to reduce access costs.
 *
 * The names for structs stored in shared memory are prefixed with
 * PgStatShared instead of PgStat. Each stats entry in shared memory is
 * protected by a dedicated lwlock.
 *
 * Most stats updates are first accumulated locally in each process as pending
 * entries, then later flushed to shared memory (just after commit, or by
 * idle-timeout). This practically eliminates contention on individual stats
 * entries. For most kinds of variable-numbered pending stats data is stored
 * in PgStat_EntryRef->pending. All entries with pending data are in the
 * pgStatPending list. Pending statistics updates are flushed out by
 * pgstat_report_stat().
 *

pgstat_internal.h has more details about the refcount aspect:

 * Per-object statistics are stored in the "shared stats" hashtable. That
 * table's entries (PgStatShared_HashEntry) contain a pointer to the actual 
stats
 * data for the object (the size of the stats data varies depending on the
 * kind of stats). The table is keyed by PgStat_HashKey.
 *
 * Once a backend has a reference to a shared stats entry, it increments the
 * entry's refcount. Even after stats data is dropped (e.g., due to a DROP
 * TABLE), the entry itself can only be deleted once all references have been
 * released.

Greetings,

Andres Freund

Re: s390x builds on buildfarm

2022-08-17 Thread Andres Freund

Hi,

On 2022-08-10 13:04:40 +, Vivian Kong wrote:
> Are builds being paused on s390x as it looks like the s390x builds were last
> run 15 days ago.  If so, wondering what is the reason for the pause and what
> is required to resume the builds?  The OS the builds were running on seems
> to have reached end of life.  Please let me know if we can help with getting
> them updated and resume the builds.

I realize the question below is likely not your department, but perhaps you
could refer us to the right people?

Does IBM provide any AIX instances to open source projects? We have access to
some via the gcc compile farm, but they're a bit outdated, often very
overloaded, and seem to have some other issues (system perl segfaulting etc).

Greetings,

Andres Freund

Re: TAP output format in pg_regress

2022-08-17 Thread Andres Freund

Hi,

On 2022-08-17 23:04:20 +0200, Daniel Gustafsson wrote:
> Attached is a new version of the pg_regress TAP patch which - per reviewer
> feedback - does away with dual output formats and just converts the existing
> output to be TAP complaint.

Cool.


> while still be TAP compliant enough that running it with prove (with a tiny
> Perl wrapper) works.

Wonder if we could utilize that for making failures of 002_pg_upgrade.pl and
027_stream_regress.pl easier to understand, by reporting pg_regress' steps as
part of the outer test. But that'd probably be at least somewhat hard, due to
the embedded test count etc.


> This version also strips away the verbose output which these days isn't
> terribly useful and mainly consume vertical space.

Yay.


> Another feature of the patch is to switch error logging to using the common
> frontend logging rather than pg_regress rolling its own.  The output from
> pg_log_error is emitted verbatim by prove as it's on stderr.

Sounds good.


> I based the support on the original TAP specification and not the new v13 or
> v14 since it seemed unclear how well those are supported in testrunners.  If
> v14 is adopted by testrunners there are some better functionality for 
> reporting
> errors that we could use, but for how this version seems a safer option.

Yep, that makes sense.

> A normal "make check" with this patch applied now looks like this:
>
>
> +++ regress check in src/test/regress +++
> # running on port 61696 with PID 57910
> ok 1 - test_setup   326 ms
> ok 2 - tablespace   401 ms
> # parallel group (20 tests):  varchar char oid pg_lsn txid int4 regproc money 
> int2 uuid float4 text name boolean int8 enum float8 bit numeric rangetypes
> ok 3 - boolean  129 ms
> ok 4 - char  73 ms
> ok 5 - name 117 ms
> ok 6 - varchar   68 ms
> <...snip...>
> ok 210 - memoize137 ms
> ok 211 - stats  851 ms
> # parallel group (2 tests):  event_trigger oidjoins
> ok 212 - event_trigger  138 ms
> not ok 213 - oidjoins   190 ms
> ok 214 - fast_default   149 ms
> 1..214
> # 1 of 214 tests failed.
> # The differences that caused some tests to fail can be viewed in the
> # file "//src/test/regress/regression.diffs".  A copy of the test 
> summary that you see
> # above is saved in the file "//src/test/regress/regression.out".

I'm happy with that compared to our current output.


> The ending comment isn't currently shown when executed via prove as it has to
> go to STDERR for that to work, and I'm not sure that's something we want to do
> in the general case.  I don't expect running the pg_regress tests via prove is
> going to be the common case.  How does the meson TAP ingestion handle
> stdout/stderr?

It'll parse stdout for tap output and log stderr output separately.


> There is a slight reduction in information, for example around tests run
> serially vs in a parallel group.  This could perhaps be handled by marking the
> test name in some way like "tablespace (serial) or prefixing with a symbol or
> somerthing.  I can take a stab at that in case we think that level of detail 
> is
> important to preserve.

I think we could just indent the test "description" for tests in parallel
groups:

I.e. instead of

ok 1 - test_setup   326 ms
ok 2 - tablespace   401 ms
# parallel group (20 tests):  varchar char oid pg_lsn txid int4 regproc money 
int2 uuid float4 text name boolean int8 enum float8 bit numeric rangetypes
ok 3 - boolean  129 ms
ok 4 - char  73 ms
ok 5 - name 117 ms
ok 6 - varchar   68 ms

do

ok 1 - test_setup   326 ms
ok 2 - tablespace   401 ms
# parallel group (20 tests):  varchar char oid pg_lsn txid int4 regproc money 
int2 uuid float4 text name boolean int8 enum float8 bit numeric rangetypes
ok 3 - boolean  129 ms
ok 4 - char  73 ms
ok 5 - name 117 ms
ok 6 - varchar   68 ms

that'd make it sufficiently clear, and is a bit more similar to the current
format?


I wonder if we should indent the test name so that the number doesn't cause
wrapping? And perhaps we can skip the - in that case?

I.e. instead of:

ok 9 - name 117 ms
ok 10 - varchar  68 ms
..
ok 99 - something60 ms
ok 100 - memoize137 ms

something like:

ok 9name117 ms
ok 10   varchar  68 ms
...
ok 99   something60 ms
ok 100  memoize 137 ms


with parallel tests:

ok 9name

Re: shadow variables - pg15 edition

2022-08-17 Thread Peter Smith

On Thu, Aug 18, 2022 at 12:54 AM Justin Pryzby  wrote:
>
> There's been no progress on this in the past discussions.
>
> https://www.postgresql.org/message-id/flat/877k1psmpf.fsf%40mailbox.samurai.com
> https://www.postgresql.org/message-id/flat/CAApHDvpqBR7u9yzW4yggjG%3DQfN%3DFZsc8Wo2ckokpQtif-%2BiQ2A%40mail.gmail.com#2d900bfe18fce17f97ec1f00800c8e27
> https://www.postgresql.org/message-id/flat/MN2PR18MB2927F7B5F690065E1194B258E35D0%40MN2PR18MB2927.namprd18.prod.outlook.com
>
> But an unfortunate consequence of not fixing the historic issues is that it
> precludes the possibility that anyone could be expected to notice if they
> introduce more instances of the same problem (as in the first half of these
> patches).  Then the hole which has already been dug becomes deeper, further
> increasing the burden of fixing the historic issues before being able to use
> -Wshadow.
>
> The first half of the patches fix shadow variables newly-introduced in v15
> (including one of my own patches), the rest are fixing the lowest hanging 
> fruit
> of the "short list" from COPT=-Wshadow=compatible-local
>
> I can't see that any of these are bugs, but it seems like a good goal to move
> towards allowing use of the -Wshadow* options to help avoid future errors, as
> well as cleanliness and readability (rather than allowing it to get harder to
> use -Wshadow).
>

Hey, thanks for picking this up!

I'd started looking at these [1] last year and spent a day trying to
categorise them all in a spreadsheet (shadows a global, shadows a
parameter, shadows a local var etc) but I became swamped by the
volume, and then other work/life got in the way.

+1 from me.

--
[1] 
https://www.postgresql.org/message-id/flat/CAHut%2BPuv4LaQKVQSErtV_%3D3MezUdpipVOMt7tJ3fXHxt_YK-Zw%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Propose a new function - list_is_empty

2022-08-17 Thread Peter Smith

On Thu, Aug 18, 2022 at 1:14 AM Tom Lane  wrote:
>
> I wrote:
> > I'd drop the parens in these particular examples because they are
> > inconsistent with the other parts of the same "if" condition.
>
> After going through the patch I removed most but not all of the
> newly-added parens on those grounds.  I also found a couple more
> spots that could be converted.  Pushed with those changes.
>

Thanks for pushing.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: Add support for DEFAULT specification in COPY FROM

2022-08-17 Thread Andrew Dunstan



On 2022-08-17 We 17:12, Israel Barth Rubio wrote:
>
>
> Does that address your concerns?
>
> I am attaching the new patch, containing the above test in the regress
> suite.


Thanks, yes, that all looks sane.


Please add this to the next CommitFest if you haven't already done so.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [RFC] building postgres with meson - v11

2022-08-17 Thread Andres Freund

Hi,

On 2022-08-17 15:50:23 +0200, Peter Eisentraut wrote:
> - There are various references to "pch" (pre-compiled headers).  Is
>   there more discussion anywhere about this?  I don't know what this
>   would entail or whether there are any drawbacks to be aware of.  The
>   new *_pch.h files don't have any comments.  Maybe this should be a
>   separate patch later.

It's mainly to make windows builds a bit slower. I've no objection to
separating this out.

> - About relativize_shared_library_references: We have had several
>   patches over the years for working around SIP stuff, and some of
>   them did essentially this, but we decided not to go ahead with them.
>   We could revisit that, but it should be a separate patch, not mixed
>   in with this.

The prior approaches all had issues because they didn't support relative
references IIRC (and thus broke being able to relocate the installation),
which this does.

I just found it very annoying to work on macs without this. And there were at
least two "bug" reports of testers of the meson branch that were just due to
SIP.

I'm ok with splitting it out, but I also think it's a lower risk opportunity
to test that this works.

> - postgresql-extension.pc: Similarly, this ought to be a separate
>   patch.  If we want people to use this, we'll need it in the makefile
>   build system anyway.

Makes sense. I'd like to keep it in the same patch for a short while longer,
to deduplicate some of the code, but then will split it out.

> - -DFRONTEND is used somewhat differently from the makefiles.  For
>example, meson sets -DFRONTEND for pg_controldata, but the
>makefiles don't.  Conversely, the makefiles set -DFRONTEND for
>ecpglib, but meson does not.  This should be checked again to make
>sure it all matches up.

Yes, should sync that up.

FWIW, meson does add -DFRONTEND for ecpglib. There were a few places that did
add it twice, I'll push a cleanup of that in a bit.

> - Option name spelling should be make consistent about underscores
>   versus hyphens.  Built-in meson options use underscores, so we
>   should make the user-defined ones like that as well (some already
>   do).  (wal-blocksize krb-srvnam system-tzdata tap-tests bsd-auth)

No objection.

> - I have found the variable name "cdata" for configuration_data() to
>   be less than clear.  I see some GNOME projects to it that way, is
>   that where it's from?  systemd uses "conf", maybe that's better.

I don't know where it's from - I don't think I ever looked at gnome
buildsystem stuff. It seems to be the obvious abbreviation for
configuration_data()...  I don't object to conf, but it's not a clear
improvement to me.

> - In the top-level meson.build, the "renaming" of the Windows system
>   name
>
>   host_system = host_machine.system() == 'windows' ? 'win32' :
> host_machine.system()
>   build_system = build_machine.system() == 'windows' ? 'win32' :
> build_machine.system()
>
>   seems unnecessary to me.  Why not stick with the provided names?

Because right now we also use it for things like choosing the "source" for
pg_config_os.h (i.e. include/port/{darwin,linux,win32,..}.h). And it seemed
easier to just have one variable name for all of it.

> - The c99_test ought to be not needed if the c_std project option is
>   used.  Was there a problem with that?

We don't want to force -std=c99 when not necessary, I think. We sometimes use
features from newer (and from gnu) language versions after testing
availability, and if we hardcode the version those will either fail or elicit
warnings.

> - Is there a way to split up the top-level meson.build somehow?  Maybe
>   just throw some stuff into included files?  This might get out of
>   hand at some point.

We can put stuff into config/meson.build or such. But I don't think it's
clearly warranted at this point.

> - The PG_SYSROOT detection gives different results.  On my system,
>   configure produces
>
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk,
>   meson produces
>
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk.
>   src/template/darwin goes out of its way to get a version-specific
>   result, so we need to carry that over somehow.  (The difference does
>   result in differences in the built binaries.)

TBH, I don't really understand the SYSROOT stuff all that well, never having
used a mac in anger (well, only in anger, but ...).

What do you think about extracting the relevant portion of src/template/darwin
into a dedicated shell script that gets called by both?

> Then, some patches from me:
>
> 0001-Change-shared-library-installation-naming-on-macOS.patch
>
> This changes the makefiles to make the shared library file naming on
> macOS match what meson produces.  I don't know what the situation is
> on other platforms.

No opinion on the matter. Seems best to apply separately if we want to?

> 0002-meson-Fix-insta

Re: Add support for DEFAULT specification in COPY FROM

2022-08-17 Thread Israel Barth Rubio

Hello Andrew,

Thanks for reviewing this patch.

It is worth noting that DEFAULT will only take place if explicitly
specified, meaning there is
no default value for the option DEFAULT. The usage of \D in the tests was
only a suggestion.
Also, NULL marker will be an unquoted empty string by default in CSV mode.

In any case I have manually tested it and the behavior is compliant to what
we see in NULL
if it is defined to use \N both in text and CSV modes.

- NULL as \N:

postgres=# CREATE TEMP TABLE copy_null (id integer primary key, value text);
CREATE TABLE
postgres=# copy copy_null from stdin with (format text, NULL '\N');
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1 \N
>> 2 \\N
>> 3 "\N"
>> \.
COPY 3
postgres=# TABLE copy_null ;
 id | value
+---
  1 |
  2 | \N
  3 | "N"
(3 rows)

postgres=# TRUNCATE copy_null ;
TRUNCATE TABLE
postgres=# copy copy_null from stdin with (format csv, NULL '\N');
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1,\N
>> 2,\\N
>> 3,"\N"
>> \.
COPY 3
postgres=# TABLE copy_null ;
 id | value
+---
  1 |
  2 | \\N
  3 | \N
(3 rows)

- DEFAULT as \D:

postgres=# CREATE TEMP TABLE copy_default (id integer primary key, value
text default 'test');
CREATE TABLE
postgres=# copy copy_default from stdin with (format text, DEFAULT '\D');
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1 \D
>> 2 \\D
>> 3 "\D"
>> \.
COPY 3
postgres=# TABLE copy_default ;
 id | value
+---
  1 | test
  2 | \D
  3 | "D"
(3 rows)

postgres=# TRUNCATE copy_default ;
TRUNCATE TABLE
postgres=# copy copy_default from stdin with (format csv, DEFAULT '\D');
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1,\D
>> 2,\\D
>> 3,"\D"
>> \.
COPY 3
postgres=# TABLE copy_default ;
 id | value
+---
  1 | test
  2 | \\D
  3 | \D
(3 rows)

If you do not specify DEFAULT in COPY FROM, it will have no default value
for
that option. So, if you try to load \D in CSV mode, then it will load the
literal value:

postgres=# CREATE TEMP TABLE copy (id integer primary key, value text
default 'test');
CREATE TABLE
postgres=# copy copy from stdin with (format csv);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1,\D
>> 2,\\D
>> 3,"\D"
>> \.
COPY 3
postgres=# TABLE copy ;
 id | value
+---
  1 | \D
  2 | \\D
  3 | \D
(3 rows)


Does that address your concerns?

I am attaching the new patch, containing the above test in the regress
suite.

Best regards,
Israel.

Em ter., 16 de ago. de 2022 às 17:27, Andrew Dunstan 
escreveu:

>
> On 2022-08-16 Tu 14:12, Israel Barth Rubio wrote:
> > Hello all,
> >
> > With the current implementation of COPY FROM in PostgreSQL we are able to
> > load the DEFAULT value/expression of a column if the column is absent
> > in the
> > list of specified columns. We are not able to explicitly ask that
> > PostgreSQL uses
> > the DEFAULT value/expression in a column that is being fetched from
> > the input
> > file, though.
> >
> > This patch adds support for handling DEFAULT values in COPY FROM. It
> > works
> > similarly to NULL in COPY FROM: whenever the marker that was set for
> > DEFAULT
> > value/expression is read from the input stream, it will evaluate the
> > DEFAULT
> > value/expression of the corresponding column.
> >
> > I'm currently working as a support engineer, and both me and some
> > customers had
> > already faced a situation where we missed an implementation like this
> > in COPY
> > FROM, and had to work around that by using an input file where the
> > column which
> > has a DEFAULT value/expression was removed.
> >
> > That does not solve all issues though, as it might be the case that we
> > just want a
> > DEFAULT value to take place if no other value was set for the column
> > in the input
> > file, meaning we would like to have a column in the input file that
> > sometimes assume
> > the DEFAULT value/expression, and sometimes assume an actual given value.
> >
> > The implementation was performed about one month ago and included all
> > regression
> > tests regarding the changes that were introduced. It was just rebased
> > on top of the
> > master branch before submitting this patch, and all tests are still
> > succeeding.
> >
> > The implementation takes advantage of the logic that was already
> > implemented to
> > handle DEFAULT values for missing columns in COPY FROM. I just
> > modified it to
> > make it available the DEFAULT values/expressions for all columns
> > instead of only
> > for the ones that were missing in the specification. I had to change
> > the variables
> > accordingly, so it would index the correct positions in the new array
> > of DEFAULT
> > values/expressions

Re: TAP output format in pg_regress

2022-08-17 Thread Daniel Gustafsson

Attached is a new version of the pg_regress TAP patch which - per reviewer
feedback - does away with dual output formats and just converts the existing
output to be TAP complaint.  The idea was to keep it fairly human readable
while still be TAP compliant enough that running it with prove (with a tiny
Perl wrapper) works.  This version also strips away the verbose output which
these days isn't terribly useful and mainly consume vertical space.  Another
feature of the patch is to switch error logging to using the common frontend
logging rather than pg_regress rolling its own.  The output from pg_log_error
is emitted verbatim by prove as it's on stderr.

I based the support on the original TAP specification and not the new v13 or
v14 since it seemed unclear how well those are supported in testrunners.  If
v14 is adopted by testrunners there are some better functionality for reporting
errors that we could use, but for how this version seems a safer option.

A normal "make check" with this patch applied now looks like this:


+++ regress check in src/test/regress +++
# running on port 61696 with PID 57910
ok 1 - test_setup   326 ms
ok 2 - tablespace   401 ms
# parallel group (20 tests):  varchar char oid pg_lsn txid int4 regproc money 
int2 uuid float4 text name boolean int8 enum float8 bit numeric rangetypes
ok 3 - boolean  129 ms
ok 4 - char  73 ms
ok 5 - name 117 ms
ok 6 - varchar   68 ms
<...snip...>
ok 210 - memoize137 ms
ok 211 - stats  851 ms
# parallel group (2 tests):  event_trigger oidjoins
ok 212 - event_trigger  138 ms
not ok 213 - oidjoins   190 ms
ok 214 - fast_default   149 ms
1..214
# 1 of 214 tests failed.
# The differences that caused some tests to fail can be viewed in the
# file "//src/test/regress/regression.diffs".  A copy of the test summary 
that you see
# above is saved in the file "//src/test/regress/regression.out".


The ending comment isn't currently shown when executed via prove as it has to
go to STDERR for that to work, and I'm not sure that's something we want to do
in the general case.  I don't expect running the pg_regress tests via prove is
going to be the common case.  How does the meson TAP ingestion handle
stdout/stderr?

And for the sake of completeness, even though we all know this by heart, a
similar run from the current output format looks like:


+++ regress check in src/test/regress +++
== creating temporary instance==
== initializing database system   ==
== starting postmaster==
running on port 61696 with PID 61955
== creating database "regression" ==
CREATE DATABASE
ALTER DATABASE
ALTER DATABASE
ALTER DATABASE
ALTER DATABASE
ALTER DATABASE
ALTER DATABASE
== running regression test queries==
test test_setup   ... ok  469 ms
test tablespace   ... ok  415 ms
parallel group (20 tests):  varchar char oid name int2 pg_lsn int4 txid text 
uuid regproc boolean money float4 int8 float8 bit enum numeric rangetypes
 boolean  ... ok  105 ms
 char ... ok   64 ms
 name ... ok   70 ms
 varchar  ... ok   55 ms
<...snip...>
 memoize  ... ok  149 ms
 stats... ok  873 ms
parallel group (2 tests):  event_trigger oidjoins
 event_trigger... FAILED  142 ms
 oidjoins ... FAILED  208 ms
test fast_default ... ok  172 ms
== shutting down postmaster   ==


 2 of 214 tests failed.


The differences that caused some tests to fail can be viewed in the
file "//src/test/regress/regression.diffs".  A copy of the test summary 
that you see
above is saved in the file "//src/test/regress/regression.out".


There is a slight reduction in information, for example around tests run
serially vs in a parallel group.  This could perhaps be handled by marking the
test name in some way like "tablespace (serial) or prefixing with a symbol or
somerthing.  I can take a stab at that in case we think that level of detail is
important to preserve.

--
Daniel Gustafsson   https://vmware.com/



v6-0001-Make-pg_regress-output-format-TAP-compliant.patch
Description: Binary data

Re: RFC: Moving specific attributes from pg_attribute column into attoptions

2022-08-17 Thread Tom Lane

Nikita Malakhov  writes:
> We already had some thoughts on storing, let's call them "optional"
> attributes into 'attoptions' instead of extending the PG_ATTRIBUTE
> table, and here came feedback from Andres Freund with a remark that
> we're increasing the largest catalog table. So we decided to propose
> moving these "optional" attributes from being the PG_ATTRIBUTE column to
> be the part of 'attoptions' column of this table.

This smells very much like what was done eons ago to create the
pg_attrdef catalog.  I don't have any concrete comments to make,
only to suggest that that's an instructive parallel case.  One
thing that comes to mind immediately is whether this stuff could
be unified with pg_attrdef instead of creating Yet Another catalog
that has to be consulted on the way to getting any real work done.

I think that pg_attrdef was originally separated to keep large
default expressions from overrunning the maximum tuple size,
a motivation that disappeared once we could TOAST system tables.
However, nowadays it's still useful for it to be separate because
it simplifies representation of dependencies of default expressions
(pg_depend refers to OIDs of pg_attrdef entries for that).
If we're thinking of moving anything that would need dependency
management then it might need its own catalog, maybe?

On the whole I'm not convinced that what you suggest will be a
net win.  pg_attrdef wins to the extent that there are a lot of
columns with no non-null default and hence no need for any pg_attrdef
entry.  But the minute you move something that most tables need, like
attcompression, you'll just have another bloated catalog to deal with.

> Also, we suggest that options stored in 'attoptions' column could be packed
> as JSON values.

Please, no.  Use of JSON in a SQL database pretty much always
represents a failure to think hard enough about what you need
to store.  Sometimes it's not worth thinking all that hard;
but I strenuously oppose applying that sort of standard in
the system catalogs.

regards, tom lane

plpython causes assertions with python debug build

2022-08-17 Thread Yura Sokolov

Hello, hackers.

I was investigating Valgrind issues with plpython. It turns out
python itself doesn't play well with Valgrind in default build.

Therefore I built python with valgrind related flags
--with-valgrind --without-pymalloc
and added debug flags just to be sure
--with-pydebug --with-assertions

It causes plpython's tests to fail on internal python's
assertions.
Example backtrace (python version 3.7, postgresql master branch):

#8  0x7fbf02851662 in __GI___assert_fail "!PyErr_Occurred()"
at assert.c:101
#9  0x7fbef9060d31 in _PyType_Lookup
at Objects/typeobject.c:3117
 
#10 0x7fbef90461be in _PyObject_GenericGetAttrWithDict
at Objects/object.c:1231
#11 0x7fbef9046707 in PyObject_GenericGetAttr
at Objects/object.c:1309
 
#12 0x7fbef9043cdf in PyObject_GetAttr
at Objects/object.c:913
#13 0x7fbef90458d9 in PyObject_GetAttrString
at Objects/object.c:818
#14 0x7fbf02499636 in get_string_attr
at plpy_elog.c:569
#15 0x7fbf02498ea5 in PLy_get_error_data
at plpy_elog.c:420
#16 0x7fbf0249763b in PLy_elog_impl
at plpy_elog.c:77

Looks like there several places where code tries to get
attributes from error objects, and while code is ready for
attribute absence, it doesn't clear AttributeError exception
in that case.

Attached patch adds 3 calls to PyErr_Clear() in places where
code reacts on attribute absence. With this patch tests are
passed well.

There were similar findings before. Calls to PyErr_Clear were
close to, but not exactly at, same places before were removed
in
  7e3bb08038 Fix access-to-already-freed-memory issue in plpython's error 
handling.
Then one of PyErr_Clear were added in
  1d2f9de38d Fix freshly-introduced PL/Python portability bug.
But looks like there's need for more.

PS. When python is compilled `--with-valgrind --without-pymalloc`
Valgrind doesn't complain, so there are no memory related
issues in plpython.

regards

--

Yura Sokolov
y.sokolov
diff --git a/src/pl/plpython/plpy_elog.c b/src/pl/plpython/plpy_elog.c
index 7c627eacfbf..aa1cf17b366 100644
--- a/src/pl/plpython/plpy_elog.c
+++ b/src/pl/plpython/plpy_elog.c
@@ -359,7 +359,10 @@ PLy_get_sqlerrcode(PyObject *exc, int *sqlerrcode)
 
 	sqlstate = PyObject_GetAttrString(exc, "sqlstate");
 	if (sqlstate == NULL)
+	{
+		PyErr_Clear();
 		return;
+	}
 
 	buffer = PLyUnicode_AsString(sqlstate);
 	if (strlen(buffer) == 5 &&
@@ -395,6 +398,7 @@ PLy_get_spi_error_data(PyObject *exc, int *sqlerrcode, char **detail,
 	}
 	else
 	{
+		PyErr_Clear();
 		/*
 		 * If there's no spidata, at least set the sqlerrcode. This can happen
 		 * if someone explicitly raises a SPI exception from Python code.
@@ -571,6 +575,10 @@ get_string_attr(PyObject *obj, char *attrname, char **str)
 	{
 		*str = pstrdup(PLyUnicode_AsString(val));
 	}
+	else if (val == NULL)
+	{
+		PyErr_Clear();
+	}
 	Py_XDECREF(val);
 }

Re: Making Vars outer-join aware

2022-08-17 Thread Tom Lane

Richard Guo  writes:
> BTW, the comment just above the two calls to build_joinrel_tlist says:
>  * Create a new tlist containing just the vars that need to be output from
> Here by 'vars' it means both plain Vars and PlaceHolderVars, right? If
> not we may need to adjust this comment to also include PlaceHolderVars.

I think it did intend just Vars because that's all that
build_joinrel_tlist did; but we really should have updated it when we
invented PlaceHolderVars, and even more so now that build_joinrel_tlist
adds PHVs too.  I changed the wording.

> A minor comment is that seems we can get rid of phid inside
> PlaceHolderInfo, since we do not do linear list searches any more. It's
> some duplicate to the phid inside PlaceHolderVar. Currently there are
> two places referencing PlaceHolderInfo->phid, remove_rel_from_query and
> find_placeholder_info. We can use PlaceHolderVar->phid instead in both
> the two places.

Meh, I'm not excited about that.  I don't think that the phid field
is only there to make the search loops faster; it's the basic
identity of the PlaceHolderInfo.

regards, tom lane

Re: cataloguing NOT NULL constraints

2022-08-17 Thread Marcos Pegoraro

>
> I started this time around from the newest of my patches in those
> threads, but the implementation has changed considerably from what's
> there.
>

I don´t know exactly what will be the scope of this process you're working
on, but there is a gap on foreign key constraint too.
It is possible to have wrong values on a FK constraint if you disable
checking of it with session_replication_role or disable trigger all
I know you can create that constraint with "not valid" and it'll be checked
when turned on. But if I just forgot that ...
So would be good to have validate constraints which checks, even if it's
already valid

drop table if exists tb_pk cascade;create table tb_pk(key integer not null
primary key);
drop table if exists tb_fk cascade;create table tb_fk(fk_key integer);
alter table tb_fk add constraint fk_pk foreign key (fk_key) references
tb_pk (key);
insert into tb_pk values(1);
alter table tb_fk disable trigger all; --can be with
session_replication_role too.
insert into tb_fk values(5); --wrong values on that table

Then, you could check

alter table tb_fk validate constraint fk_pk
or
alter table tb_fk validate all constraints

Re: static libpq (and other libraries) overwritten on aix

2022-08-17 Thread Andres Freund

Hi,

On 2022-08-17 15:28:18 -0400, Robert Haas wrote:
> On Wed, Aug 17, 2022 at 3:02 PM Andres Freund  wrote:
> > 2) Do we care about static libraries not suriving on AIX? There could also 
> > be
> >a race in the buildrules leading to sometimes static libs sometimes 
> > shared
> >libs winning, I think.
>
> Instead of overwriting the same file, can we not use different
> filenames for different things?

Not easily, as far as I understand. The way one customarily links to shared
libraries on aix is to have an .a archive containing the shared library. That
way the -lpq picks up libpq.a, which then triggers the shared library to be
referenced.

E.g.
andres@gcc119:[/home/andres/src/postgres/build-ac]$ 
LIBPATH=$(pwd)/src/interfaces/libpq ldd src/bin/scripts/clusterdb
src/bin/scripts/clusterdb needs:
 /usr/lib/libc.a(shr_64.o)
 /usr/lib/libpthread.a(shr_xpg5_64.o)
 /usr/lib/libreadline.a(libreadline.so.6)

/home/andres/src/postgres/build-ac/src/interfaces/libpq/libpq.a(libpq.so.5)
 /unix
 /usr/lib/libcrypt.a(shr_64.o)
 /usr/lib/libcurses.a(shr42_64.o)
 /usr/lib/libpthreads.a(shr_xpg5_64.o)

Note the .a(libpq.so.5) bit.

Unfortunately that's exactly how one links to a static library as well.

So we'd have to change the name used as -l$this between linking to a shared
libpq and a static libpq.

Greetings,

Andres Freund

Re: shared-memory based stats collector - v70

2022-08-17 Thread Greg Stark

On Tue, 16 Aug 2022 at 08:49, Drouvot, Bertrand  wrote:
>
>
> +   if (p->key.kind != PGSTAT_KIND_RELATION)
> +   continue;

Hm. So presumably this needs to be extended. Either to let the caller
decide which types of stats to return or to somehow return all the
stats intermixed. In my monitoring code I did the latter because I
didn't think going through the hash table repeatedly would be very
efficient. But it's definitely a pretty awkward API since I need a
switch statement that explicitly lists each case and casts the result.

> > 2) When I did the function attached above I tried to avoid returning
> > the whole set and make it possible to process them as they arrive.
>
> Is it the way it has been done? (did not look at your function yet)

I did it with callbacks. It was quick and easy and convenient for my
use case. But in general I don't really like callbacks and would think
some kind of iterator style api would be nicer.

I am handling the stats entries as they turn up. I'm constructing the
text output for each in a callback and buffering up the whole http
response in a string buffer.

I think that's ok but if I wanted to avoid buffering it up and do
network i/o then I would think the thing to do would be to build the
list of entry keys and then loop over that list doing a hash lookup
for each one and generating the response for each out and writing it
to the network. That way there wouldn't be anything locked, not even
the hash table, while doing network i/o. It would mean a lot of
traffic on the hash table though.

> > -- on that note I wonder if I've done something
> > wrong because I noted a few records with InvalidOid where I didn't
> > expect it.
>
> It looks like that InvalidOid for the dbid means that the entry is for a
> shared relation.

Ah yes. I had actually found that but forgotten it.

There's also a database entry with dboid=InvalidOid which is
apparently where background workers with no database attached report
stats.

> I've in mind to add some filtering on the dbid (I think it could be
> useful for monitoring tool with a persistent connection to one database
> but that wants to pull the stats database per database).
>
> I don't think a look up through the local cache will work if the
> entry/key is related to another database the API is launched from.

Isn't there also a local hash table used to find the entries to reduce
traffic on the shared hash table? Even if you don't take a snapshot
does it get entered there? There are definitely still parts of this
I'm working on a pretty vague understanding of :/

-- 
greg

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-17 Thread Andres Freund

Hi,

On 2022-08-17 15:21:55 -0400, Robert Haas wrote:
> On Wed, Aug 17, 2022 at 2:45 PM Andres Freund  wrote:
> > Given that the cleanup locks in question are "taken" *after* re-initializing
> > the page, I'm doubtful that's a sane path forward. It seems quite likely to
> > mislead somebody to rely on it working as a cleanup lock in the future.
>
> There's not a horde of people lining up to work on the hash index
> code, but if you feel like writing and testing the more invasive fix,
> I'm not really going to fight you over it.

My problem is that the code right now is an outright lie. At the absolute very
least this code needs a big honking "we check if we have a cleanup lock here,
but that's just for show, because WE ALREADY OVERWROTE THE WHOLE PAGE".

Greetings,

Andres Freund

Re: static libpq (and other libraries) overwritten on aix

2022-08-17 Thread Robert Haas

On Wed, Aug 17, 2022 at 3:02 PM Andres Freund  wrote:
> 2) Do we care about static libraries not suriving on AIX? There could also be
>a race in the buildrules leading to sometimes static libs sometimes shared
>libs winning, I think.

Instead of overwriting the same file, can we not use different
filenames for different things?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-17 Thread Robert Haas

On Wed, Aug 17, 2022 at 2:45 PM Andres Freund  wrote:
> On 2022-08-17 08:25:06 -0400, Robert Haas wrote:
> > Regarding the question of whether we need a cleanup lock on the new
> > bucket I am not really seeing the advantage of going down that path.
> > Simply fixing this code to take a cleanup lock instead of hoping that
> > it always gets one by accident is low risk and should fix the observed
> > problem. Getting rid of the cleanup lock will be more invasive and I'd
> > like to see some evidence that it's a necessary step before we take
> > the risk of breaking things.
>
> Given that the cleanup locks in question are "taken" *after* re-initializing
> the page, I'm doubtful that's a sane path forward. It seems quite likely to
> mislead somebody to rely on it working as a cleanup lock in the future.

There's not a horde of people lining up to work on the hash index
code, but if you feel like writing and testing the more invasive fix,
I'm not really going to fight you over it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

static libpq (and other libraries) overwritten on aix

2022-08-17 Thread Andres Freund

Hi,

I was hacking in making aix work with the meson patchset last night when I
noticed this delightful bit:

gmake -C src/interfaces/libpq
...

rm -f libpq.a
ar crs libpq.a fe-auth-scram.o fe-connect.o fe-exec.o fe-lobj.o fe-misc.o 
fe-print.o fe-protocol3.o fe-secure.o fe-trace.o legacy-pqsignal.o 
libpq-events.o pqexpbuffer.o fe-auth.o
touch libpq.a

( echo '#! libpq.so.5'; gawk '/^[^#]/ {printf "%s\n",$1}' 
/home/andres/src/postgres/build-ac/../src/interfaces/libpq/exports.txt ) 
>libpq.exp
gcc -maix64 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Werror=vla -Wendif-labels 
-Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wformat-security 
-fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation 
-O2 -pthread -D_REENTRANT -D_THREAD_SAFE  -o libpq.so.5 libpq.a 
-Wl,-bE:libpq.exp -L../../../src/port -L../../../src/common -lpgcommon_shlib 
-lpgport_shlib -Wl,-bbigtoc  -Wl,-blibpath:'/usr/local/pgsql/lib:/usr/lib:/lib' 
 -Wl,-bnoentry -Wl,-H512 -Wl,-bM:SRE -lm

rm -f libpq.a
ar crs libpq.a libpq.so.5


we first create a static library libpq.a as normal, but then we overwrite it
with the special aix way of packing up shared libraries, by packing them up in
a static library.  That part is correct, it's apparently the easiest way of
getting applications to link to shared libraries on AIX (I think the
-Wl,-bM:SRE is relevant for ensuring it'll be a dynamic link, rather than a
static one).

This likely has been going on for approximately forever.

Two questions:
1) Do we continue building static libraries for libpq etc?
2) Do we care about static libraries not suriving on AIX? There could also be
   a race in the buildrules leading to sometimes static libs sometimes shared
   libs winning, I think.

Greetings,

Andres Freund

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-17 Thread Andres Freund

Hi,

On 2022-08-17 08:25:06 -0400, Robert Haas wrote:
> Regarding the question of whether we need a cleanup lock on the new
> bucket I am not really seeing the advantage of going down that path.
> Simply fixing this code to take a cleanup lock instead of hoping that
> it always gets one by accident is low risk and should fix the observed
> problem. Getting rid of the cleanup lock will be more invasive and I'd
> like to see some evidence that it's a necessary step before we take
> the risk of breaking things.

Given that the cleanup locks in question are "taken" *after* re-initializing
the page, I'm doubtful that's a sane path forward. It seems quite likely to
mislead somebody to rely on it working as a cleanup lock in the future.

Greetings,

Andres Freund

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-17 Thread Andres Freund

Hi,

On 2022-08-17 10:18:14 +0530, Amit Kapila wrote:
> > Looking at the non-recovery code makes me even more suspicious:
> >
> > /*
> >  * Physically allocate the new bucket's primary page.  We want to 
> > do this
> >  * before changing the metapage's mapping info, in case we can't 
> > get the
> >  * disk space.  Ideally, we don't need to check for cleanup lock on 
> > new
> >  * bucket as no other backend could find this bucket unless meta 
> > page is
> >  * updated.  However, it is good to be consistent with old bucket 
> > locking.
> >  */
> > buf_nblkno = _hash_getnewbuf(rel, start_nblkno, MAIN_FORKNUM);
> > if (!IsBufferCleanupOK(buf_nblkno))
> > {
> > _hash_relbuf(rel, buf_oblkno);
> > _hash_relbuf(rel, buf_nblkno);
> > goto fail;
> > }
> >
> >
> > _hash_getnewbuf() calls _hash_pageinit() which calls PageInit(), which
> > memset(0)s the whole page. What does it even mean to check whether you
> > effectively have a cleanup lock after you zeroed out the page?
> >
> > Reading the README and the comment above makes me wonder if this whole 
> > cleanup
> > lock business here is just cargo culting and could be dropped?
> >
> 
> I think it is okay to not acquire a clean-up lock on the new bucket
> page both in recovery and non-recovery paths. It is primarily required
> on the old bucket page to avoid concurrent scans/inserts. As mentioned
> in the comments and as per my memory serves, it is mainly for keeping
> it consistent with old bucket locking.

It's not keeping it consistent with bucket locking to zero out a page before
getting a cleanup lock, hopefully at least. This code is just broken on
multiple fronts, and consistency isn't a defense.

Greetings,

Andres Freund

Re: cataloguing NOT NULL constraints

2022-08-17 Thread Alvaro Herrera

References to past discussions and patches:

https://postgr.es/m/cakoswnkn6hsyatuys8xzxzrcr-kl1okhs5-b9qd9bf1rad3...@mail.gmail.com
https://www.postgresql.org/message-id/flat/1343682669-sup-2...@alvh.no-ip.org
https://www.postgresql.org/message-id/20160109030002.GA671800@alvherre.pgsql

I started this time around from the newest of my patches in those
threads, but the implementation has changed considerably from what's
there.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

RFC: Moving specific attributes from pg_attribute column into attoptions

2022-08-17 Thread Nikita Malakhov

Hi hackers!

While working on Pluggable TOAST we extended the PG_ATTRIBUTE table with a
new
column 'atttoaster'. But is is obvious that this column is related to
tables, columns and datatypes
only, and is not needed for other attributes.
You can find full discussion on Pluggable TOAST here
https://www.postgresql.org/message-id/flat/224711f9-83b7-a307-b17f-4457ab73a...@sigaev.ru

We already had some thoughts on storing, let's call them "optional"
attributes into 'attoptions'
instead of extending the PG_ATTRIBUTE table, and here came feedback from
Andres Freund
with a remark that we're increasing the largest catalog table. So we
decided to propose moving
these "optional" attributes from being the PG_ATTRIBUTE column to be the
part of 'attoptions'
column of this table.
The first most suspected attributes to store in attoptions column are the
'atttoaster' and
'attcompression', because they are related to datatypes and table columns.
Also, this change
will allow setting options for custom Toasters, which makes a lot of sense
too, according with
an important [as we see it] 'force TOAST' option which is meant to force
given value to be
TOASTed bypassing existing logic (reference depends on tuple and value
size).

Also, we suggest that options stored in 'attoptions' column could be packed
as JSON values.

It seems to make a lot of sense to optimize PG_ATTRIBUTE structure and size
with attributes
related only to specific types, etc.

We'd welcome any opinions, suggestions and advice!

-- 
Regards,
Nikita Malakhov
https://postgrespro.ru/

cataloguing NOT NULL constraints

2022-08-17 Thread Alvaro Herrera

I've been working on having NOT NULL constraints have pg_constraint
rows.

Everything is working now.  Some things are a bit weird, and I would
like opinions on them:

1. In my implementation, you can have more than one NOT NULL
   pg_constraint row for a column.  What should happen if the user does
   ALTER TABLE .. ALTER COLUMN .. DROP NOT NULL;
   ?  Currently it throws an error about the ambiguity (ie. which
   constraint to drop).
   Using ALTER TABLE DROP CONSTRAINT works fine, and the 'attnotnull'
   bit is lost when the last one such constraint goes away.

2. If a table has a primary key, and a table is created that inherits
   from it, then the child has its column(s) marked attnotnull but there
   is no pg_constraint row for that.  This is not okay.  But what should
   happen?

   1. a CHECK(col IS NOT NULL) constraint is created for each column
   2. a PRIMARY KEY () constraint is created

Note that I've chosen not to create CHECK(foo IS NOT NULL) pg_constraint
rows for columns in the primary key, unless an explicit NOT NULL
declaration is also given.  Adding them would be a very easily solution
to problem 2 above, but ISTM that such constraints would be redundant
and not very nice.

After gathering input on these thing, I'll finish the patch and post it.
As far as I can tell, everything else is working (except the annoying
pg_dump tests, see below).

Thanks

Implementation notes:

In the current implementation I am using CHECK constraints, so these
constraints are contype='c', conkey={col} and the corresponding
expression.

pg_attribute.attnotnull is still there, and it is set true when at least
one "CHECK (col IS NOT NULL)" constraint (and it's been validated) or
PRIMARY KEY constraint exists for the column.

CHECK constraint names are no longer "tab_col_check" when the expression
is CHECK (foo IS NOT NULL).  The constraint is now going to be named
"tab_col_not_null"

If you say CREATE TABLE (a int NOT NULL), you'll get a CHECK constraint
printed by psql: (this is a bit more noisy that previously and it
changes a lot of regression tests output).

55489 16devel 1776237=# create table tab (a int not null);
CREATE TABLE
55489 16devel 1776237=# \d tab
Tabla «public.tab»
 Columna │  Tipo   │ Ordenamiento │ Nulable  │ Por omisión 
─┼─┼──┼──┼─
 a   │ integer │  │ not null │ 
Restricciones CHECK:
"tab_a_not_null" CHECK (a IS NOT NULL)


pg_dump no longer prints NOT NULL in the table definition; rather, the
CHECK constraint is dumped as a separate table constraint (still within
the CREATE TABLE statement though).  This preserves any possible
constraint name, in case one was specified by the user at creation time.

In order to search for the correct constraint for each column for
various DDL actions, I just inspect each pg_constraint row for the table
and match conkey and the CHECK expression.  Some things would be easier
with a new pg_attribute column that carries a pg_constraint.oid of the
constraint for that column; however, that seems to be just catalog bloat
and is not normalized, so I decided not to do it.

Nice side-effect: if you add CHECK (foo IS NOT NULL) NOT VALID, and
later validate that constraint, the attnotnull bit becomes set.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

[PATCH] Clarify the comments about varlena header encoding

2022-08-17 Thread Aleksander Alekseev

Hi hackers,

I noticed that the comments regarding bit layouts for varlena headers
in postgres.h are somewhat misleading. For instance, when reading:

```
00xx 4-byte length word, aligned, uncompressed data (up to 1G)
```

... one can assume this is a 00xx byte followed by another 4 bytes
(which is wrong). Also one can read this as "aligned, uncompressed
data" (which again is wrong).

```
1000 1-byte length word, unaligned, TOAST pointer
```

This is misleading too. The comments above this line say that `struct
varatt_external` is a TOAST pointer. sizeof(varatt_external) = 16,
plus 1 byte equals 17, right? However the documentation [1] claims the
result should be 18:

"""
Allowing for the varlena header bytes, the total size of an on-disk
TOAST pointer datum is therefore 18 bytes regardless of the actual
size of the represented value.
"""

I did my best to get rid of any ambiguity. The patch is attached.

[1]: https://www.postgresql.org/docs/current/storage-toast.html

-- 
Best regards,
Aleksander Alekseev


v1-0001-Clarify-the-comments-about-varlena-header-encodin.patch
Description: Binary data

Re: pg_walinspect: ReadNextXLogRecord's first_record argument

2022-08-17 Thread Robert Haas

On Wed, Aug 17, 2022 at 12:29 PM Bharath Rupireddy
 wrote:
> PSA v2 patches.

These look OK to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: ICU for global collation

2022-08-17 Thread Julien Rouhaud

Hi,

On Mon, Aug 15, 2022 at 03:06:32PM +0300, Marina Polyakova wrote:
>
> 1.1) It looks like there's a bug in the function get_db_infos
> (src/bin/pg_upgrade/info.c), where the version of the old cluster is always
> checked:
>
> if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)
>   snprintf(query + strlen(query), sizeof(query) - strlen(query),
>"'c' AS datlocprovider, NULL AS daticulocale, ");
> else
>   snprintf(query + strlen(query), sizeof(query) - strlen(query),
>"datlocprovider, daticulocale, ");
>
> With the simple patch
>
> diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
> index 
> df374ce4b362b4c6c87fc1fd0e476e5d6d353d9e..53ea348e211d3ac38334292bc16cb814bc13bb87
> 100644
> --- a/src/bin/pg_upgrade/info.c
> +++ b/src/bin/pg_upgrade/info.c
> @@ -319,7 +319,7 @@ get_db_infos(ClusterInfo *cluster)
>
>   snprintf(query, sizeof(query),
>"SELECT d.oid, d.datname, d.encoding, d.datcollate, 
> d.datctype, ");
> - if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)
> + if (GET_MAJOR_VERSION(cluster->major_version) < 1500)
>   snprintf(query + strlen(query), sizeof(query) - strlen(query),
>"'c' AS datlocprovider, NULL AS daticulocale, 
> ");
>   else
>
> I got the expected error during the upgrade:
>
> locale providers for database "template1" do not match:  old "libc", new
> "icu"
> Failure, exiting

Good catch.  There's unfortunately not a lot of regression tests for
ICU-initialized clusters.  I'm wondering if the build-farm client could be
taught about the locale provider rather than assuming libc.  Clearly that
wouldn't have caught this issue, but it should still increase the coverage a
bit (I'm thinking of the recent problem with the abbreviated keys).

> 1.2) It looks like the mentioned asserion in dbcommands.c conflicts with the
> following lines earlier:
>
> if (dbiculocale == NULL)
>   dbiculocale = src_iculocale;
>
> The following patch works for me:
>
> diff --git a/src/backend/commands/dbcommands.c
> b/src/backend/commands/dbcommands.c
> index 
> b31a30550b025d48ba3cc250dc4c15f41f9a80be..17a2942341e528c01182fb6d4878580f2706bec9
> 100644
> --- a/src/backend/commands/dbcommands.c
> +++ b/src/backend/commands/dbcommands.c
> @@ -1048,6 +1048,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
>   
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>errmsg("ICU locale must be 
> specified")));
>   }
> + else
> + dbiculocale = NULL;
>
>   if (dblocprovider == COLLPROVIDER_ICU)
>   check_icu_locale(dbiculocale);

I think it would be better to do that in the various variable initialization.
Maybe switch the dblocprovider and dbiculocale initialization, and only
initialize dbiculocale if dblocprovider is COLLPROVIDER_ICU.

> 2) CREATE DATABASE does not always require the icu locale unlike initdb and
> createdb:
>
> $ createdb mydb --locale en_US.UTF-8 --template template0 --locale-provider
> icu
> createdb: error: database creation failed: ERROR:  ICU locale must be
> specified
>
> $ psql -c "CREATE DATABASE mydb LOCALE \"en_US.UTF-8\" TEMPLATE template0
> LOCALE_PROVIDER icu" postgres
> CREATE DATABASE
>
> I'm wondering if this is not a fully-supported feature (because createdb
> creates an SQL command with LC_COLLATE and LC_CTYPE options instead of
> LOCALE option) or is it a bug in CREATE DATABASE?.. From
> src/backend/commands/dbcommands.c:
>
> if (dblocprovider == COLLPROVIDER_ICU && !dbiculocale)
> {
>   if (dlocale && dlocale->arg)
>   dbiculocale = defGetString(dlocale);
> }

This discrepancy between createdb and CREATE DATABASE looks like an oversight,
as createdb indeed interprets --locale as:

if (locale)
{
if (lc_ctype)
pg_fatal("only one of --locale and --lc-ctype can be 
specified");
if (lc_collate)
pg_fatal("only one of --locale and --lc-collate can be 
specified");
lc_ctype = locale;
lc_collate = locale;
}

AFAIK the fallback in the CREATE DATABASE case is expected as POSIX locale
names should be accepted by icu, so this should work for createdb too.

Re: pg_walinspect: ReadNextXLogRecord's first_record argument

2022-08-17 Thread Bharath Rupireddy

On Wed, Aug 17, 2022 at 8:52 PM Robert Haas  wrote:
>
> On Wed, Aug 17, 2022 at 12:41 AM Bharath Rupireddy
>  wrote:
> > Agreed.
> >
> > Here's a patch (for V15 as well) fixing this bug, please review.
>
> Couldn't you simplify this further by removing the lsn argument from
> GetWALRecordInfo and using record->ReadRecPtr instead? Then
> InitXLogReaderState's second argument could be XLogRecPtr instead of
> XLogRecPtr *.

Done. XLogFindNextRecord() stores the first valid record in EndRecPtr
and the ReadRecPtr is set to InvalidXLogRecPtr by calling
XLogBeginRead(). And XLogNextRecord() sets ReadRecPtr which we can
use.

PSA v2 patches.

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/


v2-master-Use-correct-LSN-for-error-reporting-in-pg_walinsp.patch
Description: Binary data


v2-PG15-Use-correct-LSN-for-error-reporting-in-pg_walinsp.patch
Description: Binary data

Re: pg_walinspect: ReadNextXLogRecord's first_record argument

2022-08-17 Thread Robert Haas

On Wed, Aug 17, 2022 at 12:41 AM Bharath Rupireddy
 wrote:
> Agreed.
>
> Here's a patch (for V15 as well) fixing this bug, please review.

Couldn't you simplify this further by removing the lsn argument from
GetWALRecordInfo and using record->ReadRecPtr instead? Then
InitXLogReaderState's second argument could be XLogRecPtr instead of
XLogRecPtr *.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Generalize ereport_startup_progress infrastructure

2022-08-17 Thread Robert Haas

On Wed, Aug 17, 2022 at 4:30 AM Bharath Rupireddy
 wrote:
> As I explained upthread [1], I'd vote for a single GUC at the entire
> server level. If the users/customers request per-process or
> per-operation progress report GUCs, we can then consider it.

Well, I don't agree that either of the proposed new uses of this
infrastructure are the right way to solve the problems in question, so
worrying about how to name the GUCs when we have a bunch of uses of
this infrastructure seems to me to be premature. The proposed use in
the postmaster doesn't look very safe, so you either need to give up
on that or figure out a way to make it safe. The proposed use in the
checkpointer looks like it needs more design work, because it's not
clear whether or how it should interact with log_checkpoints. While I
agree that changing log_checkpoints into an integer value doesn't
necessarily make sense, having some kind of new checkpoint logging
that is completely unrelated to existing checkpoint logging doesn't
necessarily make sense to me either.

I do have some sympathy with the idea that if people care about
operations that unexpectedly run for a long time, they probably care
about all of them, and probably don't care about changing the timeout
or even the enable switch for each one individually. Honestly, it's
not very clear to me who would want to ever turn off the startup
progress stuff, or why they'd want to change the interval. I added a
GUC for it out of an abundance of caution, but I don't know why you'd
really want a different setting. Maybe there's some reason, but it's
not clear to me. At the same time, I don't think the overall picture
here is too clear yet. I'm reluctant to commit to a specific UI for a
feature whose scope we don't seem to know.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Propose a new function - list_is_empty

2022-08-17 Thread Tom Lane

I wrote:
> I'd drop the parens in these particular examples because they are
> inconsistent with the other parts of the same "if" condition.

After going through the patch I removed most but not all of the
newly-added parens on those grounds.  I also found a couple more
spots that could be converted.  Pushed with those changes.

regards, tom lane

RE: s390x builds on buildfarm

2022-08-17 Thread Vivian Kong

Thanks Andrew.  Mark, please let me know if I can help.

Regards,

Vivian Kong
Linux on IBM Z Open Source Ecosystem
IBM Canada Toronto Lab

From: Andrew Dunstan 
Date: Wednesday, August 10, 2022 at 9:56 AM
To: Vivian Kong , pgsql-hackers@lists.postgresql.org 
, mark.w...@enterprisedb.com 

Subject: [EXTERNAL] Re: s390x builds on buildfarm

On 2022-08-10 We 09:04, Vivian Kong wrote:
>
> Hi,
>
>
>
> Are builds being paused on s390x as it looks like the s390x builds
> were last run 15 days ago.  If so, wondering what is the reason for
> the pause and what is required to resume the builds?
> The OS the builds were running on seems to have reached end of life.
> Please let me know if we can help with getting them updated and resume
> the builds.
>
>
>
>

Mark, I think you run most or all of these.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: XLogBeginRead's header comment lies

2022-08-17 Thread Robert Haas

On Wed, Aug 17, 2022 at 6:53 AM Dilip Kumar  wrote:
> Thinking again, there is already a code in XLogDecodeNextRecord() to
> error out if XLP_FIRST_IS_CONTRECORD is set so probably we don't need
> to do anything else and the previous patch with modified assert should
> just work fine?

Yeah, that looks right to me. I'm inclined to commit your patch with
some changes to wording of the comments. I'm also inclined not to
back-patch, since we don't know that this breaks anything for existing
users of the xlogreader facility.

If anyone doesn't want this committed or does want it back-patched,
please speak up.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

shadow variables - pg15 edition

2022-08-17 Thread Justin Pryzby

There's been no progress on this in the past discussions.

https://www.postgresql.org/message-id/flat/877k1psmpf.fsf%40mailbox.samurai.com
https://www.postgresql.org/message-id/flat/CAApHDvpqBR7u9yzW4yggjG%3DQfN%3DFZsc8Wo2ckokpQtif-%2BiQ2A%40mail.gmail.com#2d900bfe18fce17f97ec1f00800c8e27
https://www.postgresql.org/message-id/flat/MN2PR18MB2927F7B5F690065E1194B258E35D0%40MN2PR18MB2927.namprd18.prod.outlook.com

But an unfortunate consequence of not fixing the historic issues is that it
precludes the possibility that anyone could be expected to notice if they
introduce more instances of the same problem (as in the first half of these
patches).  Then the hole which has already been dug becomes deeper, further
increasing the burden of fixing the historic issues before being able to use
-Wshadow.

The first half of the patches fix shadow variables newly-introduced in v15
(including one of my own patches), the rest are fixing the lowest hanging fruit
of the "short list" from COPT=-Wshadow=compatible-local

I can't see that any of these are bugs, but it seems like a good goal to move
towards allowing use of the -Wshadow* options to help avoid future errors, as
well as cleanliness and readability (rather than allowing it to get harder to
use -Wshadow).

-- 
Justin
>From 0b05b375a87d89f5d88e87d11956cf2ac15ea00f Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 16 Aug 2022 19:38:57 -0500
Subject: [PATCH 01/17] avoid shadow vars: pg_dump.c: i_oid

backpatch to v15

commit d498e052b4b84ae21b3b68d5b3fda6ead65d1d4d
Author: Robert Haas 
Date:   Fri Jul 8 10:15:19 2022 -0400

Preserve relfilenode of pg_largeobject and its index across pg_upgrade.
---
 src/bin/pg_dump/pg_dump.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index da6605175a0..322947c5609 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3144,7 +3144,6 @@ dumpDatabase(Archive *fout)
 		PQExpBuffer loHorizonQry = createPQExpBuffer();
 		int			i_relfrozenxid,
 	i_relfilenode,
-	i_oid,
 	i_relminmxid;
 
 		/*
-- 
2.17.1

>From a76bac21fe428cdd6241bff6827e08d9d71e1bdf Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 16 Aug 2022 15:55:13 -0500
Subject: [PATCH 02/17] avoid shadow vars: pg_dump.c: tbinfo

backpatch to v15

commit 9895961529ef8ff3fc12b39229f9a93e08bca7b7
Author: Tom Lane 
Date:   Mon Dec 6 13:07:31 2021 -0500

Avoid per-object queries in performance-critical paths in pg_dump.
---
 src/bin/pg_dump/pg_dump.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 322947c5609..5c196d66985 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -7080,21 +7080,21 @@ getConstraints(Archive *fout, TableInfo tblinfo[], int numTables)
 	appendPQExpBufferChar(tbloids, '{');
 	for (int i = 0; i < numTables; i++)
 	{
-		TableInfo  *tbinfo = &tblinfo[i];
+		TableInfo  *mytbinfo = &tblinfo[i];
 
 		/*
 		 * For partitioned tables, foreign keys have no triggers so they must
 		 * be included anyway in case some foreign keys are defined.
 		 */
-		if ((!tbinfo->hastriggers &&
-			 tbinfo->relkind != RELKIND_PARTITIONED_TABLE) ||
-			!(tbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION))
+		if ((!mytbinfo->hastriggers &&
+			 mytbinfo->relkind != RELKIND_PARTITIONED_TABLE) ||
+			!(mytbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION))
 			continue;
 
 		/* OK, we need info for this table */
 		if (tbloids->len > 1)	/* do we have more than the '{'? */
 			appendPQExpBufferChar(tbloids, ',');
-		appendPQExpBuffer(tbloids, "%u", tbinfo->dobj.catId.oid);
+		appendPQExpBuffer(tbloids, "%u", mytbinfo->dobj.catId.oid);
 	}
 	appendPQExpBufferChar(tbloids, '}');
 
-- 
2.17.1

>From f6a814fd50800942081250b05f8e6d143b8d8266 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 16 Aug 2022 16:22:52 -0500
Subject: [PATCH 03/17] avoid shadow vars: pg_dump.c: owning_tab

backpatch to v15

commit 344d62fb9a978a72cf8347f0369b9ee643fd0b31
Author: Peter Eisentraut 
Date:   Thu Apr 7 16:13:23 2022 +0200

Unlogged sequences
---
 src/bin/pg_dump/pg_dump.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 5c196d66985..ecf29f3c52a 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -16799,7 +16799,7 @@ dumpSequence(Archive *fout, const TableInfo *tbinfo)
 	 */
 	if (OidIsValid(tbinfo->owning_tab) && !tbinfo->is_identity_sequence)
 	{
-		TableInfo  *owning_tab = findTableByOid(tbinfo->owning_tab);
+		owning_tab = findTableByOid(tbinfo->owning_tab);
 
 		if (owning_tab == NULL)
 			pg_fatal("failed sanity check, parent table with OID %u of sequence with OID %u not found",
-- 
2.17.1

>From 1a979be65baab871754f86669c5f0327fad6cab5 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 17 Aug 2022 08:52:03 -0500
Subject: [PATCH 04/17] avoid shadow vars: tablesync.c: first

back

Re: POC: GROUP BY optimization

2022-08-17 Thread Tomas Vondra

On 8/2/22 13:14, David Rowley wrote:
> On Tue, 2 Aug 2022 at 22:21, Michael Paquier  wrote:
>>
>> On Fri, Jul 15, 2022 at 09:46:51PM +0200, Tomas Vondra wrote:
>>> I agree this is a mistake in db0d67db2 that makes the test useless.
>>
>> Tomas, please note that this is an open item assigned to you.  Are you
>> planning to do something with these regression tests by beta3?
> 
> There's still something to do there for PG15, but 1349d2790 (just gone
> in to master) made those two tests run as a parallel query again.
> 

Hi,

I've been looking at this, and as I speculated before it's due to the
sort costing changing a little bit. On PG14, the costing of the plans
looks like this:

 Gather  (cost=1869.39..2823.15 rows=8 width=52)

and with disabled parallelism, it's like this:

 Append  (cost=998.04..3000.64 rows=10 width=52)

so there's a (fairly small) diffrence in favor of the parallel plan. But
on PG15 it's the other way around - the selected non-parallel one is
costed like this:

 Append  (cost=779.41..2490.61 rows=10 width=52)

and by setting parallel_setup_cost=0 you get this:

 Gather  (cost=700.34..1531.76 rows=8 width=52)

So with the setup cost included it's ~2531, and it loses to the simple
plan. This happens because the patch changed sort costing - the same
sort on PG14 and PG15 looks like this:

 PG14: ->  Sort  (cost=998.04..1028.04 rows=12000 width=13)

 PG15: ->  Sort  (cost=779.41..809.41 rows=12000 width=13)

As mentioned, the commit tweaked sort costing - before it was pretty
much just

  comparison_cost * tuples * LOG2(tuples)

but the patch needs to cost different pathkey orderings, and consider
that maybe we don't need to compare all the keys (which the original
costing kind assumes). That's the whole point of this optimization.

The costing (compute_cpu_sort_cost) considers a couple other things,
like cost of the comparator function for the data type, width of the
values, groupings determined by preceding keys, and so on.

It might seem strange that a query with a single pathkey changes, but
that single pathkey is costed the same way, of course. In principle we
might have a special costing for this case, but I'd bet that would
result in pretty surprising inconsistencies when adding a sort key
(going from 1 to 2 keys).

So I don't think the current costing is wrong, but it certainly is more
complex. But the test does not test what it intended - I have two ideas
how to make it work:

1) increase the number of rows in the table

2) increase cpu_operator_cost (for that one test?)

3) tweak the costing somehow, to increase the cost a bit

Both (1) and (2) work - I've tried doubling the number of rows or
setting the operator cost to 0.005, and that does the trick, but maybe a
smaller change would be enough.

I don't like (3) very much - changing the costing just to get the same
test behavior as on older release does not seem very principled. Yes,
maybe it should be tweaked, but not because of a regression test.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company-- Without ORDER BY clause, to test Gather at top-most path
EXPLAIN (COSTS ON)
SELECT a, sum(b), array_agg(distinct c), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3;
QUERY PLAN 
---
 Gather  (cost=1869.39..2823.15 rows=8 width=52)
   Workers Planned: 2
   ->  Parallel Append  (cost=869.39..1822.35 rows=4 width=52)
 ->  GroupAggregate  (cost=998.04..1178.25 rows=4 width=52)
   Group Key: pagg_tab_ml.a
   Filter: (avg(pagg_tab_ml.b) < '3'::numeric)
   ->  Sort  (cost=998.04..1028.04 rows=12000 width=13)
 Sort Key: pagg_tab_ml.a
 ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml  (cost=0.00..185.00 rows=12000 width=13)
 ->  GroupAggregate  (cost=869.39..1019.56 rows=3 width=52)
   Group Key: pagg_tab_ml_5.a
   Filter: (avg(pagg_tab_ml_5.b) < '3'::numeric)
   ->  Sort  (cost=869.39..894.39 rows=1 width=13)
 Sort Key: pagg_tab_ml_5.a
 ->  Append  (cost=0.00..205.00 rows=1 width=13)
   ->  Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_5  (cost=0.00..108.00 rows=7000 width=13)
   ->  Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_6  (cost=0.00..47.00 rows=3000 width=13)
 ->  GroupAggregate  (cost=682.63..802.77 rows=3 width=52)
   Group Key: pagg_tab_ml_2.a
   Filter: (avg(pagg_tab_ml_2.b) < '3'::numeric)
   ->  Sort  (cost=682.63..702.63 rows=8000 width=13)
 Sort Key: pagg_tab_ml_2.a
 ->  Append  (cost=0.00..164.00 rows=8000 width=13)
   ->  Seq Scan on pagg_t

Re: Setting log_connection in connection string doesn't work

2022-08-17 Thread Bruce Momjian

On Wed, Aug 17, 2022 at 10:29:26AM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > This patch is from October of 2021.  I don't see any commitfest entry
> > for it.  Should it be applied?
> 
> I think we decided not to.  The original argument for having these
> be PGC_SU_BACKEND was to try to ensure that you got matching
> connection and disconnection log entries for any one session,
> and I don't see anything that supersedes that plan.

Okay, thanks for the feedback.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: Setting log_connection in connection string doesn't work

2022-08-17 Thread Tom Lane

Bruce Momjian  writes:
> This patch is from October of 2021.  I don't see any commitfest entry
> for it.  Should it be applied?

I think we decided not to.  The original argument for having these
be PGC_SU_BACKEND was to try to ensure that you got matching
connection and disconnection log entries for any one session,
and I don't see anything that supersedes that plan.

regards, tom lane

Re: Setting log_connection in connection string doesn't work

2022-08-17 Thread Bruce Momjian



This patch is from October of 2021.  I don't see any commitfest entry
for it.  Should it be applied?

---

On Wed, Oct 27, 2021 at 11:53:09AM +0900, Kyotaro Horiguchi wrote:
> At Wed, 27 Oct 2021 10:55:31 +0900, Michael Paquier  
> wrote in 
> > On Wed, Oct 27, 2021 at 10:24:05AM +0900, Kyotaro Horiguchi wrote:
> > > I don't know. The fact is that it's a superuser-backend variable that
> > > is silently ignored (but acutally seems to be set in the session).
> > > Setting log_disconnection the same way works (of course the impliction
> > > of this is far less significant that the log_connection case).
> > 
> > fe550b2 is the commit that has changed both those parameters to be
> > PGC_SU_BACKEND, with the commit log mentioning the case you are
> > describing.  That would be the area of this thread:
> > https://www.postgresql.org/message-id/20408.1404329...@sss.pgh.pa.us
> 
> Thanks for the pointer.  (I didn't remember of that thread..)
> 
> > As Tom and this thread are saying, there may be a use-case for
> > making log_connections more effective at startup so as superusers
> > could hide their logs at will.  However, honestly, I am not sure that
> > this is worth spending time improving this as the use-case looks
> > rather thin to me.  Perhaps you are right and we could just mark both
> 
> I tend to agree.
> 
> > of those GUCs as PGC_SIGHUP, making the whole easier to understand and
> > more consistent, though.  If we do that, the patch is wrong, as the
> > docs would also need a refresh.
> 
> Yeah, this is the full version of the patch.
> 
> regards.
> 
> -- 
> Kyotaro Horiguchi
> NTT Open Source Software Center

> >From 11a9612c2590f57f431c3918d5b62c08a5b29efb Mon Sep 17 00:00:00 2001
> From: Kyotaro Horiguchi 
> Date: Wed, 27 Oct 2021 11:39:02 +0900
> Subject: [PATCH] Change log_(dis)connections to PGC_SIGHUP
> 
> log_connections is not effective when it is given in connection
> options. Since no complaint has been heard for this behavior the
> use-case looks rather thin.  Thus we change it to PGC_SIGHUP, rahther
> than putting efforts to make it effective for the
> use-case. log_disconnections is working with the usage but be
> consistent by treating it the same way with log_connection.
> ---
>  doc/src/sgml/config.sgml | 8 
>  src/backend/utils/misc/guc.c | 4 ++--
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
> index de77f14573..64b04a47d2 100644
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -6800,8 +6800,8 @@ local0.*/var/log/postgresql
>  Causes each attempted connection to the server to be logged,
>  as well as successful completion of both client authentication (if
>  necessary) and authorization.
> -Only superusers can change this parameter at session start,
> -and it cannot be changed at all within a session.
> +This parameter can only be set in the 
> postgresql.conf
> +file or on the server command line.
>  The default is off.
> 
>  
> @@ -6827,8 +6827,8 @@ local0.*/var/log/postgresql
>  Causes session terminations to be logged.  The log output
>  provides information similar to log_connections,
>  plus the duration of the session.
> -Only superusers can change this parameter at session start,
> -and it cannot be changed at all within a session.
> +This parameter can only be set in the 
> postgresql.conf
> +file or on the server command line.
>  The default is off.
> 
>
> diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
> index e91d5a3cfd..57d810c80d 100644
> --- a/src/backend/utils/misc/guc.c
> +++ b/src/backend/utils/misc/guc.c
> @@ -1353,7 +1353,7 @@ static struct config_bool ConfigureNamesBool[] =
>   NULL, NULL, NULL
>   },
>   {
> - {"log_connections", PGC_SU_BACKEND, LOGGING_WHAT,
> + {"log_connections", PGC_SIGHUP, LOGGING_WHAT,
>   gettext_noop("Logs each successful connection."),
>   NULL
>   },
> @@ -1362,7 +1362,7 @@ static struct config_bool ConfigureNamesBool[] =
>   NULL, NULL, NULL
>   },
>   {
> - {"log_disconnections", PGC_SU_BACKEND, LOGGING_WHAT,
> + {"log_disconnections", PGC_SIGHUP, LOGGING_WHAT,
>   gettext_noop("Logs end of a session, including 
> duration."),
>   NULL
>   },
> -- 
> 2.27.0
> 


-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: tests and meson - test names and file locations

2022-08-17 Thread Peter Eisentraut


On 12.08.22 18:29, Andres Freund wrote:

I don't really understand which problem this solves and how.  Sure, the test
output is somewhat complex, but I know where it is and I've never found
myself wishing it to be somewhere else.

I'd like the buildfarm and CI a) use parallelism to run tests (that's why the
BF is slow) b) show the logfiles for exactly the failed test ([1]). We can of
course iterate through the whole directory tree, somehow identify which log
files are for which test, and then select the log files for the failed
tests. But that's much easier to do then when you have a uniform directory
hierarchy, where you can test which tests have failed based on the filesystem
alone.


My initial experiences with testing under meson is that it's quite 
fragile and confusing (unlike the building, which is quite robust and 
understandable).  Some of that is the fault of meson, some of that is 
our implementation.  Surely this can be improved over time, but my 
experience has been that it's not there yet.


The idea that we are going to move all the test output files somewhere 
else at the same time is not appealing to me.  The combination of 
fragile plus can't find the diagnostics is not a good one.


Now, this is my experience; others might have different ones.

Also, is there anything in these proposed changes that couldn't also be 
applied to the old build system?  We are going to be running them in 
parallel for some time.  It would be good if one doesn't have to learn 
two entirely different sets of testing interfaces.

Re: [RFC] building postgres with meson - v11

2022-08-17 Thread Peter Eisentraut


On 11.08.22 02:20, Andres Freund wrote:

Attached is a new version of the meson patchset. Plenty changes:


I have various bits of comments on this.

- There are various references to "pch" (pre-compiled headers).  Is
  there more discussion anywhere about this?  I don't know what this
  would entail or whether there are any drawbacks to be aware of.  The
  new *_pch.h files don't have any comments.  Maybe this should be a
  separate patch later.

- About relativize_shared_library_references: We have had several
  patches over the years for working around SIP stuff, and some of
  them did essentially this, but we decided not to go ahead with them.
  We could revisit that, but it should be a separate patch, not mixed
  in with this.

- postgresql-extension.pc: Similarly, this ought to be a separate
  patch.  If we want people to use this, we'll need it in the makefile
  build system anyway.

- -DFRONTEND is used somewhat differently from the makefiles.  For
   example, meson sets -DFRONTEND for pg_controldata, but the
   makefiles don't.  Conversely, the makefiles set -DFRONTEND for
   ecpglib, but meson does not.  This should be checked again to make
   sure it all matches up.

- Option name spelling should be make consistent about underscores
  versus hyphens.  Built-in meson options use underscores, so we
  should make the user-defined ones like that as well (some already
  do).  (wal-blocksize krb-srvnam system-tzdata tap-tests bsd-auth)

- I have found the variable name "cdata" for configuration_data() to
  be less than clear.  I see some GNOME projects to it that way, is
  that where it's from?  systemd uses "conf", maybe that's better.

- In the top-level meson.build, the "renaming" of the Windows system
  name

  host_system = host_machine.system() == 'windows' ? 'win32' : 
host_machine.system()
  build_system = build_machine.system() == 'windows' ? 'win32' : 
build_machine.system()


  seems unnecessary to me.  Why not stick with the provided names?

- The c99_test ought to be not needed if the c_std project option is
  used.  Was there a problem with that?

- Is there a way to split up the top-level meson.build somehow?  Maybe
  just throw some stuff into included files?  This might get out of
  hand at some point.

- The PG_SYSROOT detection gives different results.  On my system,
  configure produces

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk,
  meson produces

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk.
  src/template/darwin goes out of its way to get a version-specific
  result, so we need to carry that over somehow.  (The difference does
  result in differences in the built binaries.)


Then, some patches from me:

0001-Change-shared-library-installation-naming-on-macOS.patch

This changes the makefiles to make the shared library file naming on
macOS match what meson produces.  I don't know what the situation is
on other platforms.

0002-meson-Fix-installation-name-of-libpgfeutils.patch

This was presumably an accidental mistake.

0003-meson-Libraries-need-to-be-built-with-DSO_MAJOR_VERS.patch

This is needed to make NLS work for the libraries.

0004-meson-Add-darwin_versions-argument-for-libraries.patch

This is to make the output match what Makefile.shlib produces.

0005-meson-Fix-link-order-of-support-libraries.patch
0006-meson-Make-link-order-of-external-libraries-match-ma.patch
0007-WIP-meson-Make-link-order-of-object-files-match-make.patch

I have analyzed the produced binaries between both build systems to
make sure they match.  If we link the files and libraries in different
orders, that becomes difficult.  So this fixes this up a bit.  0005 is
needed for correctness in general, I think.  0006 is mostly cosmetic.
You probably wanted to make the library order alphabetical in the
meson files, which I'd support, but then we should change the
makefiles to match.  Similarly, 0007, which is clearly a bit messy at
the moment, but we should try to sort that out either in the old or
the new build files.


And finally some comments on your patches:

meson: prereq: Don't add HAVE_LDAP_H HAVE_WINLDAP_H to pg_config.h.

This can go ahead.

meson: prereq: fix warning compat_informix/rnull.pgc with msvc

-   $float f = 3.71;
+   $float f = (float) 3.71;

This could use float literals like

+   $float f = 3.71f;From 054b24e9ae5fc859f13c05d8150ef1168a477ce4 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 17 Aug 2022 14:18:52 +0200
Subject: [PATCH 1/7] Change shared library installation naming on macOS

It is not customary to install a shared library with a minor version
number (libpq.5.16.dylib) on macOS.  We just need the file with the
major version number (libpq.5.dylib) and the one without version
number (libpq.dylib).  This also matches the installation layout used
by Meson.
---
 src/Makefile.shlib | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/

Re: Propose a new function - list_is_empty

2022-08-17 Thread Tom Lane

Junwang Zhao  writes:
> There are some places that add extra parenthesis like here
>   while (list_length(sortclause) > list_length(previous) &&
> -list_length(new_elems) > 0)
> +(new_elems != NIL))

> Is it necessary to add that extra parenthesis?

I'd drop the parens in these particular examples because they are
inconsistent with the other parts of the same "if" condition.
I concur with Daniel's point that parens can be useful as a visual
aid even when they aren't strictly necessary --- but I don't think
we should make future readers wonder why one half of the "if"
is parenthesized and the other isn't.

regards, tom lane

Re: Amcheck verification of GiST and GIN

2022-08-17 Thread Andrey Borodin

> On 23 Jul 2022, at 14:40, Andrey Borodin  wrote:
> 
> Done. PFA attached patchset.
> 
> Best regards, Andrey Borodin.
> 

Here's v13. Changes:
1. Fixed passing through downlink in GIN index
2. Fixed GIN tests (one test case was not working)

Thanks to Vitaliy Kukharik for trying this patches.

Best regards, Andrey Borodin.

v13-0001-Refactor-amcheck-to-extract-common-locking-routi.patch
Description: Binary data

v13-0002-Add-gist_index_parent_check-function-to-verify-G.patch
Description: Binary data

v13-0003-Add-gin_index_parent_check-to-verify-GIN-index.patch
Description: Binary data

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-17 Thread Robert Haas

On Tue, Aug 16, 2022 at 8:38 PM Andres Freund  wrote:
> I don't think we can defend against lwlock deadlocks where somebody doesn't
> follow the AM's deadlock avoidance strategy.

That's a good way of putting it. Tom seems to be postulating that
maybe someone can use random tools that exist to take buffer locks and
pins in arbitrary order, and if that is true then you can make any AM
deadlock. I think it isn't true, though, and I think if it were true
the right fix would be to remove the tools that are letting people do
that.

There's also zero evidence that this was ever intended as a deadlock
avoidance maneuver. I think that we are only hypothesizing that it was
intended that way because the code looks weird. But I think the email
discussion shows that I thought it was wrong at the time it was
committed, and just missed the fact that the final version of the
patch hadn't fixed it. And if it *were* a deadlock avoidance maneuver
it would still be pretty broken, because it would make the startup
process error out and the whole system go down.

Regarding the question of whether we need a cleanup lock on the new
bucket I am not really seeing the advantage of going down that path.
Simply fixing this code to take a cleanup lock instead of hoping that
it always gets one by accident is low risk and should fix the observed
problem. Getting rid of the cleanup lock will be more invasive and I'd
like to see some evidence that it's a necessary step before we take
the risk of breaking things.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Regarding availability of 32bit client drivers for postgresql 13/14

2022-08-17 Thread Bruce Momjian

On Wed, Aug 17, 2022 at 03:43:55AM +, Aravind Phaneendra wrote:
> Based on the above requirements and details, we have few questions which
> require your support. 
> 
>  1. Can we get 32bit client binaries/libraries for postgresql 14 ?
>  2. We also found that the libraries can be built by using the postgresql 14
> source. Is it possible to build the 32bit client binaries/libraries from
> the source available ?
>  3. Is there an official support for 32bit client libraries/binaries built out
> of source for customers ?
>  4. Can the postgresql 10.12.1 client work with Postgresql 14 server ? Do you
> still support postgresql 10.12.1 client ?

The community produces the source code, and third parties like Debian,
Red Hat, EDB, and our own packagers build the binaries you are asking
about.  I think you need to contact wherever you are getting your
binaries and ask them about 32-bit support.  You can certainly build
32-bit binaries yourself if you wish.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: XLogBeginRead's header comment lies

2022-08-17 Thread Dilip Kumar

On Wed, Aug 17, 2022 at 11:31 AM Dilip Kumar  wrote:
>
> On Wed, Aug 17, 2022 at 11:18 AM Dilip Kumar  wrote:
> >
> > On Tue, Aug 16, 2022 at 11:28 PM Robert Haas  wrote:
> > >
> >
> > Yeah I think it makes sense to make it work as per the comment in
> > XLogBeginRecord().  I think if we modify the Assert as per the comment
> > of XLogBeginRecord() then the remaining code of the
> > XLogDecodeNextRecord() is capable enough to take care of skipping the
> > page header if we are pointing at the beginning of the block.
> >
> > See attached patch.
> >
>
> I think that is not sufficient, if there is a record continuing from
> the previous page and we are pointing to the start of the page then
> this assertion is not sufficient.  I think if the
> targetRecOff is zero then we should additionally read the header and
> verify that XLP_FIRST_IS_CONTRECORD is not set.

Thinking again, there is already a code in XLogDecodeNextRecord() to
error out if XLP_FIRST_IS_CONTRECORD is set so probably we don't need
to do anything else and the previous patch with modified assert should
just work fine?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Proposal: CREATE/ALTER DOMAIN ... STORAGE/COMPRESSION = ...

2022-08-17 Thread Aleksander Alekseev

Hi hackers,

Do you think it will be useful to specify STORAGE and/or COMPRESSION
for domains?

As an example, this will allow creating an alias for TEXT with
EXTERNAL storage strategy. In other words, to do the same we do with
ALTER TABLE, but for types. This feature is arguably not something
most people are going to use, but it shouldn't be difficult to
implement and/or maintain either.

Thoughts?

-- 
Best regards,
Aleksander Alekseev

Re: Making Vars outer-join aware

2022-08-17 Thread Richard Guo

On Wed, Aug 17, 2022 at 4:57 AM Tom Lane  wrote:

> On further thought, it seems better to maintain the index array
> from the start, allowing complete replacement of the linear list
> searches.  We can add a separate bool flag to denote frozen-ness.

+1 for 0001 patch. Now we process plain Vars and PlaceHolderVars in a
more consistent way when building joinrel's tlist. And this change would
make it easier to build up phnullingrels for PHVs as we climb up the
join tree.

BTW, the comment just above the two calls to build_joinrel_tlist says:

 * Create a new tlist containing just the vars that need to be output from

Here by 'vars' it means both plain Vars and PlaceHolderVars, right? If
not we may need to adjust this comment to also include PlaceHolderVars.

0002 patch looks good to me. Glad we can get rid of create_new_ph flag.
A minor comment is that seems we can get rid of phid inside
PlaceHolderInfo, since we do not do linear list searches any more. It's
some duplicate to the phid inside PlaceHolderVar. Currently there are
two places referencing PlaceHolderInfo->phid, remove_rel_from_query and
find_placeholder_info. We can use PlaceHolderVar->phid instead in both
the two places.

Thanks
Richard

Re: Bug: When user-defined AM is used, the index path cannot be selected correctly

2022-08-17 Thread Quan Zongliang




On 2022/8/17 10:03, Tom Lane wrote:

Quan Zongliang  writes:

1. When using extended PGroonga
...
3. Neither ID = 'f' nor id= 't' can use the index correctly.


This works fine for btree indexes.  I think the actual problem
is that IsBooleanOpfamily only accepts the btree and hash
opclasses, and that's what needs to be improved.  Your proposed
patch fails to do that, which makes it just a crude hack that
solves some aspects of the issue (and probably breaks other
things).

It might work to change IsBooleanOpfamily so that it checks to
see whether BooleanEqualOperator is a member of the opclass.
That's basically what we need to know before we dare generate
substitute index clauses.  It's kind of an expensive test
though, and the existing coding assumes that IsBooleanOpfamily
is cheap ...

regards, tom lane


New patch attached.

It seems that partitions do not use AM other than btree and hash.
Rewrite only indxpath.c and check if it is a custom AM.diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 7d176e7b00..30df5cb2f6 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -23,6 +23,7 @@
 #include "catalog/pg_operator.h"
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_type.h"
+#include "commands/defrem.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
@@ -2295,7 +2296,7 @@ match_clause_to_indexcol(PlannerInfo *root,
 
/* First check for boolean-index cases. */
opfamily = index->opfamily[indexcol];
-   if (IsBooleanOpfamily(opfamily))
+   if (IsBooleanAmOpfamily(index->relam, opfamily))
{
iclause = match_boolean_index_clause(root, rinfo, indexcol, 
index);
if (iclause)
diff --git a/src/include/catalog/pg_opfamily.h 
b/src/include/catalog/pg_opfamily.h
index 8dc9ce01bb..5c4cc616d8 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -58,6 +58,12 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_opfamily_oid_index, 2755, 
OpfamilyOidIndexId, on pg
 #define IsBooleanOpfamily(opfamily) \
((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
 
+#define IsBooleanAmOpfamily(amid, opfamily) \
+   ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID || 
\
+   ((amid) >= FirstNormalObjectId && \
+   OidIsValid(GetDefaultOpClass(BOOLOID, (amid 
\
+   )
+
 #endif /* 
EXPOSE_TO_CLIENT_CODE */
 
 #endif /* PG_OPFAMILY_H */

Re: Generalize ereport_startup_progress infrastructure

2022-08-17 Thread Bharath Rupireddy

On Wed, Aug 17, 2022 at 2:45 AM Nathan Bossart  wrote:
>
> On Wed, Aug 10, 2022 at 11:00:20AM -0400, Robert Haas wrote:
> > Maybe the checkpointer is a better candidate, but somehow I feel that
> > we can't consider this sort of thing separate from the existing
> > progress reporting that checkpointer already does. Perhaps we need to
> > think of changing or improving that in some way rather than adding
> > something wholly new alongside the existing system.
>
> I agree that the checkpointer has a good chance of being a better
> candidate.  Are you thinking of integrating this into log_checkpoints
> somehow?  Perhaps this parameter could optionally accept an interval for
> logging the progress of ongoing checkpoints.

Certainly the checkpointer is an immediate candidate. For instance, I
can think of using ereport_progress() in CheckPointSnapBuild() for
snapshot files processing, CheckPointLogicalRewriteHeap() for mapping
files processing, BufferSync() for checkpointing dirty buffers (?),
ProcessSyncRequests() for processing fsync() requests,
RemoveOldXlogFiles(), RemoveNonParentXlogFiles()(?). I personally have
seen cases where some of these checkpoint operations take a lot of
time in production environments and a better observability would help
a lot.

However, I'm not sure if turning log_checkpoints to an integer type to
use for checkpoint progress reporting is a good idea here.

As I explained upthread [1], I'd vote for a single GUC at the entire
server level. If the users/customers request per-process or
per-operation progress report GUCs, we can then consider it.

Thoughts?

[1] 
https://www.postgresql.org/message-id/CALj2ACUJA73nCK_Li7v4_OOkRqwQBX14Fx2ALb7GDRwUTNGK-Q%40mail.gmail.com

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

Re: Propose a new function - list_is_empty

2022-08-17 Thread Daniel Gustafsson

> On 17 Aug 2022, at 10:13, Junwang Zhao  wrote:
> 
> There are some places that add extra parenthesis like here
> 
> ...
> 
> Is it necessary to add that extra parenthesis?

It's not necessary unless needed for operator associativity, but also I don't
object to grouping with parenthesis as a visual aid for the person reading the
code.  Going over the patch in more detail I might change my mind on some but I
don't object to the practice in general.

--
Daniel Gustafsson   https://vmware.com/

Re: Propose a new function - list_is_empty

2022-08-17 Thread Junwang Zhao

There are some places that add extra parenthesis like here

--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3097,7 +3097,7 @@ reorder_grouping_sets(List *groupingsets, List
*sortclause)
  GroupingSetData *gs = makeNode(GroupingSetData);

  while (list_length(sortclause) > list_length(previous) &&
-list_length(new_elems) > 0)
+(new_elems != NIL))
  {

and here,

--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3408,7 +3408,7 @@ estimate_num_groups_incremental(PlannerInfo
*root, List *groupExprs,
  * for normal cases with GROUP BY or DISTINCT, but it is possible for
  * corner cases with set operations.)
  */
- if (groupExprs == NIL || (pgset && list_length(*pgset) < 1))
+ if (groupExprs == NIL || (pgset && (*pgset == NIL)))
  return 1.0;

Is it necessary to add that extra parenthesis?

On Wed, Aug 17, 2022 at 3:33 PM Daniel Gustafsson  wrote:
>
> > On 17 Aug 2022, at 03:09, Peter Smith  wrote:
> >
> > On Wed, Aug 17, 2022 at 6:34 AM Daniel Gustafsson  wrote:
> >>
> >>> On 16 Aug 2022, at 16:03, Tom Lane  wrote:
> >>
> >>> I agree though that while simplifying list_length() calls, I'd lean to 
> >>> using
> >>> explicit comparisons to NIL.
> >>
> >>
> >> Agreed, I prefer that too.
> >>
> >
> > Thanks for the feedback.
> >
> > PSA patch v3 which now uses explicit comparisons to NIL everywhere,
> > and also addresses the other review comments.
>
> From reading, this version of the patch looks good to me.
>
> --
> Daniel Gustafsson   https://vmware.com/
>
>
>


-- 
Regards
Junwang Zhao

Re: Generalize ereport_startup_progress infrastructure

2022-08-17 Thread Bharath Rupireddy

On Wed, Aug 10, 2022 at 6:21 PM Nitin Jadhav
 wrote:
>
> Given two options, option-1 is to use a single GUC across all kind of
> log running operations and option-2 is to use multiple GUCs (one for
> each kind of long running operations), I go with option-1 because if a
> user is interested to see a log message after every 10s for startup
> operations (or any other long running operations) then it is likely
> that he is interested to see other long running operations after every
> 10s only. It does not make sense to use different intervals for each
> kind of long running operation here. It also increases the number of
> GUCs which makes things complex. So it is a good idea to use a single
> GUC here.

+1.

> But I am worried about the on/off switch as Robert
> mentioned.

Are you worried that users might want to switch off the progress
report messages at process level, for instance, they want to log the
startup process' long running operations progress but not, say,
checkpointer or postmaster? IMO, a long running operation, if it is
happening in any of the processes, is a concern for the users and
having progress report log messages for them would help users debug
any issues or improve observability of the server as a whole. With
single GUC, the server log might contain progress reports of all the
long running (wherever we use this ereport_progress()) operations in
the entire server's lifecycle, which isn't bad IMO.

I'd still vote for a single GUC log_progress_report_interval without
worrying much about process-level enable/disable capability. However,
let's hear what other hackers think about this.

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

Re: Propose a new function - list_is_empty

2022-08-17 Thread Daniel Gustafsson

> On 17 Aug 2022, at 03:09, Peter Smith  wrote:
> 
> On Wed, Aug 17, 2022 at 6:34 AM Daniel Gustafsson  wrote:
>> 
>>> On 16 Aug 2022, at 16:03, Tom Lane  wrote:
>> 
>>> I agree though that while simplifying list_length() calls, I'd lean to using
>>> explicit comparisons to NIL.
>> 
>> 
>> Agreed, I prefer that too.
>> 
> 
> Thanks for the feedback.
> 
> PSA patch v3 which now uses explicit comparisons to NIL everywhere,
> and also addresses the other review comments.

From reading, this version of the patch looks good to me.

--
Daniel Gustafsson   https://vmware.com/

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-17 Thread Bharath Rupireddy

On Wed, Aug 17, 2022 at 2:02 AM Nathan Bossart  wrote:
>
> On Wed, Aug 10, 2022 at 03:28:25PM +0530, Bharath Rupireddy wrote:
> >   snprintf(path, sizeof(path), "pg_logical/mappings/%s", 
> > mapping_de->d_name);
> > - if (lstat(path, &statbuf) == 0 && !S_ISREG(statbuf.st_mode))
> > + if (get_dirent_type(path, mapping_de, false, LOG) != 
> > PGFILETYPE_REG)
> >   continue;
>
> Previously, failure to lstat() wouldn't lead to skipping the entry.  With
> this patch, a failure to determine the file type will cause the entry to be
> skipped.  This might be okay in some places (e.g., CheckPointSnapBuild())
> but not in others.  For example, in CheckPointLogicalRewriteHeap(), this
> could cause us to skip fsync-ing a file due to a get_dirent_type() failure,
> which seems bad.

Hm. I corrected it in the v16 patch, please review.

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/


v16-0001-Make-more-use-of-get_dirent_type-in-place-of-sta.patch
Description: Binary data

Re: Handle infinite recursion in logical replication setup

2022-08-17 Thread Peter Smith

On Wed, Aug 17, 2022 at 4:33 PM Amit Kapila  wrote:
>
> On Wed, Aug 17, 2022 at 8:48 AM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Tuesday, August 2, 2022 8:00 PM Amit Kapila  
> > wrote:
> > > On Tue, Jul 26, 2022 at 9:07 AM Amit Kapila  
> > > wrote:
> >
> > Thanks for the summary.
> >
> > I think it's fine to make the user use the copy_data option more carefully 
> > to
> > prevent duplicate copies by reporting an ERROR.
> >
> > But I also have similar concern with Sawada-san as it's possible for user to
> > receive an ERROR in some unexpected cases.
> >
> > For example I want to build bi-directional setup between two nodes:
> >
> > Node A: TABLE test (has actual data)
> > Node B: TABLE test (empty)
> >
> > Step 1:
> > CREATE PUBLICATION on both Node A and B.
> >
> > Step 2:
> > CREATE SUBSCRIPTION on Node A with (copy_data = on)
> > -- this is fine as there is no data on Node B
> >
> > Step 3:
> > CREATE SUBSCRIPTION on Node B with (copy_data = on)
> > -- this should be fine as user needs to copy data from Node A to Node B,
> > -- but we still report an error for this case.
> >
> > It looks a bit strict to report an ERROR in this case and it seems not easy 
> > to
> > avoid this. So, personally, I think it might be better to document the 
> > correct
> > steps to build the bi-directional replication and probably also docuemnt the
> > steps to recover if user accidently did duplicate initial copy if not
> > documented yet.
> >
> > In addition, we could also LOG some additional information about the ORIGIN 
> > and
> > initial copy which might help user to analyze if needed.
> >
>
> But why LOG instead of WARNING? I feel in this case there is a chance
> of inconsistent data so a WARNING like "publication "pub1" could have
> data from multiple origins" can be given when the user has specified
> options: "copy_data = on, origin = NONE" while creating a
> subscription. We give a WARNING during subscription creation when the
> corresponding publication doesn't exist, eg.
>
> postgres=# create subscription sub1 connection 'dbname = postgres'
> publication pub1;
> WARNING:  publication "pub1" does not exist in the publisher
>
> Then, we can explain in docs how users can avoid data inconsistencies
> while setting up replication.
>

I was wondering if this copy/origin case really should be a NOTICE.

See table [1]. It says WARNING is meant for "warnings of likey
problems". But this is not exactly a "likely" problem - IIUC we really
don't know if there is even any problem at all  we only know there
is the *potential* for a problem, but the user has to then judge it
for themselves, Perhaps WARNING may be a bit overkill for this
situation -  it might be unnecessarily scary to give false warnings.

OTOH, NOTICE [1] says it is for "information that might be helpful to
users" which seems more like what is needed here.

--
[1] 
https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-SEVERITY-LEVELS

Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: Remove dummyret definition

2022-08-17 Thread Daniel Gustafsson

> On 17 Aug 2022, at 07:26, Peter Eisentraut 
>  wrote:
> 
> dummyret hasn't been used in a while (last use removed by 50d22de932, and 
> before that 84b6d5f359), and since we are now preferring inline functions 
> over complex macros, it's unlikely to be needed again.

+1, I can't see that making a comeback into the code.

--
Daniel Gustafsson   https://vmware.com/

77 matches

Mail list logo