Strip -mmacosx-version-min options from plperl build
When building on macOS against a Homebrew-provided Perl installation, I get these warnings during the build: ld: warning: object file (SPI.o) was built for newer macOS version (12.4) than being linked (11.3) ld: warning: object file (plperl.o) was built for newer macOS version (12.4) than being linked (11.3) ... This is because the link command uses the option -mmacosx-version-min=11.3, which comes in from perl_embed_ldflags (perl -MExtUtils::Embed -e ldopts), but the compile commands don't use that option, which creates a situation that ld considers inconsistent. I think an appropriate fix is to strip out the undesired option from perl_embed_ldflags. We already do that for other options. Proposed patch attached.From e50c439f2fced8d1b3af9b3bc142f8d43a781c5a Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 18 Aug 2022 08:24:03 +0200 Subject: [PATCH] Strip -mmacosx-version-min options from plperl build --- config/perl.m4 | 2 +- configure | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/config/perl.m4 b/config/perl.m4 index c823fc8cf0..65f338bda7 100644 --- a/config/perl.m4 +++ b/config/perl.m4 @@ -100,7 +100,7 @@ if test "$PORTNAME" = "win32" ; then else pgac_tmp1=`$PERL -MExtUtils::Embed -e ldopts` pgac_tmp2=`$PERL -MConfig -e 'print $Config{ccdlflags}'` - perl_embed_ldflags=`echo X"$pgac_tmp1" | sed -e "s/^X//" -e "s%$pgac_tmp2%%" -e ["s/ -arch [-a-zA-Z0-9_]*//g"]` + perl_embed_ldflags=`echo X"$pgac_tmp1" | sed -e "s/^X//" -e "s%$pgac_tmp2%%" -e ["s/ -arch [-a-zA-Z0-9_]*//g"] -e ["s/ -mmacosx-version-min=[0-9.]*//g"]` fi AC_SUBST(perl_embed_ldflags)dnl if test -z "$perl_embed_ldflags" ; then diff --git a/configure b/configure index 176e0f9b00..8e0f648702 100755 --- a/configure +++ b/configure @@ -10479,7 +10479,7 @@ if test "$PORTNAME" = "win32" ; then else pgac_tmp1=`$PERL -MExtUtils::Embed -e ldopts` pgac_tmp2=`$PERL -MConfig -e 'print $Config{ccdlflags}'` - perl_embed_ldflags=`echo X"$pgac_tmp1" | sed -e "s/^X//" -e "s%$pgac_tmp2%%" -e "s/ -arch [-a-zA-Z0-9_]*//g"` + perl_embed_ldflags=`echo X"$pgac_tmp1" | sed -e "s/^X//" -e "s%$pgac_tmp2%%" -e "s/ -arch [-a-zA-Z0-9_]*//g" -e "s/ -mmacosx-version-min=[0-9.]*//g"` fi if test -z "$perl_embed_ldflags" ; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -- 2.37.1
Re: Perform streaming logical transactions by background workers and parallel apply
Hi Wang-san, FYI, I also checked the latest patch v23-0001 but found that the v21-0001/v23-0001 differences are minimal, so all my v21* review comments are still applicable for the patch v23-0001. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Compressed pluggable storage experiments
Hi all, This is a continuation of the above thread... >> > 4. In order to use WAL-logging each page must start with a standard 24 >> > byte PageHeaderData even if it is needless for storage itself. Not a >> > big deal though. Another (acutally documented) WAL-related limitation >> > is that only generic WAL can be used within extension. So unless >> > inserts are made in bulks it's going to require a lot of disk space to >> > accomodate logs and wide bandwith for replication. >> >> Not sure what to suggest. Either you should ignore this problem, or >> you should fix it. I am working on an environment similar to the above extension(pg_cryogen which experiments pluggable storage api's) but don't have much knowledge on pg's logical replication.. Please suggest some approaches to support pg's logical replication for a table with a custom access method, which writes generic wal record. On Wed, 17 Aug 2022 at 19:04, Tomas Vondra wrote: > On Fri, Oct 18, 2019 at 03:25:05AM -0700, Andres Freund wrote: > >Hi, > > > >On 2019-10-17 12:47:47 -0300, Alvaro Herrera wrote: > >> On 2019-Oct-10, Ildar Musin wrote: > >> > >> > 1. Unlike FDW API, in pluggable storage API there are no routines like > >> > "begin modify table" and "end modify table" and there is no shared > >> > state between insert/update/delete calls. > >> > >> Hmm. I think adding a begin/end to modifytable is a reasonable thing to > >> do (it'd be a no-op for heap and zheap I guess). > > > >I'm fairly strongly against that. Adding two additional "virtual" > >function calls for something that's rarely going to be used, seems like > >adding too much overhead to me. > > > > That seems a bit strange to me. Sure - if there's an alternative way to > achieve the desired behavior (clear way to finalize writes etc.), then > cool, let's do that. But forcing people to use invonvenient workarounds > seems like a bad thing to me - having a convenient and clear API is > quite valueable, IMHO. > > Let's see if this actually has a measuerable overhead first. > > > > >> > 2. It looks like I cannot implement custom storage options. E.g. for > >> > compressed storage it makes sense to implement different compression > >> > methods (lz4, zstd etc.) and corresponding options (like compression > >> > level). But as i can see storage options (like fillfactor etc) are > >> > hardcoded and are not extensible. Possible solution is to use GUCs > >> > which would work but is not extremely convinient. > >> > >> Yeah, the reloptions module is undergoing some changes. I expect that > >> there will be a way to extend reloptions from an extension, at the end > >> of that set of patches. > > > >Cool. > > > > Yep. > > > > >> > 3. A bit surprising limitation that in order to use bitmap scan the > >> > maximum number of tuples per page must not exceed 291 due to > >> > MAX_TUPLES_PER_PAGE macro in tidbitmap.c which is calculated based on > >> > 8kb page size. In case of 1mb page this restriction feels really > >> > limiting. > >> > >> I suppose this is a hardcoded limit that needs to be fixed by patching > >> core as we make table AM more pervasive. > > > >That's not unproblematic - a dynamic limit would make a number of > >computations more expensive, and we already spend plenty CPU cycles > >building the tid bitmap. And we'd waste plenty of memory just having all > >that space for the worst case. ISTM that we "just" need to replace the > >TID bitmap with some tree like structure. > > > > I think the zedstore has roughly the same problem, and Heikki mentioned > some possible solutions to dealing with it in his pgconfeu talk (and it > was discussed in the zedstore thread, I think). > > > > >> > 4. In order to use WAL-logging each page must start with a standard 24 > >> > byte PageHeaderData even if it is needless for storage itself. Not a > >> > big deal though. Another (acutally documented) WAL-related limitation > >> > is that only generic WAL can be used within extension. So unless > >> > inserts are made in bulks it's going to require a lot of disk space to > >> > accomodate logs and wide bandwith for replication. > >> > >> Not sure what to suggest. Either you should ignore this problem, or > >> you should fix it. > > > >I think if it becomes a problem you should ask for an rmgr ID to use for > >your extension, which we encode and then then allow to set the relevant > >rmgr callbacks for that rmgr id at startup. But you should obviously > >first develop the WAL logging etc, and make sure it's beneficial over > >generic wal logging for your case. > > > > AFAIK compressed/columnar engines generally implement two types of > storage - write-optimized store (WOS) and read-optimized store (ROS), > where the WOS is mostly just an uncompressed append-only buffer, and ROS > is compressed etc. ISTM the WOS would benefit from a more elaborate WAL > logging, but ROS should be mostly fine with the generic WAL logging. > > But yeah, we should test and measure how beneficial that actually is.
Re: Perform streaming logical transactions by background workers and parallel apply
Here are my review comments for patch v21-0001: Note - There are some "general" comments which will result in lots of smaller changes. The subsequent "detailed" review comments have some overlap with these general comments but I expect some will be missed so please search/replace to fix all code related to those general comments. == 1. GENERAL - main_worker_pid and replorigin_session_setup Quite a few of my subsequent review comments below are related to the somewhat tricky (IMO) change to the code for this area. Here is a summary of some things that can be done to clean/simplify this logic. 1a. Make the existing replorigin_session_setup function just be a wrapper that delegates to the other function passing the acquired_by as 0. This is because in every case but one (in the apply bg worker main) we are always passing 0, and IMO there is no need to spread the messy extra param to places that do not use it. 1b. 'main_worker_pid' is a confusing member name given the way it gets used - e.g. not even set when you actually *are* the main apply worker? You can still keep all the same logic, but just change the name to something more like 'apply_leader_pid' - then the code can make sense because the main apply workers have no "apply leader" but the apply background workers do. 1c. IMO it will be much better to use pid_t and InvalidPid for the type and the unset values of this member. 1d. The checks/Asserts for main_worker_pid are confusing to read. (e.g. Assert(worker->main_worker_pid != 0) means the worker is a apply background worker. IMO there should be convenient macros for these - then code can be readable again. e.g. #define isApplyMainWorker(worker) (worker->apply_leader_pid == InvalidPid) #define isApplyBgWorker(worker) (worker->apply_leader_pid != InvalidPid) == 2. GENERAL - ApplyBgworkerInfo I like that the struct ApplyBgworkerState was renamed to the more appropriate name ApplyBgworkerInfo. But now all the old variable names (e.g. 'wstate') and parameters must be updated as well. Please search/replace them all in code and comments. e.g. ApplyBgworkerInfo *wstate should now be something like: ApplyBgworkerInfo *winfo; == 3. GENERAL - ApplyBgWorkerStatus --> ApplyBgworkerState IMO the enum should be changed to ApplyBgWorkerState because the values all represent the discrete state that the bgworker is at. See the top StackOverflow answer here [1] which is the same as the point I am trying to make with this comment. This is a simple mechanical exercise rename to fix the reliability but it will impact lots of variables, parameters, function names, and comments. Please search/replace to get them all. == 4. Commit message In addition, the patch extends the logical replication STREAM_ABORT message so that abort_time and abort_lsn can also be sent which can be used to update the replication origin in apply background worker when the streaming transaction is aborted. 4a. Should this para also mention something about the introduction of protocol version 4? 4b. Should this para also mention that these extensions are not strictly mandatory for the parallel streaming to still work? == 5. doc/src/sgml/catalogs.sgml - If true, the subscription will allow streaming of in-progress - transactions + Controls how to handle the streaming of in-progress transactions: + f = disallow streaming of in-progress transactions, + t = spill the changes of in-progress transactions to + disk and apply at once after the transaction is committed on the + publisher, + p = apply changes directly using a background + worker if available(same as 't' if no worker is available) Missing whitespace before '(' == 6. doc/src/sgml/logical-replication.sgml @@ -1334,7 +1344,8 @@ CONTEXT: processing remote data for replication origin "pg_16395" during "INSER subscription. A disabled subscription or a crashed subscription will have zero rows in this view. If the initial data synchronization of any table is in progress, there will be additional workers for the tables - being synchronized. + being synchronized. Moreover, if the streaming transaction is applied + parallelly, there will be additional workers. "applied parallelly" sounds a bit strange. SUGGESTION-1 Moreover, if the streaming transaction is applied in parallel, there will be additional workers. SUGGESTION-2 Moreover, if the streaming transaction is applied using 'parallel' mode, there will be additional workers. == 7. doc/src/sgml/protocol.sgml @@ -3106,6 +3106,11 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;" Version 3 is supported only for server version 15 and above, and it allows streaming of two-phase commits. + + Version 4 is supported only for server version 16 + and above, and it allows applying stream of large in-progress + transactions in parallel. +
Re: Cleaning up historical portability baggage
On Mon, Aug 15, 2022 at 5:53 PM Thomas Munro wrote: > Remove configure probe for IPv6. > Remove dead ifaddrs.c fallback code. > Remove configure probe for net/if.h. > Fix macro problem with gai_strerror on Windows. > Remove configure probe for netinet/tcp.h. > mstcpip.h is not missing on MinGW. I pushed these except one, plus one more about which turned out to be not needed after a bit of archeology. Here's a slightly better AF_INET6 one. I'm planning to push it tomorrow if there are no objections. It does something a little more aggressive than the preceding stuff, because SUSv3 says that IPv6 is an "option". I don't see that as an issue: it also says that various other ubiquitous stuff we're using is optional. Of course, it would be absurd for a new socket implementation to appear today that can't talk to a decent chunk of the internet, and all we require here is the headers. That optionality was relevant for the transition period a couple of decades ago. From f162a15a6d723f8c94d9daa6236149e1f39b0d9a Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Thu, 18 Aug 2022 11:55:10 +1200 Subject: [PATCH] Remove configure probe for sockaddr_in6 and require AF_INET6. SUSv3 defines struct sockaddr_in6, and all targeted Unix systems have it. Windows has it in . Remove the configure probe, the macro and a small amount of dead code. Also remove a mention of IPv6-less builds from the documentation, since there aren't any. This is similar to commits f5580882 and 077bf2f2 for Unix sockets. Even though AF_INET6 is an "optional" component of SUSv3, there are no known modern operating system without it, and it seems even less likely to be omitted from future systems than AF_UNIX. Discussion: https://postgr.es/m/ca+hukgkernfhmvb_h0upremp4lpzgn06yr2_0tyikjzb-2e...@mail.gmail.com --- configure | 10 -- configure.ac| 6 -- doc/src/sgml/client-auth.sgml | 2 -- src/backend/libpq/auth.c| 21 - src/backend/libpq/hba.c | 5 - src/backend/libpq/ifaddr.c | 18 +- src/backend/libpq/pqcomm.c | 2 -- src/backend/utils/adt/network.c | 10 -- src/backend/utils/adt/pgstatfuncs.c | 11 ++- src/bin/initdb/initdb.c | 10 -- src/include/pg_config.h.in | 3 --- src/include/utils/inet.h| 9 - src/interfaces/libpq/fe-connect.c | 2 -- src/port/inet_net_ntop.c| 5 ++--- src/tools/ifaddrs/test_ifaddrs.c| 2 -- src/tools/msvc/Solution.pm | 1 - 16 files changed, 9 insertions(+), 108 deletions(-) diff --git a/configure b/configure index b7fd6c5f4e..c275330114 100755 --- a/configure +++ b/configure @@ -16279,16 +16279,6 @@ cat >>confdefs.h <<_ACEOF _ACEOF -ac_fn_c_check_type "$LINENO" "struct sockaddr_in6" "ac_cv_type_struct_sockaddr_in6" "$ac_includes_default -#include -" -if test "x$ac_cv_type_struct_sockaddr_in6" = xyes; then : - -$as_echo "#define HAVE_IPV6 1" >>confdefs.h - -fi - - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for PS_STRINGS" >&5 $as_echo_n "checking for PS_STRINGS... " >&6; } if ${pgac_cv_var_PS_STRINGS+:} false; then : diff --git a/configure.ac b/configure.ac index e5740f4fb5..a97a48a508 100644 --- a/configure.ac +++ b/configure.ac @@ -1801,12 +1801,6 @@ AC_CHECK_DECLS([pwritev], [], [AC_LIBOBJ(pwritev)], [#include ]) # This is probably only present on macOS, but may as well check always AC_CHECK_DECLS(F_FULLFSYNC, [], [], [#include ]) -AC_CHECK_TYPE([struct sockaddr_in6], -[AC_DEFINE(HAVE_IPV6, 1, [Define to 1 if you have support for IPv6.])], -[], -[$ac_includes_default -#include ]) - AC_CACHE_CHECK([for PS_STRINGS], [pgac_cv_var_PS_STRINGS], [AC_LINK_IFELSE([AC_LANG_PROGRAM( [#include diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml index 433759928b..c6f1b70fd3 100644 --- a/doc/src/sgml/client-auth.sgml +++ b/doc/src/sgml/client-auth.sgml @@ -305,8 +305,6 @@ hostnogssenc database user diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c index 1545ff9f16..71677b69d8 100644 --- a/src/backend/libpq/auth.c +++ b/src/backend/libpq/auth.c @@ -3014,13 +3014,8 @@ PerformRadiusTransaction(const char *server, const char *secret, const char *por int packetlength; pgsocket sock; -#ifdef HAVE_IPV6 struct sockaddr_in6 localaddr; struct sockaddr_in6 remoteaddr; -#else - struct sockaddr_in localaddr; - struct sockaddr_in remoteaddr; -#endif struct addrinfo hint; struct addrinfo *serveraddrs; int port; @@ -3130,18 +3125,12 @@ PerformRadiusTransaction(const char *server, const char *secret, const char *por } memset(&localaddr, 0, sizeof(localaddr)); -#ifdef HAVE_IPV6 localaddr.sin6_family = serveraddrs[0].ai_family; localaddr.sin6_addr = in6addr_any; if (localaddr.sin6_family == AF_INET6) addrsize = sizeof(struct sockaddr_in6); else ad
Re: shadow variables - pg15 edition
On Thu, Aug 18, 2022 at 03:17:33PM +1200, David Rowley wrote: > I'm probably not the only committer to want to run a mile when they > see someone posting 17 or 26 patches in an email. So maybe "bang for > buck" is a better method for getting the ball rolling here. As you > know, I was recently bitten by local shadows in af7d270dd, so I do > believe in the cause. > > What do you think? You already fixed the shadow var introduced in master/pg16, and I sent patches for the shadow vars added in pg15 (marked as such and presented as 001-008), so perhaps it's okay to start with that ? BTW, one of the remaining warnings seems to be another buglet, which I'll write about at a later date. -- Justin
Re: static libpq (and other libraries) overwritten on aix
The AIX sections of Makefile.shlib misuse the terms "static" and "shared". Imagine s/static library/library ending in .a/ and s/shared library/library ending in .so/. That yields an accurate description of the makefile rules. On Wed, Aug 17, 2022 at 12:01:54PM -0700, Andres Freund wrote: > Two questions: > 1) Do we continue building static libraries for libpq etc? Essentially, we don't build static libpq today, and we should continue not building it. (The first-built libpq.a is static, but that file is an implementation detail of the makefile rules. The surviving libpq.a is a normal AIX shared library.) > 2) Do we care about static libraries not suriving on AIX? No. >There could also be >a race in the buildrules leading to sometimes static libs sometimes shared >libs winning, I think. Not since commit e8564ef, to my knowledge. Along the lines of Robert's comment, it could be a nice code beautification to use a different suffix for the short-lived .a file. Perhaps _so_inputs.a. I found this useful years ago: https://web.archive.org/web/20151003130212/http://seriousbirder.com/blogs/aix-shared-and-static-libraries-explained/
Re: [PATCH] Clarify the comments about varlena header encoding
On Thu, Aug 18, 2022 at 1:06 AM Aleksander Alekseev wrote: > > Hi hackers, > > I noticed that the comments regarding bit layouts for varlena headers > in postgres.h are somewhat misleading. For instance, when reading: I agree it's confusing, but I don't think this patch is the right direction. > ``` > 00xx 4-byte length word, aligned, uncompressed data (up to 1G) > ``` > > ... one can assume this is a 00xx byte followed by another 4 bytes > (which is wrong). Also one can read this as "aligned, uncompressed > data" (which again is wrong). - * 00xx 4-byte length word, aligned, uncompressed data (up to 1G) + * 00xx , uncompressed data (up to 1G) Maybe "00xx 4-byte length word (aligned)," is more clear about what is aligned. Also, adding all those xxx's obscures the point that we only need to examine one byte to figure out what to do next. > ``` > 1000 1-byte length word, unaligned, TOAST pointer > ``` > > This is misleading too. The comments above this line say that `struct > varatt_external` is a TOAST pointer. sizeof(varatt_external) = 16, > plus 1 byte equals 17, right? However the documentation [1] claims the > result should be 18: The patch has: + * In the third case the va_tag field (see varattrib_1b_e) is used to discern + * the specific type and length of the pointer datum. On disk the "xxx" bits + * currently always store sizeof(varatt_external) + 2. ...so not sure where 17 came from. - * 1000 1-byte length word, unaligned, TOAST pointer + * 1000 , TOAST pointer (struct varatt_external) This implies that the header is two bytes, which is not accurate. That next byte is a type tag: /* TOAST pointers are a subset of varattrib_1b with an identifying tag byte */ typedef struct { uint8 va_header; /* Always 0x80 or 0x01 */ uint8 va_tag; /* Type of datum */ char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Type-specific data */ } varattrib_1b_e; ...and does not always represent the on-disk length: /* * Type tag for the various sorts of "TOAST pointer" datums. The peculiar * value for VARTAG_ONDISK comes from a requirement for on-disk compatibility * with a previous notion that the tag field was the pointer datum's length. */ typedef enum vartag_external { VARTAG_INDIRECT = 1, VARTAG_EXPANDED_RO = 2, VARTAG_EXPANDED_RW = 3, VARTAG_ONDISK = 18 } vartag_external; And I don't think the new comments referring to "third case", "first two cases", etc make it easier to follow. -- John Naylor EDB: http://www.enterprisedb.com
RE: Perform streaming logical transactions by background workers and parallel apply
On Wed, Aug 17, 2022 2:28 PM Wang, Wei/王 威 wrote: > > On Tues, August 16, 2022 15:33 PM I wrote: > > Attach the new patches. > > I found that cfbot has a failure. > After investigation, I think it is because the worker's exit state is not set > correctly. So I made some slight modifications. > > Attach the new patches. > Thanks for updating the patch. Here are some comments. 0003 patch == 1. src/backend/replication/logical/applybgworker.c + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), +errmsg("cannot replicate target relation \"%s.%s\" using " + "subscription parameter streaming=parallel", + rel->remoterel.nspname, rel->remoterel.relname), +errdetail("The unique column on subscriber is not the unique " + "column on publisher or there is at least one " + "non-immutable function."), +errhint("Please change to use subscription parameter " +"streaming=on."))); Should we use "%s" instead of "streaming=parallel" and "streaming=on"? 2. src/backend/replication/logical/applybgworker.c +* If any worker is handling the streaming transaction, this check needs to +* be performed not only using the apply background worker, but also in the +* main apply worker. This is because without these restrictions, main this check needs to be performed not only using the apply background worker, but also in the main apply worker. -> this check not only needs to be performed by apply background worker, but also by the main apply worker 3. src/backend/replication/logical/relation.c + if (ukey) + { + i = -1; + while ((i = bms_next_member(ukey, i)) >= 0) + { + attnum = AttrNumberGetAttrOffset(i + FirstLowInvalidHeapAttributeNumber); + + if (entry->attrmap->attnums[attnum] < 0 || + !bms_is_member(entry->attrmap->attnums[attnum], entry->remoterel.attunique)) + { + entry->parallel_apply = PARALLEL_APPLY_UNSAFE; + return; + } + } + + bms_free(ukey); It looks we need to call bms_free() before return, right? 4. src/backend/replication/logical/relation.c + /* We don't need info for dropped or generated attributes */ + if (att->attisdropped || att->attgenerated) + continue; Would it be better to change the comment to: We don't check dropped or generated attributes 5. src/test/subscription/t/032_streaming_apply.pl +$node_publisher->wait_for_catchup($appname); + +# Then we check the foreign key on partition table. +$node_publisher->wait_for_catchup($appname); Here, wait_for_catchup() is called twice, we can remove the second one. 6. src/backend/replication/logical/applybgworker.c + /* If any workers (or the postmaster) have died, we have failed. */ + if (status == APPLY_BGWORKER_EXIT) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), +errmsg("background worker %u failed to apply transaction %u", + entry->wstate->shared->worker_id, + entry->wstate->shared->stream_xid))); Should we change the error message to "apply background worker %u failed to apply transaction %u" ? To be consistent with the error message in apply_bgworker_wait_for(). 0004 patch == 1. I saw that the commit message says: If the subscriber exits with an error, this flag will be set true, and whenever the transaction is applied successfully, this flag is reset false. "subretry" is set to false if a transaction is applied successfully, it looks similar to what clear_subscription_skip_lsn() does, so maybe we should remove the following change in apply_handle_stream_abort()? Or only call set_subscription_retry() when rollbacking the toplevel transaction. @@ -1671,6 +1688,9 @@ apply_handle_stream_abort(StringInfo s) */ serialize_stream_abort(xid, subxid); } + + /* Reset the retry flag. */ + set_subscription_retry(false); } reset_apply_error_context_info(); 2. src/backend/replication/logical/worker.c + /* Reset subretry */ + values[Anum_pg_subscription_subretry - 1] = BoolGetDatum(retry); + replaces[Anum_pg_subscription_sub
Re: shadow variables - pg15 edition
Michael Paquier writes: > A lot of the changes proposed here update the code so as the same > variable gets used across more code paths by removing declarations, > but we have two variables defined because both are aimed to be used in > a different context (see AttachPartitionEnsureIndexes() in tablecmds.c > for example). > Wouldn't it be a saner approach in a lot of cases to rename the > shadowed variables (aka the ones getting removed in your patches) and > keep them local to the code paths where we use them? Yeah. I do not think a patch of this sort has any business changing the scopes of variables. That moves it out of "cosmetic cleanup" and into "hm, I wonder if this introduces any bugs". Most hackers are going to decide that they have better ways to spend their time than doing that level of analysis for a very noncritical patch. regards, tom lane
Re: MultiXact\SLRU buffers configuration
> On 17 Aug 2022, at 00:36, i.laza...@postgrespro.ru wrote: > >> Andrey Borodin wrote 2022-07-23 11:39: >> Yura, thank you for your benchmarks! >> We already knew that patch can save the day on pathological workloads, >> now we have a proof of this. >> Also there's the evidence that user can blindly increase size of SLRU >> if they want (with the second patch). So there's no need for hard >> explanations on how to tune the buffers size. > > Hi @Andrey.Borodin, With some considerations and performance checks from > @Yura.Sokolov we simplified your approach by the following: > > 1. Preamble. We feel free to increase any SLRU's, since there's no > performance degradation on large Buffers count using your SLRU buckets > solution. > 2. `slru_buffers_size_scale` is only one config param introduced for all > SLRUs. It scales SLRUs upper cap by power 2. > 3. All SLRU buffers count are capped by both `MBuffers (shared_buffers)` and > `slru_buffers_size_scale`. see > 4. Magic initial constants `NUM_*_BUFFERS << slru_buffers_size_scale` are > applied for every SLRU. > 5. All SLRU buffers are always sized as power of 2, their hash bucket size is > always 8. > > There's attached patch for your consideration. It does gather and simplify > both `v21-0001-Make-all-SLRU-buffer-sizes-configurable.patch` and > `v21-0002-Divide-SLRU-buffers-into-8-associative-banks.patch` to much simpler > approach. I like the idea of one knob instead of one per each SLRU. Maybe we even could deduce sane value from NBuffers? That would effectively lead to 0 knobs :) Your patch have a prefix "v22-0006", does it mean there are 5 previous steps of the patchset? Thank you! Best regards, Andrey Borodin.
Re: Add last failed connection error message to pg_stat_wal_receiver
On Thu, Aug 04, 2022 at 03:27:11PM +0530, Bharath Rupireddy wrote: > Good point. The walreceiver can exit for any reason. We can either 1) > store for all the error messages or 2) think of using sigsetjmp but > that only catches the ERROR kinds, leaving FATAL and PANIC messages. > The option (1) is simple but there are problems - we may miss storing > future error messages, good commenting and reviewing may help here and > all the error messages now need to be stored in string, which is > complex. The option (2) seems reasonable but we will miss FATAL and > PANIC messages (we have many ERRORs, 2 FATALs, 3 PANICs). Maybe a > combination of option (1) for FATALs and PANICs, and option (2) for > ERRORs helps. > > Thoughts? PANIC is not something you'd care about as the system would go down as and shared memory would be reset (right?) even if restart_on_crash is enabled. Perhaps it would help here to use something like a macro to catch and save the error, in a style similar to what's in hba.c for example, which is the closest example I can think of, even if on ERROR we don't really care about the error string anyway as there is nothing to report back to the SQL views used for the HBA/ident files. FATAL may prove to be tricky though, because I'd expect the error to be saved in shared memory in this case. This is particularly critical as this takes the WAL receiver process down, actually. Anyway, outside the potential scope of the proposal, there are more things that I find strange with the code: - Why isn't the string reset when the WAL receiver is starting up? That surely is not OK to keep a past state not referring to what actually happens with a receiver currently running. - pg_stat_wal_receiver (system view) reports no rows if pid is NULL, which would be the state stored in shared memory after a connection. This means that one would never be able to see last_conn_error except when calling directly the SQL function pg_stat_get_wal_receiver(). One could say that we should report a row for this view all the time, but this creates a compatibility breakage: existing application assuming something like (one row <=> WAL receiver running) could break. -- Michael signature.asc Description: PGP signature
Re: shadow variables - pg15 edition
On Thu, 18 Aug 2022 at 02:54, Justin Pryzby wrote: > The first half of the patches fix shadow variables newly-introduced in v15 > (including one of my own patches), the rest are fixing the lowest hanging > fruit > of the "short list" from COPT=-Wshadow=compatible-local I wonder if it's better to fix the "big hitters" first. The idea there would be to try to reduce the number of these warnings as quickly and easily as possible. If we can get the numbers down fairly significantly without too much effort, then that should provide us with a bit more motivation to get rid of the remaining ones. Here are the warnings grouped by the name of the variable: $ make -s 2>&1 | grep "warning: declaration of" | grep -oP "‘([_a-zA-Z]{1}[_a-zA-Z0-9]*)’" | sort | uniq -c 2 ‘aclresult’ 3 ‘attnum’ 1 ‘cell’ 1 ‘cell__state’ 2 ‘cmp’ 2 ‘command’ 1 ‘constraintOid’ 1 ‘copyTuple’ 1 ‘data’ 1 ‘db’ 1 ‘_do_rethrow’ 1 ‘dpns’ 1 ‘econtext’ 1 ‘entry’ 36 ‘expected’ 1 ‘first’ 1 ‘found_whole_row’ 1 ‘host’ 20 ‘i’ 1 ‘iclause’ 1 ‘idxs’ 1 ‘i_oid’ 4 ‘isnull’ 1 ‘it’ 2 ‘item’ 1 ‘itemno’ 1 ‘j’ 1 ‘jtc’ 1 ‘k’ 1 ‘keyno’ 7 ‘l’ 13 ‘lc’ 4 ‘lc__state’ 1 ‘len’ 1 ‘_local_sigjmp_buf’ 1 ‘name’ 2 ‘now’ 1 ‘owning_tab’ 1 ‘page’ 1 ‘partitionId’ 2 ‘path’ 3 ‘proc’ 1 ‘proclock’ 1 ‘querytree_list’ 1 ‘range’ 1 ‘rel’ 1 ‘relation’ 1 ‘relid’ 1 ‘rightop’ 2 ‘rinfo’ 1 ‘_save_context_stack’ 1 ‘save_errno’ 1 ‘_save_exception_stack’ 1 ‘slot’ 1 ‘sqlca’ 9 ‘startelem’ 1 ‘stmt_list’ 2 ‘str’ 1 ‘subpath’ 1 ‘tbinfo’ 1 ‘ti’ 1 ‘transno’ 1 ‘ttype’ 1 ‘tuple’ 5 ‘val’ 1 ‘value2’ 1 ‘wco’ 1 ‘xid’ 1 ‘xlogfname’ The top 5 by count here account for about half of the warnings, so maybe is best to start with those? Likely the ones ending in __state will fix themselves when you fix the variable with the same name without that suffix. The attached patch targets fixing the "expected" variable. $ ./configure --prefix=/home/drowley/pg CFLAGS="-Wshadow=compatible-local" > /dev/null $ make clean -s $ make -j -s 2>&1 | grep "warning: declaration of" | wc -l 153 $ make clean -s $ patch -p1 < reduce_local_variable_shadow_warnings_in_regress.c.patch $ make -j -s 2>&1 | grep "warning: declaration of" | wc -l 117 So 36 fewer warnings with the attached. I'm probably not the only committer to want to run a mile when they see someone posting 17 or 26 patches in an email. So maybe "bang for buck" is a better method for getting the ball rolling here. As you know, I was recently bitten by local shadows in af7d270dd, so I do believe in the cause. What do you think? David diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c index ba3532a51e..6d285255dd 100644 --- a/src/test/regress/regress.c +++ b/src/test/regress/regress.c @@ -56,22 +56,22 @@ #define EXPECT_EQ_U32(result_expr, expected_expr) \ do { \ - uint32 result = (result_expr); \ - uint32 expected = (expected_expr); \ - if (result != expected) \ + uint32 actual_result = (result_expr); \ + uint32 expected_result = (expected_expr); \ + if (actual_result != expected_result) \ elog(ERROR, \ "%s yielded %u, expected %s in file \"%s\" line %u", \ -#result_expr, result, #expected_expr, __FILE__, __LINE__); \ +#result_expr, actual_result, #expected_expr, __FILE__, __LINE__); \ } while (0) #define EXPECT_EQ_U64(result_expr, expected_expr) \ do { \ - uint64 result = (result_expr); \ - uint64 expected = (expected_expr); \ - if (result != expected) \ + uint64 actual_result = (result_expr); \ + uint64 expected_result = (expected_expr); \ + if (actual_result != expected_result) \ elog(ERROR, \ "%s yielded " UINT64_FORMAT ", expected %s in file \"%s\" line %u", \ -#result_expr, result, #expected_expr, __FILE__, __LINE__); \ +#result_expr, actual_result, #expected_expr, __FILE__, __LINE__); \ } while (0) #define LDELIM '('
Re: Handle infinite recursion in logical replication setup
On Wed, Aug 17, 2022 at 12:34 PM Peter Smith wrote: > > On Wed, Aug 17, 2022 at 4:33 PM Amit Kapila wrote: > > > > On Wed, Aug 17, 2022 at 8:48 AM houzj.f...@fujitsu.com > > wrote: > > > > > > On Tuesday, August 2, 2022 8:00 PM Amit Kapila > > > wrote: > > > > On Tue, Jul 26, 2022 at 9:07 AM Amit Kapila > > > > wrote: > > > > > > Thanks for the summary. > > > > > > I think it's fine to make the user use the copy_data option more > > > carefully to > > > prevent duplicate copies by reporting an ERROR. > > > > > > But I also have similar concern with Sawada-san as it's possible for user > > > to > > > receive an ERROR in some unexpected cases. > > > > > > For example I want to build bi-directional setup between two nodes: > > > > > > Node A: TABLE test (has actual data) > > > Node B: TABLE test (empty) > > > > > > Step 1: > > > CREATE PUBLICATION on both Node A and B. > > > > > > Step 2: > > > CREATE SUBSCRIPTION on Node A with (copy_data = on) > > > -- this is fine as there is no data on Node B > > > > > > Step 3: > > > CREATE SUBSCRIPTION on Node B with (copy_data = on) > > > -- this should be fine as user needs to copy data from Node A to Node B, > > > -- but we still report an error for this case. > > > > > > It looks a bit strict to report an ERROR in this case and it seems not > > > easy to > > > avoid this. So, personally, I think it might be better to document the > > > correct > > > steps to build the bi-directional replication and probably also docuemnt > > > the > > > steps to recover if user accidently did duplicate initial copy if not > > > documented yet. > > > > > > In addition, we could also LOG some additional information about the > > > ORIGIN and > > > initial copy which might help user to analyze if needed. > > > > > > > But why LOG instead of WARNING? I feel in this case there is a chance > > of inconsistent data so a WARNING like "publication "pub1" could have > > data from multiple origins" can be given when the user has specified > > options: "copy_data = on, origin = NONE" while creating a > > subscription. We give a WARNING during subscription creation when the > > corresponding publication doesn't exist, eg. > > > > postgres=# create subscription sub1 connection 'dbname = postgres' > > publication pub1; > > WARNING: publication "pub1" does not exist in the publisher > > > > Then, we can explain in docs how users can avoid data inconsistencies > > while setting up replication. > > > > I was wondering if this copy/origin case really should be a NOTICE. > We usually give NOTICE for some sort of additional implicit information, e.g., when we create a slot during CREATE SUBSCRIPTION command: "NOTICE: created replication slot "sub1" on publisher". IMO, this is likely to be a problem of data inconsistency so I think here we can choose between WARNING and LOG. I prefer WARNING but okay with LOG as well if others feel so. I think we can change this later as well if required. We do have an option to not do anything and just document it but I feel it is better to give user some indication of problem here because not everyone reads each update of documentation. Jonathan, Sawada-San, Hou-San, and others, what do you think is the best way to move forward here? -- With Regards, Amit Kapila.
Re: shadow variables - pg15 edition
On Thu, Aug 18, 2022 at 09:39:02AM +0900, Michael Paquier wrote: > On Thu, Aug 18, 2022 at 08:49:14AM +1000, Peter Smith wrote: > > I'd started looking at these [1] last year and spent a day trying to > > categorise them all in a spreadsheet (shadows a global, shadows a > > parameter, shadows a local var etc) but I became swamped by the > > volume, and then other work/life got in the way. > > > > +1 from me. > > A lot of the changes proposed here update the code so as the same > variable gets used across more code paths by removing declarations, > but we have two variables defined because both are aimed to be used in > a different context (see AttachPartitionEnsureIndexes() in tablecmds.c > for example). > Wouldn't it be a saner approach in a lot of cases to rename the > shadowed variables (aka the ones getting removed in your patches) and > keep them local to the code paths where we use them? The cases where I removed a declaration are ones where the variable either hasn't yet been assigned in the outer scope (so it's safe to use first in the inner scope, since its value is later overwriten in the outer scope). Or it's no longer used in the outer scope, so it's safe to re-use it in the inner scope (as in AttachPartitionEnsureIndexes). Since you think it's saner, I changed to rename them. In the case of "first", the var is used in two independent loops, the same way, and re-initialized. In the case of found_whole_row, the var is ignored, as the comments say, so it would be silly to declare more vars to be additionally ignored. -- Justin PS. I hadn't sent the other patches which rename the variables, having assumed that the discussion would be bikeshedded to death and derail without having fixed the lowest hanging fruits. I'm attaching them those now to see what happens. >From 97768e5a439bef016e6ebd5221ed148f076c6e3f Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Tue, 16 Aug 2022 19:38:57 -0500 Subject: [PATCH 01/26] avoid shadow vars: pg_dump.c: i_oid backpatch to v15 commit d498e052b4b84ae21b3b68d5b3fda6ead65d1d4d Author: Robert Haas Date: Fri Jul 8 10:15:19 2022 -0400 Preserve relfilenode of pg_largeobject and its index across pg_upgrade. --- src/bin/pg_dump/pg_dump.c | 1 - 1 file changed, 1 deletion(-) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index da6605175a0..322947c5609 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -3144,7 +3144,6 @@ dumpDatabase(Archive *fout) PQExpBuffer loHorizonQry = createPQExpBuffer(); int i_relfrozenxid, i_relfilenode, - i_oid, i_relminmxid; /* -- 2.17.1 >From ce729535c47d72db775ebcf1f185799c78615148 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Tue, 16 Aug 2022 15:55:13 -0500 Subject: [PATCH 02/26] avoid shadow vars: pg_dump.c: tbinfo backpatch to v15 commit 9895961529ef8ff3fc12b39229f9a93e08bca7b7 Author: Tom Lane Date: Mon Dec 6 13:07:31 2021 -0500 Avoid per-object queries in performance-critical paths in pg_dump. --- src/bin/pg_dump/pg_dump.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 322947c5609..5c196d66985 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -7080,21 +7080,21 @@ getConstraints(Archive *fout, TableInfo tblinfo[], int numTables) appendPQExpBufferChar(tbloids, '{'); for (int i = 0; i < numTables; i++) { - TableInfo *tbinfo = &tblinfo[i]; + TableInfo *mytbinfo = &tblinfo[i]; /* * For partitioned tables, foreign keys have no triggers so they must * be included anyway in case some foreign keys are defined. */ - if ((!tbinfo->hastriggers && - tbinfo->relkind != RELKIND_PARTITIONED_TABLE) || - !(tbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)) + if ((!mytbinfo->hastriggers && + mytbinfo->relkind != RELKIND_PARTITIONED_TABLE) || + !(mytbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)) continue; /* OK, we need info for this table */ if (tbloids->len > 1) /* do we have more than the '{'? */ appendPQExpBufferChar(tbloids, ','); - appendPQExpBuffer(tbloids, "%u", tbinfo->dobj.catId.oid); + appendPQExpBuffer(tbloids, "%u", mytbinfo->dobj.catId.oid); } appendPQExpBufferChar(tbloids, '}'); -- 2.17.1 >From 478fa745d4ddc38fe15f54d7d396ebf7a106772b Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Tue, 16 Aug 2022 16:22:52 -0500 Subject: [PATCH 03/26] avoid shadow vars: pg_dump.c: owning_tab backpatch to v15 commit 344d62fb9a978a72cf8347f0369b9ee643fd0b31 Author: Peter Eisentraut Date: Thu Apr 7 16:13:23 2022 +0200 Unlogged sequences --- src/bin/pg_dump/pg_dump.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 5c196d66985..4b5d8df1e4e 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -16799,21 +16799,21 @@ dumpSequence(Archive *fout, const Ta
Re: fix typos
On Tue, Aug 16, 2022 at 8:48 AM John Naylor wrote: > > On Fri, Aug 12, 2022 at 8:55 PM Tom Lane wrote: > > > > John Naylor writes: > > > This is really a straw-man proposal, since I'm not volunteering to do > > > the work, or suggest anybody else should do the same. That being the > > > case, it seems we should just go ahead with Justin's patch for > > > consistency. Possibly we could also change the messages to say "ID"? > > > > I'd be content if we change the user-facing messages (and documentation > > if any) to say "ID" not "OID". > > The documentation has both, so it makes sense to standardize on "ID". > The messages all had oid/OID, which I changed in the attached. I think > I got all the places. > > I'm thinking it's not wrong enough to confuse people, but consistency > is good, so backpatch to v15 and no further. Does anyone want to make > a case otherwise? This is done. -- John Naylor EDB: http://www.enterprisedb.com
Re: POC: GROUP BY optimization
On Thu, 18 Aug 2022 at 02:46, Tomas Vondra wrote: > So I don't think the current costing is wrong, but it certainly is more > complex. But the test does not test what it intended - I have two ideas > how to make it work: > > 1) increase the number of rows in the table > > 2) increase cpu_operator_cost (for that one test?) > > 3) tweak the costing somehow, to increase the cost a bit Why not, 4) SET parallel_setup_cost = 0; there are plenty of other places we do just that so we get a parallel plan without having to generate enough cost to drown out the parallel worker startup cost. Here are a couple of patches to demo the idea. David diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out index 4b41ccf1aa..db36e3a150 100644 --- a/src/test/regress/expected/partition_aggregate.out +++ b/src/test/regress/expected/partition_aggregate.out @@ -942,6 +942,7 @@ INSERT INTO pagg_tab_ml SELECT i % 30, i % 10, to_char(i % 4, 'FM') FROM gen ANALYZE pagg_tab_ml; -- For Parallel Append SET max_parallel_workers_per_gather TO 2; +SET parallel_setup_cost = 0; -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY -- for level 1 only. For subpartitions, GROUP BY clause does not match with -- PARTITION KEY, but still we do not see a partial aggregation as array_agg() @@ -1025,6 +1026,7 @@ SELECT a, sum(b), array_agg(distinct c), count(*) FROM pagg_tab_ml GROUP BY a HA -> Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3 (25 rows) +RESET parallel_setup_cost; -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY -- for level 1 only. For subpartitions, GROUP BY clause does not match with -- PARTITION KEY, thus we will have a partial aggregation for them. diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql index c17294b15b..ab070fee24 100644 --- a/src/test/regress/sql/partition_aggregate.sql +++ b/src/test/regress/sql/partition_aggregate.sql @@ -222,6 +222,7 @@ ANALYZE pagg_tab_ml; -- For Parallel Append SET max_parallel_workers_per_gather TO 2; +SET parallel_setup_cost = 0; -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY -- for level 1 only. For subpartitions, GROUP BY clause does not match with @@ -235,6 +236,8 @@ SELECT a, sum(b), array_agg(distinct c), count(*) FROM pagg_tab_ml GROUP BY a HA EXPLAIN (COSTS OFF) SELECT a, sum(b), array_agg(distinct c), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3; +RESET parallel_setup_cost; + -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY -- for level 1 only. For subpartitions, GROUP BY clause does not match with -- PARTITION KEY, thus we will have a partial aggregation for them. diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out index a08a3825ff..0dc6d63347 100644 --- a/src/test/regress/expected/partition_aggregate.out +++ b/src/test/regress/expected/partition_aggregate.out @@ -942,40 +942,43 @@ INSERT INTO pagg_tab_ml SELECT i % 30, i % 10, to_char(i % 4, 'FM') FROM gen ANALYZE pagg_tab_ml; -- For Parallel Append SET max_parallel_workers_per_gather TO 2; +SET parallel_setup_cost = 0; -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY -- for level 1 only. For subpartitions, GROUP BY clause does not match with -- PARTITION KEY, but still we do not see a partial aggregation as array_agg() -- is not partial agg safe. EXPLAIN (COSTS OFF) SELECT a, sum(b), array_agg(distinct c), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3 ORDER BY 1, 2, 3; - QUERY PLAN --- - Sort - Sort Key: pagg_tab_ml.a, (sum(pagg_tab_ml.b)), (array_agg(DISTINCT pagg_tab_ml.c)) - -> Append - -> GroupAggregate - Group Key: pagg_tab_ml.a - Filter: (avg(pagg_tab_ml.b) < '3'::numeric) - -> Sort - Sort Key: pagg_tab_ml.a - -> Seq Scan on pagg_tab_ml_p1 pagg_tab_ml - -> GroupAggregate - Group Key: pagg_tab_ml_2.a - Filter: (avg(pagg_tab_ml_2.b) < '3'::numeric) - -> Sort - Sort Key: pagg_tab_ml_2.a - -> Append - -> Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2 - -> Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3 - -> GroupAggregate - Group Key: pagg_tab_ml_5.a - Filter: (avg(pagg_tab_ml_5.b) < '3'::numeric) - -> Sort - Sort Key: pagg_tab_ml_5.a - -> Append - -> Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_5 -
Re: Assertion failure on PG15 with modified test_shm_mq test
Hi, On 2022-08-17 15:02:28 +0900, Michael Paquier wrote: > On Wed, Aug 17, 2022 at 11:15:28AM +0530, Pavan Deolasee wrote: > > I notice that pgstat_shutdown_hook() is registered as a before-shmem-exit > > callback. The callback is responsible for detaching from the pgstat shared > > memory segment. But looks like other parts of the system still expect it to > > be available during later stages of proc exit. > > > > It's not clear to me if pgstat shutdown should happen later or code that > > gets executed later in the cycle should not try to use pgstat. It's also > > entirely possible that my usage of SharedFileSet is completely wrong. If > > that's the case, please let me know the mistake in the usage. > > That's visibly an issue with shared memory and the stats. I have > added an open item. Andres? I don't think there's anything reasonably done about this for 15, as explained upthread. We need a big redesign of the shutdown sequence at some point, but ... Greetings, Andres Freund
Re: shadow variables - pg15 edition
On Thu, Aug 18, 2022 at 08:49:14AM +1000, Peter Smith wrote: > I'd started looking at these [1] last year and spent a day trying to > categorise them all in a spreadsheet (shadows a global, shadows a > parameter, shadows a local var etc) but I became swamped by the > volume, and then other work/life got in the way. > > +1 from me. A lot of the changes proposed here update the code so as the same variable gets used across more code paths by removing declarations, but we have two variables defined because both are aimed to be used in a different context (see AttachPartitionEnsureIndexes() in tablecmds.c for example). Wouldn't it be a saner approach in a lot of cases to rename the shadowed variables (aka the ones getting removed in your patches) and keep them local to the code paths where we use them? -- Michael signature.asc Description: PGP signature
Re: Assertion failure on PG15 with modified test_shm_mq test
Hi, On 2022-08-17 11:15:28 +0530, Pavan Deolasee wrote: > I've a slightly modified version of test_shm_mq, that I changed to include > a shared fileset. The motivation to do that came because I hit an > assertion failure with PG15 while doing some development work on BDR and I > suspected it to be a PG15 bug. > I notice that pgstat_shutdown_hook() is registered as a before-shmem-exit > callback. The callback is responsible for detaching from the pgstat shared > memory segment. But looks like other parts of the system still expect it to > be available during later stages of proc exit. > It's not clear to me if pgstat shutdown should happen later or code that > gets executed later in the cycle should not try to use pgstat. It's also > entirely possible that my usage of SharedFileSet is completely wrong. If > that's the case, please let me know the mistake in the usage. I don't think we have the infrastructure for a nice solution to this at the moment - we need a fairly large overhaul of process initialization / shutdown to handle these interdependencies nicely. We can't move pgstat shutdown into on_dsm callback because that's too late to allocate *new* dsm segments, which we might need to do while flushing out pending stats. See https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=fa91d4c91f28f4819dc54f93adbd413a685e366a for a way to avoid the problem. Greetings, Andres Freund
Re: Add proper planner support for ORDER BY / DISTINCT aggregates
On Wed, 17 Aug 2022 at 13:57, Justin Pryzby wrote: > But in a few places, it removes the locally-computed group_pathkeys: > > - List *group_pathkeys = root->group_pathkeys; > I noticed because that creates a new shadow variable, which seems accidental. Thanks for the report. I've just pushed a fix for this that basically just removes the line you quoted. Really I should have been using the version of group_pathkeys that stripped off the pathkeys from the ORDER BY / DISTINCT aggregates that is calculated earlier in that function. In practice, there was no actual bug here as the wrong variable was only being used in the code path that was handling partial paths. We never create any partial paths when there are aggregates with ORDER BY / DISTINCT clauses, so in that code path, the two versions of the group_pathkeys variable would have always been set to the same thing. It makes sense just to get rid of the shadowed variable since the value of it will be the same anyway. David
Re: shared-memory based stats collector - v70
Hi, On 2022-08-17 15:46:42 -0400, Greg Stark wrote: > Isn't there also a local hash table used to find the entries to reduce > traffic on the shared hash table? Even if you don't take a snapshot > does it get entered there? There are definitely still parts of this > I'm working on a pretty vague understanding of :/ Yes, there is. But it's more about code that generates stats, rather than reporting functions. While there's backend local pending stats we need to have a refcount on the shared stats item so that the stats item can't be dropped and then revived when those local stats are flushed. Relevant comments from pgstat.c: * To avoid contention on the shared hashtable, each backend has a * backend-local hashtable (pgStatEntryRefHash) in front of the shared * hashtable, containing references (PgStat_EntryRef) to shared hashtable * entries. The shared hashtable only needs to be accessed when no prior * reference is found in the local hashtable. Besides pointing to the * shared hashtable entry (PgStatShared_HashEntry) PgStat_EntryRef also * contains a pointer to the shared statistics data, as a process-local * address, to reduce access costs. * * The names for structs stored in shared memory are prefixed with * PgStatShared instead of PgStat. Each stats entry in shared memory is * protected by a dedicated lwlock. * * Most stats updates are first accumulated locally in each process as pending * entries, then later flushed to shared memory (just after commit, or by * idle-timeout). This practically eliminates contention on individual stats * entries. For most kinds of variable-numbered pending stats data is stored * in PgStat_EntryRef->pending. All entries with pending data are in the * pgStatPending list. Pending statistics updates are flushed out by * pgstat_report_stat(). * pgstat_internal.h has more details about the refcount aspect: * Per-object statistics are stored in the "shared stats" hashtable. That * table's entries (PgStatShared_HashEntry) contain a pointer to the actual stats * data for the object (the size of the stats data varies depending on the * kind of stats). The table is keyed by PgStat_HashKey. * * Once a backend has a reference to a shared stats entry, it increments the * entry's refcount. Even after stats data is dropped (e.g., due to a DROP * TABLE), the entry itself can only be deleted once all references have been * released. Greetings, Andres Freund
Re: s390x builds on buildfarm
Hi, On 2022-08-10 13:04:40 +, Vivian Kong wrote: > Are builds being paused on s390x as it looks like the s390x builds were last > run 15 days ago. If so, wondering what is the reason for the pause and what > is required to resume the builds? The OS the builds were running on seems > to have reached end of life. Please let me know if we can help with getting > them updated and resume the builds. I realize the question below is likely not your department, but perhaps you could refer us to the right people? Does IBM provide any AIX instances to open source projects? We have access to some via the gcc compile farm, but they're a bit outdated, often very overloaded, and seem to have some other issues (system perl segfaulting etc). Greetings, Andres Freund
Re: TAP output format in pg_regress
Hi, On 2022-08-17 23:04:20 +0200, Daniel Gustafsson wrote: > Attached is a new version of the pg_regress TAP patch which - per reviewer > feedback - does away with dual output formats and just converts the existing > output to be TAP complaint. Cool. > while still be TAP compliant enough that running it with prove (with a tiny > Perl wrapper) works. Wonder if we could utilize that for making failures of 002_pg_upgrade.pl and 027_stream_regress.pl easier to understand, by reporting pg_regress' steps as part of the outer test. But that'd probably be at least somewhat hard, due to the embedded test count etc. > This version also strips away the verbose output which these days isn't > terribly useful and mainly consume vertical space. Yay. > Another feature of the patch is to switch error logging to using the common > frontend logging rather than pg_regress rolling its own. The output from > pg_log_error is emitted verbatim by prove as it's on stderr. Sounds good. > I based the support on the original TAP specification and not the new v13 or > v14 since it seemed unclear how well those are supported in testrunners. If > v14 is adopted by testrunners there are some better functionality for > reporting > errors that we could use, but for how this version seems a safer option. Yep, that makes sense. > A normal "make check" with this patch applied now looks like this: > > > +++ regress check in src/test/regress +++ > # running on port 61696 with PID 57910 > ok 1 - test_setup 326 ms > ok 2 - tablespace 401 ms > # parallel group (20 tests): varchar char oid pg_lsn txid int4 regproc money > int2 uuid float4 text name boolean int8 enum float8 bit numeric rangetypes > ok 3 - boolean 129 ms > ok 4 - char 73 ms > ok 5 - name 117 ms > ok 6 - varchar 68 ms > <...snip...> > ok 210 - memoize137 ms > ok 211 - stats 851 ms > # parallel group (2 tests): event_trigger oidjoins > ok 212 - event_trigger 138 ms > not ok 213 - oidjoins 190 ms > ok 214 - fast_default 149 ms > 1..214 > # 1 of 214 tests failed. > # The differences that caused some tests to fail can be viewed in the > # file "//src/test/regress/regression.diffs". A copy of the test > summary that you see > # above is saved in the file "//src/test/regress/regression.out". I'm happy with that compared to our current output. > The ending comment isn't currently shown when executed via prove as it has to > go to STDERR for that to work, and I'm not sure that's something we want to do > in the general case. I don't expect running the pg_regress tests via prove is > going to be the common case. How does the meson TAP ingestion handle > stdout/stderr? It'll parse stdout for tap output and log stderr output separately. > There is a slight reduction in information, for example around tests run > serially vs in a parallel group. This could perhaps be handled by marking the > test name in some way like "tablespace (serial) or prefixing with a symbol or > somerthing. I can take a stab at that in case we think that level of detail > is > important to preserve. I think we could just indent the test "description" for tests in parallel groups: I.e. instead of ok 1 - test_setup 326 ms ok 2 - tablespace 401 ms # parallel group (20 tests): varchar char oid pg_lsn txid int4 regproc money int2 uuid float4 text name boolean int8 enum float8 bit numeric rangetypes ok 3 - boolean 129 ms ok 4 - char 73 ms ok 5 - name 117 ms ok 6 - varchar 68 ms do ok 1 - test_setup 326 ms ok 2 - tablespace 401 ms # parallel group (20 tests): varchar char oid pg_lsn txid int4 regproc money int2 uuid float4 text name boolean int8 enum float8 bit numeric rangetypes ok 3 - boolean 129 ms ok 4 - char 73 ms ok 5 - name 117 ms ok 6 - varchar 68 ms that'd make it sufficiently clear, and is a bit more similar to the current format? I wonder if we should indent the test name so that the number doesn't cause wrapping? And perhaps we can skip the - in that case? I.e. instead of: ok 9 - name 117 ms ok 10 - varchar 68 ms .. ok 99 - something60 ms ok 100 - memoize137 ms something like: ok 9name117 ms ok 10 varchar 68 ms ... ok 99 something60 ms ok 100 memoize 137 ms with parallel tests: ok 9name
Re: shadow variables - pg15 edition
On Thu, Aug 18, 2022 at 12:54 AM Justin Pryzby wrote: > > There's been no progress on this in the past discussions. > > https://www.postgresql.org/message-id/flat/877k1psmpf.fsf%40mailbox.samurai.com > https://www.postgresql.org/message-id/flat/CAApHDvpqBR7u9yzW4yggjG%3DQfN%3DFZsc8Wo2ckokpQtif-%2BiQ2A%40mail.gmail.com#2d900bfe18fce17f97ec1f00800c8e27 > https://www.postgresql.org/message-id/flat/MN2PR18MB2927F7B5F690065E1194B258E35D0%40MN2PR18MB2927.namprd18.prod.outlook.com > > But an unfortunate consequence of not fixing the historic issues is that it > precludes the possibility that anyone could be expected to notice if they > introduce more instances of the same problem (as in the first half of these > patches). Then the hole which has already been dug becomes deeper, further > increasing the burden of fixing the historic issues before being able to use > -Wshadow. > > The first half of the patches fix shadow variables newly-introduced in v15 > (including one of my own patches), the rest are fixing the lowest hanging > fruit > of the "short list" from COPT=-Wshadow=compatible-local > > I can't see that any of these are bugs, but it seems like a good goal to move > towards allowing use of the -Wshadow* options to help avoid future errors, as > well as cleanliness and readability (rather than allowing it to get harder to > use -Wshadow). > Hey, thanks for picking this up! I'd started looking at these [1] last year and spent a day trying to categorise them all in a spreadsheet (shadows a global, shadows a parameter, shadows a local var etc) but I became swamped by the volume, and then other work/life got in the way. +1 from me. -- [1] https://www.postgresql.org/message-id/flat/CAHut%2BPuv4LaQKVQSErtV_%3D3MezUdpipVOMt7tJ3fXHxt_YK-Zw%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: Propose a new function - list_is_empty
On Thu, Aug 18, 2022 at 1:14 AM Tom Lane wrote: > > I wrote: > > I'd drop the parens in these particular examples because they are > > inconsistent with the other parts of the same "if" condition. > > After going through the patch I removed most but not all of the > newly-added parens on those grounds. I also found a couple more > spots that could be converted. Pushed with those changes. > Thanks for pushing. -- Kind Regards, Peter Smith. Fujitsu Australia.
Re: Add support for DEFAULT specification in COPY FROM
On 2022-08-17 We 17:12, Israel Barth Rubio wrote: > > > Does that address your concerns? > > I am attaching the new patch, containing the above test in the regress > suite. Thanks, yes, that all looks sane. Please add this to the next CommitFest if you haven't already done so. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: [RFC] building postgres with meson - v11
Hi, On 2022-08-17 15:50:23 +0200, Peter Eisentraut wrote: > - There are various references to "pch" (pre-compiled headers). Is > there more discussion anywhere about this? I don't know what this > would entail or whether there are any drawbacks to be aware of. The > new *_pch.h files don't have any comments. Maybe this should be a > separate patch later. It's mainly to make windows builds a bit slower. I've no objection to separating this out. > - About relativize_shared_library_references: We have had several > patches over the years for working around SIP stuff, and some of > them did essentially this, but we decided not to go ahead with them. > We could revisit that, but it should be a separate patch, not mixed > in with this. The prior approaches all had issues because they didn't support relative references IIRC (and thus broke being able to relocate the installation), which this does. I just found it very annoying to work on macs without this. And there were at least two "bug" reports of testers of the meson branch that were just due to SIP. I'm ok with splitting it out, but I also think it's a lower risk opportunity to test that this works. > - postgresql-extension.pc: Similarly, this ought to be a separate > patch. If we want people to use this, we'll need it in the makefile > build system anyway. Makes sense. I'd like to keep it in the same patch for a short while longer, to deduplicate some of the code, but then will split it out. > - -DFRONTEND is used somewhat differently from the makefiles. For >example, meson sets -DFRONTEND for pg_controldata, but the >makefiles don't. Conversely, the makefiles set -DFRONTEND for >ecpglib, but meson does not. This should be checked again to make >sure it all matches up. Yes, should sync that up. FWIW, meson does add -DFRONTEND for ecpglib. There were a few places that did add it twice, I'll push a cleanup of that in a bit. > - Option name spelling should be make consistent about underscores > versus hyphens. Built-in meson options use underscores, so we > should make the user-defined ones like that as well (some already > do). (wal-blocksize krb-srvnam system-tzdata tap-tests bsd-auth) No objection. > - I have found the variable name "cdata" for configuration_data() to > be less than clear. I see some GNOME projects to it that way, is > that where it's from? systemd uses "conf", maybe that's better. I don't know where it's from - I don't think I ever looked at gnome buildsystem stuff. It seems to be the obvious abbreviation for configuration_data()... I don't object to conf, but it's not a clear improvement to me. > - In the top-level meson.build, the "renaming" of the Windows system > name > > host_system = host_machine.system() == 'windows' ? 'win32' : > host_machine.system() > build_system = build_machine.system() == 'windows' ? 'win32' : > build_machine.system() > > seems unnecessary to me. Why not stick with the provided names? Because right now we also use it for things like choosing the "source" for pg_config_os.h (i.e. include/port/{darwin,linux,win32,..}.h). And it seemed easier to just have one variable name for all of it. > - The c99_test ought to be not needed if the c_std project option is > used. Was there a problem with that? We don't want to force -std=c99 when not necessary, I think. We sometimes use features from newer (and from gnu) language versions after testing availability, and if we hardcode the version those will either fail or elicit warnings. > - Is there a way to split up the top-level meson.build somehow? Maybe > just throw some stuff into included files? This might get out of > hand at some point. We can put stuff into config/meson.build or such. But I don't think it's clearly warranted at this point. > - The PG_SYSROOT detection gives different results. On my system, > configure produces > > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk, > meson produces > > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk. > src/template/darwin goes out of its way to get a version-specific > result, so we need to carry that over somehow. (The difference does > result in differences in the built binaries.) TBH, I don't really understand the SYSROOT stuff all that well, never having used a mac in anger (well, only in anger, but ...). What do you think about extracting the relevant portion of src/template/darwin into a dedicated shell script that gets called by both? > Then, some patches from me: > > 0001-Change-shared-library-installation-naming-on-macOS.patch > > This changes the makefiles to make the shared library file naming on > macOS match what meson produces. I don't know what the situation is > on other platforms. No opinion on the matter. Seems best to apply separately if we want to? > 0002-meson-Fix-insta
Re: Add support for DEFAULT specification in COPY FROM
Hello Andrew, Thanks for reviewing this patch. It is worth noting that DEFAULT will only take place if explicitly specified, meaning there is no default value for the option DEFAULT. The usage of \D in the tests was only a suggestion. Also, NULL marker will be an unquoted empty string by default in CSV mode. In any case I have manually tested it and the behavior is compliant to what we see in NULL if it is defined to use \N both in text and CSV modes. - NULL as \N: postgres=# CREATE TEMP TABLE copy_null (id integer primary key, value text); CREATE TABLE postgres=# copy copy_null from stdin with (format text, NULL '\N'); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1 \N >> 2 \\N >> 3 "\N" >> \. COPY 3 postgres=# TABLE copy_null ; id | value +--- 1 | 2 | \N 3 | "N" (3 rows) postgres=# TRUNCATE copy_null ; TRUNCATE TABLE postgres=# copy copy_null from stdin with (format csv, NULL '\N'); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1,\N >> 2,\\N >> 3,"\N" >> \. COPY 3 postgres=# TABLE copy_null ; id | value +--- 1 | 2 | \\N 3 | \N (3 rows) - DEFAULT as \D: postgres=# CREATE TEMP TABLE copy_default (id integer primary key, value text default 'test'); CREATE TABLE postgres=# copy copy_default from stdin with (format text, DEFAULT '\D'); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1 \D >> 2 \\D >> 3 "\D" >> \. COPY 3 postgres=# TABLE copy_default ; id | value +--- 1 | test 2 | \D 3 | "D" (3 rows) postgres=# TRUNCATE copy_default ; TRUNCATE TABLE postgres=# copy copy_default from stdin with (format csv, DEFAULT '\D'); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1,\D >> 2,\\D >> 3,"\D" >> \. COPY 3 postgres=# TABLE copy_default ; id | value +--- 1 | test 2 | \\D 3 | \D (3 rows) If you do not specify DEFAULT in COPY FROM, it will have no default value for that option. So, if you try to load \D in CSV mode, then it will load the literal value: postgres=# CREATE TEMP TABLE copy (id integer primary key, value text default 'test'); CREATE TABLE postgres=# copy copy from stdin with (format csv); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1,\D >> 2,\\D >> 3,"\D" >> \. COPY 3 postgres=# TABLE copy ; id | value +--- 1 | \D 2 | \\D 3 | \D (3 rows) Does that address your concerns? I am attaching the new patch, containing the above test in the regress suite. Best regards, Israel. Em ter., 16 de ago. de 2022 às 17:27, Andrew Dunstan escreveu: > > On 2022-08-16 Tu 14:12, Israel Barth Rubio wrote: > > Hello all, > > > > With the current implementation of COPY FROM in PostgreSQL we are able to > > load the DEFAULT value/expression of a column if the column is absent > > in the > > list of specified columns. We are not able to explicitly ask that > > PostgreSQL uses > > the DEFAULT value/expression in a column that is being fetched from > > the input > > file, though. > > > > This patch adds support for handling DEFAULT values in COPY FROM. It > > works > > similarly to NULL in COPY FROM: whenever the marker that was set for > > DEFAULT > > value/expression is read from the input stream, it will evaluate the > > DEFAULT > > value/expression of the corresponding column. > > > > I'm currently working as a support engineer, and both me and some > > customers had > > already faced a situation where we missed an implementation like this > > in COPY > > FROM, and had to work around that by using an input file where the > > column which > > has a DEFAULT value/expression was removed. > > > > That does not solve all issues though, as it might be the case that we > > just want a > > DEFAULT value to take place if no other value was set for the column > > in the input > > file, meaning we would like to have a column in the input file that > > sometimes assume > > the DEFAULT value/expression, and sometimes assume an actual given value. > > > > The implementation was performed about one month ago and included all > > regression > > tests regarding the changes that were introduced. It was just rebased > > on top of the > > master branch before submitting this patch, and all tests are still > > succeeding. > > > > The implementation takes advantage of the logic that was already > > implemented to > > handle DEFAULT values for missing columns in COPY FROM. I just > > modified it to > > make it available the DEFAULT values/expressions for all columns > > instead of only > > for the ones that were missing in the specification. I had to change > > the variables > > accordingly, so it would index the correct positions in the new array > > of DEFAULT > > values/expressions
Re: TAP output format in pg_regress
Attached is a new version of the pg_regress TAP patch which - per reviewer feedback - does away with dual output formats and just converts the existing output to be TAP complaint. The idea was to keep it fairly human readable while still be TAP compliant enough that running it with prove (with a tiny Perl wrapper) works. This version also strips away the verbose output which these days isn't terribly useful and mainly consume vertical space. Another feature of the patch is to switch error logging to using the common frontend logging rather than pg_regress rolling its own. The output from pg_log_error is emitted verbatim by prove as it's on stderr. I based the support on the original TAP specification and not the new v13 or v14 since it seemed unclear how well those are supported in testrunners. If v14 is adopted by testrunners there are some better functionality for reporting errors that we could use, but for how this version seems a safer option. A normal "make check" with this patch applied now looks like this: +++ regress check in src/test/regress +++ # running on port 61696 with PID 57910 ok 1 - test_setup 326 ms ok 2 - tablespace 401 ms # parallel group (20 tests): varchar char oid pg_lsn txid int4 regproc money int2 uuid float4 text name boolean int8 enum float8 bit numeric rangetypes ok 3 - boolean 129 ms ok 4 - char 73 ms ok 5 - name 117 ms ok 6 - varchar 68 ms <...snip...> ok 210 - memoize137 ms ok 211 - stats 851 ms # parallel group (2 tests): event_trigger oidjoins ok 212 - event_trigger 138 ms not ok 213 - oidjoins 190 ms ok 214 - fast_default 149 ms 1..214 # 1 of 214 tests failed. # The differences that caused some tests to fail can be viewed in the # file "//src/test/regress/regression.diffs". A copy of the test summary that you see # above is saved in the file "//src/test/regress/regression.out". The ending comment isn't currently shown when executed via prove as it has to go to STDERR for that to work, and I'm not sure that's something we want to do in the general case. I don't expect running the pg_regress tests via prove is going to be the common case. How does the meson TAP ingestion handle stdout/stderr? And for the sake of completeness, even though we all know this by heart, a similar run from the current output format looks like: +++ regress check in src/test/regress +++ == creating temporary instance== == initializing database system == == starting postmaster== running on port 61696 with PID 61955 == creating database "regression" == CREATE DATABASE ALTER DATABASE ALTER DATABASE ALTER DATABASE ALTER DATABASE ALTER DATABASE ALTER DATABASE == running regression test queries== test test_setup ... ok 469 ms test tablespace ... ok 415 ms parallel group (20 tests): varchar char oid name int2 pg_lsn int4 txid text uuid regproc boolean money float4 int8 float8 bit enum numeric rangetypes boolean ... ok 105 ms char ... ok 64 ms name ... ok 70 ms varchar ... ok 55 ms <...snip...> memoize ... ok 149 ms stats... ok 873 ms parallel group (2 tests): event_trigger oidjoins event_trigger... FAILED 142 ms oidjoins ... FAILED 208 ms test fast_default ... ok 172 ms == shutting down postmaster == 2 of 214 tests failed. The differences that caused some tests to fail can be viewed in the file "//src/test/regress/regression.diffs". A copy of the test summary that you see above is saved in the file "//src/test/regress/regression.out". There is a slight reduction in information, for example around tests run serially vs in a parallel group. This could perhaps be handled by marking the test name in some way like "tablespace (serial) or prefixing with a symbol or somerthing. I can take a stab at that in case we think that level of detail is important to preserve. -- Daniel Gustafsson https://vmware.com/ v6-0001-Make-pg_regress-output-format-TAP-compliant.patch Description: Binary data
Re: RFC: Moving specific attributes from pg_attribute column into attoptions
Nikita Malakhov writes: > We already had some thoughts on storing, let's call them "optional" > attributes into 'attoptions' instead of extending the PG_ATTRIBUTE > table, and here came feedback from Andres Freund with a remark that > we're increasing the largest catalog table. So we decided to propose > moving these "optional" attributes from being the PG_ATTRIBUTE column to > be the part of 'attoptions' column of this table. This smells very much like what was done eons ago to create the pg_attrdef catalog. I don't have any concrete comments to make, only to suggest that that's an instructive parallel case. One thing that comes to mind immediately is whether this stuff could be unified with pg_attrdef instead of creating Yet Another catalog that has to be consulted on the way to getting any real work done. I think that pg_attrdef was originally separated to keep large default expressions from overrunning the maximum tuple size, a motivation that disappeared once we could TOAST system tables. However, nowadays it's still useful for it to be separate because it simplifies representation of dependencies of default expressions (pg_depend refers to OIDs of pg_attrdef entries for that). If we're thinking of moving anything that would need dependency management then it might need its own catalog, maybe? On the whole I'm not convinced that what you suggest will be a net win. pg_attrdef wins to the extent that there are a lot of columns with no non-null default and hence no need for any pg_attrdef entry. But the minute you move something that most tables need, like attcompression, you'll just have another bloated catalog to deal with. > Also, we suggest that options stored in 'attoptions' column could be packed > as JSON values. Please, no. Use of JSON in a SQL database pretty much always represents a failure to think hard enough about what you need to store. Sometimes it's not worth thinking all that hard; but I strenuously oppose applying that sort of standard in the system catalogs. regards, tom lane
plpython causes assertions with python debug build
Hello, hackers. I was investigating Valgrind issues with plpython. It turns out python itself doesn't play well with Valgrind in default build. Therefore I built python with valgrind related flags --with-valgrind --without-pymalloc and added debug flags just to be sure --with-pydebug --with-assertions It causes plpython's tests to fail on internal python's assertions. Example backtrace (python version 3.7, postgresql master branch): #8 0x7fbf02851662 in __GI___assert_fail "!PyErr_Occurred()" at assert.c:101 #9 0x7fbef9060d31 in _PyType_Lookup at Objects/typeobject.c:3117 #10 0x7fbef90461be in _PyObject_GenericGetAttrWithDict at Objects/object.c:1231 #11 0x7fbef9046707 in PyObject_GenericGetAttr at Objects/object.c:1309 #12 0x7fbef9043cdf in PyObject_GetAttr at Objects/object.c:913 #13 0x7fbef90458d9 in PyObject_GetAttrString at Objects/object.c:818 #14 0x7fbf02499636 in get_string_attr at plpy_elog.c:569 #15 0x7fbf02498ea5 in PLy_get_error_data at plpy_elog.c:420 #16 0x7fbf0249763b in PLy_elog_impl at plpy_elog.c:77 Looks like there several places where code tries to get attributes from error objects, and while code is ready for attribute absence, it doesn't clear AttributeError exception in that case. Attached patch adds 3 calls to PyErr_Clear() in places where code reacts on attribute absence. With this patch tests are passed well. There were similar findings before. Calls to PyErr_Clear were close to, but not exactly at, same places before were removed in 7e3bb08038 Fix access-to-already-freed-memory issue in plpython's error handling. Then one of PyErr_Clear were added in 1d2f9de38d Fix freshly-introduced PL/Python portability bug. But looks like there's need for more. PS. When python is compilled `--with-valgrind --without-pymalloc` Valgrind doesn't complain, so there are no memory related issues in plpython. regards -- Yura Sokolov y.sokolov diff --git a/src/pl/plpython/plpy_elog.c b/src/pl/plpython/plpy_elog.c index 7c627eacfbf..aa1cf17b366 100644 --- a/src/pl/plpython/plpy_elog.c +++ b/src/pl/plpython/plpy_elog.c @@ -359,7 +359,10 @@ PLy_get_sqlerrcode(PyObject *exc, int *sqlerrcode) sqlstate = PyObject_GetAttrString(exc, "sqlstate"); if (sqlstate == NULL) + { + PyErr_Clear(); return; + } buffer = PLyUnicode_AsString(sqlstate); if (strlen(buffer) == 5 && @@ -395,6 +398,7 @@ PLy_get_spi_error_data(PyObject *exc, int *sqlerrcode, char **detail, } else { + PyErr_Clear(); /* * If there's no spidata, at least set the sqlerrcode. This can happen * if someone explicitly raises a SPI exception from Python code. @@ -571,6 +575,10 @@ get_string_attr(PyObject *obj, char *attrname, char **str) { *str = pstrdup(PLyUnicode_AsString(val)); } + else if (val == NULL) + { + PyErr_Clear(); + } Py_XDECREF(val); }
Re: Making Vars outer-join aware
Richard Guo writes: > BTW, the comment just above the two calls to build_joinrel_tlist says: > * Create a new tlist containing just the vars that need to be output from > Here by 'vars' it means both plain Vars and PlaceHolderVars, right? If > not we may need to adjust this comment to also include PlaceHolderVars. I think it did intend just Vars because that's all that build_joinrel_tlist did; but we really should have updated it when we invented PlaceHolderVars, and even more so now that build_joinrel_tlist adds PHVs too. I changed the wording. > A minor comment is that seems we can get rid of phid inside > PlaceHolderInfo, since we do not do linear list searches any more. It's > some duplicate to the phid inside PlaceHolderVar. Currently there are > two places referencing PlaceHolderInfo->phid, remove_rel_from_query and > find_placeholder_info. We can use PlaceHolderVar->phid instead in both > the two places. Meh, I'm not excited about that. I don't think that the phid field is only there to make the search loops faster; it's the basic identity of the PlaceHolderInfo. regards, tom lane
Re: cataloguing NOT NULL constraints
> > I started this time around from the newest of my patches in those > threads, but the implementation has changed considerably from what's > there. > I don´t know exactly what will be the scope of this process you're working on, but there is a gap on foreign key constraint too. It is possible to have wrong values on a FK constraint if you disable checking of it with session_replication_role or disable trigger all I know you can create that constraint with "not valid" and it'll be checked when turned on. But if I just forgot that ... So would be good to have validate constraints which checks, even if it's already valid drop table if exists tb_pk cascade;create table tb_pk(key integer not null primary key); drop table if exists tb_fk cascade;create table tb_fk(fk_key integer); alter table tb_fk add constraint fk_pk foreign key (fk_key) references tb_pk (key); insert into tb_pk values(1); alter table tb_fk disable trigger all; --can be with session_replication_role too. insert into tb_fk values(5); --wrong values on that table Then, you could check alter table tb_fk validate constraint fk_pk or alter table tb_fk validate all constraints
Re: static libpq (and other libraries) overwritten on aix
Hi, On 2022-08-17 15:28:18 -0400, Robert Haas wrote: > On Wed, Aug 17, 2022 at 3:02 PM Andres Freund wrote: > > 2) Do we care about static libraries not suriving on AIX? There could also > > be > >a race in the buildrules leading to sometimes static libs sometimes > > shared > >libs winning, I think. > > Instead of overwriting the same file, can we not use different > filenames for different things? Not easily, as far as I understand. The way one customarily links to shared libraries on aix is to have an .a archive containing the shared library. That way the -lpq picks up libpq.a, which then triggers the shared library to be referenced. E.g. andres@gcc119:[/home/andres/src/postgres/build-ac]$ LIBPATH=$(pwd)/src/interfaces/libpq ldd src/bin/scripts/clusterdb src/bin/scripts/clusterdb needs: /usr/lib/libc.a(shr_64.o) /usr/lib/libpthread.a(shr_xpg5_64.o) /usr/lib/libreadline.a(libreadline.so.6) /home/andres/src/postgres/build-ac/src/interfaces/libpq/libpq.a(libpq.so.5) /unix /usr/lib/libcrypt.a(shr_64.o) /usr/lib/libcurses.a(shr42_64.o) /usr/lib/libpthreads.a(shr_xpg5_64.o) Note the .a(libpq.so.5) bit. Unfortunately that's exactly how one links to a static library as well. So we'd have to change the name used as -l$this between linking to a shared libpq and a static libpq. Greetings, Andres Freund
Re: shared-memory based stats collector - v70
On Tue, 16 Aug 2022 at 08:49, Drouvot, Bertrand wrote: > > > + if (p->key.kind != PGSTAT_KIND_RELATION) > + continue; Hm. So presumably this needs to be extended. Either to let the caller decide which types of stats to return or to somehow return all the stats intermixed. In my monitoring code I did the latter because I didn't think going through the hash table repeatedly would be very efficient. But it's definitely a pretty awkward API since I need a switch statement that explicitly lists each case and casts the result. > > 2) When I did the function attached above I tried to avoid returning > > the whole set and make it possible to process them as they arrive. > > Is it the way it has been done? (did not look at your function yet) I did it with callbacks. It was quick and easy and convenient for my use case. But in general I don't really like callbacks and would think some kind of iterator style api would be nicer. I am handling the stats entries as they turn up. I'm constructing the text output for each in a callback and buffering up the whole http response in a string buffer. I think that's ok but if I wanted to avoid buffering it up and do network i/o then I would think the thing to do would be to build the list of entry keys and then loop over that list doing a hash lookup for each one and generating the response for each out and writing it to the network. That way there wouldn't be anything locked, not even the hash table, while doing network i/o. It would mean a lot of traffic on the hash table though. > > -- on that note I wonder if I've done something > > wrong because I noted a few records with InvalidOid where I didn't > > expect it. > > It looks like that InvalidOid for the dbid means that the entry is for a > shared relation. Ah yes. I had actually found that but forgotten it. There's also a database entry with dboid=InvalidOid which is apparently where background workers with no database attached report stats. > I've in mind to add some filtering on the dbid (I think it could be > useful for monitoring tool with a persistent connection to one database > but that wants to pull the stats database per database). > > I don't think a look up through the local cache will work if the > entry/key is related to another database the API is launched from. Isn't there also a local hash table used to find the entries to reduce traffic on the shared hash table? Even if you don't take a snapshot does it get entered there? There are definitely still parts of this I'm working on a pretty vague understanding of :/ -- greg
Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Hi, On 2022-08-17 15:21:55 -0400, Robert Haas wrote: > On Wed, Aug 17, 2022 at 2:45 PM Andres Freund wrote: > > Given that the cleanup locks in question are "taken" *after* re-initializing > > the page, I'm doubtful that's a sane path forward. It seems quite likely to > > mislead somebody to rely on it working as a cleanup lock in the future. > > There's not a horde of people lining up to work on the hash index > code, but if you feel like writing and testing the more invasive fix, > I'm not really going to fight you over it. My problem is that the code right now is an outright lie. At the absolute very least this code needs a big honking "we check if we have a cleanup lock here, but that's just for show, because WE ALREADY OVERWROTE THE WHOLE PAGE". Greetings, Andres Freund
Re: static libpq (and other libraries) overwritten on aix
On Wed, Aug 17, 2022 at 3:02 PM Andres Freund wrote: > 2) Do we care about static libraries not suriving on AIX? There could also be >a race in the buildrules leading to sometimes static libs sometimes shared >libs winning, I think. Instead of overwriting the same file, can we not use different filenames for different things? -- Robert Haas EDB: http://www.enterprisedb.com
Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
On Wed, Aug 17, 2022 at 2:45 PM Andres Freund wrote: > On 2022-08-17 08:25:06 -0400, Robert Haas wrote: > > Regarding the question of whether we need a cleanup lock on the new > > bucket I am not really seeing the advantage of going down that path. > > Simply fixing this code to take a cleanup lock instead of hoping that > > it always gets one by accident is low risk and should fix the observed > > problem. Getting rid of the cleanup lock will be more invasive and I'd > > like to see some evidence that it's a necessary step before we take > > the risk of breaking things. > > Given that the cleanup locks in question are "taken" *after* re-initializing > the page, I'm doubtful that's a sane path forward. It seems quite likely to > mislead somebody to rely on it working as a cleanup lock in the future. There's not a horde of people lining up to work on the hash index code, but if you feel like writing and testing the more invasive fix, I'm not really going to fight you over it. -- Robert Haas EDB: http://www.enterprisedb.com
static libpq (and other libraries) overwritten on aix
Hi, I was hacking in making aix work with the meson patchset last night when I noticed this delightful bit: gmake -C src/interfaces/libpq ... rm -f libpq.a ar crs libpq.a fe-auth-scram.o fe-connect.o fe-exec.o fe-lobj.o fe-misc.o fe-print.o fe-protocol3.o fe-secure.o fe-trace.o legacy-pqsignal.o libpq-events.o pqexpbuffer.o fe-auth.o touch libpq.a ( echo '#! libpq.so.5'; gawk '/^[^#]/ {printf "%s\n",$1}' /home/andres/src/postgres/build-ac/../src/interfaces/libpq/exports.txt ) >libpq.exp gcc -maix64 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -O2 -pthread -D_REENTRANT -D_THREAD_SAFE -o libpq.so.5 libpq.a -Wl,-bE:libpq.exp -L../../../src/port -L../../../src/common -lpgcommon_shlib -lpgport_shlib -Wl,-bbigtoc -Wl,-blibpath:'/usr/local/pgsql/lib:/usr/lib:/lib' -Wl,-bnoentry -Wl,-H512 -Wl,-bM:SRE -lm rm -f libpq.a ar crs libpq.a libpq.so.5 we first create a static library libpq.a as normal, but then we overwrite it with the special aix way of packing up shared libraries, by packing them up in a static library. That part is correct, it's apparently the easiest way of getting applications to link to shared libraries on AIX (I think the -Wl,-bM:SRE is relevant for ensuring it'll be a dynamic link, rather than a static one). This likely has been going on for approximately forever. Two questions: 1) Do we continue building static libraries for libpq etc? 2) Do we care about static libraries not suriving on AIX? There could also be a race in the buildrules leading to sometimes static libs sometimes shared libs winning, I think. Greetings, Andres Freund
Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Hi, On 2022-08-17 08:25:06 -0400, Robert Haas wrote: > Regarding the question of whether we need a cleanup lock on the new > bucket I am not really seeing the advantage of going down that path. > Simply fixing this code to take a cleanup lock instead of hoping that > it always gets one by accident is low risk and should fix the observed > problem. Getting rid of the cleanup lock will be more invasive and I'd > like to see some evidence that it's a necessary step before we take > the risk of breaking things. Given that the cleanup locks in question are "taken" *after* re-initializing the page, I'm doubtful that's a sane path forward. It seems quite likely to mislead somebody to rely on it working as a cleanup lock in the future. Greetings, Andres Freund
Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Hi, On 2022-08-17 10:18:14 +0530, Amit Kapila wrote: > > Looking at the non-recovery code makes me even more suspicious: > > > > /* > > * Physically allocate the new bucket's primary page. We want to > > do this > > * before changing the metapage's mapping info, in case we can't > > get the > > * disk space. Ideally, we don't need to check for cleanup lock on > > new > > * bucket as no other backend could find this bucket unless meta > > page is > > * updated. However, it is good to be consistent with old bucket > > locking. > > */ > > buf_nblkno = _hash_getnewbuf(rel, start_nblkno, MAIN_FORKNUM); > > if (!IsBufferCleanupOK(buf_nblkno)) > > { > > _hash_relbuf(rel, buf_oblkno); > > _hash_relbuf(rel, buf_nblkno); > > goto fail; > > } > > > > > > _hash_getnewbuf() calls _hash_pageinit() which calls PageInit(), which > > memset(0)s the whole page. What does it even mean to check whether you > > effectively have a cleanup lock after you zeroed out the page? > > > > Reading the README and the comment above makes me wonder if this whole > > cleanup > > lock business here is just cargo culting and could be dropped? > > > > I think it is okay to not acquire a clean-up lock on the new bucket > page both in recovery and non-recovery paths. It is primarily required > on the old bucket page to avoid concurrent scans/inserts. As mentioned > in the comments and as per my memory serves, it is mainly for keeping > it consistent with old bucket locking. It's not keeping it consistent with bucket locking to zero out a page before getting a cleanup lock, hopefully at least. This code is just broken on multiple fronts, and consistency isn't a defense. Greetings, Andres Freund
Re: cataloguing NOT NULL constraints
References to past discussions and patches: https://postgr.es/m/cakoswnkn6hsyatuys8xzxzrcr-kl1okhs5-b9qd9bf1rad3...@mail.gmail.com https://www.postgresql.org/message-id/flat/1343682669-sup-2...@alvh.no-ip.org https://www.postgresql.org/message-id/20160109030002.GA671800@alvherre.pgsql I started this time around from the newest of my patches in those threads, but the implementation has changed considerably from what's there. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
RFC: Moving specific attributes from pg_attribute column into attoptions
Hi hackers! While working on Pluggable TOAST we extended the PG_ATTRIBUTE table with a new column 'atttoaster'. But is is obvious that this column is related to tables, columns and datatypes only, and is not needed for other attributes. You can find full discussion on Pluggable TOAST here https://www.postgresql.org/message-id/flat/224711f9-83b7-a307-b17f-4457ab73a...@sigaev.ru We already had some thoughts on storing, let's call them "optional" attributes into 'attoptions' instead of extending the PG_ATTRIBUTE table, and here came feedback from Andres Freund with a remark that we're increasing the largest catalog table. So we decided to propose moving these "optional" attributes from being the PG_ATTRIBUTE column to be the part of 'attoptions' column of this table. The first most suspected attributes to store in attoptions column are the 'atttoaster' and 'attcompression', because they are related to datatypes and table columns. Also, this change will allow setting options for custom Toasters, which makes a lot of sense too, according with an important [as we see it] 'force TOAST' option which is meant to force given value to be TOASTed bypassing existing logic (reference depends on tuple and value size). Also, we suggest that options stored in 'attoptions' column could be packed as JSON values. It seems to make a lot of sense to optimize PG_ATTRIBUTE structure and size with attributes related only to specific types, etc. We'd welcome any opinions, suggestions and advice! -- Regards, Nikita Malakhov https://postgrespro.ru/
cataloguing NOT NULL constraints
I've been working on having NOT NULL constraints have pg_constraint rows. Everything is working now. Some things are a bit weird, and I would like opinions on them: 1. In my implementation, you can have more than one NOT NULL pg_constraint row for a column. What should happen if the user does ALTER TABLE .. ALTER COLUMN .. DROP NOT NULL; ? Currently it throws an error about the ambiguity (ie. which constraint to drop). Using ALTER TABLE DROP CONSTRAINT works fine, and the 'attnotnull' bit is lost when the last one such constraint goes away. 2. If a table has a primary key, and a table is created that inherits from it, then the child has its column(s) marked attnotnull but there is no pg_constraint row for that. This is not okay. But what should happen? 1. a CHECK(col IS NOT NULL) constraint is created for each column 2. a PRIMARY KEY () constraint is created Note that I've chosen not to create CHECK(foo IS NOT NULL) pg_constraint rows for columns in the primary key, unless an explicit NOT NULL declaration is also given. Adding them would be a very easily solution to problem 2 above, but ISTM that such constraints would be redundant and not very nice. After gathering input on these thing, I'll finish the patch and post it. As far as I can tell, everything else is working (except the annoying pg_dump tests, see below). Thanks Implementation notes: In the current implementation I am using CHECK constraints, so these constraints are contype='c', conkey={col} and the corresponding expression. pg_attribute.attnotnull is still there, and it is set true when at least one "CHECK (col IS NOT NULL)" constraint (and it's been validated) or PRIMARY KEY constraint exists for the column. CHECK constraint names are no longer "tab_col_check" when the expression is CHECK (foo IS NOT NULL). The constraint is now going to be named "tab_col_not_null" If you say CREATE TABLE (a int NOT NULL), you'll get a CHECK constraint printed by psql: (this is a bit more noisy that previously and it changes a lot of regression tests output). 55489 16devel 1776237=# create table tab (a int not null); CREATE TABLE 55489 16devel 1776237=# \d tab Tabla «public.tab» Columna │ Tipo │ Ordenamiento │ Nulable │ Por omisión ─┼─┼──┼──┼─ a │ integer │ │ not null │ Restricciones CHECK: "tab_a_not_null" CHECK (a IS NOT NULL) pg_dump no longer prints NOT NULL in the table definition; rather, the CHECK constraint is dumped as a separate table constraint (still within the CREATE TABLE statement though). This preserves any possible constraint name, in case one was specified by the user at creation time. In order to search for the correct constraint for each column for various DDL actions, I just inspect each pg_constraint row for the table and match conkey and the CHECK expression. Some things would be easier with a new pg_attribute column that carries a pg_constraint.oid of the constraint for that column; however, that seems to be just catalog bloat and is not normalized, so I decided not to do it. Nice side-effect: if you add CHECK (foo IS NOT NULL) NOT VALID, and later validate that constraint, the attnotnull bit becomes set. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
[PATCH] Clarify the comments about varlena header encoding
Hi hackers, I noticed that the comments regarding bit layouts for varlena headers in postgres.h are somewhat misleading. For instance, when reading: ``` 00xx 4-byte length word, aligned, uncompressed data (up to 1G) ``` ... one can assume this is a 00xx byte followed by another 4 bytes (which is wrong). Also one can read this as "aligned, uncompressed data" (which again is wrong). ``` 1000 1-byte length word, unaligned, TOAST pointer ``` This is misleading too. The comments above this line say that `struct varatt_external` is a TOAST pointer. sizeof(varatt_external) = 16, plus 1 byte equals 17, right? However the documentation [1] claims the result should be 18: """ Allowing for the varlena header bytes, the total size of an on-disk TOAST pointer datum is therefore 18 bytes regardless of the actual size of the represented value. """ I did my best to get rid of any ambiguity. The patch is attached. [1]: https://www.postgresql.org/docs/current/storage-toast.html -- Best regards, Aleksander Alekseev v1-0001-Clarify-the-comments-about-varlena-header-encodin.patch Description: Binary data
Re: pg_walinspect: ReadNextXLogRecord's first_record argument
On Wed, Aug 17, 2022 at 12:29 PM Bharath Rupireddy wrote: > PSA v2 patches. These look OK to me. -- Robert Haas EDB: http://www.enterprisedb.com
Re: ICU for global collation
Hi, On Mon, Aug 15, 2022 at 03:06:32PM +0300, Marina Polyakova wrote: > > 1.1) It looks like there's a bug in the function get_db_infos > (src/bin/pg_upgrade/info.c), where the version of the old cluster is always > checked: > > if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500) > snprintf(query + strlen(query), sizeof(query) - strlen(query), >"'c' AS datlocprovider, NULL AS daticulocale, "); > else > snprintf(query + strlen(query), sizeof(query) - strlen(query), >"datlocprovider, daticulocale, "); > > With the simple patch > > diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c > index > df374ce4b362b4c6c87fc1fd0e476e5d6d353d9e..53ea348e211d3ac38334292bc16cb814bc13bb87 > 100644 > --- a/src/bin/pg_upgrade/info.c > +++ b/src/bin/pg_upgrade/info.c > @@ -319,7 +319,7 @@ get_db_infos(ClusterInfo *cluster) > > snprintf(query, sizeof(query), >"SELECT d.oid, d.datname, d.encoding, d.datcollate, > d.datctype, "); > - if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500) > + if (GET_MAJOR_VERSION(cluster->major_version) < 1500) > snprintf(query + strlen(query), sizeof(query) - strlen(query), >"'c' AS datlocprovider, NULL AS daticulocale, > "); > else > > I got the expected error during the upgrade: > > locale providers for database "template1" do not match: old "libc", new > "icu" > Failure, exiting Good catch. There's unfortunately not a lot of regression tests for ICU-initialized clusters. I'm wondering if the build-farm client could be taught about the locale provider rather than assuming libc. Clearly that wouldn't have caught this issue, but it should still increase the coverage a bit (I'm thinking of the recent problem with the abbreviated keys). > 1.2) It looks like the mentioned asserion in dbcommands.c conflicts with the > following lines earlier: > > if (dbiculocale == NULL) > dbiculocale = src_iculocale; > > The following patch works for me: > > diff --git a/src/backend/commands/dbcommands.c > b/src/backend/commands/dbcommands.c > index > b31a30550b025d48ba3cc250dc4c15f41f9a80be..17a2942341e528c01182fb6d4878580f2706bec9 > 100644 > --- a/src/backend/commands/dbcommands.c > +++ b/src/backend/commands/dbcommands.c > @@ -1048,6 +1048,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) > > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), >errmsg("ICU locale must be > specified"))); > } > + else > + dbiculocale = NULL; > > if (dblocprovider == COLLPROVIDER_ICU) > check_icu_locale(dbiculocale); I think it would be better to do that in the various variable initialization. Maybe switch the dblocprovider and dbiculocale initialization, and only initialize dbiculocale if dblocprovider is COLLPROVIDER_ICU. > 2) CREATE DATABASE does not always require the icu locale unlike initdb and > createdb: > > $ createdb mydb --locale en_US.UTF-8 --template template0 --locale-provider > icu > createdb: error: database creation failed: ERROR: ICU locale must be > specified > > $ psql -c "CREATE DATABASE mydb LOCALE \"en_US.UTF-8\" TEMPLATE template0 > LOCALE_PROVIDER icu" postgres > CREATE DATABASE > > I'm wondering if this is not a fully-supported feature (because createdb > creates an SQL command with LC_COLLATE and LC_CTYPE options instead of > LOCALE option) or is it a bug in CREATE DATABASE?.. From > src/backend/commands/dbcommands.c: > > if (dblocprovider == COLLPROVIDER_ICU && !dbiculocale) > { > if (dlocale && dlocale->arg) > dbiculocale = defGetString(dlocale); > } This discrepancy between createdb and CREATE DATABASE looks like an oversight, as createdb indeed interprets --locale as: if (locale) { if (lc_ctype) pg_fatal("only one of --locale and --lc-ctype can be specified"); if (lc_collate) pg_fatal("only one of --locale and --lc-collate can be specified"); lc_ctype = locale; lc_collate = locale; } AFAIK the fallback in the CREATE DATABASE case is expected as POSIX locale names should be accepted by icu, so this should work for createdb too.
Re: pg_walinspect: ReadNextXLogRecord's first_record argument
On Wed, Aug 17, 2022 at 8:52 PM Robert Haas wrote: > > On Wed, Aug 17, 2022 at 12:41 AM Bharath Rupireddy > wrote: > > Agreed. > > > > Here's a patch (for V15 as well) fixing this bug, please review. > > Couldn't you simplify this further by removing the lsn argument from > GetWALRecordInfo and using record->ReadRecPtr instead? Then > InitXLogReaderState's second argument could be XLogRecPtr instead of > XLogRecPtr *. Done. XLogFindNextRecord() stores the first valid record in EndRecPtr and the ReadRecPtr is set to InvalidXLogRecPtr by calling XLogBeginRead(). And XLogNextRecord() sets ReadRecPtr which we can use. PSA v2 patches. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/ v2-master-Use-correct-LSN-for-error-reporting-in-pg_walinsp.patch Description: Binary data v2-PG15-Use-correct-LSN-for-error-reporting-in-pg_walinsp.patch Description: Binary data
Re: pg_walinspect: ReadNextXLogRecord's first_record argument
On Wed, Aug 17, 2022 at 12:41 AM Bharath Rupireddy wrote: > Agreed. > > Here's a patch (for V15 as well) fixing this bug, please review. Couldn't you simplify this further by removing the lsn argument from GetWALRecordInfo and using record->ReadRecPtr instead? Then InitXLogReaderState's second argument could be XLogRecPtr instead of XLogRecPtr *. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Generalize ereport_startup_progress infrastructure
On Wed, Aug 17, 2022 at 4:30 AM Bharath Rupireddy wrote: > As I explained upthread [1], I'd vote for a single GUC at the entire > server level. If the users/customers request per-process or > per-operation progress report GUCs, we can then consider it. Well, I don't agree that either of the proposed new uses of this infrastructure are the right way to solve the problems in question, so worrying about how to name the GUCs when we have a bunch of uses of this infrastructure seems to me to be premature. The proposed use in the postmaster doesn't look very safe, so you either need to give up on that or figure out a way to make it safe. The proposed use in the checkpointer looks like it needs more design work, because it's not clear whether or how it should interact with log_checkpoints. While I agree that changing log_checkpoints into an integer value doesn't necessarily make sense, having some kind of new checkpoint logging that is completely unrelated to existing checkpoint logging doesn't necessarily make sense to me either. I do have some sympathy with the idea that if people care about operations that unexpectedly run for a long time, they probably care about all of them, and probably don't care about changing the timeout or even the enable switch for each one individually. Honestly, it's not very clear to me who would want to ever turn off the startup progress stuff, or why they'd want to change the interval. I added a GUC for it out of an abundance of caution, but I don't know why you'd really want a different setting. Maybe there's some reason, but it's not clear to me. At the same time, I don't think the overall picture here is too clear yet. I'm reluctant to commit to a specific UI for a feature whose scope we don't seem to know. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Propose a new function - list_is_empty
I wrote: > I'd drop the parens in these particular examples because they are > inconsistent with the other parts of the same "if" condition. After going through the patch I removed most but not all of the newly-added parens on those grounds. I also found a couple more spots that could be converted. Pushed with those changes. regards, tom lane
RE: s390x builds on buildfarm
Thanks Andrew. Mark, please let me know if I can help. Regards, Vivian Kong Linux on IBM Z Open Source Ecosystem IBM Canada Toronto Lab From: Andrew Dunstan Date: Wednesday, August 10, 2022 at 9:56 AM To: Vivian Kong , pgsql-hackers@lists.postgresql.org , mark.w...@enterprisedb.com Subject: [EXTERNAL] Re: s390x builds on buildfarm On 2022-08-10 We 09:04, Vivian Kong wrote: > > Hi, > > > > Are builds being paused on s390x as it looks like the s390x builds > were last run 15 days ago. If so, wondering what is the reason for > the pause and what is required to resume the builds? > The OS the builds were running on seems to have reached end of life. > Please let me know if we can help with getting them updated and resume > the builds. > > > > Mark, I think you run most or all of these. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: XLogBeginRead's header comment lies
On Wed, Aug 17, 2022 at 6:53 AM Dilip Kumar wrote: > Thinking again, there is already a code in XLogDecodeNextRecord() to > error out if XLP_FIRST_IS_CONTRECORD is set so probably we don't need > to do anything else and the previous patch with modified assert should > just work fine? Yeah, that looks right to me. I'm inclined to commit your patch with some changes to wording of the comments. I'm also inclined not to back-patch, since we don't know that this breaks anything for existing users of the xlogreader facility. If anyone doesn't want this committed or does want it back-patched, please speak up. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
shadow variables - pg15 edition
There's been no progress on this in the past discussions. https://www.postgresql.org/message-id/flat/877k1psmpf.fsf%40mailbox.samurai.com https://www.postgresql.org/message-id/flat/CAApHDvpqBR7u9yzW4yggjG%3DQfN%3DFZsc8Wo2ckokpQtif-%2BiQ2A%40mail.gmail.com#2d900bfe18fce17f97ec1f00800c8e27 https://www.postgresql.org/message-id/flat/MN2PR18MB2927F7B5F690065E1194B258E35D0%40MN2PR18MB2927.namprd18.prod.outlook.com But an unfortunate consequence of not fixing the historic issues is that it precludes the possibility that anyone could be expected to notice if they introduce more instances of the same problem (as in the first half of these patches). Then the hole which has already been dug becomes deeper, further increasing the burden of fixing the historic issues before being able to use -Wshadow. The first half of the patches fix shadow variables newly-introduced in v15 (including one of my own patches), the rest are fixing the lowest hanging fruit of the "short list" from COPT=-Wshadow=compatible-local I can't see that any of these are bugs, but it seems like a good goal to move towards allowing use of the -Wshadow* options to help avoid future errors, as well as cleanliness and readability (rather than allowing it to get harder to use -Wshadow). -- Justin >From 0b05b375a87d89f5d88e87d11956cf2ac15ea00f Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Tue, 16 Aug 2022 19:38:57 -0500 Subject: [PATCH 01/17] avoid shadow vars: pg_dump.c: i_oid backpatch to v15 commit d498e052b4b84ae21b3b68d5b3fda6ead65d1d4d Author: Robert Haas Date: Fri Jul 8 10:15:19 2022 -0400 Preserve relfilenode of pg_largeobject and its index across pg_upgrade. --- src/bin/pg_dump/pg_dump.c | 1 - 1 file changed, 1 deletion(-) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index da6605175a0..322947c5609 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -3144,7 +3144,6 @@ dumpDatabase(Archive *fout) PQExpBuffer loHorizonQry = createPQExpBuffer(); int i_relfrozenxid, i_relfilenode, - i_oid, i_relminmxid; /* -- 2.17.1 >From a76bac21fe428cdd6241bff6827e08d9d71e1bdf Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Tue, 16 Aug 2022 15:55:13 -0500 Subject: [PATCH 02/17] avoid shadow vars: pg_dump.c: tbinfo backpatch to v15 commit 9895961529ef8ff3fc12b39229f9a93e08bca7b7 Author: Tom Lane Date: Mon Dec 6 13:07:31 2021 -0500 Avoid per-object queries in performance-critical paths in pg_dump. --- src/bin/pg_dump/pg_dump.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 322947c5609..5c196d66985 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -7080,21 +7080,21 @@ getConstraints(Archive *fout, TableInfo tblinfo[], int numTables) appendPQExpBufferChar(tbloids, '{'); for (int i = 0; i < numTables; i++) { - TableInfo *tbinfo = &tblinfo[i]; + TableInfo *mytbinfo = &tblinfo[i]; /* * For partitioned tables, foreign keys have no triggers so they must * be included anyway in case some foreign keys are defined. */ - if ((!tbinfo->hastriggers && - tbinfo->relkind != RELKIND_PARTITIONED_TABLE) || - !(tbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)) + if ((!mytbinfo->hastriggers && + mytbinfo->relkind != RELKIND_PARTITIONED_TABLE) || + !(mytbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)) continue; /* OK, we need info for this table */ if (tbloids->len > 1) /* do we have more than the '{'? */ appendPQExpBufferChar(tbloids, ','); - appendPQExpBuffer(tbloids, "%u", tbinfo->dobj.catId.oid); + appendPQExpBuffer(tbloids, "%u", mytbinfo->dobj.catId.oid); } appendPQExpBufferChar(tbloids, '}'); -- 2.17.1 >From f6a814fd50800942081250b05f8e6d143b8d8266 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Tue, 16 Aug 2022 16:22:52 -0500 Subject: [PATCH 03/17] avoid shadow vars: pg_dump.c: owning_tab backpatch to v15 commit 344d62fb9a978a72cf8347f0369b9ee643fd0b31 Author: Peter Eisentraut Date: Thu Apr 7 16:13:23 2022 +0200 Unlogged sequences --- src/bin/pg_dump/pg_dump.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 5c196d66985..ecf29f3c52a 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -16799,7 +16799,7 @@ dumpSequence(Archive *fout, const TableInfo *tbinfo) */ if (OidIsValid(tbinfo->owning_tab) && !tbinfo->is_identity_sequence) { - TableInfo *owning_tab = findTableByOid(tbinfo->owning_tab); + owning_tab = findTableByOid(tbinfo->owning_tab); if (owning_tab == NULL) pg_fatal("failed sanity check, parent table with OID %u of sequence with OID %u not found", -- 2.17.1 >From 1a979be65baab871754f86669c5f0327fad6cab5 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Wed, 17 Aug 2022 08:52:03 -0500 Subject: [PATCH 04/17] avoid shadow vars: tablesync.c: first back
Re: POC: GROUP BY optimization
On 8/2/22 13:14, David Rowley wrote: > On Tue, 2 Aug 2022 at 22:21, Michael Paquier wrote: >> >> On Fri, Jul 15, 2022 at 09:46:51PM +0200, Tomas Vondra wrote: >>> I agree this is a mistake in db0d67db2 that makes the test useless. >> >> Tomas, please note that this is an open item assigned to you. Are you >> planning to do something with these regression tests by beta3? > > There's still something to do there for PG15, but 1349d2790 (just gone > in to master) made those two tests run as a parallel query again. > Hi, I've been looking at this, and as I speculated before it's due to the sort costing changing a little bit. On PG14, the costing of the plans looks like this: Gather (cost=1869.39..2823.15 rows=8 width=52) and with disabled parallelism, it's like this: Append (cost=998.04..3000.64 rows=10 width=52) so there's a (fairly small) diffrence in favor of the parallel plan. But on PG15 it's the other way around - the selected non-parallel one is costed like this: Append (cost=779.41..2490.61 rows=10 width=52) and by setting parallel_setup_cost=0 you get this: Gather (cost=700.34..1531.76 rows=8 width=52) So with the setup cost included it's ~2531, and it loses to the simple plan. This happens because the patch changed sort costing - the same sort on PG14 and PG15 looks like this: PG14: -> Sort (cost=998.04..1028.04 rows=12000 width=13) PG15: -> Sort (cost=779.41..809.41 rows=12000 width=13) As mentioned, the commit tweaked sort costing - before it was pretty much just comparison_cost * tuples * LOG2(tuples) but the patch needs to cost different pathkey orderings, and consider that maybe we don't need to compare all the keys (which the original costing kind assumes). That's the whole point of this optimization. The costing (compute_cpu_sort_cost) considers a couple other things, like cost of the comparator function for the data type, width of the values, groupings determined by preceding keys, and so on. It might seem strange that a query with a single pathkey changes, but that single pathkey is costed the same way, of course. In principle we might have a special costing for this case, but I'd bet that would result in pretty surprising inconsistencies when adding a sort key (going from 1 to 2 keys). So I don't think the current costing is wrong, but it certainly is more complex. But the test does not test what it intended - I have two ideas how to make it work: 1) increase the number of rows in the table 2) increase cpu_operator_cost (for that one test?) 3) tweak the costing somehow, to increase the cost a bit Both (1) and (2) work - I've tried doubling the number of rows or setting the operator cost to 0.005, and that does the trick, but maybe a smaller change would be enough. I don't like (3) very much - changing the costing just to get the same test behavior as on older release does not seem very principled. Yes, maybe it should be tweaked, but not because of a regression test. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company-- Without ORDER BY clause, to test Gather at top-most path EXPLAIN (COSTS ON) SELECT a, sum(b), array_agg(distinct c), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3; QUERY PLAN --- Gather (cost=1869.39..2823.15 rows=8 width=52) Workers Planned: 2 -> Parallel Append (cost=869.39..1822.35 rows=4 width=52) -> GroupAggregate (cost=998.04..1178.25 rows=4 width=52) Group Key: pagg_tab_ml.a Filter: (avg(pagg_tab_ml.b) < '3'::numeric) -> Sort (cost=998.04..1028.04 rows=12000 width=13) Sort Key: pagg_tab_ml.a -> Seq Scan on pagg_tab_ml_p1 pagg_tab_ml (cost=0.00..185.00 rows=12000 width=13) -> GroupAggregate (cost=869.39..1019.56 rows=3 width=52) Group Key: pagg_tab_ml_5.a Filter: (avg(pagg_tab_ml_5.b) < '3'::numeric) -> Sort (cost=869.39..894.39 rows=1 width=13) Sort Key: pagg_tab_ml_5.a -> Append (cost=0.00..205.00 rows=1 width=13) -> Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_5 (cost=0.00..108.00 rows=7000 width=13) -> Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_6 (cost=0.00..47.00 rows=3000 width=13) -> GroupAggregate (cost=682.63..802.77 rows=3 width=52) Group Key: pagg_tab_ml_2.a Filter: (avg(pagg_tab_ml_2.b) < '3'::numeric) -> Sort (cost=682.63..702.63 rows=8000 width=13) Sort Key: pagg_tab_ml_2.a -> Append (cost=0.00..164.00 rows=8000 width=13) -> Seq Scan on pagg_t
Re: Setting log_connection in connection string doesn't work
On Wed, Aug 17, 2022 at 10:29:26AM -0400, Tom Lane wrote: > Bruce Momjian writes: > > This patch is from October of 2021. I don't see any commitfest entry > > for it. Should it be applied? > > I think we decided not to. The original argument for having these > be PGC_SU_BACKEND was to try to ensure that you got matching > connection and disconnection log entries for any one session, > and I don't see anything that supersedes that plan. Okay, thanks for the feedback. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: Setting log_connection in connection string doesn't work
Bruce Momjian writes: > This patch is from October of 2021. I don't see any commitfest entry > for it. Should it be applied? I think we decided not to. The original argument for having these be PGC_SU_BACKEND was to try to ensure that you got matching connection and disconnection log entries for any one session, and I don't see anything that supersedes that plan. regards, tom lane
Re: Setting log_connection in connection string doesn't work
This patch is from October of 2021. I don't see any commitfest entry for it. Should it be applied? --- On Wed, Oct 27, 2021 at 11:53:09AM +0900, Kyotaro Horiguchi wrote: > At Wed, 27 Oct 2021 10:55:31 +0900, Michael Paquier > wrote in > > On Wed, Oct 27, 2021 at 10:24:05AM +0900, Kyotaro Horiguchi wrote: > > > I don't know. The fact is that it's a superuser-backend variable that > > > is silently ignored (but acutally seems to be set in the session). > > > Setting log_disconnection the same way works (of course the impliction > > > of this is far less significant that the log_connection case). > > > > fe550b2 is the commit that has changed both those parameters to be > > PGC_SU_BACKEND, with the commit log mentioning the case you are > > describing. That would be the area of this thread: > > https://www.postgresql.org/message-id/20408.1404329...@sss.pgh.pa.us > > Thanks for the pointer. (I didn't remember of that thread..) > > > As Tom and this thread are saying, there may be a use-case for > > making log_connections more effective at startup so as superusers > > could hide their logs at will. However, honestly, I am not sure that > > this is worth spending time improving this as the use-case looks > > rather thin to me. Perhaps you are right and we could just mark both > > I tend to agree. > > > of those GUCs as PGC_SIGHUP, making the whole easier to understand and > > more consistent, though. If we do that, the patch is wrong, as the > > docs would also need a refresh. > > Yeah, this is the full version of the patch. > > regards. > > -- > Kyotaro Horiguchi > NTT Open Source Software Center > >From 11a9612c2590f57f431c3918d5b62c08a5b29efb Mon Sep 17 00:00:00 2001 > From: Kyotaro Horiguchi > Date: Wed, 27 Oct 2021 11:39:02 +0900 > Subject: [PATCH] Change log_(dis)connections to PGC_SIGHUP > > log_connections is not effective when it is given in connection > options. Since no complaint has been heard for this behavior the > use-case looks rather thin. Thus we change it to PGC_SIGHUP, rahther > than putting efforts to make it effective for the > use-case. log_disconnections is working with the usage but be > consistent by treating it the same way with log_connection. > --- > doc/src/sgml/config.sgml | 8 > src/backend/utils/misc/guc.c | 4 ++-- > 2 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml > index de77f14573..64b04a47d2 100644 > --- a/doc/src/sgml/config.sgml > +++ b/doc/src/sgml/config.sgml > @@ -6800,8 +6800,8 @@ local0.*/var/log/postgresql > Causes each attempted connection to the server to be logged, > as well as successful completion of both client authentication (if > necessary) and authorization. > -Only superusers can change this parameter at session start, > -and it cannot be changed at all within a session. > +This parameter can only be set in the > postgresql.conf > +file or on the server command line. > The default is off. > > > @@ -6827,8 +6827,8 @@ local0.*/var/log/postgresql > Causes session terminations to be logged. The log output > provides information similar to log_connections, > plus the duration of the session. > -Only superusers can change this parameter at session start, > -and it cannot be changed at all within a session. > +This parameter can only be set in the > postgresql.conf > +file or on the server command line. > The default is off. > > > diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c > index e91d5a3cfd..57d810c80d 100644 > --- a/src/backend/utils/misc/guc.c > +++ b/src/backend/utils/misc/guc.c > @@ -1353,7 +1353,7 @@ static struct config_bool ConfigureNamesBool[] = > NULL, NULL, NULL > }, > { > - {"log_connections", PGC_SU_BACKEND, LOGGING_WHAT, > + {"log_connections", PGC_SIGHUP, LOGGING_WHAT, > gettext_noop("Logs each successful connection."), > NULL > }, > @@ -1362,7 +1362,7 @@ static struct config_bool ConfigureNamesBool[] = > NULL, NULL, NULL > }, > { > - {"log_disconnections", PGC_SU_BACKEND, LOGGING_WHAT, > + {"log_disconnections", PGC_SIGHUP, LOGGING_WHAT, > gettext_noop("Logs end of a session, including > duration."), > NULL > }, > -- > 2.27.0 > -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: tests and meson - test names and file locations
On 12.08.22 18:29, Andres Freund wrote: I don't really understand which problem this solves and how. Sure, the test output is somewhat complex, but I know where it is and I've never found myself wishing it to be somewhere else. I'd like the buildfarm and CI a) use parallelism to run tests (that's why the BF is slow) b) show the logfiles for exactly the failed test ([1]). We can of course iterate through the whole directory tree, somehow identify which log files are for which test, and then select the log files for the failed tests. But that's much easier to do then when you have a uniform directory hierarchy, where you can test which tests have failed based on the filesystem alone. My initial experiences with testing under meson is that it's quite fragile and confusing (unlike the building, which is quite robust and understandable). Some of that is the fault of meson, some of that is our implementation. Surely this can be improved over time, but my experience has been that it's not there yet. The idea that we are going to move all the test output files somewhere else at the same time is not appealing to me. The combination of fragile plus can't find the diagnostics is not a good one. Now, this is my experience; others might have different ones. Also, is there anything in these proposed changes that couldn't also be applied to the old build system? We are going to be running them in parallel for some time. It would be good if one doesn't have to learn two entirely different sets of testing interfaces.
Re: [RFC] building postgres with meson - v11
On 11.08.22 02:20, Andres Freund wrote: Attached is a new version of the meson patchset. Plenty changes: I have various bits of comments on this. - There are various references to "pch" (pre-compiled headers). Is there more discussion anywhere about this? I don't know what this would entail or whether there are any drawbacks to be aware of. The new *_pch.h files don't have any comments. Maybe this should be a separate patch later. - About relativize_shared_library_references: We have had several patches over the years for working around SIP stuff, and some of them did essentially this, but we decided not to go ahead with them. We could revisit that, but it should be a separate patch, not mixed in with this. - postgresql-extension.pc: Similarly, this ought to be a separate patch. If we want people to use this, we'll need it in the makefile build system anyway. - -DFRONTEND is used somewhat differently from the makefiles. For example, meson sets -DFRONTEND for pg_controldata, but the makefiles don't. Conversely, the makefiles set -DFRONTEND for ecpglib, but meson does not. This should be checked again to make sure it all matches up. - Option name spelling should be make consistent about underscores versus hyphens. Built-in meson options use underscores, so we should make the user-defined ones like that as well (some already do). (wal-blocksize krb-srvnam system-tzdata tap-tests bsd-auth) - I have found the variable name "cdata" for configuration_data() to be less than clear. I see some GNOME projects to it that way, is that where it's from? systemd uses "conf", maybe that's better. - In the top-level meson.build, the "renaming" of the Windows system name host_system = host_machine.system() == 'windows' ? 'win32' : host_machine.system() build_system = build_machine.system() == 'windows' ? 'win32' : build_machine.system() seems unnecessary to me. Why not stick with the provided names? - The c99_test ought to be not needed if the c_std project option is used. Was there a problem with that? - Is there a way to split up the top-level meson.build somehow? Maybe just throw some stuff into included files? This might get out of hand at some point. - The PG_SYSROOT detection gives different results. On my system, configure produces /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk, meson produces /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk. src/template/darwin goes out of its way to get a version-specific result, so we need to carry that over somehow. (The difference does result in differences in the built binaries.) Then, some patches from me: 0001-Change-shared-library-installation-naming-on-macOS.patch This changes the makefiles to make the shared library file naming on macOS match what meson produces. I don't know what the situation is on other platforms. 0002-meson-Fix-installation-name-of-libpgfeutils.patch This was presumably an accidental mistake. 0003-meson-Libraries-need-to-be-built-with-DSO_MAJOR_VERS.patch This is needed to make NLS work for the libraries. 0004-meson-Add-darwin_versions-argument-for-libraries.patch This is to make the output match what Makefile.shlib produces. 0005-meson-Fix-link-order-of-support-libraries.patch 0006-meson-Make-link-order-of-external-libraries-match-ma.patch 0007-WIP-meson-Make-link-order-of-object-files-match-make.patch I have analyzed the produced binaries between both build systems to make sure they match. If we link the files and libraries in different orders, that becomes difficult. So this fixes this up a bit. 0005 is needed for correctness in general, I think. 0006 is mostly cosmetic. You probably wanted to make the library order alphabetical in the meson files, which I'd support, but then we should change the makefiles to match. Similarly, 0007, which is clearly a bit messy at the moment, but we should try to sort that out either in the old or the new build files. And finally some comments on your patches: meson: prereq: Don't add HAVE_LDAP_H HAVE_WINLDAP_H to pg_config.h. This can go ahead. meson: prereq: fix warning compat_informix/rnull.pgc with msvc - $float f = 3.71; + $float f = (float) 3.71; This could use float literals like + $float f = 3.71f;From 054b24e9ae5fc859f13c05d8150ef1168a477ce4 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 17 Aug 2022 14:18:52 +0200 Subject: [PATCH 1/7] Change shared library installation naming on macOS It is not customary to install a shared library with a minor version number (libpq.5.16.dylib) on macOS. We just need the file with the major version number (libpq.5.dylib) and the one without version number (libpq.dylib). This also matches the installation layout used by Meson. --- src/Makefile.shlib | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/
Re: Propose a new function - list_is_empty
Junwang Zhao writes: > There are some places that add extra parenthesis like here > while (list_length(sortclause) > list_length(previous) && > -list_length(new_elems) > 0) > +(new_elems != NIL)) > Is it necessary to add that extra parenthesis? I'd drop the parens in these particular examples because they are inconsistent with the other parts of the same "if" condition. I concur with Daniel's point that parens can be useful as a visual aid even when they aren't strictly necessary --- but I don't think we should make future readers wonder why one half of the "if" is parenthesized and the other isn't. regards, tom lane
Re: Amcheck verification of GiST and GIN
> On 23 Jul 2022, at 14:40, Andrey Borodin wrote: > > Done. PFA attached patchset. > > Best regards, Andrey Borodin. > Here's v13. Changes: 1. Fixed passing through downlink in GIN index 2. Fixed GIN tests (one test case was not working) Thanks to Vitaliy Kukharik for trying this patches. Best regards, Andrey Borodin. v13-0001-Refactor-amcheck-to-extract-common-locking-routi.patch Description: Binary data v13-0002-Add-gist_index_parent_check-function-to-verify-G.patch Description: Binary data v13-0003-Add-gin_index_parent_check-to-verify-GIN-index.patch Description: Binary data
Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
On Tue, Aug 16, 2022 at 8:38 PM Andres Freund wrote: > I don't think we can defend against lwlock deadlocks where somebody doesn't > follow the AM's deadlock avoidance strategy. That's a good way of putting it. Tom seems to be postulating that maybe someone can use random tools that exist to take buffer locks and pins in arbitrary order, and if that is true then you can make any AM deadlock. I think it isn't true, though, and I think if it were true the right fix would be to remove the tools that are letting people do that. There's also zero evidence that this was ever intended as a deadlock avoidance maneuver. I think that we are only hypothesizing that it was intended that way because the code looks weird. But I think the email discussion shows that I thought it was wrong at the time it was committed, and just missed the fact that the final version of the patch hadn't fixed it. And if it *were* a deadlock avoidance maneuver it would still be pretty broken, because it would make the startup process error out and the whole system go down. Regarding the question of whether we need a cleanup lock on the new bucket I am not really seeing the advantage of going down that path. Simply fixing this code to take a cleanup lock instead of hoping that it always gets one by accident is low risk and should fix the observed problem. Getting rid of the cleanup lock will be more invasive and I'd like to see some evidence that it's a necessary step before we take the risk of breaking things. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Regarding availability of 32bit client drivers for postgresql 13/14
On Wed, Aug 17, 2022 at 03:43:55AM +, Aravind Phaneendra wrote: > Based on the above requirements and details, we have few questions which > require your support. > > 1. Can we get 32bit client binaries/libraries for postgresql 14 ? > 2. We also found that the libraries can be built by using the postgresql 14 > source. Is it possible to build the 32bit client binaries/libraries from > the source available ? > 3. Is there an official support for 32bit client libraries/binaries built out > of source for customers ? > 4. Can the postgresql 10.12.1 client work with Postgresql 14 server ? Do you > still support postgresql 10.12.1 client ? The community produces the source code, and third parties like Debian, Red Hat, EDB, and our own packagers build the binaries you are asking about. I think you need to contact wherever you are getting your binaries and ask them about 32-bit support. You can certainly build 32-bit binaries yourself if you wish. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: XLogBeginRead's header comment lies
On Wed, Aug 17, 2022 at 11:31 AM Dilip Kumar wrote: > > On Wed, Aug 17, 2022 at 11:18 AM Dilip Kumar wrote: > > > > On Tue, Aug 16, 2022 at 11:28 PM Robert Haas wrote: > > > > > > > Yeah I think it makes sense to make it work as per the comment in > > XLogBeginRecord(). I think if we modify the Assert as per the comment > > of XLogBeginRecord() then the remaining code of the > > XLogDecodeNextRecord() is capable enough to take care of skipping the > > page header if we are pointing at the beginning of the block. > > > > See attached patch. > > > > I think that is not sufficient, if there is a record continuing from > the previous page and we are pointing to the start of the page then > this assertion is not sufficient. I think if the > targetRecOff is zero then we should additionally read the header and > verify that XLP_FIRST_IS_CONTRECORD is not set. Thinking again, there is already a code in XLogDecodeNextRecord() to error out if XLP_FIRST_IS_CONTRECORD is set so probably we don't need to do anything else and the previous patch with modified assert should just work fine? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Proposal: CREATE/ALTER DOMAIN ... STORAGE/COMPRESSION = ...
Hi hackers, Do you think it will be useful to specify STORAGE and/or COMPRESSION for domains? As an example, this will allow creating an alias for TEXT with EXTERNAL storage strategy. In other words, to do the same we do with ALTER TABLE, but for types. This feature is arguably not something most people are going to use, but it shouldn't be difficult to implement and/or maintain either. Thoughts? -- Best regards, Aleksander Alekseev
Re: Making Vars outer-join aware
On Wed, Aug 17, 2022 at 4:57 AM Tom Lane wrote: > On further thought, it seems better to maintain the index array > from the start, allowing complete replacement of the linear list > searches. We can add a separate bool flag to denote frozen-ness. +1 for 0001 patch. Now we process plain Vars and PlaceHolderVars in a more consistent way when building joinrel's tlist. And this change would make it easier to build up phnullingrels for PHVs as we climb up the join tree. BTW, the comment just above the two calls to build_joinrel_tlist says: * Create a new tlist containing just the vars that need to be output from Here by 'vars' it means both plain Vars and PlaceHolderVars, right? If not we may need to adjust this comment to also include PlaceHolderVars. 0002 patch looks good to me. Glad we can get rid of create_new_ph flag. A minor comment is that seems we can get rid of phid inside PlaceHolderInfo, since we do not do linear list searches any more. It's some duplicate to the phid inside PlaceHolderVar. Currently there are two places referencing PlaceHolderInfo->phid, remove_rel_from_query and find_placeholder_info. We can use PlaceHolderVar->phid instead in both the two places. Thanks Richard
Re: Bug: When user-defined AM is used, the index path cannot be selected correctly
On 2022/8/17 10:03, Tom Lane wrote: Quan Zongliang writes: 1. When using extended PGroonga ... 3. Neither ID = 'f' nor id= 't' can use the index correctly. This works fine for btree indexes. I think the actual problem is that IsBooleanOpfamily only accepts the btree and hash opclasses, and that's what needs to be improved. Your proposed patch fails to do that, which makes it just a crude hack that solves some aspects of the issue (and probably breaks other things). It might work to change IsBooleanOpfamily so that it checks to see whether BooleanEqualOperator is a member of the opclass. That's basically what we need to know before we dare generate substitute index clauses. It's kind of an expensive test though, and the existing coding assumes that IsBooleanOpfamily is cheap ... regards, tom lane New patch attached. It seems that partitions do not use AM other than btree and hash. Rewrite only indxpath.c and check if it is a custom AM.diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 7d176e7b00..30df5cb2f6 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -23,6 +23,7 @@ #include "catalog/pg_operator.h" #include "catalog/pg_opfamily.h" #include "catalog/pg_type.h" +#include "commands/defrem.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" #include "nodes/supportnodes.h" @@ -2295,7 +2296,7 @@ match_clause_to_indexcol(PlannerInfo *root, /* First check for boolean-index cases. */ opfamily = index->opfamily[indexcol]; - if (IsBooleanOpfamily(opfamily)) + if (IsBooleanAmOpfamily(index->relam, opfamily)) { iclause = match_boolean_index_clause(root, rinfo, indexcol, index); if (iclause) diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h index 8dc9ce01bb..5c4cc616d8 100644 --- a/src/include/catalog/pg_opfamily.h +++ b/src/include/catalog/pg_opfamily.h @@ -58,6 +58,12 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_opfamily_oid_index, 2755, OpfamilyOidIndexId, on pg #define IsBooleanOpfamily(opfamily) \ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID) +#define IsBooleanAmOpfamily(amid, opfamily) \ + ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID || \ + ((amid) >= FirstNormalObjectId && \ + OidIsValid(GetDefaultOpClass(BOOLOID, (amid \ + ) + #endif /* EXPOSE_TO_CLIENT_CODE */ #endif /* PG_OPFAMILY_H */
Re: Generalize ereport_startup_progress infrastructure
On Wed, Aug 17, 2022 at 2:45 AM Nathan Bossart wrote: > > On Wed, Aug 10, 2022 at 11:00:20AM -0400, Robert Haas wrote: > > Maybe the checkpointer is a better candidate, but somehow I feel that > > we can't consider this sort of thing separate from the existing > > progress reporting that checkpointer already does. Perhaps we need to > > think of changing or improving that in some way rather than adding > > something wholly new alongside the existing system. > > I agree that the checkpointer has a good chance of being a better > candidate. Are you thinking of integrating this into log_checkpoints > somehow? Perhaps this parameter could optionally accept an interval for > logging the progress of ongoing checkpoints. Certainly the checkpointer is an immediate candidate. For instance, I can think of using ereport_progress() in CheckPointSnapBuild() for snapshot files processing, CheckPointLogicalRewriteHeap() for mapping files processing, BufferSync() for checkpointing dirty buffers (?), ProcessSyncRequests() for processing fsync() requests, RemoveOldXlogFiles(), RemoveNonParentXlogFiles()(?). I personally have seen cases where some of these checkpoint operations take a lot of time in production environments and a better observability would help a lot. However, I'm not sure if turning log_checkpoints to an integer type to use for checkpoint progress reporting is a good idea here. As I explained upthread [1], I'd vote for a single GUC at the entire server level. If the users/customers request per-process or per-operation progress report GUCs, we can then consider it. Thoughts? [1] https://www.postgresql.org/message-id/CALj2ACUJA73nCK_Li7v4_OOkRqwQBX14Fx2ALb7GDRwUTNGK-Q%40mail.gmail.com -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Re: Propose a new function - list_is_empty
> On 17 Aug 2022, at 10:13, Junwang Zhao wrote: > > There are some places that add extra parenthesis like here > > ... > > Is it necessary to add that extra parenthesis? It's not necessary unless needed for operator associativity, but also I don't object to grouping with parenthesis as a visual aid for the person reading the code. Going over the patch in more detail I might change my mind on some but I don't object to the practice in general. -- Daniel Gustafsson https://vmware.com/
Re: Propose a new function - list_is_empty
There are some places that add extra parenthesis like here --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -3097,7 +3097,7 @@ reorder_grouping_sets(List *groupingsets, List *sortclause) GroupingSetData *gs = makeNode(GroupingSetData); while (list_length(sortclause) > list_length(previous) && -list_length(new_elems) > 0) +(new_elems != NIL)) { and here, --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -3408,7 +3408,7 @@ estimate_num_groups_incremental(PlannerInfo *root, List *groupExprs, * for normal cases with GROUP BY or DISTINCT, but it is possible for * corner cases with set operations.) */ - if (groupExprs == NIL || (pgset && list_length(*pgset) < 1)) + if (groupExprs == NIL || (pgset && (*pgset == NIL))) return 1.0; Is it necessary to add that extra parenthesis? On Wed, Aug 17, 2022 at 3:33 PM Daniel Gustafsson wrote: > > > On 17 Aug 2022, at 03:09, Peter Smith wrote: > > > > On Wed, Aug 17, 2022 at 6:34 AM Daniel Gustafsson wrote: > >> > >>> On 16 Aug 2022, at 16:03, Tom Lane wrote: > >> > >>> I agree though that while simplifying list_length() calls, I'd lean to > >>> using > >>> explicit comparisons to NIL. > >> > >> > >> Agreed, I prefer that too. > >> > > > > Thanks for the feedback. > > > > PSA patch v3 which now uses explicit comparisons to NIL everywhere, > > and also addresses the other review comments. > > From reading, this version of the patch looks good to me. > > -- > Daniel Gustafsson https://vmware.com/ > > > -- Regards Junwang Zhao
Re: Generalize ereport_startup_progress infrastructure
On Wed, Aug 10, 2022 at 6:21 PM Nitin Jadhav wrote: > > Given two options, option-1 is to use a single GUC across all kind of > log running operations and option-2 is to use multiple GUCs (one for > each kind of long running operations), I go with option-1 because if a > user is interested to see a log message after every 10s for startup > operations (or any other long running operations) then it is likely > that he is interested to see other long running operations after every > 10s only. It does not make sense to use different intervals for each > kind of long running operation here. It also increases the number of > GUCs which makes things complex. So it is a good idea to use a single > GUC here. +1. > But I am worried about the on/off switch as Robert > mentioned. Are you worried that users might want to switch off the progress report messages at process level, for instance, they want to log the startup process' long running operations progress but not, say, checkpointer or postmaster? IMO, a long running operation, if it is happening in any of the processes, is a concern for the users and having progress report log messages for them would help users debug any issues or improve observability of the server as a whole. With single GUC, the server log might contain progress reports of all the long running (wherever we use this ereport_progress()) operations in the entire server's lifecycle, which isn't bad IMO. I'd still vote for a single GUC log_progress_report_interval without worrying much about process-level enable/disable capability. However, let's hear what other hackers think about this. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Re: Propose a new function - list_is_empty
> On 17 Aug 2022, at 03:09, Peter Smith wrote: > > On Wed, Aug 17, 2022 at 6:34 AM Daniel Gustafsson wrote: >> >>> On 16 Aug 2022, at 16:03, Tom Lane wrote: >> >>> I agree though that while simplifying list_length() calls, I'd lean to using >>> explicit comparisons to NIL. >> >> >> Agreed, I prefer that too. >> > > Thanks for the feedback. > > PSA patch v3 which now uses explicit comparisons to NIL everywhere, > and also addresses the other review comments. From reading, this version of the patch looks good to me. -- Daniel Gustafsson https://vmware.com/
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
On Wed, Aug 17, 2022 at 2:02 AM Nathan Bossart wrote: > > On Wed, Aug 10, 2022 at 03:28:25PM +0530, Bharath Rupireddy wrote: > > snprintf(path, sizeof(path), "pg_logical/mappings/%s", > > mapping_de->d_name); > > - if (lstat(path, &statbuf) == 0 && !S_ISREG(statbuf.st_mode)) > > + if (get_dirent_type(path, mapping_de, false, LOG) != > > PGFILETYPE_REG) > > continue; > > Previously, failure to lstat() wouldn't lead to skipping the entry. With > this patch, a failure to determine the file type will cause the entry to be > skipped. This might be okay in some places (e.g., CheckPointSnapBuild()) > but not in others. For example, in CheckPointLogicalRewriteHeap(), this > could cause us to skip fsync-ing a file due to a get_dirent_type() failure, > which seems bad. Hm. I corrected it in the v16 patch, please review. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/ v16-0001-Make-more-use-of-get_dirent_type-in-place-of-sta.patch Description: Binary data
Re: Handle infinite recursion in logical replication setup
On Wed, Aug 17, 2022 at 4:33 PM Amit Kapila wrote: > > On Wed, Aug 17, 2022 at 8:48 AM houzj.f...@fujitsu.com > wrote: > > > > On Tuesday, August 2, 2022 8:00 PM Amit Kapila > > wrote: > > > On Tue, Jul 26, 2022 at 9:07 AM Amit Kapila > > > wrote: > > > > Thanks for the summary. > > > > I think it's fine to make the user use the copy_data option more carefully > > to > > prevent duplicate copies by reporting an ERROR. > > > > But I also have similar concern with Sawada-san as it's possible for user to > > receive an ERROR in some unexpected cases. > > > > For example I want to build bi-directional setup between two nodes: > > > > Node A: TABLE test (has actual data) > > Node B: TABLE test (empty) > > > > Step 1: > > CREATE PUBLICATION on both Node A and B. > > > > Step 2: > > CREATE SUBSCRIPTION on Node A with (copy_data = on) > > -- this is fine as there is no data on Node B > > > > Step 3: > > CREATE SUBSCRIPTION on Node B with (copy_data = on) > > -- this should be fine as user needs to copy data from Node A to Node B, > > -- but we still report an error for this case. > > > > It looks a bit strict to report an ERROR in this case and it seems not easy > > to > > avoid this. So, personally, I think it might be better to document the > > correct > > steps to build the bi-directional replication and probably also docuemnt the > > steps to recover if user accidently did duplicate initial copy if not > > documented yet. > > > > In addition, we could also LOG some additional information about the ORIGIN > > and > > initial copy which might help user to analyze if needed. > > > > But why LOG instead of WARNING? I feel in this case there is a chance > of inconsistent data so a WARNING like "publication "pub1" could have > data from multiple origins" can be given when the user has specified > options: "copy_data = on, origin = NONE" while creating a > subscription. We give a WARNING during subscription creation when the > corresponding publication doesn't exist, eg. > > postgres=# create subscription sub1 connection 'dbname = postgres' > publication pub1; > WARNING: publication "pub1" does not exist in the publisher > > Then, we can explain in docs how users can avoid data inconsistencies > while setting up replication. > I was wondering if this copy/origin case really should be a NOTICE. See table [1]. It says WARNING is meant for "warnings of likey problems". But this is not exactly a "likely" problem - IIUC we really don't know if there is even any problem at all we only know there is the *potential* for a problem, but the user has to then judge it for themselves, Perhaps WARNING may be a bit overkill for this situation - it might be unnecessarily scary to give false warnings. OTOH, NOTICE [1] says it is for "information that might be helpful to users" which seems more like what is needed here. -- [1] https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-SEVERITY-LEVELS Kind Regards, Peter Smith. Fujitsu Australia.
Re: Remove dummyret definition
> On 17 Aug 2022, at 07:26, Peter Eisentraut > wrote: > > dummyret hasn't been used in a while (last use removed by 50d22de932, and > before that 84b6d5f359), and since we are now preferring inline functions > over complex macros, it's unlikely to be needed again. +1, I can't see that making a comeback into the code. -- Daniel Gustafsson https://vmware.com/