Re: [PoC] Improve dead tuple storage for lazy vacuum
On Thu, Dec 1, 2022 at 4:00 PM John Naylor wrote: > > > On Wed, Nov 30, 2022 at 11:09 PM Masahiko Sawada > wrote: > > > > I've investigated this issue and have a question about using atomic > > variables on palloc'ed memory. In non-parallel vacuum cases, > > radix_tree_control is allocated via aset.c. IIUC in 32-bit machines, > > the memory allocated by aset.c is 4-bytes aligned so these atomic > > variables are not always 8-bytes aligned. Is there any way to enforce > > 8-bytes aligned memory allocations in 32-bit machines? > > The bigger question in my mind is: Why is there an atomic variable in > backend-local memory? Because I use the same radix_tree and radix_tree_control structs for non-parallel and parallel vacuum. Therefore, radix_tree_control is allocated in DSM for parallel-vacuum cases or in backend-local memory for non-parallel vacuum cases. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Bug in row_number() optimization
On Mon, Nov 28, 2022 at 5:59 PM Sergey Shinderuk wrote: > Not quite sure that we don't need to do anything for the > WINDOWAGG_PASSTHROUGH_STRICT case. Although, we won't return any more > tuples for the current partition, we still call ExecProject with > dangling pointers. Is it okay? AFAIU once we go into WINDOWAGG_PASSTHROUGH_STRICT we will spool all the remaining tuples in the current partition without storing them and then move to the next partition if available and become WINDOWAGG_RUN again or become WINDOWAGG_DONE if there are no further partitions. It seems we would not have chance to see the dangling pointers. > + if (!func_strict(opexpr->opfuncid)) > + return false; > > Should return true instead? Yeah, you're right. This should be a thinko. Thanks Richard
Re: pgsql: Revoke PUBLIC CREATE from public schema, now owned by pg_databas
On Wed, Nov 30, 2022 at 05:35:01PM -0500, Tom Lane wrote: > Also, I'd like to structure things so that the first para covers what > you need to know in a clean v15+ installation, and details that only > apply in upgrade scenarios are in the second para. The upgrade scenario > is going to be interesting to fewer and fewer people over time, so let's > not clutter the lede with it. > > So maybe about like this? > > Constrain ordinary users to user-private schemas. To implement > this pattern, for every user needing to create non-temporary > objects, create a schema with the same name as that user. (Recall > that the default search path starts with $user, which resolves to > the user name. Therefore, if each user has a separate schema, they > access their own schemas by default.) Also ensure that no other > schemas have public CREATE privileges. This pattern is a secure > schema usage pattern unless an untrusted user is the database > owner or holds the CREATEROLE privilege, in which case no secure > schema usage pattern exists. This is free from the problem found in ddl-create-public-reorg-really.patch. However, the word "other" doesn't belong there. (The per-user schemas should not have public CREATE privilege.) I would also move that same sentence up front, like this: Constrain ordinary users to user-private schemas. To implement this pattern, first ensure that no schemas have public CREATE privileges. Then, for every user needing to create non-temporary objects, create a schema with the same name as that user. (Recall that the default search path starts with $user, which resolves to the user name. Therefore, if each user has a separate schema, they access their own schemas by default.) This pattern is a secure schema usage pattern unless an untrusted user is the database owner or holds the CREATEROLE privilege, in which case no secure schema usage pattern exists. With that, I think you have improved on the status quo. Thanks. > In PostgreSQL 15 and later, the default configuration supports > this usage pattern. In prior versions, or when using a database > that has been upgraded from a prior version, you will need to > remove the public CREATE privilege from the public schema (issue > REVOKE CREATE ON SCHEMA public FROM PUBLIC). Then consider > auditing the public schema for objects named like objects in > schema pg_catalog. > BTW, is "create a schema with the same name" sufficient detail? > You have to either make it owned by that user, or explicitly > grant CREATE permission on it. I'm not sure if that detail > belongs here, but it feels like maybe it does. Maybe. Failing to GRANT that will yield a clear error when the user starts work, so it's not critical to explain here.
File API cleanup
Here are a couple of patches that clean up the internal File API and related things a bit: 0001-Update-types-in-File-API.patch Make the argument types of the File API match stdio better: - Change the data buffer to void *, from char *. - Change FileWrite() data buffer to const on top of that. - Change amounts to size_t, from int. In passing, change the FilePrefetch() amount argument from int to off_t, to match the underlying posix_fadvise(). 0002-Remove-unnecessary-casts.patch Some code carefully cast all data buffer arguments for BufFileWrite() and BufFileRead() to void *, even though the arguments are already void * (and AFAICT were never anything else). Remove this unnecessary clutter. (I had initially thought these casts were related to the first patch, but as I said the BufFile API never used char * arguments, so this turned out to be unrelated, but still weird.)From 0ca35b75e383272356ba49211913d8aead5ca10d Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 1 Dec 2022 08:36:09 +0100 Subject: [PATCH 1/2] Update types in File API Make the argument types of the File API match stdio better: - Change the data buffer to void *, from char *. - Change FileWrite() data buffer to const on top of that. - Change amounts to size_t, from int. In passing, change the FilePrefetch() amount argument from int to off_t, to match the underlying posix_fadvise(). --- src/backend/storage/file/fd.c | 8 src/include/storage/fd.h | 6 +++--- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index 4151cafec547..f6c938202309 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -1980,7 +1980,7 @@ FileClose(File file) * to read into. */ int -FilePrefetch(File file, off_t offset, int amount, uint32 wait_event_info) +FilePrefetch(File file, off_t offset, off_t amount, uint32 wait_event_info) { #if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_WILLNEED) int returnCode; @@ -2031,7 +2031,7 @@ FileWriteback(File file, off_t offset, off_t nbytes, uint32 wait_event_info) } int -FileRead(File file, char *buffer, int amount, off_t offset, +FileRead(File file, void *buffer, size_t amount, off_t offset, uint32 wait_event_info) { int returnCode; @@ -2039,7 +2039,7 @@ FileRead(File file, char *buffer, int amount, off_t offset, Assert(FileIsValid(file)); - DO_DB(elog(LOG, "FileRead: %d (%s) " INT64_FORMAT " %d %p", + DO_DB(elog(LOG, "FileRead: %d (%s) " INT64_FORMAT " %zu %p", file, VfdCache[file].fileName, (int64) offset, amount, buffer)); @@ -2087,7 +2087,7 @@ FileRead(File file, char *buffer, int amount, off_t offset, } int -FileWrite(File file, char *buffer, int amount, off_t offset, +FileWrite(File file, const void *buffer, size_t amount, off_t offset, uint32 wait_event_info) { int returnCode; diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h index c0a212487d92..7144fc9f6050 100644 --- a/src/include/storage/fd.h +++ b/src/include/storage/fd.h @@ -102,9 +102,9 @@ extern File PathNameOpenFile(const char *fileName, int fileFlags); extern File PathNameOpenFilePerm(const char *fileName, int fileFlags, mode_t fileMode); extern File OpenTemporaryFile(bool interXact); extern void FileClose(File file); -extern int FilePrefetch(File file, off_t offset, int amount, uint32 wait_event_info); -extern int FileRead(File file, char *buffer, int amount, off_t offset, uint32 wait_event_info); -extern int FileWrite(File file, char *buffer, int amount, off_t offset, uint32 wait_event_info); +extern int FilePrefetch(File file, off_t offset, off_t amount, uint32 wait_event_info); +extern int FileRead(File file, void *buffer, size_t amount, off_t offset, uint32 wait_event_info); +extern int FileWrite(File file, const void *buffer, size_t amount, off_t offset, uint32 wait_event_info); extern int FileSync(File file, uint32 wait_event_info); extern off_t FileSize(File file); extern int FileTruncate(File file, off_t offset, uint32 wait_event_info); base-commit: 43351557d0d2b9c5e20298b5fee2849abef86aff -- 2.38.1 From 8d0530fbf9095bc16fa3465b69aef2c31c4c3c9b Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 1 Dec 2022 08:36:09 +0100 Subject: [PATCH 2/2] Remove unnecessary casts Some code carefully cast all data buffer arguments for BufFileWrite() and BufFileRead() to void *, even though the arguments are already void * (and AFAICT were never anything else). Remove this unnecessary clutter. --- src/backend/executor/nodeHashjoin.c | 8 src/backend/utils/sort/tuplestore.c | 12 ++-- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/src/backend/e
Re: Perform streaming logical transactions by background workers and parallel apply
On Thu, Dec 1, 2022 at 11:44 AM Masahiko Sawada wrote: > > On Wed, Nov 30, 2022 at 7:54 PM Amit Kapila wrote: > > > > On Tue, Nov 29, 2022 at 10:18 AM houzj.f...@fujitsu.com > > wrote: > > > > > > Attach the new version patch which addressed all comments. > > > > > > > Some comments on v53-0002* > > > > 1. I think testing the scenario where the shm_mq buffer is full > > between the leader and parallel apply worker would require a large > > amount of data and then also there is no guarantee. How about having a > > developer GUC [1] force_apply_serialize which allows us to serialize > > the changes and only after commit the parallel apply worker would be > > allowed to apply it? > > +1 > > The code coverage report shows that we don't cover the partial > serialization codes. This GUC would improve the code coverage. > Shall we keep it as a boolean or an integer? Keeping it as an integer as suggested by Kuroda-San [1] would have an added advantage that we can easily test the cases where serialization would be triggered after sending some changes. [1] - https://www.postgresql.org/message-id/TYAPR01MB5866160DE81FA2D88B8F22DEF5159%40TYAPR01MB5866.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Thu, Dec 1, 2022 at 3:03 PM Masahiko Sawada wrote: > > On Thu, Dec 1, 2022 at 4:00 PM John Naylor wrote: > > > > The bigger question in my mind is: Why is there an atomic variable in backend-local memory? > > Because I use the same radix_tree and radix_tree_control structs for > non-parallel and parallel vacuum. Therefore, radix_tree_control is > allocated in DSM for parallel-vacuum cases or in backend-local memory > for non-parallel vacuum cases. Ok, that could be yet another reason to compile local- and shared-memory functionality separately, but now I'm wondering why there are atomic variables at all, since there isn't yet any locking support. -- John Naylor EDB: http://www.enterprisedb.com
Re: File API cleanup
On Thu, Dec 1, 2022 at 1:55 PM Peter Eisentraut wrote: > > Here are a couple of patches that clean up the internal File API and > related things a bit: > > 0001-Update-types-in-File-API.patch > > Make the argument types of the File API match stdio better: > > - Change the data buffer to void *, from char *. > - Change FileWrite() data buffer to const on top of that. > - Change amounts to size_t, from int. > > In passing, change the FilePrefetch() amount argument from int to > off_t, to match the underlying posix_fadvise(). > > 0002-Remove-unnecessary-casts.patch > > Some code carefully cast all data buffer arguments for > BufFileWrite() and BufFileRead() to void *, even though the > arguments are already void * (and AFAICT were never anything else). > Remove this unnecessary clutter. > > (I had initially thought these casts were related to the first patch, > but as I said the BufFile API never used char * arguments, so this > turned out to be unrelated, but still weird.) Thanks. Please note that I've not looked at the patches attached. However, I'm here after reading the $subject - can we have a generic, single function file_exists() in fd.c/file_utils.c so that both backend and frontend code can use it? I see there are 3 uses and definitions of it in jit.c, dfmgr.c and pg_regress.c. This will reduce the code duplication. Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Allow round() function to accept float and double precision
On Thu, 1 Dec 2022 at 02:58, David Rowley wrote: > > On Thu, 1 Dec 2022 at 15:41, David G. Johnston > wrote: > > I don't get the point of adding a function here (or at least one called > > round) - the type itself is inexact so, as you say, it is actually more of > > a type conversion with an ability to specify precision, which is exactly > > what you get today when you write 1.48373::numeric(20,3) - though it is a > > bit annoying having to specify an arbitrary precision. > > An additional problem with that which you might have missed is that > you'd need to know what to specify in the precision part of the > typemod. You might start getting errors one day if you don't select a > value large enough. That problem does not exist with round(). Having > to specify 131072 each time does not sound like a great solution, it's > not exactly a very memorable number. > I don't really see the point of such a function either. Casting to numeric(1000, n) will work fine in all cases AFAICS (1000 being the maximum allowed precision in a numeric typemod, and somewhat more memorable). Note that double precision numbers range in magnitude from something like 2.2e-308 to 1.8e308, so you won't ever get an error (except, I suppose, if you also chose "n" larger than 692 or so, but that would be silly, given the input). > > At present round does allow you to specify a negative position to round at > > positions to the left of the decimal point (this is undocumented though...) > > which the actual cast cannot do, but that seems like a marginal case. Note that, as of PG15, "n" can be negative in such typemods, if you want to round before the decimal point. The fact that passing a negative scale to round() isn't documented does seem like an oversight though... Regards, Dean
Re: [DOCS] Stats views and functions not in order?
On 29.11.22 08:29, Peter Smith wrote: PSA v8* patches. Here, patches 0001 and 0002 are unchanged, but 0003 has many changes per David's suggestion [1] to change all these views to blocks. I don't understand what order 0001 is trying to achieve. I know we didn't necessarily want to go fully alphabetic, but if we're going to spend time on this, let's come up with a system that the next contributor who adds a view will be able to understand and follow. As an aside, I find the mixing of pg_stat_* and pg_statio_* views visually distracting. It was easier to read before when they were in separate blocks. I think something like this would be manageable: pg_stat_archiver pg_stat_bgwriter pg_stat_database pg_stat_database_conflicts pg_stat_replication_slots pg_stat_slru pg_stat_subscription_stats pg_stat_wal pg_stat_all_tables pg_stat_sys_tables pg_stat_user_tables pg_stat_xact_all_tables pg_stat_xact_sys_tables pg_stat_xact_user_tables pg_stat_all_indexes pg_stat_sys_indexes pg_stat_user_indexes pg_stat_user_functions pg_stat_xact_user_functions pg_statio_all_tables pg_statio_sys_tables pg_statio_user_tables pg_statio_all_indexes pg_statio_sys_indexes pg_statio_user_indexes pg_statio_all_sequences pg_statio_sys_sequences pg_statio_user_sequences In any case, the remaining patches are new and need further review, so I'll move this to the next CF.
Re: pg_upgrade allows itself to be run twice
On 01.11.22 14:07, Justin Pryzby wrote: On Tue, Nov 01, 2022 at 01:54:35PM +0100, Peter Eisentraut wrote: On 07.07.22 08:22, Justin Pryzby wrote: This one comes from NextOID in the control data file after a fresh initdb, and GetNewObjectId() would enforce that in a postmaster environment to be FirstNormalObjectId when assigning the first user OID. Would you imply an extra step at the end of initdb to update the control data file of the new cluster to reflect FirstNormalObjectId? I added a call to reset xlog, similar to what's in pg_upgrade. Unfortunately, I don't see an easy way to silence it. I think it would be better to update the control file directly instead of going through pg_resetwal. (See src/include/common/controldata_utils.h for the required functions.) However, I don't know whether we need to add special provisions that guard against people using postgres --single in complicated ways. Many consider the single-user mode deprecated outside of initdb use. Thanks for looking. I think the above is a "returned with feedback" at this point. One other thing I noticed (by accident!) is that pg_upgrade doesn't prevent itself from trying to upgrade a cluster on top of itself: | $ /usr/pgsql-15/bin/initdb -D pg15.dat -N | $ /usr/pgsql-15/bin/pg_upgrade -D pg15.dat -d pg15.dat -b /usr/pgsql-15/bin | Performing Upgrade | -- | Analyzing all rows in the new cluster ok | Freezing all rows in the new clusterok | Deleting files from new pg_xact ok ^^^ | Copying old pg_xact to new server | *failure* | | Consult the last few lines of "pg15.dat/pg_upgrade_output.d/20221101T055916.486/log/pg_upgrade_utility.log" for | command: cp -Rf "pg15.dat/pg_xact" "pg15.dat/pg_xact" >> "pg15.dat/pg_upgrade_output.d/20221101T055916.486/log/pg_upgrade_utility.log" 2>&1 | cp: cannot stat 'pg15.dat/pg_xact': No such file or directory This may be of little concern since it's upgrading a version to itself, which only applies to developers. I think this would be worth addressing nonetheless, for robustness. For comparison, "cp" and "mv" will error if you give source and destination that are the same file.
Re: Improve performance of pg_strtointNN functions
On Thu, 1 Dec 2022 at 05:38, David Rowley wrote: > > On Thu, 1 Dec 2022 at 18:27, John Naylor wrote: > > I don't see why the non-decimal literal patch needs to be "immediately" > > faster? If doing this first leads to less code churn, that's another > > consideration, but you haven't made that argument. > > My view is that Peter wants to keep the code he's adding for the hex, > octal and binary parsing as similar to the existing code as possible. > I very much understand Peter's point of view on that. Consistency is > good. However, if we commit the hex literals patch first, people might > ask "why don't we use bit-wise operators to make the power-of-2 bases > faster?", which seems like a very legitimate question. I asked it, > anyway... On the other hand, if Peter adds the bit-wise operators > then the problem of code inconsistency remains. > > As an alternative to those 2 options, I'm proposing we commit this > first then the above dilemma disappears completely. > > If this was going to cause huge conflicts with Peter's patch then I > might think differently. I feel like it's a fairly trivial task to > rebase. > > If the consensus is that we should fix this afterwards, then I'm happy to > delay. > I feel like it should be done afterwards, so that any performance gains can be measured for all bases. Otherwise, we won't really know, or have any record of, how much faster this was for other bases, or be able to go back and test that. Regards, Dean
initdb: Refactor PG_CMD_PUTS loops
Keeping the SQL commands that initdb runs in string arrays before feeding them to PG_CMD_PUTS() seems unnecessarily verbose and inflexible. In some cases, the array only has one member. In other cases, one might want to use PG_CMD_PRINTF() instead, to parametrize a command, but that would require breaking up the loop or using workarounds like replace_token(). This patch unwinds all that; it's much simpler that way.From e177a142a9a5412ff8aeb271330005ef518b32d1 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 1 Dec 2022 09:49:48 +0100 Subject: [PATCH] initdb: Refactor PG_CMD_PUTS loops Keeping the SQL commands that initdb runs in string arrays before feeding them to PG_CMD_PUTS() seems unnecessarily verbose and inflexible. In some cases, the array only has one member. In other cases, one might want to use PG_CMD_PRINTF() instead, to parametrize a command, but that would require breaking up the loop or using workarounds like replace_token(). Unwind all that; it's much simpler that way. --- src/bin/initdb/initdb.c | 379 ++-- 1 file changed, 170 insertions(+), 209 deletions(-) diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index f61a04305590..7c391aaf0b13 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -1350,18 +1350,11 @@ bootstrap_template1(void) static void setup_auth(FILE *cmdfd) { - const char *const *line; - static const char *const pg_authid_setup[] = { - /* -* The authid table shouldn't be readable except through views, to -* ensure passwords are not publicly visible. -*/ - "REVOKE ALL ON pg_authid FROM public;\n\n", - NULL - }; - - for (line = pg_authid_setup; *line != NULL; line++) - PG_CMD_PUTS(*line); + /* +* The authid table shouldn't be readable except through views, to +* ensure passwords are not publicly visible. +*/ + PG_CMD_PUTS("REVOKE ALL ON pg_authid FROM public;\n\n"); if (superuser_password) PG_CMD_PRINTF("ALTER USER \"%s\" WITH PASSWORD E'%s';\n\n", @@ -1433,18 +1426,11 @@ get_su_pwd(void) static void setup_depend(FILE *cmdfd) { - const char *const *line; - static const char *const pg_depend_setup[] = { - /* -* Advance the OID counter so that subsequently-created objects aren't -* pinned. -*/ - "SELECT pg_stop_making_pinned_objects();\n\n", - NULL - }; - - for (line = pg_depend_setup; *line != NULL; line++) - PG_CMD_PUTS(*line); + /* +* Advance the OID counter so that subsequently-created objects aren't +* pinned. +*/ + PG_CMD_PUTS("SELECT pg_stop_making_pinned_objects();\n\n"); } /* @@ -1530,147 +1516,138 @@ setup_collation(FILE *cmdfd) static void setup_privileges(FILE *cmdfd) { - char **line; - char **priv_lines; - static char *privileges_setup[] = { - "UPDATE pg_class " - " SET relacl = (SELECT array_agg(a.acl) FROM " - " (SELECT E'=r/\"$POSTGRES_SUPERUSERNAME\"' as acl " - " UNION SELECT unnest(pg_catalog.acldefault(" - "CASE WHEN relkind = " CppAsString2(RELKIND_SEQUENCE) " THEN 's' " - " ELSE 'r' END::\"char\"," CppAsString2(BOOTSTRAP_SUPERUSERID) "::oid))" - " ) as a) " - " WHERE relkind IN (" CppAsString2(RELKIND_RELATION) ", " - CppAsString2(RELKIND_VIEW) ", " CppAsString2(RELKIND_MATVIEW) ", " - CppAsString2(RELKIND_SEQUENCE) ")" - " AND relacl IS NULL;\n\n", - "GRANT USAGE ON SCHEMA pg_catalog, public TO PUBLIC;\n\n", - "REVOKE ALL ON pg_largeobject FROM PUBLIC;\n\n", - "INSERT INTO pg_init_privs " - " (objoid, classoid, objsubid, initprivs, privtype)" - "SELECT" - "oid," - "(SELECT oid FROM pg_class WHERE relname = 'pg_class')," - "0," - "relacl," - "'i'" - "FROM" - "pg_class" - "WHERE" - "relacl IS NOT NULL" - "AND relkind IN (" CppAsString2(RELKIND_RELATION) ", " - CppAsString2(RELKIND_VIEW) ", " CppAsString2(RELKIND_MATVIEW) ", " - CppAsString2(RELKIND_SEQUENCE) ");\n\n", - "INSERT INTO pg_init_privs " - " (objoid, classoid, objsubid, initprivs, privtype)" - "SELECT" - "pg_class.oid," - "(SELECT oid FROM pg_class WHERE relname = 'pg_class')," - "pg_attribute.attnum," - "pg_at
RE: Perform streaming logical transactions by background workers and parallel apply
On Thursday, December 1, 2022 3:58 PM Masahiko Sawada wrote: > > On Wed, Nov 30, 2022 at 10:51 PM houzj.f...@fujitsu.com > wrote: > > > > On Wednesday, November 30, 2022 9:41 PM houzj.f...@fujitsu.com > wrote: > > > > > > On Tuesday, November 29, 2022 8:34 PM Amit Kapila > > > > Review comments on v53-0001* > > > > > > Attach the new version patch set. > > > > Sorry, there were some mistakes in the previous patch set. > > Here is the correct V54 patch set. I also ran pgindent for the patch set. > > > > Thank you for updating the patches. Here are random review comments for > 0001 and 0002 patches. Thanks for the comments! > > ereport(ERROR, > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > errmsg("logical replication parallel apply worker exited > abnormally"), > errcontext("%s", edata.context))); and > > ereport(ERROR, > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > errmsg("logical replication parallel apply worker exited > because of subscription information change"))); > > I'm not sure ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is appropriate > here. Given that parallel apply worker has already reported the error message > with the error code, I think we don't need to set the errorcode for the logs > from the leader process. > > Also, I'm not sure the term "exited abnormally" is appropriate since we use it > when the server crashes for example. I think ERRORs reported here don't mean > that in general. How about reporting "xxx worker exited due to error" ? > --- > if (am_parallel_apply_worker() && on_subinfo_change) { > /* > * If a parallel apply worker exits due to the subscription > * information change, we notify the leader apply worker so that the > * leader can report more meaningful message in time and restart the > * logical replication. > */ > pq_putmessage('X', NULL, 0); > } > > and > > ereport(ERROR, > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > errmsg("logical replication parallel apply worker exited > because of subscription information change"))); > > Do we really need an additional message in case of 'X'? When we call > apply_worker_clean_exit with on_subinfo_change = true, we have reported the > error message such as: > > ereport(LOG, > (errmsg("logical replication parallel apply worker for subscription > \"%s\" will stop because of a parameter change", > MySubscription->name))); > > I think that reporting a similar message from the leader might not be > meaningful for users. The intention is to let leader report more meaningful message if a worker exited due to subinfo change. Otherwise, the leader is likely to report an error like " lost connection ... to parallel apply worker" when trying to send data via shared memory if the worker exited. What do you think ? > --- > -if (options->proto.logical.streaming && > -PQserverVersion(conn->streamConn) >= 14) > -appendStringInfoString(&cmd, ", streaming 'on'"); > +if (options->proto.logical.streaming_str) > +appendStringInfo(&cmd, ", streaming '%s'", > + > options->proto.logical.streaming_str); > > and > > +/* > + * Assign the appropriate option value for streaming option > according to > + * the 'streaming' mode and the publisher's ability to > support that mode. > + */ > +if (server_version >= 16 && > +MySubscription->stream == SUBSTREAM_PARALLEL) > +{ > +options.proto.logical.streaming_str = pstrdup("parallel"); > +MyLogicalRepWorker->parallel_apply = true; > +} > +else if (server_version >= 14 && > + MySubscription->stream != SUBSTREAM_OFF) > +{ > +options.proto.logical.streaming_str = pstrdup("on"); > +MyLogicalRepWorker->parallel_apply = false; > +} > +else > +{ > +options.proto.logical.streaming_str = NULL; > +MyLogicalRepWorker->parallel_apply = false; > +} > > This change moves the code of adjustment of the streaming option based on > the publisher server version from libpqwalreceiver.c to worker.c. > On the other hand, the similar logic for other parameters such as "two_phase" > and "origin" are still done in libpqwalreceiver.c. How about passing > MySubscription->stream via WalRcvStreamOptions and constructing a > streaming option string in libpqrcv_startstreaming()? > In ApplyWorkerMain(), we just need to set > MyLogicalRepWorker->parallel_apply = true if (server_version >= 16 > && MySubscription->stream == SUBSTREAM_PARALLEL). We won't need > pstrdup for "parallel" and "on", and it's more consistent with other > parameters. Thanks for the suggestion. I thought
Re: ExecRTCheckPerms() and many prunable partitions
Hello, This didn't apply, so I rebased it on current master, excluding the one I already pushed. No further changes. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "No me acuerdo, pero no es cierto. No es cierto, y si fuera cierto, no me acuerdo." (Augusto Pinochet a una corte de justicia) >From c8a9bcd071ad88d5ae286bcb1a241b4fccd2449a Mon Sep 17 00:00:00 2001 From: amitlan Date: Mon, 28 Nov 2022 16:12:15 +0900 Subject: [PATCH v30 1/2] Add ri_RootToChildMap and ExecGetRootToChildMap() It's a AttrMap provided for converting "root" table column bitmapsets into their child relation counterpart, as a more generalized alternative to using ri_RootToPartitionMap.attrMap to do the same. More generalized in the sense that it can also be requested for regular inheritance child relations, whereas ri_RootToPartitionMap is currently only initialized in tuple-routing "partition" result relations. One of the differences between the two cases is that the regular inheritance child relations can have their own columns that are not present in the "root" table, so the map must be created in a way that ignores such columns. To that end, ExecGetRootToChildMap() passes true for the missing_ok argument of build_attrmap_by_name(), so that it puts 0 (InvalidAttr) in the map for the columns of a child table that are not present in the root table. root-table-to-child-table bitmapset conversions that would need ri_RootToChildMap (cannot be done with ri_RootToPartitionMap) are as of this commit unnecessary, but will become necessary in a subsequent commit that will remove the insertedCols et al bitmapset fields from RangeTblEntry node in favor of a new type of node that will only be created and added to the plan for root tables in a query and never for children. The child table bitmapsets will be created on-the-fly during execution if needed, by copying the root table bitmapset and converting with the aforementioned map as required. --- src/backend/executor/execUtils.c | 39 src/include/executor/executor.h | 2 ++ src/include/nodes/execnodes.h| 8 +++ 3 files changed, 49 insertions(+) diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c index 0e595ffa6e..e2b4272d90 100644 --- a/src/backend/executor/execUtils.c +++ b/src/backend/executor/execUtils.c @@ -1252,6 +1252,45 @@ ExecGetChildToRootMap(ResultRelInfo *resultRelInfo) return resultRelInfo->ri_ChildToRootMap; } +/* + * Return the map needed to convert "root" table column bitmapsets to the + * rowtype of an individual child table. A NULL result is valid and means + * that no conversion is needed. + */ +AttrMap * +ExecGetRootToChildMap(ResultRelInfo *resultRelInfo, + EState *estate) +{ + /* If we didn't already do so, compute the map for this child. */ + if (!resultRelInfo->ri_RootToChildMapValid) + { + ResultRelInfo *rootRelInfo = resultRelInfo->ri_RootResultRelInfo; + MemoryContext oldcontext = MemoryContextSwitchTo(estate->es_query_cxt); + + if (rootRelInfo) + { + /* + * Passing 'true' below means any columns present in the child + * table but not in the root parent are to be ignored; note that + * such a case is possible with traditional inheritance but never + * with partitioning. + */ + resultRelInfo->ri_RootToChildMap = +build_attrmap_by_name_if_req(RelationGetDescr(rootRelInfo->ri_RelationDesc), + RelationGetDescr(resultRelInfo->ri_RelationDesc), + true); + } + else /* this isn't a child result rel */ + resultRelInfo->ri_RootToChildMap = NULL; + + resultRelInfo->ri_RootToChildMapValid = true; + + MemoryContextSwitchTo(oldcontext); + } + + return resultRelInfo->ri_RootToChildMap; +} + /* Return a bitmap representing columns being inserted */ Bitmapset * ExecGetInsertedCols(ResultRelInfo *relinfo, EState *estate) diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h index ed95ed1176..5c02a1521f 100644 --- a/src/include/executor/executor.h +++ b/src/include/executor/executor.h @@ -600,6 +600,8 @@ extern TupleTableSlot *ExecGetTriggerOldSlot(EState *estate, ResultRelInfo *relI extern TupleTableSlot *ExecGetTriggerNewSlot(EState *estate, ResultRelInfo *relInfo); extern TupleTableSlot *ExecGetReturningSlot(EState *estate, ResultRelInfo *relInfo); extern TupleConversionMap *ExecGetChildToRootMap(ResultRelInfo *resultRelInfo); +extern AttrMap *ExecGetRootToChildMap(ResultRelInfo *resultRelInfo, + EState *estate); extern Bitmapset *ExecGetInsertedCols(ResultRelInfo *relinfo, EState *estate); extern Bitmapset *ExecGetUpdatedCols(ResultRelInfo *relinfo, EState *estate); diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index 18e572f171..313840fe32 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -563,6 +563,14 @@ typedef struct ResultRelInfo TupleConversionMap *ri_ChildToRootMap; boo
Re: pgsql: Revoke PUBLIC CREATE from public schema, now owned by pg_databas
On 2022-Dec-01, Noah Misch wrote: > This is free from the problem found in ddl-create-public-reorg-really.patch. > However, the word "other" doesn't belong there. (The per-user schemas should > not have public CREATE privilege.) I would also move that same sentence up > front, like this: > > Constrain ordinary users to user-private schemas. To implement this > pattern, first ensure that no schemas have public CREATE privileges. > Then, for every user needing to create non-temporary objects, create a > schema with the same name as that user. (Recall that the default search > path starts with $user, which resolves to the user name. Therefore, if > each user has a separate schema, they access their own schemas by > default.) This pattern is a secure schema usage pattern unless an > untrusted user is the database owner or holds the CREATEROLE privilege, in > which case no secure schema usage pattern exists. +1 LGTM -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: generic plans and "initial" pruning
On 2022-Dec-01, Amit Langote wrote: > Hmm, how about keeping the [Merge]Append's parent relation's RT index > in the PartitionPruneInfo and passing it down to > ExecInitPartitionPruning() from ExecInit[Merge]Append() for > cross-checking? Both Append and MergeAppend already have a > 'apprelids' field that we can save a copy of in the > PartitionPruneInfo. Tried that in the attached delta patch. Ah yeah, that sounds about what I was thinking. I've merged that in and pushed to github, which had a strange pg_upgrade failure on Windows mentioning log files that were not captured by the CI tooling. So I pushed another one trying to grab those files, in case it wasn't an one-off failure. It's running now: https://cirrus-ci.com/task/5857239638999040 If all goes well with this run, I'll get this 0001 pushed. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Investigación es lo que hago cuando no sé lo que estoy haciendo" (Wernher von Braun)
Re: Bug in row_number() optimization
On 01.12.2022 11:18, Richard Guo wrote: On Mon, Nov 28, 2022 at 5:59 PM Sergey Shinderuk mailto:s.shinde...@postgrespro.ru>> wrote: Not quite sure that we don't need to do anything for the WINDOWAGG_PASSTHROUGH_STRICT case. Although, we won't return any more tuples for the current partition, we still call ExecProject with dangling pointers. Is it okay? AFAIU once we go into WINDOWAGG_PASSTHROUGH_STRICT we will spool all the remaining tuples in the current partition without storing them and then move to the next partition if available and become WINDOWAGG_RUN again or become WINDOWAGG_DONE if there are no further partitions. It seems we would not have chance to see the dangling pointers. Maybe I'm missing something, but the previous call to spool_tuples() might have read extra tuples (if the tuplestore spilled to disk), and after switching to WINDOWAGG_PASSTHROUGH_STRICT mode we nevertheless would loop through these extra tuples and call ExecProject if only to increment winstate->currentpos. -- Sergey Shinderukhttps://postgrespro.com/
Re: Perform streaming logical transactions by background workers and parallel apply
On Wed, Nov 30, 2022 at 4:23 PM Amit Kapila wrote: > > 2. > + /* > + * The stream lock is released when processing changes in a > + * streaming block, so the leader needs to acquire the lock here > + * before entering PARTIAL_SERIALIZE mode to ensure that the > + * parallel apply worker will wait for the leader to release the > + * stream lock. > + */ > + if (in_streamed_transaction && > + action != LOGICAL_REP_MSG_STREAM_STOP) > + { > + pa_lock_stream(winfo->shared->xid, AccessExclusiveLock); > > This comment is not completely correct because we can even acquire the > lock for the very streaming chunk. This check will work but doesn't > appear future-proof or at least not very easy to understand though I > don't have a better suggestion at this stage. Can we think of a better > check here? > One idea is that we acquire this lock every time and callers like stream_commit are responsible to release it. Also, we can handle the close of stream file in the respective callers. I think that will make this part of the patch easier to follow. Some other comments: = 1. The handling of buffile inside pa_stream_abort() looks bit ugly to me. I think you primarily required it because the buffile opened by parallel apply worker is in CurrentResourceOwner. Can we think of having a new resource owner to apply spooled messages? I think that will avoid the need to have a special purpose code to handle buffiles in parallel apply worker. 2. @@ -564,6 +571,7 @@ handle_streamed_transaction(LogicalRepMsgType action, StringInfo s) TransactionId current_xid; ParallelApplyWorkerInfo *winfo; TransApplyAction apply_action; + StringInfoData original_msg; apply_action = get_transaction_apply_action(stream_xid, &winfo); @@ -573,6 +581,8 @@ handle_streamed_transaction(LogicalRepMsgType action, StringInfo s) Assert(TransactionIdIsValid(stream_xid)); + original_msg = *s; + /* * We should have received XID of the subxact as the first part of the * message, so extract it. @@ -596,10 +606,14 @@ handle_streamed_transaction(LogicalRepMsgType action, StringInfo s) stream_write_change(action, s); return true; + case TRANS_LEADER_PARTIAL_SERIALIZE: case TRANS_LEADER_SEND_TO_PARALLEL: Assert(winfo); - pa_send_data(winfo, s->len, s->data); + if (apply_action == TRANS_LEADER_SEND_TO_PARALLEL) + pa_send_data(winfo, s->len, s->data); + else + stream_write_change(action, &original_msg); Please add the comment to specify the reason to remember the original string. 3. @@ -1797,8 +1907,8 @@ apply_spooled_messages(TransactionId xid, XLogRecPtr lsn) changes_filename(path, MyLogicalRepWorker->subid, xid); elog(DEBUG1, "replaying changes from file \"%s\"", path); - fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset, path, O_RDONLY, - false); + stream_fd = BufFileOpenFileSet(stream_fileset, path, O_RDONLY, false); + stream_xid = xid; Why do we need stream_xid here? I think we can avoid having global stream_fd if the comment #1 is feasible. 4. + * TRANS_LEADER_APPLY: + * The action means that we /The/This. Please make a similar change for other actions. 5. Apart from the above, please find a few changes to the comments for 0001 and 0002 patches in the attached patches. -- With Regards, Amit Kapila. changes_amit_v54_0001.patch Description: Binary data changes_amit_v54_0002.patch Description: Binary data
Re: generic plans and "initial" pruning
On Thu, Dec 1, 2022 at 8:21 PM Alvaro Herrera wrote: > On 2022-Dec-01, Amit Langote wrote: > > Hmm, how about keeping the [Merge]Append's parent relation's RT index > > in the PartitionPruneInfo and passing it down to > > ExecInitPartitionPruning() from ExecInit[Merge]Append() for > > cross-checking? Both Append and MergeAppend already have a > > 'apprelids' field that we can save a copy of in the > > PartitionPruneInfo. Tried that in the attached delta patch. > > Ah yeah, that sounds about what I was thinking. I've merged that in and > pushed to github, which had a strange pg_upgrade failure on Windows > mentioning log files that were not captured by the CI tooling. So I > pushed another one trying to grab those files, in case it wasn't an > one-off failure. It's running now: > https://cirrus-ci.com/task/5857239638999040 > > If all goes well with this run, I'll get this 0001 pushed. Thanks for pushing 0001. Rebased 0002 attached. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com v25-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patch Description: Binary data
Optimizing Node Files Support
Hi, I believe that has room for improving generation node files. The patch attached reduced the size of generated files by 27 kbytes. >From 891 kbytes to 864 kbytes. About the patch: 1. Avoid useless attribution when from->field is NULL, once that the new node is palloc0. 2. Avoid useless declaration variable Size, when it is unnecessary. 3. Optimize comparison functions like memcmp and strcmp, using a short-cut comparison of the first element. 4. Switch several copy attributions like COPY_SCALAR_FIELD or COPY_LOCATION_FIELD by one memcpy call. 5. Avoid useless attribution, ignoring the result of pg_strtok when it is unnecessary. regards, Ranier Vilela optimize_gen_nodes_support.patch Description: Binary data
Re: Support logical replication of DDLs
I applied patch 0005. I think this modification is a bit overdone. This design skips all subcommands, which results in many ddl replication failures. For example: ``` CREATE TABLE datatype_table (id SERIAL); ``` deparsed ddl is: CREATE TABLE public.datatype_table (id pg_catalog.int4 STORAGE plain NOT NULL DEFAULT pg_catalog.nextval('public.datatype_table_id_seq'::pg_catalog.regclass)) CREATE SEQUENCE subcommand will be skipped. OR: ``` CREATE SCHEMA element_test CREATE TABLE foo (id int) CREATE VIEW bar AS SELECT * FROM foo; ``` deparsed ddl is: CREATE SCHEMA element_test. Its subcommands will be skipped. There may be other cases. For the initial CREATE LIKE statement, It is special, It derives the subcommand of alter table column. Just skipping them may be enough. Instead of skipping subcommands of all statements. After all, our design is to obtain the actual ddl information from the catalog instead of parsing raw parsetree. This is why we cannot skip all subcommands. Do you have any better ideas? Regards, Adger.
Re: Asynchronous execution support for Custom Scan
Thank you for your comment. I've removed the tabs. > I can think of at least a few use cases where this customscan is helpful and > not merely testing code. IIUC, we already can use ctid in the where clause on the latest PostgreSQL, can't we? 2022年11月22日(火) 18:07 Ronan Dunklau : > > Le mardi 6 septembre 2022, 11:29:55 CET Etsuro Fujita a écrit : > > On Mon, Sep 5, 2022 at 10:32 PM Kazutaka Onishi wrote: > > > I'm sorry for my error on your name... > > > > No problem. > > > > > > IIUC, it uses the proposed > > > > > > > > APIs, but actually executes ctidscans *synchronously*, so it does not > > > > improve performance. Right? > > > > > > Exactly. > > > The actual CustomScan that supports asynchronous execution will > > > start processing in CustomScanAsyncRequest, > > > configure to detect completion via file descriptor in > > > CustomScanAsyncConfigureWait, > > > and receive the result in CustomScanAsyncNotify. > > > > Ok, thanks! > > Thanks for this patch, seems like a useful addition to the CustomScan API. > Just to nitpick: there are extraneous tabs in createplan.c on a blank line. > > Sorry for the digression, but I know your ctidscan module had been proposed > for inclusion in contrib a long time ago, and I wonder if the rationale for > not including it could have changed. We still don't have tests which cover > CustomScan, and I can think of at least a few use cases where this customscan > is helpful and not merely testing code. > > One of those use case is when performing migrations on a table, and one wants > to update the whole table by filling a new column with a computed value. You > obviously don't want to do it in a single transaction, so you end up batching > updates using an index looking for null values. If you want to do this, it's > much faster to update rows in a range of block, performing a first series of > batch updating all such block ranges, and then finally update the ones we > missed transactionally (inserted in a block we already processed while in the > middle of the batch, or in new blocks resulting from a relation extension). > > Best regards, > > -- > Ronan Dunklau > > v3-0001-allow-custon-scan-asynchronous-execution.patch Description: Binary data
Re: Collation version tracking for macOS
Jeff Davis writes: > On Mon, 2022-11-28 at 19:36 -0800, Jeff Davis wrote: >> On Mon, 2022-11-28 at 21:57 -0500, Robert Haas wrote: >> > That is ... astonishingly bad. >> >> https://unicode-org.atlassian.net/browse/CLDR-16175 > > Oops, reported in CLDR instead of ICU. Moved to: > > https://unicode-org.atlassian.net/browse/ICU-22215 Out of morbid curiosity I went source diving, and the culprit is this bit (which will also break if a version component ever goes above 999): /* write the decimal field value */ field=versionArray[part]; if(field>=100) { *versionString++=(char)('0'+field/100); field%=100; } if(field>=10) { *versionString++=(char)('0'+field/10); field%=10; } *versionString++=(char)('0'+field); (https://sources.debian.org/src/icu/72.1-3/source/common/putil.cpp#L2308) because apparently snprintf() is too hard? - ilmari
Re: Asynchronous execution support for Custom Scan
> IIUC, we already can use ctid in the where clause on the latest > PostgreSQL, can't we? Oh, sorry, I missed the TidRangeScan. My apologies for the noise. Best regards, -- Ronan Dunklau
Warning When Creating FOR EACH STATEMENT Trigger On Logical Replication Subscriber Side
Hi Hackers, There is no error or warning when creating FOR EACH STATEMENT trigger on Logical Replication subscriber side, but it is not doing anything. Shouldn't a warning be helpful? CREATE TRIGGER set_updated_time_trig AFTER INSERT OR UPDATE OR DELETE ON test FOR EACH STATEMENT EXECUTE FUNCTION set_updated_time(); Thanks IMPORTANT - This email and any attachments is intended for the above named addressee(s), and may contain information which is confidential or privileged. If you are not the intended recipient, please inform the sender immediately and delete this email: you should not copy or use this e-mail for any purpose nor disclose its contents to any person.
Re: pgsql: Revoke PUBLIC CREATE from public schema, now owned by pg_databas
Alvaro Herrera writes: > On 2022-Dec-01, Noah Misch wrote: >> This is free from the problem found in ddl-create-public-reorg-really.patch. >> However, the word "other" doesn't belong there. (The per-user schemas should >> not have public CREATE privilege.) I would also move that same sentence up >> front, like this: >> >> Constrain ordinary users to user-private schemas. To implement this >> pattern, first ensure that no schemas have public CREATE privileges. >> Then, for every user needing to create non-temporary objects, create a >> schema with the same name as that user. (Recall that the default search >> path starts with $user, which resolves to the user name. Therefore, if >> each user has a separate schema, they access their own schemas by >> default.) This pattern is a secure schema usage pattern unless an >> untrusted user is the database owner or holds the CREATEROLE privilege, in >> which case no secure schema usage pattern exists. > +1 LGTM Sounds good. I'll make it so in a bit. regards, tom lane
Re: [DOCS] Stats views and functions not in order?
On Thu, Dec 1, 2022 at 2:20 AM Peter Eisentraut < peter.eisentr...@enterprisedb.com> wrote: > On 29.11.22 08:29, Peter Smith wrote: > > PSA v8* patches. > > > > Here, patches 0001 and 0002 are unchanged, but 0003 has many changes > > per David's suggestion [1] to change all these views to > > blocks. > > I don't understand what order 0001 is trying to achieve. The rule behind 0001 is: All global object stats All table object stats (stat > statio > xact; all > sys > user) All index object stats All sequence object stats All function object stats > As an aside, I find the mixing of pg_stat_* and pg_statio_* views > visually distracting. It was easier to read before when they were in > separate blocks. > I found that having the statio at the end of each object type block added a natural partitioning for tables and indexes that the existing order lacked and that made reading the table be more "wall-of-text-ish", and thus more difficult to read, than necessary. I'm not opposed to the following though. The object-type driven order just feels more useful but I really cannot justify it beyond that. I'm not particularly enamored with the existing single large table but don't have a better structure to offer at this time. > I think something like this would be manageable: > > > pg_stat_archiver > pg_stat_bgwriter > pg_stat_database > pg_stat_database_conflicts > pg_stat_replication_slots > pg_stat_slru > pg_stat_subscription_stats > pg_stat_wal > WAL being adjacent to archiver/bgwriter seemed reasonable so I left that alone. Replication and Subscription being adjacent seemed reasonable so I left that alone. Thus slru ended up last, with database* remaining as-is. At 8 items, with a group size average of 2, pure alphabetical is also reasonable. > > > > > David J.
Re: Allow round() function to accept float and double precision
Dean Rasheed writes: > I don't really see the point of such a function either. > Casting to numeric(1000, n) will work fine in all cases AFAICS (1000 > being the maximum allowed precision in a numeric typemod, and somewhat > more memorable). Right, but I think what the OP wants is to not have to think about whether the input is of exact or inexact type. That's easily soluble locally by making your own function: create function round(float8, int) returns numeric as $$select pg_catalog.round($1::pg_catalog.numeric, $2)$$ language sql strict immutable parallel safe; but I'm not sure that the argument for it is strong enough to justify putting it into Postgres. > The fact that passing a negative scale to round() isn't documented > does seem like an oversight though... Agreed, will do something about that. regards, tom lane
Re: Add LZ4 compression in pg_dump
--- Original Message --- On Thursday, December 1st, 2022 at 3:05 AM, Michael Paquier wrote: > > > On Wed, Nov 30, 2022 at 05:11:44PM +, gkokola...@pm.me wrote: > > > Fair enough. The atteched v11 does that. 0001 introduces compression > > specification and is using it throughout. 0002 paves the way to the > > new interface by homogenizing the use of cfp. 0003 introduces the new > > API and stores the compression algorithm in the custom format header > > instead of the compression level integer. Finally 0004 adds support for > > LZ4. > > > I have been looking at 0001, and.. Hmm. I am really wondering > whether it would not be better to just nuke this warning into orbit. > This stuff enforces non-compression even if -Z has been used to a > non-default value. This has been moved to its current location by > cae2bb1 as of this thread: > https://www.postgresql.org/message-id/20160526.185551.242041780.horiguchi.kyotaro%40lab.ntt.co.jp > > However, this is only active if -Z is used when not building with > zlib. At the end, it comes down to whether we want to prioritize the > portability of pg_dump commands specifying a -Z/--compress across > environments knowing that these may or may not be built with zlib, > vs the amount of simplification/uniformity we would get across the > binaries in the tree once we switch everything to use the compression > specifications. Now that pg_basebackup and pg_receivewal are managed > by compression specifications, and that we'd want more compression > options for pg_dump, I would tend to do the latter and from now on > complain if attempting to do a pg_dump -Z under --without-zlib with a > compression level > 0. zlib is also widely available, and we don't > document the fact that non-compression is enforced in this case, > either. (Two TAP tests with the custom format had to be tweaked.) Fair enough. Thank you for looking. However I have a small comment on your new patch. - /* Custom and directory formats are compressed by default, others not */ - if (compressLevel == -1) - { -#ifdef HAVE_LIBZ - if (archiveFormat == archCustom || archiveFormat == archDirectory) - compressLevel = Z_DEFAULT_COMPRESSION; - else -#endif - compressLevel = 0; - } Nuking the warning from orbit and changing the behaviour around disabling the requested compression when the libraries are not present, should not mean that we need to change the behaviour of default values for different formats. Please find v13 attached which reinstates it. Which in itself it got me looking and wondering why the tests succeeded. The only existing test covering that path is `defaults_dir_format` in `002_pg_dump.pl`. However as the test is currently written it does not check whether the output was compressed. The restore command would succeed in either case. A simple `gzip -t -r` against the directory will not suffice to test it, because there exist files which are never compressed in this format (.toc). A little bit more involved test case would need to be written, yet before I embark to this journey, I would like to know if you would agree to reinstate the defaults for those formats. > > As per the patch, it is true that we do not need to bump the format of > the dump archives, as we can still store only the compression level > and guess the method from it. I have added some notes about that in > ReadHead and WriteHead to not forget. Agreed. A minor suggestion if you may. #ifndef HAVE_LIBZ - if (AH->compression != 0) + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) pg_log_warning("archive is compressed, but this installation does not support compression -- no data will be available"); #endif It would seem a more consistent to error out in this case. We do error in all other cases where the compression is not available. > > Most of the changes are really-straight forward, and it has resisted > my tests, so I think that this is in a rather-commitable shape as-is. Thank you. Cheers, //Georgios > -- > MichaelFrom 16e10b38cc8eb6eb5b1ffc15365d7e6ce23eef0a Mon Sep 17 00:00:00 2001 From: Georgios Kokolatos Date: Thu, 1 Dec 2022 08:58:51 + Subject: [PATCH v13] Teach pg_dump about compress_spec and use it throughout. Align pg_dump with the rest of the binaries which use common compression. It is teaching pg_dump.c about the common compression definitions and interfaces. Then it propagates those throughout the code. --- doc/src/sgml/ref/pg_dump.sgml | 34 +-- src/bin/pg_dump/compress_io.c | 107 src/bin/pg_dump/compress_io.h | 20 ++-- src/bin/pg_dump/pg_backup.h | 7 +- src/bin/pg_dump/pg_backup_archiver.c| 78 +- src/bin/pg_dump/pg_backup_archiver.h| 10 +- src/bin/pg_dump/pg_backup_custom.c | 6 +- src/bin/pg_dump/pg_back
Re: Documentation for building with meson
On 23.11.22 22:24, samay sharma wrote: Thank you. Attaching v7 addressing most of the points below. I have committed this, after some editing and making some structural changes. I moved the "Requirements" section back to the top level. It did not look appealing to have to maintain two copies of this that have almost no substantial difference (but for some reason were written with separate structure and wording). Also, I rearranged the Building with Meson section to use the same internal structure as the Building with Autoconf and Make section. This will make it easier to maintain going forward. For example if someone adds a new option, it will be easier to find the corresponding places in the lists where to add them. We will likely keep iterating on the contents for the next little while, but I'm glad we now have a structure in place that we should be able to live with.
Re: File API cleanup
On 01.12.22 09:55, Bharath Rupireddy wrote: can we have a generic, single function file_exists() in fd.c/file_utils.c so that both backend and frontend code can use it? I see there are 3 uses and definitions of it in jit.c, dfmgr.c and pg_regress.c. This will reduce the code duplication. Thoughts? Well, the first problem with that would be that all three of those implementations are slightly different. Maybe that is intentional, or maybe not, in which case a common implementation might be beneficial. (Another thing to consider is that checking whether a file exists is not often actually useful. If you want to use the file, you should just open it and then check for any errors. The cases above have special requirements, so there obviously are uses, but I'm not sure how many in the long run.)
pg_upgrade: Make testing different transfer modes easier
I wanted to test the different pg_upgrade transfer modes (--link, --clone), but that was not that easy, because there is more than one place in the test script you have to find and manually change. So I wrote a little patch to make that easier. It's still manual, but it's a start. (In principle, we could automatically run the tests with each supported mode in a loop, but that would be very slow.) While doing that, I also found it strange that the default transfer mode (referred to as "copy" internally) did not have any external representation, so it is awkward to refer to it in text, and obscure to see where it is used for example in those test scripts. So I added an option --copy, which effectively does nothing, but it's not uncommon to have options that select default behaviors explicitly. (I also thought about something like a "mode" option with an argument, but given that we already have --link and --clone, this seemed the most sensible.) Thoughts?From e01feabdfc0e2ea01242afd93261885c035e7942 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 1 Dec 2022 15:34:00 +0100 Subject: [PATCH 1/2] pg_upgrade: Add --copy option This option selects the default transfer mode. Having an explicit option is handy to make scripts and tests more explicit. It also makes it easier to talk about a "copy" mode rather than "the default mode" or something like that, since until now the default mode didn't have an externally visible name. --- doc/src/sgml/ref/pgupgrade.sgml | 10 ++ src/bin/pg_upgrade/option.c | 6 ++ 2 files changed, 16 insertions(+) diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml index 8f7a3025c368..7816b4c6859b 100644 --- a/doc/src/sgml/ref/pgupgrade.sgml +++ b/doc/src/sgml/ref/pgupgrade.sgml @@ -230,6 +230,16 @@ Options + + --copy + + +Copy files to the new cluster. This is the default. (See also +--link and --clone.) + + + + -? --help diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c index f441668c612a..f986129c2fb9 100644 --- a/src/bin/pg_upgrade/option.c +++ b/src/bin/pg_upgrade/option.c @@ -56,6 +56,7 @@ parseCommandLine(int argc, char *argv[]) {"socketdir", required_argument, NULL, 's'}, {"verbose", no_argument, NULL, 'v'}, {"clone", no_argument, NULL, 1}, + {"copy", no_argument, NULL, 2}, {NULL, 0, NULL, 0} }; @@ -194,6 +195,10 @@ parseCommandLine(int argc, char *argv[]) user_opts.transfer_mode = TRANSFER_MODE_CLONE; break; + case 2: + user_opts.transfer_mode = TRANSFER_MODE_COPY; + break; + default: fprintf(stderr, _("Try \"%s --help\" for more information.\n"), os_info.progname); @@ -283,6 +288,7 @@ usage(void) printf(_(" -v, --verbose enable verbose internal logging\n")); printf(_(" -V, --version display version information, then exit\n")); printf(_(" --clone clone instead of copying files to new cluster\n")); + printf(_(" --copycopy files to new cluster (default)\n")); printf(_(" -?, --helpshow this help, then exit\n")); printf(_("\n" "Before running pg_upgrade you must:\n" base-commit: ec386948948c1708c0c28c48ef08b9c4dd9d47cc -- 2.38.1 From b87a6bbb293deb94693774fa7b5c1e4918704f57 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 1 Dec 2022 15:36:12 +0100 Subject: [PATCH 2/2] pg_upgrade: Make testing different transfer modes easier It still requires a manual change in the test script, but now there is only one well-marked place to change. (Automatically running the pg_upgrade tests for all supported modes would be too slow.) --- src/bin/pg_upgrade/t/002_pg_upgrade.pl | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl index add6ea9c3437..365d81a8a380 100644 --- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl +++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl @@ -12,6 +12,9 @@ use PostgreSQL::Test::Utils; use Test::More; +# Can be changed (manually) to test the other modes. +my $mode = '--copy'; + # Generate a database with a name made of a range of ASCII characters. sub generate_db { @@ -256,6 +259,7 @@ sub filter_dump '-s', $newnode->host, '-p', $oldnode->port, '-P', $newnode->port, + $mode, '--check' ], 'run of pg_upgrade --check for new instance with incorr
Re: New docs chapter on Transaction Management and related changes
On Wed, Nov 30, 2022 at 12:31:55PM -0500, Tom Lane wrote: > Alvaro Herrera writes: > > I find it a bit shocking to have had it backpatched, even to 15 -- a > > whole chapter in the documentation? I don't see why it wouldn't be > > treated like any other "major feature" patch, which we only consider for > > the development branch. Also, this is a first cut -- presumably we'll > > want to copy-edit it before it becomes released material. > > I think that last point is fairly convincing. I've not read the > new material, but I didn't get further than the first line of > the new chapter file before noting a copy-and-paste error: > > --- /dev/null > +++ b/doc/src/sgml/xact.sgml > @@ -0,0 +1,205 @@ > + Fixed in master. > That doesn't leave me with a warm feeling that it's ready to ship. > I too vote for reverting it out of the released branches. Patch reverted in all back branches. I was hoping to get support for more aggressive backpatches of docs, but obviously failed. I should have been clearer about my intent to backpatch, and will have to consider these issues in future doc backpatches. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: Temporary tables versus wraparound... again
On Sat, 5 Nov 2022 at 11:34, Tom Lane wrote: > > Greg Stark writes: > > Simple Rebase > > I took a little bit of a look through these. > > * I find 0001 a bit scary, specifically that it's decided it's > okay to apply extract_autovac_opts, pgstat_fetch_stat_tabentry_ext, > and especially relation_needs_vacanalyze to another session's > temp table. How safe is that really? I can look a bit more closely but none of them are doing any thing with the table itself, just the catalog entries which afaik have always been fair game for other sessions. So I'm not really clear what kind of unsafeness you're asking about. > * Don't see much point in renaming checkTempNamespaceStatus. > That doesn't make it not an ABI break. If we want to back-patch > this we'll have to do something different than what you did here. Well it's an ABI break but at least it's an ABI break that gives a build-time error or shared library loading error rather than one that just crashes or writes to random memory at runtime. > * In 0002, I don't especially approve of what you've done with > the relminmxid calculation --- that seems to move it out of > "pure bug fix" territory and into "hmm, I wonder if this > creates new bugs" territory. Hm. Ok, I can separate that into a separate patch. I admit I have a lot of trouble remembering how multixactids work. > Also, skipping that update > for non-temp tables immediately falsifies ResetVacStats' > claimed charter of "resetting to the same values used when > creating tables". Surely GetOldestMultiXactId isn't *that* > expensive, especially compared to the costs of file truncation. > I think you should just do GetOldestMultiXactId straight up, > and maybe submit a separate performance-improvement patch > to make it do the other thing (in both places) for temp tables. Hm. the feedback I got earlier was that it was quite expensive. That said, I think the concern was about the temp tables where the truncate was happening on every transaction commit when they're used. For regular truncates I'm not sure how important optimizing it is. > * I wonder if this is the best place for ResetVacStats --- it > doesn't seem to be very close to the code it needs to stay in > sync with. If there's no better place, perhaps adding cross- > reference comments in both directions would be advisable. I'll look at that. I think relfrozenxid and relfrozenmxid are touched in a lot of places so it may be tilting at windmills to try to centralize the code working with them at this point... > * 0003 says it's running temp.sql by itself to avoid interference > from other sessions, but sadly that cannot avoid interference > from background autovacuum/autoanalyze. I seriously doubt this > patch would survive contact with the buildfarm. Do we actually > need a new test case? It's not like the code won't get exercised > without it --- we have plenty of temp table truncations, surely. No I don't think we do. I kept it in a separate commit so it could be dropped when committing. But just having truncate working isn't really good enough either. An early version of the patch had a bug that meant it didn't run at all so truncate worked fine but relfrozenxid never got reset. In thinking about whether we could have a basic test that temp tables are getting reset at all it occurs to me that there's still a gap here: You can have a session attached to a temp namespace that never actually uses the temp tables. That would prevent autovacuum from dropping them and still never reset their vacuum stats. :( Offhand I think PreCommit_on_commit_actions() could occasionally truncate all ON COMMIT TRUNCATE tables even if they haven't been touched in this transaction. -- greg
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал 2022-12-01 05:23: Hi Mr.Pyhalov. Hi. Attaching minor fixes. I haven't proof-read all comments (but perhaps, they need attention from some native speaker). Tested it with queries from https://github.com/swarm64/s64da-benchmark-toolkit, works as expected. -- Best regards, Alexander Pyhalov, Postgres Professionaldiff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 35f2d102374..bd8a4acc112 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -3472,9 +3472,9 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) if ((aggform->aggtranstype != INTERNALOID) && (aggform->aggfinalfn == InvalidOid)) { appendFunctionName(node->aggfnoid, context); } else if(aggform->partialaggfn) { - appendFunctionName((Oid)(aggform->partialaggfn), context); + appendFunctionName(aggform->partialaggfn, context); } else { - elog(ERROR, "there in no partialaggfn %u", node->aggfnoid); + elog(ERROR, "there is no partialaggfn %u", node->aggfnoid); } ReleaseSysCache(aggtup); } @@ -3986,7 +3986,8 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, } /* - * Check that partial aggregate function of aggform exsits in remote + * Check that partial aggregate function, described by aggform, + * exists on remote server, described by fpinfo. */ static bool partial_agg_compatible(Form_pg_aggregate aggform, PgFdwRelationInfo *fpinfo)
Re: [PATCH] Add native windows on arm64 support
On 05/11/2022 18:31, Andres Freund wrote: On 2022-11-03 11:06:46 +, Niyas Sait wrote: I've attached a new version of the patch which excludes the already merged ASLR changes and add small changes to handle latest changes in the build scripts. Note that we're planning to remove the custom windows build scripts before the next release, relying on the meson build instead. Thanks. I will add changes to add meson build support. This won't suffice with the meson build, since the relevant configure test also uses arm_acle.h: elif host_cpu == 'arm' or host_cpu == 'aarch64' prog = ''' #include int main(void) { unsigned int crc = 0; crc = __crc32cb(crc, 0); crc = __crc32ch(crc, 0); crc = __crc32cw(crc, 0); crc = __crc32cd(crc, 0); /* return computed value, to prevent the above being optimized away */ return crc == 0; } ''' if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd without -march=armv8-a+crc', args: test_c_args) # Use ARM CRC Extension unconditionally cdata.set('USE_ARMV8_CRC32C', 1) have_optimized_crc = true elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd with -march=armv8-a+crc', args: test_c_args + ['-march=armv8-a+crc']) # Use ARM CRC Extension, with runtime check cflags_crc += '-march=armv8-a+crc' cdata.set('USE_ARMV8_CRC32C', false) cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) have_optimized_crc = true endif endif The meson checking logic is used both for msvc and other compilers, so this will need to work with both. Yes, will handle that. -- Niyas
Re: [PATCH] Add native windows on arm64 support
On 07/11/2022 06:58, Michael Paquier wrote: #if defined(_WIN64) static __forceinline void spin_delay(void) { +#ifdef _M_ARM64 + /* +* arm64 way of hinting processor for spin loops optimisations +* ref: https://community.arm.com/support-forums/f/infrastructure-solutions-forum/48654/ssetoneon-faq + */ + __isb(_ARM64_BARRIER_SY); +#else _mm_pause(); +#endif } #else static __forceinline void I think we should just apply this, there seems very little risk of making anything worse, given the gating to _WIN64 && _M_ARM64. Seems so. Hmm, where does _ARM64_BARRIER_SY come from? Perhaps it would be better to have a comment referring to it from a different place than the forums of arm, like some actual docs? _ARM64_BARRIER_SY is defined in Microsoft Arm64 intrinsic documentation - https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170#BarrierRestrictions I couldn't find something more official for the sse2neon library part. -- Niyas
Re: Documentation for building with meson
Hi, On 2022-12-01 15:58:39 +0100, Peter Eisentraut wrote: > On 23.11.22 22:24, samay sharma wrote: > > Thank you. Attaching v7 addressing most of the points below. > > I have committed this, after some editing and making some structural > changes. Thanks. I was working on that too, but somehow felt a bit stuck... I'll try if I can adapt my pending changes. > I moved the "Requirements" section back to the top level. It did > not look appealing to have to maintain two copies of this that have almost > no substantial difference (but for some reason were written with separate > structure and wording). I don't think this is good. The whole "The following software packages are required for building PostgreSQL" section is wrong now. "They are not required in the default configuration, but they are needed when certain build options are enabled, as explained below:" section is misleading as well. By the time we fix all of those we'll end up with a different section again. > Also, I rearranged the Building with Meson section to use the same internal > structure as the Building with Autoconf and Make section. This will make it > easier to maintain going forward. For example if someone adds a new option, > it will be easier to find the corresponding places in the lists where to add > them. I don't know. The existing list order makes very little sense to me. The E.g. --enable-nls is before the rest in configure, presumably because it sorts there alphabetically. But that's not the case for meson. Copying "Anti-Features" as a structure element to the meson docs seems bogus (also the name is bogus, but that's a pre-existing issue). There's no difference in -Dreadline= to the other options meson-options-features list. Nor does -Dspinlocks= -Datomics= make sense in the "anti features" section. It made some sense for autoconf because of the --without- prefix, but that's not at play in meson. Their placement in the "Developer Options" made a whole lot more sense. I don't like "Miscellaneous" bit containing minor stuff like krb_srvnam and data layout changing options like blocksize,segsize,wal_blocksize. But it makes sense to change that for both at the same time. > We will likely keep iterating on the contents for the next little while, but > I'm glad we now have a structure in place that we should be able to live > with. I agree that it's good to have something we can work from more iteratively. But I don't think this is a structure that we can live with. I'm not particularly happy about this level of structural change made without discussing it prior. Greetings, Andres Freund
Re: Error-safe user functions
On 2022-11-21 Mo 00:26, Tom Lane wrote: > Corey Huinker writes: >> On Tue, Nov 15, 2022 at 11:36 AM Andrew Dunstan wrote: >>> Nikita, >>> just checking in, are you making progress on this? I think we really >>> need to get this reviewed and committed ASAP if we are to have a chance >>> to get the SQL/JSON stuff reworked to use it in time for release 16. >> I'm making an attempt at this or something very similar to it. I don't yet >> have a patch ready. > Cool. We can't delay too much longer on this if we want to have > a credible feature in v16. Although I want a minimal initial > patch, there will still be a ton of incremental work to do after > the core capability is reviewed and committed, so there's no > time to lose. > > OK, here's a set of minimal patches based on Nikita's earlier work and also some work by my colleague Amul Sul. It tries to follow Tom's original outline at [1], and do as little else as possible. Patch 1 introduces the IOCallContext node. The caller should set the no_error_throw flag and clear the error_found flag, which will be set by a conforming IO function if an error is found. It also includes a string field for an error message. I haven't used that, it's more there to stimulate discussion. Robert suggested to me that maybe it should be an ErrorData, but I'm not sure how we would use it. Patch 2 introduces InputFunctionCallContext(), which is similar to InputFunctionCall() but with an extra context parameter. Note that it's ok for an input function to return a NULL to this function if an error is found. Patches 3 and 4 modify the bool_in() and int4in() functions respectively to handle an IOContextCall appropriately if provided one in their fcinfo.context. Patch 5 introduces COPY FROM ... NULL_ON_ERROR which, in addition to being useful in itself, provides a test of the previous patches. I hope we can get a fairly quick agreement so that work can being on adjusting at least those things needed for the SQL/JSON patches ASAP. Our goal should be to adjust all the core input functions, but that's not quite so urgent, and can be completed in parallel with the SQL/JSON work. cheers andrew [1] https://www.postgresql.org/message-id/13351.1661965592%40sss.pgh.pa.us -- Andrew Dunstan EDB: https://www.enterprisedb.com From e84cf9461fe3898c0326e7c369c65fc4ef1761c6 Mon Sep 17 00:00:00 2001 From: Andrew Dunstan Date: Wed, 30 Nov 2022 10:44:27 -0500 Subject: [PATCH 1/5] Add IOCallContext node --- src/include/nodes/primnodes.h | 17 + 1 file changed, 17 insertions(+) diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h index 74f228d959..3e2ecd11b3 100644 --- a/src/include/nodes/primnodes.h +++ b/src/include/nodes/primnodes.h @@ -1692,4 +1692,21 @@ typedef struct OnConflictExpr List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */ } OnConflictExpr; +/*-- + * IOCallContext - passed to an input function as FunctionCallInfo.context + * + * if the field no_error_throw is set then a conforming IO function will + * set the field error_found on error rather than calling ereport(ERROR ...) + * + */ +typedef struct IOCallContext +{ + pg_node_attr(nodetag_only) /* don't need copy etc. support functions */ + + NodeTag type; + boolno_error_throw; /* set by caller to suppress errors */ + boolerror_found;/* cleared by caller, set by IO function */ + char *error_message; +} IOCallContext; + #endif /* PRIMNODES_H */ -- 2.34.1 From b6b277f5dd3837ab9ac2df819f4e67f39d790d1f Mon Sep 17 00:00:00 2001 From: Andrew Dunstan Date: Wed, 30 Nov 2022 10:46:03 -0500 Subject: [PATCH 2/5] Add InputFunctionCallContext() --- src/backend/utils/fmgr/fmgr.c | 50 +++ src/include/fmgr.h| 3 +++ 2 files changed, 53 insertions(+) diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c index 3c210297aa..bd91ae1dac 100644 --- a/src/backend/utils/fmgr/fmgr.c +++ b/src/backend/utils/fmgr/fmgr.c @@ -24,6 +24,7 @@ #include "miscadmin.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" +#include "nodes/primnodes.h" #include "pgstat.h" #include "utils/acl.h" #include "utils/builtins.h" @@ -1548,6 +1549,55 @@ InputFunctionCall(FmgrInfo *flinfo, char *str, Oid typioparam, int32 typmod) return result; } +Datum +InputFunctionCallContext(FmgrInfo *flinfo, char *str, + Oid typioparam, int32 typmod, + void * ctxt) +{ + LOCAL_FCINFO(fcinfo, 3); + Datum result; + + if (str == NULL && flinfo->fn_strict) + return (Datum) 0; /* just return null result */ + + InitFunctionCallInfoData(*fcinfo, flinfo, 3, InvalidOid, NULL, NULL); + + fcinfo->args[0].value = CStringGetDatum(str); + fcinfo->args[0].isnull = false; + fcinfo->args[1].value = ObjectIdGetDatum(typioparam); + fcinfo->args[1].isnull = false; + fcinfo->args[2].value = Int32GetDatum(typmod); + fcinfo->args[2].isnull = false; + + fcinfo->context = ctxt; +
Re: [PATCH] Add native windows on arm64 support
Hello, I've attached a new revision of the patch (v4) and includes following changes, 1. Add support for meson build system 2. Extend MSVC scripts to handle ARM64 platform. 3. Add arm64 definition of spin_delay function. 4. Exclude arm_acle.h import with MSVC compiler. V3->V4: Add support for meson build system V2->V3: Removed ASLR enablement and rebased on master. V1->V2: Rebased on top of master -- NiyasFrom 872f538f6261b36a32cbcecae82e99778b747799 Mon Sep 17 00:00:00 2001 From: Niyas Sait Date: Tue, 22 Feb 2022 13:07:24 + Subject: [PATCH v4] Enable postgres native build for windows-arm64 platform Following changes are included - Extend MSVC scripts to handle ARM64 platform - Add arm64 definition of spin_delay function - Exclude arm_acle.h import for MSVC - Add support for meson build --- meson.build | 33 +++- src/include/storage/s_lock.h | 10 +- src/port/pg_crc32c_armv8.c | 2 ++ src/tools/msvc/MSBuildProject.pm | 16 src/tools/msvc/Mkvcbuild.pm | 9 +++-- src/tools/msvc/Solution.pm | 11 --- src/tools/msvc/gendef.pl | 8 7 files changed, 62 insertions(+), 27 deletions(-) diff --git a/meson.build b/meson.build index 725e10d815..e354ad7650 100644 --- a/meson.build +++ b/meson.build @@ -1944,7 +1944,13 @@ int main(void) elif host_cpu == 'arm' or host_cpu == 'aarch64' - prog = ''' + if cc.get_id() == 'msvc' +cdata.set('USE_ARMV8_CRC32C', false) +cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) +have_optimized_crc = true + else + +prog = ''' #include int main(void) @@ -1960,18 +1966,19 @@ int main(void) } ''' - if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd without -march=armv8-a+crc', - args: test_c_args) -# Use ARM CRC Extension unconditionally -cdata.set('USE_ARMV8_CRC32C', 1) -have_optimized_crc = true - elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd with -march=armv8-a+crc', - args: test_c_args + ['-march=armv8-a+crc']) -# Use ARM CRC Extension, with runtime check -cflags_crc += '-march=armv8-a+crc' -cdata.set('USE_ARMV8_CRC32C', false) -cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) -have_optimized_crc = true +if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd without -march=armv8-a+crc', +args: test_c_args) + # Use ARM CRC Extension unconditionally + cdata.set('USE_ARMV8_CRC32C', 1) + have_optimized_crc = true +elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd with -march=armv8-a+crc', +args: test_c_args + ['-march=armv8-a+crc']) + # Use ARM CRC Extension, with runtime check + cflags_crc += '-march=armv8-a+crc' + cdata.set('USE_ARMV8_CRC32C', false) + cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) + have_optimized_crc = true +endif endif endif diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h index 8b19ab160f..bf6a6dba35 100644 --- a/src/include/storage/s_lock.h +++ b/src/include/storage/s_lock.h @@ -708,13 +708,21 @@ typedef LONG slock_t; #define SPIN_DELAY() spin_delay() /* If using Visual C++ on Win64, inline assembly is unavailable. - * Use a _mm_pause intrinsic instead of rep nop. + * Use _mm_pause (x64) or __isb(arm64) intrinsic instead of rep nop. */ #if defined(_WIN64) static __forceinline void spin_delay(void) { +#ifdef _M_ARM64 + /* +* arm64 way of hinting processor for spin loops optimisations +* ref: https://community.arm.com/support-forums/f/infrastructure-solutions-forum/48654/ssetoneon-faq + */ + __isb(_ARM64_BARRIER_SY); +#else _mm_pause(); +#endif } #else static __forceinline void diff --git a/src/port/pg_crc32c_armv8.c b/src/port/pg_crc32c_armv8.c index 9e301f96f6..981718752f 100644 --- a/src/port/pg_crc32c_armv8.c +++ b/src/port/pg_crc32c_armv8.c @@ -14,7 +14,9 @@ */ #include "c.h" +#ifndef _MSC_VER #include +#endif #include "port/pg_crc32c.h" diff --git a/src/tools/msvc/MSBuildProject.pm b/src/tools/msvc/MSBuildProject.pm index 58590fdac2..274ddc8860 100644 --- a/src/tools/msvc/MSBuildProject.pm +++ b/src/tools/msvc/MSBuildProject.pm @@ -310,10 +310,18 @@ sub WriteItemDefinitionGroup : ($self->{type} eq "dll" ? 'DynamicLibrary' : 'StaticLibrary'); my $libs = $self->GetAdditionalLinkerDependencies($cfgname, ';'); - my $targetmachine = - $self->{platform} eq 'Win32' ? 'MachineX86' : 'MachineX64'; - my $arch = - $self->{platform} eq 'Win32' ? 'x86' : 'x86_64'; + my $targetmachine; + my $arch; + if ($self->{platform} eq 'Win32') { + $targetmachine = 'MachineX86'; + $arch = 'x86'; + } elsif ($self->{platform} eq 'ARM64'){ + $targetmachine = 'MachineARM64'; + $arch = 'a
Re: explain_regress, explain(MACHINE), and default to explain(BUFFERS) (was: BUFFERS enabled by default in EXPLAIN (ANALYZE))
On Sat, Nov 05, 2022 at 10:43:07AM -0400, Tom Lane wrote: > Justin Pryzby writes: > > You set the patch to "waiting on author", which indicates that there's > > no need for further input or review. But, I think that's precisely > > what's needed - without input from more people, what could I do to > > progress the patch ? I don't think it's reasonable to put 001 first and > > change thousands (actually, 1338) of regression results. If nobody > > wants to discuss 001, then this patch series won't progress. > > Well ... > > 1. 0001 invents a new GUC but provides no documentation for it. > That certainly isn't committable, and it's discouraging the > discussion you seek because people have to read the whole patch > in detail to understand what is being proposed. > > 2. The same complaint for 0004, which labors under the additional I suggested that the discussion be limited to the early patches (004 is an optional, possible idea that I had, but it's evidently still causing confusion). > 3. I'm not really on board with the entire approach. Making > EXPLAIN work significantly differently for developers and test > critters than it does for everyone else seems likely to promote > confusion and hide bugs. I don't think getting rid of the need > for filter functions in test scripts is worth that. I'm open to suggestions how else to move towards the goal. Note that there's a few other things which have vaguely-similar behavior: HIDE_TABLEAM=on HIDE_TOAST_COMPRESSION=on compute_query_id = regress Another possibility is to make a new "explain" format, like | explain(FORMAT REGRESS, ...). ...but I don't see how that's would be less objectionable than what I've written. The patch record should probably be closed until someone proposes another way to implement what's necessary to enable explain(BUFFERS) by default. -- Justin
Re: Error-safe user functions
Andrew Dunstan writes: > OK, here's a set of minimal patches based on Nikita's earlier work and > also some work by my colleague Amul Sul. It tries to follow Tom's > original outline at [1], and do as little else as possible. This is not really close at all to what I had in mind. The main objection is that we shouldn't be passing back a "char *" error string (though I observe that your 0003 and 0004 patches aren't even bothering to do that much). I think we want to pass back a fully populated ErrorData struct so that we can report everything the actual error would have done (notably, the SQLSTATE). That means that elog.h/.c has to be intimately involved in this. I liked Nikita's overall idea of introducing an "ereturn" macro, with the idea that where we have, say, ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg("value \"%s\" is out of range for type %s", s, "integer"))); we would write ereturn(context, ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg("value \"%s\" is out of range for type %s", s, "integer"))); return NULL; // or whatever is appropriate and the involvement with the contents of the context node would all be confined to some new code in elog.c. That would help prevent the #include-footprint-bloat that is otherwise going to ensue. (Maybe we could assume that ereturn's elevel must be ERROR, and save a little notation. I'm not very wedded to "ereturn" as the new macro name either, though it's not awful.) Also, as I said before, the absolute first priority has to be documentation explaining what function authors are supposed to do differently than before. I'd be willing to have a go at this myself, except that Corey said he was working on it, so I don't want to step on his toes. regards, tom lane
Re: New docs chapter on Transaction Management and related changes
On 2022-Dec-01, Bruce Momjian wrote: > Patch reverted in all back branches. I was hoping to get support for > more aggressive backpatches of docs, but obviously failed. I should > have been clearer about my intent to backpatch, and will have to > consider these issues in future doc backpatches. FWIW I am in favor of backpatching doc fixes (even if they're not completely trivial, such as 02d43ad6262d) and smallish additions of content (63a370938). But we've added a few terms to the glossary (606c38459, 3dddb2a82) and those weren't backpatched, which seems appropriate to me. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "The problem with the facetime model is not just that it's demoralizing, but that the people pretending to work interrupt the ones actually working." (Paul Graham)
Re: Error-safe user functions
On Thu, Dec 1, 2022 at 1:14 PM Tom Lane wrote: > The main objection is that we shouldn't be passing back a "char *" > error string (though I observe that your 0003 and 0004 patches aren't > even bothering to do that much). I think we want to pass back a > fully populated ErrorData struct so that we can report everything > the actual error would have done (notably, the SQLSTATE). +1. > That means that elog.h/.c has to be intimately involved in this. > I liked Nikita's > overall idea of introducing an "ereturn" macro, with the idea that > where we have, say, > > ereport(ERROR, > (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), > errmsg("value \"%s\" is out of range for type %s", > s, "integer"))); > > we would write > > ereturn(context, ERROR, > (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), > errmsg("value \"%s\" is out of range for type %s", > s, "integer"))); > return NULL; // or whatever is appropriate It sounds like you're imagining that ereturn doesn't return, which seems confusing. But I don't know that I'd like it better if it did. Magic return statements hidden inside macros seem not too fun. What I'd like to see is a macro that takes a pointer to an ErrorData and the rest of the arguments like ereport() and stuffs everything in there. And then you can pass that to ThrowErrorData() later if you like. That way it's visible when you're using the macro where you're putting the error. I think that would make the code more readable. > Also, as I said before, the absolute first priority has to be > documentation explaining what function authors are supposed to > do differently than before. +1. > I'd be willing to have a go at this myself, except that Corey > said he was working on it, so I don't want to step on his toes. Time is short, and I do not think Corey will be too sad if you decide to have a go at it. The chances of person A being able to write the code person B is imagining as well as person B could write it are not great, regardless of who A and B are. And I think the general consensus around here is that you're a better coder than most. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Questions regarding distinct operation implementation
On 25/11/22 11:00, Ankit Kumar Pandey wrote: On 25/11/22 02:14, David Rowley wrote: On Fri, 25 Nov 2022 at 06:57, Ankit Kumar Pandey wrote: Please let me know any opinions on this. I think if you're planning on working on this then step 1 would have to be checking the SQL standard to see which set of rows it asks implementations to consider for duplicate checks when deciding if the transition should be performed or not. Having not looked, I don't know if this is the entire partition or just the rows in the current frame. Depending on what you want, an alternative today would be to run a subquery to uniquify the rows the way you want and then do the window function stuff. David Thanks David, these are excellent pointers, I will look into SQL standard first and so on. Hi, Looking further into it, I am bit clear about expectations of having distinct in Windows Aggregates (although I couldn't got hands on SQL standard as it is not in public domain but distinct in windows aggregate is supported by Oracle and I am using it as reference). For table (mytable): id, name 1, A 1, A 10, B 3, A 1, A /select avg(distinct id) over (partition by name)/ from mytable (in oracle db) yields: 2 2 2 2 10 From this, it is seen distinct is taken across the all rows in the partition. I also thought of using a subquery approach: /select avg(id) over (partition by name) from (select distinct(id), name from mytable)/ but this obviously doesn't yield right answer because result should contain same number of rows as input. This implies we need to find partition first and then remove duplicates within the partition. Can we avoid any ordering/sort until existing logic finds if value is in frame (so as to respect any /order by/ clause given by user), and once it is determined that tuple is in frame, skip the tuple if it is a duplicate? If aforementioned approach is right, question is how do we check if it is duplicate? Should we create a lookup table (as tuples coming to advance_windowaggregate can be in arbitrary order)? Or any other approach would be better? Any opinion on this will be appreciated. -- Regards, Ankit Kumar Pandey
Re: Temporary tables versus wraparound... again
Hi, On 2022-12-01 11:13:01 -0500, Greg Stark wrote: > On Sat, 5 Nov 2022 at 11:34, Tom Lane wrote: > > > > Greg Stark writes: > > > Simple Rebase > > > > I took a little bit of a look through these. > > > > * I find 0001 a bit scary, specifically that it's decided it's > > okay to apply extract_autovac_opts, pgstat_fetch_stat_tabentry_ext, > > and especially relation_needs_vacanalyze to another session's > > temp table. How safe is that really? > > I can look a bit more closely but none of them are doing any thing > with the table itself, just the catalog entries which afaik have > always been fair game for other sessions. So I'm not really clear what > kind of unsafeness you're asking about. Is that actually true? Don't we skip some locking operations for temporary tables, which then also means catalog modifications cannot safely be done in other sessions? > > Also, skipping that update > > for non-temp tables immediately falsifies ResetVacStats' > > claimed charter of "resetting to the same values used when > > creating tables". Surely GetOldestMultiXactId isn't *that* > > expensive, especially compared to the costs of file truncation. > > I think you should just do GetOldestMultiXactId straight up, > > and maybe submit a separate performance-improvement patch > > to make it do the other thing (in both places) for temp tables. > > Hm. the feedback I got earlier was that it was quite expensive. That > said, I think the concern was about the temp tables where the truncate > was happening on every transaction commit when they're used. For > regular truncates I'm not sure how important optimizing it is. And it's not called just once, but once for each relation. > > * I wonder if this is the best place for ResetVacStats --- it > > doesn't seem to be very close to the code it needs to stay in > > sync with. If there's no better place, perhaps adding cross- > > reference comments in both directions would be advisable. > > I'll look at that. I think relfrozenxid and relfrozenmxid are touched > in a lot of places so it may be tilting at windmills to try to > centralize the code working with them at this point... Not convinced. Yes, there's plenty of references to relfrozenxid, but most of them don't modify it. I find it problematic that ResetVacStats() bypasses tableam. Normal vacuums etc go through tableam but you put a ResetVacStats() besides each call to table_relation_nontransactional_truncate(). Seems like this should just be in heapam_relation_nontransactional_truncate()? Is it a good idea to use heap_inplace_update() in ResetVacStats()? Greetings, Andres Freund
Re: Allow round() function to accept float and double precision
On Thu, Dec 1, 2022 at 7:39 AM Tom Lane wrote: > Dean Rasheed writes: > > > The fact that passing a negative scale to round() isn't documented > > does seem like an oversight though... > > Agreed, will do something about that. > > Thanks. I'm a bit surprised you left "Rounds v to s decimal places." alone though. I feel like the prose should also make clear that positions to the left of the decimal, which are not conventionally considered decimal places, can be identified. Rounds v at s digits before or after the decimal place. The examples will hopefully clear up any off-by-one concerns that someone may have. David J.
Re: Error-safe user functions
Robert Haas writes: > It sounds like you're imagining that ereturn doesn't return, which > seems confusing. But I don't know that I'd like it better if it did. The spec I had in mind was that it would behave as ereport(ERROR) unless a suitable FuncErrorContext node is passed, in which case it'd store the error data into that node and return. This leaves the invoker with only the job of passing control back afterwards, if it gets control back. I'd be the first to agree that "ereturn" doesn't capture that detail very well, but I don't have a better name. (And I do like the fact that this name is the same length as "ereport", so that we won't end up with lots of reindentation to do.) > Magic return statements hidden inside macros seem not too fun. What > I'd like to see is a macro that takes a pointer to an ErrorData and > the rest of the arguments like ereport() and stuffs everything in > there. And then you can pass that to ThrowErrorData() later if you > like. That way it's visible when you're using the macro where you're > putting the error. I think that would make the code more readable. I think that'd just complicate the places that are having to report such errors --- of which there are likely to be hundreds by the time we are done. I will not accept a solution that requires more than the absolute minimum of additions to the error-reporting spots. regards, tom lane
Re: Allow round() function to accept float and double precision
On Thu, 1 Dec 2022 at 21:55, Dean Rasheed wrote: > Casting to numeric(1000, n) will work fine in all cases AFAICS (1000 > being the maximum allowed precision in a numeric typemod, and somewhat > more memorable). I wasn't aware of the typemod limit. I don't really agree that it will work fine in all cases though. If the numeric has more than 1000 digits left of the decimal point then the method won't work at all. # select length(('1' || repeat('0',2000))::numeric(1000,0)::text); ERROR: numeric field overflow DETAIL: A field with precision 1000, scale 0 must round to an absolute value less than 10^1000. No issue with round() with the same number. # select length(round(('1' || repeat('0',2000))::numeric,0)::text); length 2001 David
Re: Allow round() function to accept float and double precision
David Rowley writes: > I don't really agree that it will work fine in all cases though. If > the numeric has more than 1000 digits left of the decimal point then > the method won't work at all. But what we're talking about is starting from a float4 or float8 input, so it can't be more than ~308 digits. regards, tom lane
Re: Error-safe user functions
On Thu, Dec 1, 2022 at 2:41 PM Tom Lane wrote: > Robert Haas writes: > > It sounds like you're imagining that ereturn doesn't return, which > > seems confusing. But I don't know that I'd like it better if it did. > > The spec I had in mind was that it would behave as ereport(ERROR) > unless a suitable FuncErrorContext node is passed, in which case > it'd store the error data into that node and return. This leaves > the invoker with only the job of passing control back afterwards, > if it gets control back. I'd be the first to agree that "ereturn" > doesn't capture that detail very well, but I don't have a better name. > (And I do like the fact that this name is the same length as "ereport", > so that we won't end up with lots of reindentation to do.) I don't think it's sensible to make decisions about important syntax on the basis of byte-length. Reindenting is a one-time effort; code clarity will be with us forever. > > Magic return statements hidden inside macros seem not too fun. What > > I'd like to see is a macro that takes a pointer to an ErrorData and > > the rest of the arguments like ereport() and stuffs everything in > > there. And then you can pass that to ThrowErrorData() later if you > > like. That way it's visible when you're using the macro where you're > > putting the error. I think that would make the code more readable. > > I think that'd just complicate the places that are having to report > such errors --- of which there are likely to be hundreds by the time > we are done. OK, that's a fair point. -- Robert Haas EDB: http://www.enterprisedb.com
Re: meson PGXS compatibility
Hi, On 2022-12-01 08:43:19 +0100, Peter Eisentraut wrote: > On 13.10.22 07:23, Andres Freund wrote: > > > > 0005: meson: Add PGXS compatibility > > > > > > > > The actual meson PGXS compatibility. Plenty more replacements to > > > > do, but > > > > suffices to build common extensions on a few platforms. > > > > > > > > What level of completeness do we want to require here? > > > > > > I have tried this with a few extensions. Seems to work alright. I don't > > > think we need to overthink this. The way it's set up, if someone needs > > > additional variables set, they can easily be added. > > > > Yea, I am happy enough with it now that the bulk is out of src/meson.build > > and > > strip isn't set to an outright wrong value. > > How are you planning to proceed with this? I thought it was ready, but it > hasn't moved in a while. I basically was hoping for a review of the prerequisite patches I posted at [1], particularly 0003 (different approach than before, comparatively large). Here's an updated version of the patches. There was a stupid copy-paste bug in the prior version of the 0003 / export_dynamic patch. I'll push 0001, 0002 shortly. I don't think 0002 is the most elegant approach, but it's not awful. I'd still like some review for 0003, but will try to push it in a few days if that's not forthcoming. Greetings, Andres Freund [1] https://www.postgresql.org/message-id/20221013051648.ufz7ud2b5tioctyt%40awork3.anarazel.de >From 53b4063ff6230963ee60122d47b154f91990d097 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Wed, 5 Oct 2022 12:24:14 -0700 Subject: [PATCH v4 1/4] autoconf: Unify CFLAGS_SSE42 and CFLAGS_ARMV8_CRC32C Until now we emitted the cflags to build the CRC objects into architecture specific variables. That doesn't make a whole lot of sense to me - we're never going to target x86 and arm at the same time, so they don't need to be separate variables. It might be better to instead continue to have CFLAGS_SSE42 / CFLAGS_ARMV8_CRC32C be computed by PGAC_ARMV8_CRC32C_INTRINSICS / PGAC_SSE42_CRC32_INTRINSICS and then set CFLAGS_CRC based on those. But it seems unlikely that we'd need other sets of CRC specific flags for those two architectures at the same time. This simplifies the upcoming meson PGXS compatibility. Discussion: https://postgr.es/m/20221005200710.luvw5evhwf6cl...@awork3.anarazel.de --- configure.ac | 10 +- config/c-compiler.m4 | 8 src/port/Makefile | 16 configure | 19 +-- src/Makefile.global.in | 3 +-- 5 files changed, 27 insertions(+), 29 deletions(-) diff --git a/configure.ac b/configure.ac index f76b7ee31fc..6e7c8e09411 100644 --- a/configure.ac +++ b/configure.ac @@ -2091,12 +2091,11 @@ fi # # First check if the _mm_crc32_u8 and _mm_crc32_u64 intrinsics can be used # with the default compiler flags. If not, check if adding the -msse4.2 -# flag helps. CFLAGS_SSE42 is set to -msse4.2 if that's required. +# flag helps. CFLAGS_CRC is set to -msse4.2 if that's required. PGAC_SSE42_CRC32_INTRINSICS([]) if test x"$pgac_sse42_crc32_intrinsics" != x"yes"; then PGAC_SSE42_CRC32_INTRINSICS([-msse4.2]) fi -AC_SUBST(CFLAGS_SSE42) # Are we targeting a processor that supports SSE 4.2? gcc, clang and icc all # define __SSE4_2__ in that case. @@ -2110,12 +2109,13 @@ AC_COMPILE_IFELSE([AC_LANG_PROGRAM([], [ # # First check if __crc32c* intrinsics can be used with the default compiler # flags. If not, check if adding -march=armv8-a+crc flag helps. -# CFLAGS_ARMV8_CRC32C is set if the extra flag is required. +# CFLAGS_CRC is set if the extra flag is required. PGAC_ARMV8_CRC32C_INTRINSICS([]) if test x"$pgac_armv8_crc32c_intrinsics" != x"yes"; then PGAC_ARMV8_CRC32C_INTRINSICS([-march=armv8-a+crc]) fi -AC_SUBST(CFLAGS_ARMV8_CRC32C) + +AC_SUBST(CFLAGS_CRC) # Select CRC-32C implementation. # @@ -2144,7 +2144,7 @@ if test x"$USE_SLICING_BY_8_CRC32C" = x"" && test x"$USE_SSE42_CRC32C" = x"" && USE_SSE42_CRC32C_WITH_RUNTIME_CHECK=1 else # Use ARM CRC Extension if available. - if test x"$pgac_armv8_crc32c_intrinsics" = x"yes" && test x"$CFLAGS_ARMV8_CRC32C" = x""; then + if test x"$pgac_armv8_crc32c_intrinsics" = x"yes" && test x"$CFLAGS_CRC" = x""; then USE_ARMV8_CRC32C=1 else # ARM CRC Extension, with runtime check? diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index 000b075312e..eb8cc8ce170 100644 --- a/config/c-compiler.m4 +++ b/config/c-compiler.m4 @@ -597,7 +597,7 @@ fi])# PGAC_HAVE_GCC__ATOMIC_INT64_CAS # the other ones are, on x86-64 platforms) # # An optional compiler flag can be passed as argument (e.g. -msse4.2). If the -# intrinsics are supported, sets pgac_sse42_crc32_intrinsics, and CFLAGS_SSE42. +# intrinsics are supported, sets pgac_sse42_crc32_intrinsics, and CFLAGS_CRC. AC_DEFUN([PGAC_SSE42_CRC32_INTRINSICS], [define([Ac_cachevar], [AS_TR_SH([pgac_cv_sse42_crc32_intrins
Re: Error-safe user functions
Robert Haas writes: > On Thu, Dec 1, 2022 at 2:41 PM Tom Lane wrote: >> I'd be the first to agree that "ereturn" >> doesn't capture that detail very well, but I don't have a better name. >> (And I do like the fact that this name is the same length as "ereport", >> so that we won't end up with lots of reindentation to do.) > I don't think it's sensible to make decisions about important syntax > on the basis of byte-length. Reindenting is a one-time effort; code > clarity will be with us forever. Sure, but without a proposal for a better name, that's irrelevant. regards, tom lane
Re: Lazy JIT IR code generation to increase JIT speed with partitions
> On Thu, Jul 14, 2022 at 02:45:29PM +0200, David Geier wrote: > On Mon, Jul 4, 2022 at 10:32 PM Andres Freund wrote: > > On 2022-06-27 16:55:55 +0200, David Geier wrote: > > > Indeed, the total JIT time increases the more modules are used. The > > reason > > > for this to happen is that the inlining pass loads and deserializes all > > to > > > be inlined modules (.bc files) from disk prior to inlining them via > > > llvm::IRMover. There's already a cache for such modules in the code, but > > it > > > is currently unused. This is because llvm::IRMover takes the module to be > > > inlined as std::unique_ptr. The by-value argument requires > > > the source module to be moved, which means it cannot be reused > > afterwards. > > > The code is accounting for that by erasing the module from the cache > > after > > > inlining it, which in turns requires reloading the module next time a > > > reference to it is encountered. > > > > > > Instead of each time loading and deserializing all to be inlined modules > > > from disk, they can reside in the cache and instead be cloned via > > > llvm::CloneModule() before they get inlined. Key to calling > > > llvm::CloneModule() is fully deserializing the module upfront, instead of > > > loading the module lazily. That is why I changed the call from > > > LLVMGetBitcodeModuleInContext2() (which lazily loads the module via > > > llvm::getOwningLazyBitcodeModule()) to LLVMParseBitCodeInContext2() > > (which > > > fully loads the module via llvm::parseBitcodeFile()). Beyond that it > > seems > > > like that prior to LLVM 13, cloning modules could fail with an assertion > > > (not sure though if that would cause problems in a release build without > > > assertions). Andres reported this problem back in the days here [1]. In > > the > > > meanwhile the issue got discussed in [2] and finally fixed for LLVM 13, > > see > > > [3]. > > > > Unfortunately that doesn't work right now - that's where I had started. The > > problem is that IRMover renames types. Which, in the case of cloned modules > > unfortunately means that types used cloned modules are also renamed in the > > "origin" module. Which then causes problems down the line, because parts of > > the LLVM code match types by type names. > > > > That can then have the effect of drastically decreasing code generation > > quality over time, because e.g. inlining never manages to find signatures > > compatible. Hi, Thanks for the patch, looks quite interesting! First to summarize things a bit: from what I understand there are two suggestions on the table, one is about caching modules when doing inlining, the second is about actual lazy jitting. Are those to tightly coupled together, could they be split into two separate patches? It would make it a bit easier to review and test. I was playing with the caching part of the patch (still have to think about the lazy jitting), which generally seems to be a good idea. From what I see the main concern here is a chance that IRMover will rename types, degrading performance of the generated code in long run. I have to admit, I'm not fully sure mentioned LLVM 13 fix [1] completely eliminates those concerns, somehow its description is formulated in not very committing way ("don't know if this patch fixes ..., but it does fix a few soundness issues that have crept in."). But I haven't found any crashes or false asserts coming from the LLVM side (using v13), running several rounds of regression tests with forced jitting, so a point to the fix. I would be curious to learn how Andres was troubleshooting type renaming issues? Using LLVM 13 from packages, jitting the same query twice and dumping the bitcode out showed some difference in types a-la "%struct.ExprState.142" vs "%struct.ExprState.430" (from what I understood, the whole thing is an identifier, including the number) with the patch, but the very same changes are happening on the main branch as well. Of course, I was inspecting bitcode only for certain queries, it doesn't exclude that some other examples actually feature type renaming. In general, it would be interesting to know how to do some sort of "smoke tests" for the generated code, e.g. in case if LLVM has fixed this particular issue, but they might reappear in the future? I did few performance tests and got numbers similar to posted in the thread, inlining time being impressively reduced (~10 times) as well as (suspiciously) optimization time (~1.5 times). The approach was a bit different though, I've run the sequence of example queries from the thread using pgbench and checked jit counters from pgss. Few small notes: If caching of modules is safe from LLVM >= 13, I guess it should be wrapped into a corresponding condition on LLVM_VERSION_MAJOR, right? Why the assert about hasExternalLinkage was removed from the llvmjit_inline.cpp? For so much discussion about such a small change there is definitely not enough commentaries in the code about dangers of cloning modules. A
Using WaitEventSet in the postmaster
Hi, Here's a work-in-progress patch that uses WaitEventSet for the main event loop in the postmaster, with a latch as the wakeup mechanism for "PM signals" (requests from backends to do things like start a background worker, etc). There are still raw signals that are part of the external interface (SIGHUP etc), but those handlers just set a flag and set the latch, instead of doing the state machine work. Some advantages I can think of: 1. Inherits various slightly more efficient modern kernel APIs for multiplexing. 2. Will automatically benefit from later improvements to WaitEventSet. 3. Looks much more like the rest of our code. 4. Requires no obscure signal programming knowledge to understand. 5. Removes the strange call stacks we have, where most of postgres is forked from inside a signal handler. 6. Might help with weirdness and bugs in some signal implementations (Cygwin, NetBSD?). 7. Removes the need to stat() PROMOTE_SIGNAL_FILE and LOGROTATE_SIGNAL_FILE whenever PM signals are sent, now that SIGUSR1 is less overloaded. 8. It's a small step towards removing the need to emulate signals on Windows. In order to avoid adding a new dependency on the contents of shared memory, I introduced SetLatchRobustly() that will always use the slow path kernel wakeup primitive, even in cases where SetLatch() would not. The idea here is that if one backend trashes shared memory, others backends can still wake the postmaster even though it may appear that the postmaster isn't waiting or the latch is already set. It would be possible to go further and have a robust wait mode that doesn't read is_set too. It was indecision here that stopped me proposing this sooner... One thing that might need more work is cleanup of the PM's WES in child processes. Also I noticed in passing that the Windows kernel event handles for latches are probably leaked on crash-reinit, but that was already true, this just adds one more. Also the way I re-add the latch every time through the event loop in case there was a crash-reinit is stupid, I'll tidy that up. This is something I extracted and rejuvenated from a larger set of patches I was hacking on a year or two ago to try to get rid of lots of users of raw signals. The recent thread about mamba's struggles and the possibility that latches might help reminded me to dust this part off, and potentially avoid some duplicated effort. I'm not saying this is free of bugs, but it's passing on CI and seems like enough to share and see what people think. (Some other ideas I thought about back then: we could invent WL_SIGNAL, and not need all those global flag variables or the handlers that set the latch. For eg kqueue it's trivial, and for ancient Unixes you could do a sort of home-made signalfd with a single generic signal handler that just does write(self_pipe, &siginfo, sizeof(siginfo)). But that starts to seems like refactoring for refactoring's sake; that's probably how I'd write a native kqueue program, but it's not even that obvious to me that we really should be pretending that Windows has signals, which put me off that idea. Perhaps in a post-fake-signals world we just have the postmaster's event loop directly consume commands from pg_ctl from a control pipe? Too many decisions at once, I gave that line of thinking up for now.) From 978aa358885312372f842cd47549bb04a78477ab Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Wed, 9 Nov 2022 22:59:58 +1300 Subject: [PATCH] Give the postmaster a WaitEventSet and a latch. Traditionally, the postmaster's architecture was quite unusual. It did its main work entirely inside signal handlers, which were only unblocked while waiting in select(). Switch to a more typical architecture, where signal handlers just set flags and use a latch to close races. Now the postmaster looks like all other PostgreSQL processes, multiplexing its event processing in epoll_wait()/kevent()/poll()/WaitForMultipleObjects() depending on the OS. Work in progress! --- src/backend/libpq/pqsignal.c| 40 src/backend/postmaster/postmaster.c | 336 ++-- src/backend/storage/ipc/latch.c | 54 - src/backend/storage/ipc/pmsignal.c | 26 ++- src/backend/utils/init/miscinit.c | 13 +- src/include/libpq/pqsignal.h| 3 - src/include/miscadmin.h | 1 + src/include/storage/latch.h | 9 +- src/include/storage/pmsignal.h | 3 + 9 files changed, 259 insertions(+), 226 deletions(-) diff --git a/src/backend/libpq/pqsignal.c b/src/backend/libpq/pqsignal.c index 1ab34c5214..718043a39d 100644 --- a/src/backend/libpq/pqsignal.c +++ b/src/backend/libpq/pqsignal.c @@ -97,43 +97,3 @@ pqinitmask(void) sigdelset(&StartupBlockSig, SIGALRM); #endif } - -/* - * Set up a postmaster signal handler for signal "signo" - * - * Returns the previous handler. - * - * This is used only in the postmaster, which has its own odd approach to - * signal handling. For signals with handlers, we b
Re: Error-safe user functions
On Thu, Dec 1, 2022 at 3:49 PM Tom Lane wrote: > > I don't think it's sensible to make decisions about important syntax > > on the basis of byte-length. Reindenting is a one-time effort; code > > clarity will be with us forever. > > Sure, but without a proposal for a better name, that's irrelevant. Sure, but you're far too clever not to be able to come up with something good without any help from me. io_error_return_or_throw()? store_or_report_io_error()? Or just store_io_error()? It sounds to me like we're crafting something that is specific to and can only be used with type input and output functions, so the name probably should reflect that rather than being something totally generic like ereturn() or error_stash() or whatever. If we were making this into a general-purpose way of sticking an error someplace, then a name like that would make sense and this would be an extension of the elog.c interface. But what you're proposing is a great deal more specialized than that. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Allow round() function to accept float and double precision
On Fri, 2 Dec 2022 at 09:02, Tom Lane wrote: > > David Rowley writes: > > I don't really agree that it will work fine in all cases though. If > > the numeric has more than 1000 digits left of the decimal point then > > the method won't work at all. > > But what we're talking about is starting from a float4 or float8 > input, so it can't be more than ~308 digits. I may have misunderstood. I thought David J was proposing this as a useful method for rounding numeric too. Re-reading what he wrote, I no longer think he was. David
Re: Questions regarding distinct operation implementation
On Fri, 2 Dec 2022 at 08:10, Ankit Kumar Pandey wrote: > select avg(distinct id) over (partition by name) from mytable (in oracle db) > yields: > 2 > 2 > 2 > 2 > 10 > > From this, it is seen distinct is taken across the all rows in the partition. Due to the lack of ORDER BY clause, all rows in the partition are in the window frame at once. The question is, what *should* happen if you add an ORDER BY. Looking at the copy of the standard that I have, I see nothing explicitly mentioned about aggregates with DISTINCT used as window functions, however, I do see in the Window Function section: "The window aggregate functions compute an (COUNT, SUM, AVG, etc.), the same as a group aggregate function, except that the computation aggregates over the window frame of a row rather than over a group of a grouped table. The hypothetical set functions are not permitted as window aggregate functions." So you could deduce that the DISTINCT would also need to be applied over the frame too. The question is, what do you want to make work? If you're not worried about supporting DISTINCT when there is an ORDER BY clause and the frame options are effectively ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, then it's going to be much easier to make work. You never need to worry about rows dropping out of visibility in the frame. Simply all rows in the partition are in the frame. You do need to be careful as, if I remember correctly, we do support some non-standard things here. I believe the standard requires an ORDER BY when specifying frame options. I think we didn't see any technical reason to apply that limitation, so didn't. That means in Postgres, you can do things like: select avg(id) over (partition by name ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING) from mytable; but that's unlikely to work on most other databases without adding an ORDER BY. So if you are going to limit this to only being supported without an ORDER BY, then you'll need to ensure that the specified frame options don't cause your code to break. I'm unsure, but this might be a case of checking for FRAMEOPTION_NONDEFAULT unless it's FRAMEOPTION_START_UNBOUNDED_PRECEDING|FRAMEOPTION_END_UNBOUNDED_FOLLOWING. You'll need to study that a bit more than I just did though. One way to make that work might be to add code to eval_windowaggregates() around the call to advance_windowaggregate(), you can see the row being aggregated is set by: winstate->tmpcontext->ecxt_outertuple = agg_row_slot; what you'd need to do here is change the code so that you put all the rows to aggregate into a tuplesort then sort them by the distinct column and instead, feed the tuplesort rows to advance_windowaggregate(). You'd need to add code similar to what is in process_ordered_aggregate_single() in nodeAgg.c to have the duplicate consecutive rows skipped. Just a word of warning on this. This is a hugely complex area of Postgres. If I was you, I'd make sure and spend quite a bit of time reading nodeWindowAgg.c and likely much of nodeAgg.c. Any changes we accept in that area are going to have to be very carefully done. Make sure you're comfortable with the code before doing too much. It would be very easy to end up with a giant mess if you try to do this without fully understanding the implications of your changes. Also, you'll need to show you've not regressed the performance of the existing features with the code you've added. Good luck! David
Re: O(n) tasks cause lengthy startups and checkpoints
On Wed, Nov 30, 2022 at 05:27:10PM +0530, Bharath Rupireddy wrote: > On Wed, Nov 30, 2022 at 4:52 PM Bharath Rupireddy > wrote: >> Thanks for the patches. I spent some time on reviewing v17 patch set >> and here are my comments: Thanks for reviewing! >> 0001: >> 1. I think the custodian process needs documentation - it needs a >> definition in glossary.sgml and perhaps a dedicated page describing >> what tasks it takes care of. Good catch. I added this in v18. I stopped short of adding a dedicated page to describe the tasks because 1) there are no parameters for the custodian and 2) AFAICT none of its tasks are described in the docs today. >> 2. >> +LWLockReleaseAll(); >> +ConditionVariableCancelSleep(); >> +AbortBufferIO(); >> +UnlockBuffers(); >> +ReleaseAuxProcessResources(false); >> +AtEOXact_Buffers(false); >> +AtEOXact_SMgr(); >> +AtEOXact_Files(false); >> +AtEOXact_HashTables(false); >> Do we need all of these in the exit path? Isn't the stuff that >> ShutdownAuxiliaryProcess() does enough for the custodian process? >> AFAICS, the custodian process uses LWLocks (which the >> ShutdownAuxiliaryProcess() takes care of) and it doesn't access shared >> buffers and so on. >> Having said that, I'm fine to keep them for future use and all of >> those cleanup functions exit if nothing related occurs. Yeah, I don't think we need a few of these. In v18, I've kept the following: * LWLockReleaseAll() * ConditionVariableCancelSleep() * ReleaseAuxProcessResources(false) * AtEOXact_Files(false) >> 3. >> + * Advertise out latch that backends can use to wake us up while we're >> Typo - %s/out/our fixed >> 4. Is it a good idea to add log messages in the DoCustodianTasks() >> loop? Maybe at a debug level? The log message can say the current task >> the custodian is processing. And/Or setting the custodian's status on >> the ps display is also a good idea IMO. I'd like to pick these up in a new thread if/when this initial patch set is committed. The tasks already do some logging, and the checkpointer process doesn't update the ps display for these tasks today. >> 0002 and 0003: >> 1. >> +CHECKPOINT; >> +DO $$ >> I think we need to ensure that there are some snapshot files before >> the checkpoint. Otherwise, it may happen that the above test case >> exits without the custodian process doing anything. >> >> 2. I think the best way to test the custodian process code is by >> adding a TAP test module to see actually the custodian process kicks >> in. Perhaps, add elog(DEBUGX,...) messages to various custodian >> process functions and see if we see the logs in server logs. The test appears to reliably create snapshot and mapping files, so if the directories are empty at some point after the checkpoint at the end, we can be reasonably certain the custodian took action. I didn't add explicit checks that there are files in the directories before the checkpoint because a concurrent checkpoint could make such checks unreliable. >> 0004: >> I think the 0004 patch can be merged into 0001, 0002 and 0003 patches. >> Otherwise the patch LGTM. I'm keeping this one separate because I've received conflicting feedback about the idea. >> 1. I think we can trivially extend the custodian process to remove any >> future WAL files on the old timeline, something like the attached >> 0001-Move-removal-of-future-WAL-files-on-the-old-timeline.text file). >> While this offloads the recovery a bit, the server may archive such >> WAL files before the custodian removes them. We can do a bit more to >> stop the server from archiving such WAL files, but that needs more >> coding. I don't think we need to do all that now, perhaps, we can give >> it a try once the basic custodian stuff gets in. >> 2. Moving RemovePgTempFiles() to the custodian can bring up the server >> soon. The idea is that the postmaster just renames the temp >> directories and informs the custodian so that it can go delete such >> temp files and directories. I have personally seen cases where the >> server spent a good amount of time cleaning up temp files. We can park >> it for later. >> 3. Moving RemoveOldXlogFiles() to the custodian can make checkpoints faster. >> 4. PreallocXlogFiles() - if we ever have plans to make pre-allocation >> more aggressive (pre-allocate more than 1 WAL file), perhaps letting >> custodian do that is a good idea. Again, too many tasks for a single >> process. I definitely want to do #2. І have some patches for that upthread, but I removed them for now based on Simon's feedback. I intend to pick that up in a new thread. I haven't thought too much about the others yet. > Another comment: > IIUC, there's no custodian_delay GUC as we want to avoid unnecessary > wakeups for power savings (being discussed in the other thread). > However, can it happen that the custodian missed to capture SetLatch > wakeups by other backends? In o
Re: Allow round() function to accept float and double precision
On Thu, Dec 1, 2022 at 2:21 PM David Rowley wrote: > On Fri, 2 Dec 2022 at 09:02, Tom Lane wrote: > > > > David Rowley writes: > > > I don't really agree that it will work fine in all cases though. If > > > the numeric has more than 1000 digits left of the decimal point then > > > the method won't work at all. > > > > But what we're talking about is starting from a float4 or float8 > > input, so it can't be more than ~308 digits. > > I may have misunderstood. I thought David J was proposing this as a > useful method for rounding numeric too. Re-reading what he wrote, I no > longer think he was. > > I was not, my response was that what is being asked for is basically a cast from float to numeric, and doing that via a "round()" function seems odd. And we can handle the desired rounding aspect of that process already via the existing numeric(1000, n) syntax. David J.
Re: Questions regarding distinct operation implementation
On Thu, Dec 1, 2022 at 2:37 PM David Rowley wrote: > > The question is, what do you want to make work? If you're not worried > about supporting DISTINCT when there is an ORDER BY clause and the > frame options are effectively ROWS BETWEEN UNBOUNDED PRECEDING AND > UNBOUNDED FOLLOWING, then it's going to be much easier to make work. > You never need to worry about rows dropping out of visibility in the > frame. Simply all rows in the partition are in the frame. > I would definitely want the ability to have the output ordered and distinct at the same time. array_agg(distinct col) over (order by whatever) Conceptually this seems like it can be trivially accomplished with a simple lookup table, the key being the distinct column(s) and the value being a counter - with the entry being removed when the counter goes to zero (decreases happening each time a row goes out of scope). The main concern, I suspect, isn't implementation ability, it is speed and memory consumption. I would expect the distinct output to be identical to the non-distinct output except for duplicates removed. Using array_agg as an example makes seeing the distinction quite easy. Thinking over the above a bit more, is something like this possible? array_agg(distinct col order by col) over (order by whatever) i.e., can we add order by within the aggregate to control its internal ordering separately from the ordering needed for the window framing? David J.
Re: Error-safe user functions
Robert Haas writes: > It sounds to me like we're crafting something that is specific to and > can only be used with type input and output functions, so the name > probably should reflect that rather than being something totally > generic like ereturn() or error_stash() or whatever. My opinion is exactly the opposite. Don't we already have a need for error-safe type conversions, too, in the JSON stuff? Even if I couldn't point to a need-it-now requirement, I think we will eventually find a use for this with some other classes of functions. > If we were making > this into a general-purpose way of sticking an error someplace, then a > name like that would make sense and this would be an extension of the > elog.c interface. But what you're proposing is a great deal more > specialized than that. I'm proposing *exactly* an extension of the elog.c interface; so were you, a couple messages back. It's only specialized to I/O in the sense that our current need is for that. regards, tom lane
Re: Asynchronous execution support for Custom Scan
> > IIUC, we already can use ctid in the where clause on the latest > > PostgreSQL, can't we? > > Oh, sorry, I missed the TidRangeScan. My apologies for the noise. > I made the ctidscan extension when we developed CustomScan API towards v9.5 or v9.6, IIRC. It would make sense just an example of CustomScan API (e.g, PG-Strom code is too large and complicated to learn about this API), however, makes no sense from the standpoint of the functionality. Best regards, 2022年12月1日(木) 22:27 Ronan Dunklau : > > > IIUC, we already can use ctid in the where clause on the latest > > PostgreSQL, can't we? > > Oh, sorry, I missed the TidRangeScan. My apologies for the noise. > > Best regards, > > -- > Ronan Dunklau > > > > -- HeteroDB, Inc / The PG-Strom Project KaiGai Kohei
Using AF_UNIX sockets always for tests on Windows
Hi, Commit f5580882 established that all supported computers have AF_UNIX. One of the follow-up consequences that was left unfinished is that we could simplify our test harness code to make it the same on all platforms. Currently we have hundreds of lines of C and perl to use secure TCP connections instead for the benefit of defunct Windows versions. Here's a patch set for that. These patches and some discussion of them were buried in the recent clean-up-after-recently-dropped-obsolete-systems thread[1], and I didn't want to lose track of them. I think they would need review and testing from a Windows-based hacker to make progress. The patches are: 1. Teach mkdtemp() to make a non-world-accessible directory. This is required to be able to make a socket that other processes can't connect to, to match the paranoia level used on Unix. This was written just by reading documentation, because I am not a Windows user, so I would be grateful for a second opinion and/or testing from a Windows hacker, which would involve testing with two different users. The idea is that Windows' mkdir() is completely ignoring the permissions (we can see in the mingw headers that it literally throws away the mode argument), so we shouldn't use that, but native CreateDirectory() when given a pointer to a SECURITY_ATTRIBUTES with lpSecurityDesciptor set to NULL should only allow the current user to access the object (directory). Does this really work, and would it be better to create some more explicit private-keep-out SECURITY_ATTRIBUTE, and how would that look? I'm fairly sure that filesystem permissions must be enough to stop another OS user from connecting, because it's clear from documentation that AF_UNIX works on Windows by opening the file and reading some magic "reparse" data from inside it, so if you can't see into the containing directory, you can't do it. This patch is the one the rest are standing on, because the tests should match Unix in their level of security. 2. Always use AF_UNIX for pg_regress. Remove a bunch of no-longer-needed sspi auth stuff. Remove comments that worried about signal handler safety (referring here to real Windows signals, not fake PostgreSQL signals that are a backend-only concept). By my reading of the Windows documentation and our code, there is no real concern here, so the remove_temp() stuff should be fine, as I have explained in a new comment. But I have not tested this signal safety claim, not being a Windows user. I added an assertion that should hold if I am right. If you run this on Windows and interrupt pg_regress with ^C, does it hold? 3. Use AF_UNIX for TAP tests too. 4. In passing, remove our documentation's claim that Linux's "abstract" AF_UNIX namespace is available on Windows. It does not work at all, according to all reports (IMHO it seems like an inherently insecure interface that other OSes would be unlikely to adopt). Note that this thread is not about libpq, which differs from Unix by defaulting to host=localhost rather than AF_UNIX IIRC. That's a user-facing policy decision I'm not touching; this thread is just about cleaning up old test infrastructure of interest to hackers. [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGJ3LHeP9w5Fgzdr4G8AnEtJ%3Dz%3Dp6hGDEm4qYGEUX5B6fQ%40mail.gmail.com From 62b1cdbdc848f96eef02ed97f14be9c1f4557539 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Wed, 7 Sep 2022 07:35:11 +1200 Subject: [PATCH 1/4] WIP: Make mkdtemp() more secure on Windows. Our POSIX mkdtemp() implementation in src/port/mkdtemp.c code would create directories with default permissions on Windows. Fix, using native Windows API instead of mkdir(). --- src/port/mkdtemp.c | 12 1 file changed, 12 insertions(+) diff --git a/src/port/mkdtemp.c b/src/port/mkdtemp.c index 8809957dcd..8116317435 100644 --- a/src/port/mkdtemp.c +++ b/src/port/mkdtemp.c @@ -187,8 +187,20 @@ GETTEMP(char *path, int *doopen, int domkdir) } else if (domkdir) { +#ifdef WIN32 + SECURITY_ATTRIBUTES sa = { +.nLength = sizeof(SECURITY_ATTRIBUTES), +.lpSecurityDescriptor = NULL, +.bInheritHandle = false + }; + + if (CreateDirectory(path, &sa)) +return 1; + _dosmaperr(GetLastError()); +#else if (mkdir(path, 0700) >= 0) return 1; +#endif if (errno != EEXIST) return 0; } -- 2.38.1 From 388719a6fbb45896ff87794ed4bfdbc0f0aac329 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Fri, 19 Aug 2022 11:28:38 +1200 Subject: [PATCH 2/4] WIP: Always use Unix-domain sockets in pg_regress on Windows. Since we can now rely on Unix-domain sockets working on supported Windows versions (10+), we can remove a source of instability and a difference between Unix and Windows in pg_regress. Previously, we thought the socket cleanup code was unsafe, so we made Unix-domain sockets an option with a "use-at-your-own-risk" note. On closer inspection, the concerns about signal handlers don't seem to apply here. (initdb.c h
Re: HOT chain validation in verify_heapam()
Hi, On 2022-11-30 16:09:19 +0530, Himanshu Upadhyaya wrote: > has been updated to handle cases of sub-transaction. Thanks! > + /* Loop over offsets and validate the data in the predecessor > array. */ > + for (OffsetNumber currentoffnum = FirstOffsetNumber; > currentoffnum <= maxoff; > + currentoffnum = OffsetNumberNext(currentoffnum)) > + { > + HeapTupleHeader pred_htup; > + HeapTupleHeader curr_htup; > + TransactionId pred_xmin; > + TransactionId curr_xmin; > + ItemId pred_lp; > + ItemId curr_lp; > + boolpred_in_progress; > + XidCommitStatus xid_commit_status; > + XidBoundsViolation xid_status; > + > + ctx.offnum = predecessor[currentoffnum]; > + ctx.attnum = -1; > + curr_lp = PageGetItemId(ctx.page, currentoffnum); > + if (!lp_valid[currentoffnum] || > ItemIdIsRedirected(curr_lp)) > + continue; I don't think we should do PageGetItemId(ctx.page, currentoffnum); if !lp_valid[currentoffnum]. > + curr_htup = (HeapTupleHeader) PageGetItem(ctx.page, > curr_lp); > + curr_xmin = HeapTupleHeaderGetXmin(curr_htup); > + xid_status = get_xid_status(curr_xmin, &ctx, > &xid_commit_status); > + if (!(xid_status == XID_BOUNDS_OK || xid_status == > XID_INVALID)) > + continue; Why can we even get here if the xid status isn't XID_BOUNDS_OK? > + if (ctx.offnum == 0) For one, I think it'd be better to use InvalidOffsetNumber here. But more generally, storing the predecessor in ctx.offnum seems quite confusing. > + { > + /* > + * No harm in overriding value of ctx.offnum as > we will always > + * continue if we are here. > + */ > + ctx.offnum = currentoffnum; > + if (TransactionIdIsInProgress(curr_xmin) || > TransactionIdDidCommit(curr_xmin)) Is it actually ok to call TransactionIdDidCommit() here? There's a reason get_xid_status() exists after all. And we do have the xid status for xmin already, so this could just check xid_commit_status, no? Greetings, Andres Freund
Re: wake up logical workers after ALTER SUBSCRIPTION
On Tue, Nov 29, 2022 at 09:04:41PM -0800, Nathan Bossart wrote: > I don't mind fixing it! There are a couple more I'd like to track down > before posting another revision. Okay, here is a new version of the patch. This seems to clear up everything that I could find via the tests. Thanks to this effort, I discovered that we need to include wal_retrieve_retry_interval in our wait time calculations after failed tablesyncs (for the suppress-unnecessary-wakeups patch). I'll make that change and post that patch in a new thread. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 79db8837e5b3054fb6007326d230aef50f3cda87 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 21 Nov 2022 16:01:01 -0800 Subject: [PATCH v4 1/1] wake up logical workers after ALTER SUBSCRIPTION and when two_phase mode can be enabled --- src/backend/access/transam/xact.c| 3 ++ src/backend/catalog/pg_subscription.c| 7 src/backend/commands/alter.c | 7 src/backend/commands/subscriptioncmds.c | 4 +++ src/backend/replication/logical/worker.c | 46 src/include/replication/logicalworker.h | 3 ++ 6 files changed, 70 insertions(+) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 8086b857b9..dc00e66cfb 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -47,6 +47,7 @@ #include "pgstat.h" #include "replication/logical.h" #include "replication/logicallauncher.h" +#include "replication/logicalworker.h" #include "replication/origin.h" #include "replication/snapbuild.h" #include "replication/syncrep.h" @@ -2360,6 +2361,7 @@ CommitTransaction(void) AtEOXact_PgStat(true, is_parallel_worker); AtEOXact_Snapshot(true, false); AtEOXact_ApplyLauncher(true); + AtEOXact_LogicalRepWorkers(true); pgstat_report_xact_timestamp(0); CurrentResourceOwner = NULL; @@ -2860,6 +2862,7 @@ AbortTransaction(void) AtEOXact_HashTables(false); AtEOXact_PgStat(false, is_parallel_worker); AtEOXact_ApplyLauncher(false); + AtEOXact_LogicalRepWorkers(false); pgstat_report_xact_timestamp(0); } diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c index a506fc3ec8..d9f21b7dcf 100644 --- a/src/backend/catalog/pg_subscription.c +++ b/src/backend/catalog/pg_subscription.c @@ -25,6 +25,7 @@ #include "catalog/pg_type.h" #include "miscadmin.h" #include "nodes/makefuncs.h" +#include "replication/logicalworker.h" #include "storage/lmgr.h" #include "utils/array.h" #include "utils/builtins.h" @@ -320,6 +321,12 @@ UpdateSubscriptionRelState(Oid subid, Oid relid, char state, /* Update the catalog. */ CatalogTupleUpdate(rel, &tup->t_self, tup); + /* + * We might be ready to enable two_phase mode, which requires the workers + * to restart. Wake them up at commit time to check the status. + */ + LogicalRepWorkersWakeupAtCommit(subid); + /* Cleanup. */ table_close(rel, NoLock); } diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c index 10b6fe19a2..da14e91552 100644 --- a/src/backend/commands/alter.c +++ b/src/backend/commands/alter.c @@ -59,6 +59,7 @@ #include "commands/user.h" #include "miscadmin.h" #include "parser/parse_func.h" +#include "replication/logicalworker.h" #include "rewrite/rewriteDefine.h" #include "tcop/utility.h" #include "utils/builtins.h" @@ -410,6 +411,12 @@ ExecRenameStmt(RenameStmt *stmt) stmt->newname); table_close(catalog, RowExclusiveLock); +/* + * Wake up the logical replication workers to handle this + * change quickly. + */ +LogicalRepWorkersWakeupAtCommit(address.objectId); + return address; } diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index d673557ea4..e29cdcbff9 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -34,6 +34,7 @@ #include "nodes/makefuncs.h" #include "pgstat.h" #include "replication/logicallauncher.h" +#include "replication/logicalworker.h" #include "replication/origin.h" #include "replication/slot.h" #include "replication/walreceiver.h" @@ -1358,6 +1359,9 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt, table_close(rel, RowExclusiveLock); + /* Wake up the logical replication workers to handle this change quickly. */ + LogicalRepWorkersWakeupAtCommit(subid); + ObjectAddressSet(myself, SubscriptionRelationId, subid); InvokeObjectPostAlterHook(SubscriptionRelationId, subid, 0); diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index e48a3f589a..c39ca5a3cb 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -253,6 +253,8 @@ WalReceiverConn *LogRepWorkerWalRcvConn = NULL; Subscription *MySubscription = NULL; static bool MySubscriptionValid = false; +static List *on_commit_w
Re: Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish()
On 2022-11-25 16:54:19 +0530, Bharath Rupireddy wrote: > On Fri, Nov 25, 2022 at 12:16 AM Andres Freund wrote: > > I think we could improve this code more significantly by avoiding the call > > to > > LWLockWaitForVar() for all locks that aren't acquired or don't have a > > conflicting insertingAt, that'd require just a bit more work to handle > > systems > > without tear-free 64bit writes/reads. > > > > The easiest way would probably be to just make insertingAt a 64bit atomic, > > that transparently does the required work to make even non-atomic > > read/writes > > tear free. Then we could trivially avoid the spinlock in > > LWLockConflictsWithVar(), LWLockReleaseClearVar() and with just a bit more > > work add a fastpath to LWLockUpdateVar(). We don't need to acquire the wait > > list lock if there aren't any waiters. > > > > I'd bet that start to have visible effects in a workload with many small > > records. > > Thanks Andres! I quickly came up with the attached patch. I also ran > an insert test [1], below are the results. I also attached the results > graph. The cirrus-ci is happy with the patch - > https://github.com/BRupireddy/postgres/tree/wal_insertion_lock_improvements_v1_2. > Please let me know if the direction of the patch seems right. > clientsHEADPATCHED > 113541499 > 214511464 > 430693073 > 857125797 > 161133111157 > 322202022074 > 644174242213 > 1287130076638 > 256103652118944 > 512111250161582 > 76899544161987 > 102496743164161 > 204872711156686 > 409654158135713 Nice. > From 293e789f9c1a63748147acd613c556961f1dc5c4 Mon Sep 17 00:00:00 2001 > From: Bharath Rupireddy > Date: Fri, 25 Nov 2022 10:53:56 + > Subject: [PATCH v1] WAL Insertion Lock Improvements > > --- > src/backend/access/transam/xlog.c | 8 +++-- > src/backend/storage/lmgr/lwlock.c | 56 +-- > src/include/storage/lwlock.h | 7 ++-- > 3 files changed, 41 insertions(+), 30 deletions(-) > > diff --git a/src/backend/access/transam/xlog.c > b/src/backend/access/transam/xlog.c > index a31fbbff78..b3f758abb3 100644 > --- a/src/backend/access/transam/xlog.c > +++ b/src/backend/access/transam/xlog.c > @@ -376,7 +376,7 @@ typedef struct XLogwrtResult > typedef struct > { > LWLock lock; > - XLogRecPtr insertingAt; > + pg_atomic_uint64insertingAt; > XLogRecPtr lastImportantAt; > } WALInsertLock; > > @@ -1482,6 +1482,10 @@ WaitXLogInsertionsToFinish(XLogRecPtr upto) > { > XLogRecPtr insertingat = InvalidXLogRecPtr; > > + /* Quickly check and continue if no one holds the lock. */ > + if (!IsLWLockHeld(&WALInsertLocks[i].l.lock)) > + continue; I'm not sure this is quite right - don't we need a memory barrier. But I don't see a reason to not just leave this code as-is. I think this should be optimized entirely in lwlock.c I'd probably split the change to an atomic from other changes either way. Greetings, Andres Freund
Re: pg_upgrade: Make testing different transfer modes easier
At Thu, 1 Dec 2022 16:18:21 +0100, Peter Eisentraut wrote in > I wanted to test the different pg_upgrade transfer modes (--link, > --clone), but that was not that easy, because there is more than one > place in the test script you have to find and manually change. So I > wrote a little patch to make that easier. It's still manual, but it's > a start. (In principle, we could automatically run the tests with > each supported mode in a loop, but that would be very slow.) > > While doing that, I also found it strange that the default transfer > mode (referred to as "copy" internally) did not have any external > representation, so it is awkward to refer to it in text, and obscure > to see where it is used for example in those test scripts. So I added > an option --copy, which effectively does nothing, but it's not > uncommon to have options that select default behaviors explicitly. (I I don't have a clear idea of wheter it is common or not, but I suppose many such commands allow to choose the default behavior by a configuration file or an environment variable, etc. But I don't mind the command had the effetively nop option only for completeness. > also thought about something like a "mode" option with an argument, > but given that we already have --link and --clone, this seemed the > most sensible.) > > Thoughts? When I read up to the point of the --copy option, what came to my mind was the --mode= option. IMHO, if I was going to add an option to choose the copy behavior, I would add --mode option instead, like pg_ctl does, as it implicitly signals that the suboptions are mutually exclusive. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Using AF_UNIX sockets always for tests on Windows
Thomas Munro writes: > Commit f5580882 established that all supported computers have AF_UNIX. > One of the follow-up consequences that was left unfinished is that we > could simplify our test harness code to make it the same on all > platforms. Currently we have hundreds of lines of C and perl to use > secure TCP connections instead for the benefit of defunct Windows > versions. Here's a patch set for that. If we remove that, won't we have a whole lot of code that's not tested at all on any platform, ie all the TCP-socket code? regards, tom lane
Re: Using WaitEventSet in the postmaster
Hi, On 2022-12-02 10:12:25 +1300, Thomas Munro wrote: > Here's a work-in-progress patch that uses WaitEventSet for the main > event loop in the postmaster Wee! > with a latch as the wakeup mechanism for "PM signals" (requests from > backends to do things like start a background worker, etc). Hm - is that directly related? ISTM that using a WES in the main loop, and changing pmsignal.c to a latch are somewhat separate things? Using a latch for pmsignal.c seems like a larger lift, because it means that all of latch.c needs to be robust against a corrupted struct Latch. > In order to avoid adding a new dependency on the contents of shared > memory, I introduced SetLatchRobustly() that will always use the slow > path kernel wakeup primitive, even in cases where SetLatch() would > not. The idea here is that if one backend trashes shared memory, > others backends can still wake the postmaster even though it may > appear that the postmaster isn't waiting or the latch is already set. Why is that a concern that needs to be addressed? ISTM that the important thing is that either a) the postmaster's latch can't be corrupted, because it's not shared with backends or b) struct Latch can be overwritten with random contents without causing additional problems in postmaster. I don't think b) is the case as the patch stands. Imagine some process overwriting pm_latch->owner_pid. That'd then break the SetLatch() in postmaster's signal handler, because it wouldn't realize that itself needs to be woken up, and we'd just signal some random process. It doesn't seem trivial (but not impossible either) to make SetLatch() robust against arbitrary corruption. So it seems easier to me to just put the latch in process local memory, and do a SetLatch() in postmaster's SIGUSR1 handler. Greetings, Andres Freund
Re: Using AF_UNIX sockets always for tests on Windows
Hi, On 2022-12-01 20:30:36 -0500, Tom Lane wrote: > Thomas Munro writes: > > Commit f5580882 established that all supported computers have AF_UNIX. > > One of the follow-up consequences that was left unfinished is that we > > could simplify our test harness code to make it the same on all > > platforms. Currently we have hundreds of lines of C and perl to use > > secure TCP connections instead for the benefit of defunct Windows > > versions. Here's a patch set for that. > > If we remove that, won't we have a whole lot of code that's not > tested at all on any platform, ie all the TCP-socket code? There's some coverage via the auth and ssl tests. But I agree it's an issue. But to me the fix for that seems to be to add a dedicated test for that, rather than relying on windows to test our socket code - that's quite a few separate code paths from the tcp support of other platforms. Greetings, Andres Freund
Re: Prefetch the next tuple's memory during seqscans
On Thu, 1 Dec 2022 at 18:18, John Naylor wrote: > I then tested a Power8 machine (also kernel 3.10 gcc 4.8). Configure reports > "checking for __builtin_prefetch... yes", but I don't think it does anything > here, as the results are within noise level. A quick search didn't turn up > anything informative on this platform, and I'm not motivated to dig deeper. > In any case, it doesn't make things worse. Thanks for testing the power8 hardware. Andres just let me test on some Apple M1 hardware (those cores are insanely fast!) Using the table and running the script from [1], with trimmed-down output, I see: Master @ edf12e7bbd Testing a -> 158.037 ms Testing a2 -> 164.442 ms Testing a3 -> 171.523 ms Testing a4 -> 189.892 ms Testing a5 -> 217.197 ms Testing a6 -> 186.790 ms Testing a7 -> 189.491 ms Testing a8 -> 195.384 ms Testing a9 -> 200.547 ms Testing a10 -> 206.149 ms Testing a11 -> 211.708 ms Testing a12 -> 217.976 ms Testing a13 -> 224.565 ms Testing a14 -> 230.642 ms Testing a15 -> 237.372 ms Testing a16 -> 244.110 ms (checking for __builtin_prefetch... yes) Master + v2-0001 + v2-0005 Testing a -> 157.477 ms Testing a2 -> 163.720 ms Testing a3 -> 171.159 ms Testing a4 -> 186.837 ms Testing a5 -> 205.220 ms Testing a6 -> 184.585 ms Testing a7 -> 189.879 ms Testing a8 -> 195.650 ms Testing a9 -> 201.220 ms Testing a10 -> 207.162 ms Testing a11 -> 213.255 ms Testing a12 -> 219.313 ms Testing a13 -> 225.763 ms Testing a14 -> 237.337 ms Testing a15 -> 239.440 ms Testing a16 -> 245.740 ms It does not seem like there's any improvement on this architecture. There is a very small increase from "a" to "a6", but a very small decrease in performance from "a7" to "a16". It's likely within the expected noise level. David [1] https://postgr.es/m/caaphdvqwexy_6jgmb39vr3oqxz_w6stafkq52hodvwaw-19...@mail.gmail.com
Re: Using AF_UNIX sockets always for tests on Windows
Andres Freund writes: > On 2022-12-01 20:30:36 -0500, Tom Lane wrote: >> If we remove that, won't we have a whole lot of code that's not >> tested at all on any platform, ie all the TCP-socket code? > There's some coverage via the auth and ssl tests. But I agree it's an > issue. But to me the fix for that seems to be to add a dedicated test for > that, rather than relying on windows to test our socket code - that's quite a > few separate code paths from the tcp support of other platforms. IMO that's not the best way forward, because you'll always have nagging questions about whether a single-purpose test covers everything that needs coverage. I think the best place to be in would be to be able to run the whole test suite using either TCP or UNIX sockets, on any platform (with stuff like the SSL test overriding the choice as needed). I doubt ripping out the existing Windows-only support for TCP-based testing is a good step in that direction. regards, tom lane
Re: Add LZ4 compression in pg_dump
On Thu, Dec 01, 2022 at 02:58:35PM +, gkokola...@pm.me wrote: > Nuking the warning from orbit and changing the behaviour around disabling > the requested compression when the libraries are not present, should not > mean that we need to change the behaviour of default values for different > formats. Please find v13 attached which reinstates it. Gah, thanks! And this default behavior is documented as dependent on the compilation as well. > Which in itself it got me looking and wondering why the tests succeeded. > The only existing test covering that path is `defaults_dir_format` in > `002_pg_dump.pl`. However as the test is currently written it does not > check whether the output was compressed. The restore command would succeed > in either case. A simple `gzip -t -r` against the directory will not > suffice to test it, because there exist files which are never compressed > in this format (.toc). A little bit more involved test case would need > to be written, yet before I embark to this journey, I would like to know > if you would agree to reinstate the defaults for those formats. On top of my mind, I briefly recall that -r is not that portable. And the toc format makes the files generated non-deterministic as these use OIDs.. [.. thinks ..] We are going to need a new thing here, as compress_cmd cannot be directly used. What if we used only an array of glob()-able elements? Let's say "expected_contents" that could include a "dir_path/*.gz" conditional on $supports_gzip? glob() can only be calculated when the test is run as the file names cannot be known beforehand :/ >> As per the patch, it is true that we do not need to bump the format of >> the dump archives, as we can still store only the compression level >> and guess the method from it. I have added some notes about that in >> ReadHead and WriteHead to not forget. > > Agreed. A minor suggestion if you may. > > #ifndef HAVE_LIBZ > - if (AH->compression != 0) > + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > pg_log_warning("archive is compressed, but this installation > does not support compression -- no data will be available"); > #endif > > It would seem a more consistent to error out in this case. We do error > in all other cases where the compression is not available. Makes sense. I have gone through the patch again, and applied it. Thanks! -- Michael signature.asc Description: PGP signature
Re: Using AF_UNIX sockets always for tests on Windows
Hi, On 2022-12-01 20:56:18 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2022-12-01 20:30:36 -0500, Tom Lane wrote: > >> If we remove that, won't we have a whole lot of code that's not > >> tested at all on any platform, ie all the TCP-socket code? > > > There's some coverage via the auth and ssl tests. But I agree it's an > > issue. But to me the fix for that seems to be to add a dedicated test for > > that, rather than relying on windows to test our socket code - that's quite > > a > > few separate code paths from the tcp support of other platforms. > > IMO that's not the best way forward, because you'll always have > nagging questions about whether a single-purpose test covers > everything that needs coverage. Still seems better than not having any coverage in our development environments... > I think the best place to be in would be to be able to run the whole test > suite using either TCP or UNIX sockets, on any platform (with stuff like the > SSL test overriding the choice as needed). I agree that that's useful. But it seems somewhat independent from the majority of the proposed changes. To be able to test force-tcp-everywhere we don't need e.g. code for setting sspi auth in pg_regress etc - it's afaik just needed so there's a secure way of running tests at all on windows. I think 0003 should be "trimmed" to only change the default for $use_unix_sockets on windows and to remove PG_TEST_USE_UNIX_SOCKETS. Whoever wants to, can then add a new environment variable to force tap tests to use tcp. Greetings, Andres Freund
Re: Using WaitEventSet in the postmaster
On Fri, Dec 2, 2022 at 2:40 PM Andres Freund wrote: > On 2022-12-02 10:12:25 +1300, Thomas Munro wrote: > > with a latch as the wakeup mechanism for "PM signals" (requests from > > backends to do things like start a background worker, etc). > > Hm - is that directly related? ISTM that using a WES in the main loop, and > changing pmsignal.c to a latch are somewhat separate things? Yeah, that's a good question. This comes from a larger patch set where my *goal* was to use latches everywhere possible for interprocess wakeups, but it does indeed make a lot of sense to do the postmaster WaitEventSet retrofit completely independently of that, and leaving the associated robustness problems for later proposals (the posted patch clearly fails to solve them). > I don't think b) is the case as the patch stands. Imagine some process > overwriting pm_latch->owner_pid. That'd then break the SetLatch() in > postmaster's signal handler, because it wouldn't realize that itself needs to > be woken up, and we'd just signal some random process. Right. At some point I had an idea about a non-shared table of latches where OS-specific things like pids and HANDLEs live, so only the maybe_waiting and is_set flags are in shared memory, and even those are ignored when accessing the latch in 'robust' mode (they're only optimisations after all). I didn't try it though. First you might have to switch to a model with a finite set of latches identified by index, or something like that. But I like your idea of separating that whole problem. > It doesn't seem trivial (but not impossible either) to make SetLatch() robust > against arbitrary corruption. So it seems easier to me to just put the latch > in process local memory, and do a SetLatch() in postmaster's SIGUSR1 handler. Alright, good idea, I'll do a v2 like that.
Missing MaterialPath support in reparameterize_path_by_child
Whilst fooling with my outer-join-aware-Vars patch, I tripped across a multi-way join query that failed with ERROR: could not devise a query plan for the given query when enable_partitionwise_join is on. I traced that to the fact that reparameterize_path_by_child() omits support for MaterialPath, so that if the only surviving path(s) for a child join include materialization steps, we'll fail outright to produce a plan for the parent join. Unfortunately, I don't have an example that produces such a failure against HEAD. It seems certain to me that such cases exist, though, so I'd like to apply and back-patch the attached. I'm suspicious now that reparameterize_path() should be extended likewise, but I don't really have any hard evidence for that. regards, tom lane commit f74ca5d8611af3306eb96719d5d1910c9b6a8db5 Author: Tom Lane Date: Thu Dec 1 21:04:49 2022 -0500 Add missing MaterialPath case in reparameterize_path_by_child. Surprised we hadn't noticed this already. diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index c77399ca92..224ba84107 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -4209,6 +4209,16 @@ do { \ } break; + case T_MaterialPath: + { +MaterialPath *mpath; + +FLAT_COPY_PATH(mpath, path, MaterialPath); +REPARAMETERIZE_CHILD_PATH(mpath->subpath); +new_path = (Path *) mpath; + } + break; + case T_MemoizePath: { MemoizePath *mpath;
Re: Warning When Creating FOR EACH STATEMENT Trigger On Logical Replication Subscriber Side
On Thu, Dec 1, 2022 at 7:29 PM Avi Weinberg wrote: > > > There is no error or warning when creating FOR EACH STATEMENT trigger on > Logical Replication subscriber side, but it is not doing anything. Shouldn't > a warning be helpful? > > > > CREATE TRIGGER set_updated_time_trig > >AFTER INSERT OR UPDATE OR DELETE ON test > > FOR EACH STATEMENT EXECUTE FUNCTION set_updated_time(); > I think we need to consider a few things for this. First, how will we decide whether a particular node is a subscriber-side? One can create a subscription after creating the trigger. Also, it won't be straightforward to know whether the particular table is involved in the replication. Second, the statement triggers do get fired during initial sync, see docs [1] (The logical replication apply process currently only fires row triggers, not statement triggers. The initial table synchronization, however, is implemented like a COPY command and thus fires both row and statement triggers for INSERT.). So, considering these points, I don't know if it is worth showing a WARNING here. [1] - https://www.postgresql.org/docs/devel/logical-replication-architecture.html -- With Regards, Amit Kapila.
Re: Failed Assert while pgstat_unlink_relation
Hi, On 2022-11-29 15:42:45 +0900, Kyotaro Horiguchi wrote: > At Mon, 28 Nov 2022 14:01:30 +0530, vignesh C wrote in > > Hi, > > > > While reviewing/testing one of the patches I found the following Assert: > > #0 0x55c6312ba639 in pgstat_unlink_relation (rel=0x7fb73bcbac58) > > at pgstat_relation.c:161 > > #1 0x55c6312bbb5a in pgstat_relation_delete_pending_cb > > (entry_ref=0x55c6335563a8) at pgstat_relation.c:839 > > #2 0x55c6312b72bc in pgstat_delete_pending_entry > > (entry_ref=0x55c6335563a8) at pgstat.c:1124 > > #3 0x55c6312be3f1 in pgstat_release_entry_ref (key=..., > > entry_ref=0x55c6335563a8, discard_pending=true) at pgstat_shmem.c:523 > > #4 0x55c6312bee9a in pgstat_drop_entry > > (kind=PGSTAT_KIND_RELATION, dboid=5, objoid=40960) at > > pgstat_shmem.c:867 > > #5 0x55c6312c034a in AtEOXact_PgStat_DroppedStats > > (xact_state=0x55c6334baac8, isCommit=false) at pgstat_xact.c:97 > > #6 0x55c6312c0240 in AtEOXact_PgStat (isCommit=false, > > parallel=false) at pgstat_xact.c:55 > > #7 0x55c630df8bee in AbortTransaction () at xact.c:2861 > > #8 0x55c630df94fd in AbortCurrentTransaction () at xact.c:3352 > > > > I could reproduce this issue with the following steps: > > create table t1(c1 int); > > BEGIN; > > CREATE TABLE t (c1 int); > > CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD SELECT * FROM t1; > > CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD SELECT * FROM t1; > > Good catch! > > AtEOXact_PgStat_DroppedStats() visits a relation that has been dropped > then wiped (when CLOBBER_FREED_MEMORY) by AtEOXact_RelationCache() > called just before. Since the relcache content directly pointed from > stats module is lost in this case, the stats side cannot defend itself > from this. > > Perhaps RelationDestroyRelation() need to do pgstat_drop_entry() or > AtEOXact_PgStat_DroppedStats() should be called before > AtEOXact_RelationCache(). the latter of which is simpler. I think we > need to test this case, too. This doesn't strike me as the right fix. What do you think about my patch at https://postgr.es/m/20221128210908.hyffmljjylj447nu%40awork3.anarazel.de , leaving the quibbles around error handling aside? Afaict it fixes the issue. Greetings, Andres Freund
Re: Failed Assert in pgstat_assoc_relation
Hi, On 2022-11-28 16:33:20 -0500, Tom Lane wrote: > Andres Freund writes: > > Something like the attached. Still needs a bit of polish, e.g. adding the > > test > > case from above. > > > I'm a bit uncomfortable adding a function call below > > * Perform swapping of the relcache entry contents. Within this > > * process the old entry is momentarily invalid, so there > > *must* be no > > * possibility of CHECK_FOR_INTERRUPTS within this sequence. Do > > it in > > * all-in-line code for safety. > > Ugh. I don't know what pgstat_unlink_relation does, but assuming > that it can never throw an error seems like a pretty bad idea, I don't think it'd be an issue - it just resets the pointer from a pgstat entry to the relcache entry. But you're right: > Can't that part be done outside the critical section? we can do that. See the attached. Do we have any cases of relcache entries changing their relkind? Greetings, Andres Freund diff --git i/src/backend/utils/cache/relcache.c w/src/backend/utils/cache/relcache.c index bd6cd4e47b5..e2521b51ad4 100644 --- i/src/backend/utils/cache/relcache.c +++ w/src/backend/utils/cache/relcache.c @@ -2626,6 +2626,7 @@ RelationClearRelation(Relation relation, bool rebuild) bool keep_rules; bool keep_policies; bool keep_partkey; + bool keep_pgstats; /* Build temporary entry, but don't link it into hashtable */ newrel = RelationBuildDesc(save_relid, false); @@ -2667,6 +2668,21 @@ RelationClearRelation(Relation relation, bool rebuild) /* partkey is immutable once set up, so we can always keep it */ keep_partkey = (relation->rd_partkey != NULL); + /* + * Keep stats pointers, except when the relkind changes + * (e.g. converting tables into views). Different kinds of relations + * might have different types of stats. + * + * If we don't want to keep the stats, unlink the stats and relcache + * entry (and do so before entering the "critical section" + * below). This is important because otherwise + * PgStat_TableStatus->relation would get out of sync with + * relation->pgstat_info. + */ + keep_pgstats = relation->rd_rel->relkind == newrel->rd_rel->relkind; + if (!keep_pgstats) + pgstat_unlink_relation(relation); + /* * Perform swapping of the relcache entry contents. Within this * process the old entry is momentarily invalid, so there *must* be no @@ -2720,9 +2736,14 @@ RelationClearRelation(Relation relation, bool rebuild) SWAPFIELD(RowSecurityDesc *, rd_rsdesc); /* toast OID override must be preserved */ SWAPFIELD(Oid, rd_toastoid); + /* pgstat_info / enabled must be preserved */ - SWAPFIELD(struct PgStat_TableStatus *, pgstat_info); - SWAPFIELD(bool, pgstat_enabled); + if (keep_pgstats) + { + SWAPFIELD(struct PgStat_TableStatus *, pgstat_info); + SWAPFIELD(bool, pgstat_enabled); + } + /* preserve old partition key if we have one */ if (keep_partkey) { diff --git i/src/test/regress/expected/create_view.out w/src/test/regress/expected/create_view.out index 17ca29ddbf7..4c108c4c40b 100644 --- i/src/test/regress/expected/create_view.out +++ w/src/test/regress/expected/create_view.out @@ -2202,6 +2202,29 @@ select pg_get_viewdef('tt26v', true); FROM ( VALUES (1,2,3)) v(x, y, z); (1 row) +-- Test that changing the relkind of a relcache entry doesn't cause +-- trouble. Prior instances of where it did: +-- caldanm2yxz+zotv7y5zbd5wkt8o0ld3yxikuu3dcycvxf7g...@mail.gmail.com +-- caldanm3oza-8wbps2jd1g5_gjrr-x3ywrjpek-mf5asrrvz...@mail.gmail.com +CREATE TABLE tt26(c int); +BEGIN; +CREATE TABLE tt27(c int); +SAVEPOINT q; +CREATE RULE "_RETURN" AS ON SELECT TO tt27 DO INSTEAD SELECT * FROM tt26; +SELECT * FROM tt27; + c +--- +(0 rows) + +ROLLBACK TO q; +CREATE RULE "_RETURN" AS ON SELECT TO tt27 DO INSTEAD SELECT * FROM tt26; +ROLLBACK; +BEGIN; +CREATE TABLE tt28(c int); +CREATE RULE "_RETURN" AS ON SELECT TO tt28 DO INSTEAD SELECT * FROM tt26; +CREATE RULE "_RETURN" AS ON SELECT TO tt28 DO INSTEAD SELECT * FROM tt26; +ERROR: "tt28" is already a view +ROLLBACK; -- clean up all the random objects we made above DROP SCHEMA temp_view_test CASCADE; NOTICE: drop cascades to 27 other objects @@ -2233,7 +2256,7 @@ drop cascades to view aliased_view_2 drop cascades to view aliased_view_3 drop cascades to view aliased_view_4 DROP SCHEMA testviewschm2 CASCADE; -NOTICE: drop cascades to 77 other objects +NOTICE: drop cascades to 78 other objects DETAIL: drop cascades to table t1 drop cascades to view temporal1 drop cascades to view temporal2 @@ -2311,3 +2334,4 @@ drop cascades to view tt23v drop cascades to view tt24v drop cascades to view tt25v drop cascades to view tt26v +drop cascades to table tt26 diff --git i/src/test/regress/sql/create_view.sql w/src/test/regress/sql/create_view.sql index 8838a40f7ab..f305632c6aa 100644 --- i/src/test/regress/sql/create_view.sql +++ w/src/test/regress/sql/create_view.sql @@ -813,6 +813
Re: [PATCH] Add native windows on arm64 support
On Thu, Dec 01, 2022 at 05:53:47PM +, Niyas Sait wrote: > I've attached a new revision of the patch (v4) and includes following > changes, > > 1. Add support for meson build system > 2. Extend MSVC scripts to handle ARM64 platform. > 3. Add arm64 definition of spin_delay function. > 4. Exclude arm_acle.h import with MSVC compiler. > > V3->V4: Add support for meson build system > V2->V3: Removed ASLR enablement and rebased on master. > V1->V2: Rebased on top of master Thanks for the updated version. I have been looking at it closely and it looks like it should be able to do the job (no arm64 machine for Windows here, sigh). I have one tiny comment about this part: - USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK => undef, + USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK => 1, Shouldn't we only enable this flag when we are under aarch64? Similarly, I don't think that it is a good idea to switch on USE_SSE42_CRC32C_WITH_RUNTIME_CHECK all the time. We should only set it when building under x86 and x86_64. I would also add your link [1] in s_lock.h. [1]: https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics#BarrierRestrictions -- Michael signature.asc Description: PGP signature
Re: Failed Assert in pgstat_assoc_relation
Andres Freund writes: > Do we have any cases of relcache entries changing their relkind? Just the table-to-view hack. I'm not aware that there are any other cases, and it seems hard to credit that there ever will be any. I think we could get rid of table-to-view in HEAD, and use your patch only in v15. regards, tom lane
Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
On Mon, Nov 21, 2022 at 12:21:09PM +0300, Aleksander Alekseev wrote: > After merging 1489b1ce [1] the patchset needed a rebase. PFA v47. > > [1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1489b1ce The CF bot is showing some failures here. You may want to double-check. -- Michael signature.asc Description: PGP signature
Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths
On Fri, Nov 04, 2022 at 09:52:39AM +0900, Ian Lawrence Barwick wrote: > CommitFest 2022-11 is currently underway, so if you are interested > in moving this patch forward, now would be a good time to update it. No replies after 4 weeks, so I have marked this entry as returned with feedback. I am still wondering what would be the best thing to do here.. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
On Mon, Nov 07, 2022 at 05:04:14PM -0800, Jacob Champion wrote: > Done. sslrootcert=system now prevents you from explicitly setting a > weaker sslmode, to try to cement it as a Do What I Mean sort of > feature. If you need something weird then you can still jump through > the hoops by setting sslrootcert to a real file, same as today. > > The macOS/OpenSSL 3.0.0 failure is still unfixed. Err, could you look at that? I am switching the patch as waiting on author. -- Michael signature.asc Description: PGP signature
Re: Failed Assert in pgstat_assoc_relation
Hi, On 2022-12-02 00:08:20 -0500, Tom Lane wrote: > Andres Freund writes: > > Do we have any cases of relcache entries changing their relkind? > > Just the table-to-view hack. I'm not aware that there are any other > cases, and it seems hard to credit that there ever will be any. I can see some halfway credible scenarios. E.g. converting a view to a matview, or a table into a partition. I kind of wonder if it's worth keeping the change, just in case we do - it's not that easy to hit... > I think we could get rid of table-to-view in HEAD, and use your patch > only in v15. WFM. I'll push it to 15 tomorrow. Greetings, Andres Freund
Re: [PATCH] Add native windows on arm64 support
On Fri, Dec 2, 2022 at 12:20 AM Niyas Sait wrote: > > > On 07/11/2022 06:58, Michael Paquier wrote: > > Seems so. Hmm, where does _ARM64_BARRIER_SY come from? Perhaps it > > would be better to have a comment referring to it from a different > > place than the forums of arm, like some actual docs? > > > _ARM64_BARRIER_SY is defined in Microsoft Arm64 intrinsic documentation > - > https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170#BarrierRestrictions In particular, at the bottom of that section: "For the __isb intrinsic, the only restriction that is currently valid is _ARM64_BARRIER_SY; all other values are reserved by the architecture." This corresponds to https://developer.arm.com/documentation/ddi0596/2021-06/Base-Instructions/ISB--Instruction-Synchronization-Barrier- which says "SY Full system barrier operation, encoded as CRm = 0b. Can be omitted." > I couldn't find something more official for the sse2neon library part. Not quite sure what this is referring to, but it seems we can just point to the __aarch64__ section in the same file, which uses the same instruction: spin_delay(void) { __asm__ __volatile__( " isb; \n"); } ...and which already explains the choice with a comment. About v4: + * Use _mm_pause (x64) or __isb(arm64) intrinsic instead of rep nop. Need a space here after __isb. + if cc.get_id() == 'msvc' +cdata.set('USE_ARMV8_CRC32C', false) +cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) +have_optimized_crc = true + else That seems like a heavy-handed way to force it. Could we just use the same gating in the test program that the patch puts in the code of interest? Namely: +#ifndef _MSC_VER #include +#endif -- John Naylor EDB: http://www.enterprisedb.com
Re: Failed Assert in pgstat_assoc_relation
Andres Freund writes: > On 2022-12-02 00:08:20 -0500, Tom Lane wrote: >> Just the table-to-view hack. I'm not aware that there are any other >> cases, and it seems hard to credit that there ever will be any. > I can see some halfway credible scenarios. E.g. converting a view to a > matview, or a table into a partition. I kind of wonder if it's worth keeping > the change, just in case we do - it's not that easy to hit... I'd suggest putting in an assertion that the relkind isn't changing, instead. When and if somebody makes a credible feature patch that'd require relaxing that, we can see what to do. (There's a couple of places in rewriteHandler.c that could perhaps be simplified, too.) regards, tom lane
Re: Introduce a new view for checkpointer related stats
On Wed, Nov 30, 2022 at 5:15 PM Bharath Rupireddy wrote: > > I don't have a strong opinion about changing column names. However, if > we were to change it, I prefer to use names that > PgStat_CheckpointerStats has. BTW, that's what > PgStat_BgWriterStats/pg_stat_bgwriter and > PgStat_ArchiverStats/pg_stat_archiver uses. After thinking about this a while, I convinced myself to change the column names to be a bit more meaningful. I still think having checkpoints in the column names is needed because it also has other backend related columns. I'm attaching the v4 patch for further review. CREATE VIEW pg_stat_checkpointer AS SELECT pg_stat_get_timed_checkpoints() AS timed_checkpoints, pg_stat_get_requested_checkpoints() AS requested_checkpoints, pg_stat_get_checkpoint_write_time() AS checkpoint_write_time, pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time, pg_stat_get_buf_written_checkpoints() AS buffers_written_checkpoints, pg_stat_get_buf_written_backend() AS buffers_written_backend, pg_stat_get_buf_fsync_backend() AS buffers_fsync_backend, pg_stat_get_checkpointer_stat_reset_time() AS stats_reset; -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From 8280223b2b8fde888dd8540328336421bbf6138c Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Fri, 2 Dec 2022 04:58:46 + Subject: [PATCH v4] Introduce a new view for checkpointer related stats pg_stat_bgwriter view currently reports checkpointer stats as well. It is that way because historically checkpointer was part of bgwriter until the commits 806a2ae and bf405ba, that went into PG 9.2, separated them out. It is time for us to separate checkpointer stats to its own view called pg_stat_checkpointer. Bump catalog version. --- doc/src/sgml/monitoring.sgml | 110 +- src/backend/catalog/system_views.sql | 18 +-- .../utils/activity/pgstat_checkpointer.c | 1 + src/backend/utils/adt/pgstatfuncs.c | 19 +-- src/include/catalog/pg_proc.dat | 22 ++-- src/include/pgstat.h | 1 + src/test/recovery/t/029_stats_restart.pl | 6 +- src/test/regress/expected/rules.out | 17 +-- src/test/regress/expected/stats.out | 21 +++- src/test/regress/sql/stats.sql| 12 +- 10 files changed, 156 insertions(+), 71 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 11a8ebe5ec..1279b47d27 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -448,6 +448,15 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + pg_stat_checkpointerpg_stat_checkpointer + One row only, showing statistics about the + checkpointer process's activity. See + + pg_stat_checkpointer for details. + + + pg_stat_walpg_stat_wal One row only, showing statistics about WAL activity. See @@ -3633,7 +3642,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i The pg_stat_bgwriter view will always have a - single row, containing global data for the cluster. + single row, containing data about the bgwriter of the cluster. @@ -3653,97 +3662,138 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i - checkpoints_timed bigint + buffers_clean bigint - Number of scheduled checkpoints that have been performed + Number of buffers written by the background writer - checkpoints_req bigint + maxwritten_clean bigint - Number of requested checkpoints that have been performed + Number of times the background writer stopped a cleaning + scan because it had written too many buffers - checkpoint_write_time double precision + buffers_alloc bigint - Total amount of time that has been spent in the portion of - checkpoint processing where files are written to disk, in milliseconds + Number of buffers allocated - checkpoint_sync_time double precision + stats_reset timestamp with time zone - Total amount of time that has been spent in the portion of - checkpoint processing where files are synchronized to disk, in - milliseconds + Time at which these statistics were last reset + + + + + + + + pg_stat_checkpointer + + + pg_stat_checkpointer + + + + The pg_stat_checkpointer view will always have a + single row, containing data about the checkpointer process of the cluster. + + + pg_stat_checkpointer View + + - bu
Re: Failed Assert in pgstat_assoc_relation
Hi, On December 1, 2022 9:48:48 PM PST, Tom Lane wrote: >Andres Freund writes: >> On 2022-12-02 00:08:20 -0500, Tom Lane wrote: >>> Just the table-to-view hack. I'm not aware that there are any other >>> cases, and it seems hard to credit that there ever will be any. > >> I can see some halfway credible scenarios. E.g. converting a view to a >> matview, or a table into a partition. I kind of wonder if it's worth keeping >> the change, just in case we do - it's not that easy to hit... > >I'd suggest putting in an assertion that the relkind isn't changing, >instead. Sounds like a plan. Will you do that when you remove the table-to-view hack? Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Failed Assert in pgstat_assoc_relation
Andres Freund writes: > On December 1, 2022 9:48:48 PM PST, Tom Lane wrote: >> I'd suggest putting in an assertion that the relkind isn't changing, >> instead. > Sounds like a plan. Will you do that when you remove the table-to-view hack? I'd suggest committing it concurrently with the v15 fix, instead, so that there's a cross-reference to what some future hacker might need to install if they remove the assertion. I guess that means that the table-to-view removal has to go in first. I should be able to take care of that tomorrow, or if you're in a hurry I don't mind if you commit it for me. regards, tom lane
Re: O(n) tasks cause lengthy startups and checkpoints
On Fri, Dec 2, 2022 at 3:10 AM Nathan Bossart wrote: > > >> 4. Is it a good idea to add log messages in the DoCustodianTasks() > >> loop? Maybe at a debug level? The log message can say the current task > >> the custodian is processing. And/Or setting the custodian's status on > >> the ps display is also a good idea IMO. > > I'd like to pick these up in a new thread if/when this initial patch set is > committed. The tasks already do some logging, and the checkpointer process > doesn't update the ps display for these tasks today. It'll be good to have some kind of dedicated monitoring for the custodian process as it can do a "good" amount of work at times and users will have a way to know what it currently is doing - it can be logs at debug level, progress reporting via ereport_startup_progress()-sort of mechanism, ps display, pg_stat_custodian or a special function that tells some details, or some other. In any case, I agree to park this for later. > >> 0002 and 0003: > >> 1. > >> +CHECKPOINT; > >> +DO $$ > >> I think we need to ensure that there are some snapshot files before > >> the checkpoint. Otherwise, it may happen that the above test case > >> exits without the custodian process doing anything. > >> > >> 2. I think the best way to test the custodian process code is by > >> adding a TAP test module to see actually the custodian process kicks > >> in. Perhaps, add elog(DEBUGX,...) messages to various custodian > >> process functions and see if we see the logs in server logs. > > The test appears to reliably create snapshot and mapping files, so if the > directories are empty at some point after the checkpoint at the end, we can > be reasonably certain the custodian took action. I didn't add explicit > checks that there are files in the directories before the checkpoint > because a concurrent checkpoint could make such checks unreliable. I think you're right. I added sqls to see if the snapshot and mapping files count > 0, see [1] and the cirrus-ci members are happy too - https://github.com/BRupireddy/postgres/tree/custodian_review_2. I think we can consider adding these count > 0 checks to tests. > >> 0004: > >> I think the 0004 patch can be merged into 0001, 0002 and 0003 patches. > >> Otherwise the patch LGTM. > > I'm keeping this one separate because I've received conflicting feedback > about the idea. If we classify custodian as a process doing non-critical tasks that have nothing to do with regular server functioning, then processing ShutdownRequestPending looks okay. However, delaying these non-critical tasks such as file removals which reclaims disk space might impact the server overall especially when it's reaching 100% disk usage and we want the custodian to do its job fully before we shutdown the server. If we delay processing shutdown requests, that can impact the server overall (might delay restarts, failovers etc.), because at times there can be a lot of tasks with a good amount of work pending in the custodian's task queue. Having said above, I'm okay to process ShutdownRequestPending as early as possible, however, should we also add CHECK_FOR_INTERRUPTS() alongside ShutdownRequestPending? Also, I think it's enough to just have ShutdownRequestPending check in DoCustodianTasks(void)'s main loop and we can let RemoveOldSerializedSnapshots() and RemoveOldLogicalRewriteMappings() do their jobs to the fullest as they do today. While thinking about this, one thing that really struck me is what happens if we let the custodian exit, say after processing ShutdownRequestPending immediately or after a restart, leaving other queued tasks? The custodian will never get to work on those tasks unless the requestors (say checkpoint or some other process) requests it to do so after restart. Maybe, we don't need to worry about it. Maybe we need to worry about it. Maybe it's an overkill to save the custodian's task state to disk so that it can come up and do the leftover tasks upon restart. > > Another comment: > > IIUC, there's no custodian_delay GUC as we want to avoid unnecessary > > wakeups for power savings (being discussed in the other thread). > > However, can it happen that the custodian missed to capture SetLatch > > wakeups by other backends? In other words, can the custodian process > > be sleeping when there's work to do? > > I'm not aware of any way this could happen, but if there is one, I think we > should treat it as a bug instead of relying on the custodian process to > periodically wake up and check for work to do. One possible scenario is that the requestor adds its task details to the queue and sets the latch, the custodian can miss this SetLatch() when it's in the midst of processing a task. However, it guarantees the requester that it'll process the added task after it completes the current task. And, I don't know the other reasons when the custodian can miss SetLatch(). [1] diff --git a/contrib/test_decoding/expected/rewrite.out b/contrib/test_decoding/expected/rewrite
Re: initdb: Refactor PG_CMD_PUTS loops
On Thu, Dec 1, 2022 at 5:02 PM Peter Eisentraut < peter.eisentr...@enterprisedb.com> wrote: > > Keeping the SQL commands that initdb runs in string arrays before > feeding them to PG_CMD_PUTS() seems unnecessarily verbose and > inflexible. In some cases, the array only has one member. In other > cases, one might want to use PG_CMD_PRINTF() instead, to parametrize a > command, but that would require breaking up the loop or using > workarounds like replace_token(). This patch unwinds all that; it's > much simpler that way. +1, I can't think of a reason to keep the current coding -- John Naylor EDB: http://www.enterprisedb.com
Re: Time delayed LR (WAS Re: logical replication restrictions)
On Tue, Nov 15, 2022 at 12:33 PM Amit Kapila wrote: > > On Mon, Nov 14, 2022 at 6:52 PM Hayato Kuroda (Fujitsu) > wrote: > > > > Dear Amit, > > > > > > > It seems to me Kuroda-San has proposed this change [1] to fix the test > > > > > but it is not clear to me why such a change is required. Why can't > > > > > CHECK_FOR_INTERRUPTS() after waiting, followed by the existing below > > > > > code [2] in LogicalRepApplyLoop() sufficient to handle parameter > > > > > updates? > > > > (I forgot to say, this change was not proposed by me. I said that there > > should be > > modified. I thought workers should wake up after the transaction was > > committed.) > > > > > So, why only honor the 'disable' option of the subscription? For > > > example, one can change 'min_apply_delay' and it seems > > > recoveryApplyDelay() honors a similar change in the recovery > > > parameter. Is there a way to set the latch of the worker process, so > > > that it can recheck if anything is changed? > > > > I have not considered about it, but seems reasonable. We may be able to > > do maybe_reread_subscription() if subscription parameters are changed > > and latch is set. > > > > One more thing I would like you to consider is the point raised by me > related to this patch's interaction with the parallel apply feature as > mentioned in the email [1]. I am not sure the idea proposed in that > email [1] is a good one because delaying after applying commit may not > be good as we want to delay the apply of the transaction(s) on > subscribers by this feature. I feel this needs more thought. > I have thought a bit more about this and we have the following options to choose the delay point from. (a) apply delay just before committing a transaction. As mentioned in comments in the patch this can lead to bloat and locks held for a long time. (b) apply delay before starting to apply changes for a transaction but here the problem is which time to consider. In some cases, like for streaming transactions, we don't receive the commit/prepare xact time in the start message. (c) use (b) but use the previous transaction's commit time. (d) apply delay after committing a transaction by using the xact's commit time. At this stage, among above, I feel any one of (c) or (d) is worth considering. Now, the difference between (c) and (d) is that if after commit the next xact's data is already delayed by more than min_apply_delay time then we don't need to kick the new logic of apply delay. The other thing to consider whether we need to process any keepalive messages during the delay because otherwise, walsender may think that the subscriber is not available and time out. This may not be a problem for synchronous replication but otherwise, it could be a problem. Thoughts? -- With Regards, Amit Kapila.
Re: pg_dump: Remove "blob" terminology
On 30.11.22 09:07, Daniel Gustafsson wrote: The commit message contains a typo: functinos fixed * called for both BLOB and TABLE data; it is the responsibility of - * the format to manage each kind of data using StartBlob/StartData. + * the format to manage each kind of data using StartLO/StartData. Should BLOB be changed to BLOBS here (and in similar comments) to make it clearer that it refers to the archive entry and the concept of a binary large object in general? I changed this one and went through it again to tidy it up a bit more. (There are both "BLOB" and "BLOBS" archive entries, so both forms still exist in the code now.) Theres an additional mention in src/test/modules/test_pg_dump/t/001_base.pl: # Tests which are considered 'full' dumps by pg_dump, but there. # are flags used to exclude specific items (ACLs, blobs, etc). fixed I also put back the old long options forms in the documentation and help but marked them deprecated. From e32d43952a4f2a054f25883c1136177cd1ad1fc3 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 2 Dec 2022 07:55:52 +0100 Subject: [PATCH v2] pg_dump: Remove "blob" terminology For historical reasons, pg_dump refers to large objects as "BLOBs". This term is not used anywhere else in PostgreSQL, and it also means something different in the SQL standard and other SQL systems. This patch renames internal functions, code comments, documentation, etc. to use the "large object" or "LO" terminology instead. There is no functionality change, so the archive format still uses the name "BLOB" for the archive entry. Additional long command-line options are added with the new naming. Discussion: https://www.postgresql.org/message-id/flat/868a381f-4650-9460-1726-1ffd39a270b4%40enterprisedb.com --- doc/src/sgml/ref/pg_dump.sgml | 16 +- src/bin/pg_dump/pg_backup.h | 12 +- src/bin/pg_dump/pg_backup_archiver.c| 78 +++ src/bin/pg_dump/pg_backup_archiver.h| 34 +-- src/bin/pg_dump/pg_backup_custom.c | 66 +++--- src/bin/pg_dump/pg_backup_db.c | 4 +- src/bin/pg_dump/pg_backup_directory.c | 112 +- src/bin/pg_dump/pg_backup_null.c| 50 ++--- src/bin/pg_dump/pg_backup_tar.c | 68 +++--- src/bin/pg_dump/pg_dump.c | 216 ++-- src/bin/pg_dump/pg_dump.h | 8 +- src/bin/pg_dump/pg_dump_sort.c | 16 +- src/bin/pg_dump/t/002_pg_dump.pl| 46 ++--- src/test/modules/test_pg_dump/t/001_base.pl | 2 +- 14 files changed, 367 insertions(+), 361 deletions(-) diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index 363d1327e2..2c938cd7e1 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -132,7 +132,8 @@ Options -b - --blobs + --large-objects + --blobs (deprecated) Include large objects in the dump. This is the default behavior @@ -140,7 +141,7 @@ Options --schema-only is specified. The -b switch is therefore only useful to add large objects to dumps where a specific schema or table has been requested. Note that -blobs are considered data and therefore will be included when +large objects are considered data and therefore will be included when --data-only is used, but not when --schema-only is. @@ -149,7 +150,8 @@ Options -B - --no-blobs + --no-large-objects + --no-blobs (deprecated) Exclude large objects in the dump. @@ -323,7 +325,7 @@ Options Output a directory-format archive suitable for input into pg_restore. This will create a directory - with one file for each table and blob being dumped, plus a + with one file for each table and large object being dumped, plus a so-called Table of Contents file describing the dumped objects in a machine-readable format that pg_restore can read. A directory format archive can be manipulated with @@ -434,9 +436,9 @@ Options - Non-schema objects such as blobs are not dumped when -n is - specified. You can add blobs back to the dump with the - --blobs switch. + Non-schema objects such as large objects are not dumped when -n is + specified. You can add large objects back to the dump with the + --large-objects switch. diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h index bc6b6594af..aba780ef4b 100644 --- a/src/bin/pg_dump/pg_backup.h +++ b/src/bin/pg_dump/pg_backup.h @@ -53,9 +53,9 @@ typedef enum _archiveMode typedef enum _teSection { - SECTION_NONE = 1, /* COMMENTs, ACLs, etc; can be anywhere */ + SECTION_NONE = 1, /* com
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures
On Thu, Nov 17, 2022 at 11:53:23AM +0530, Bharath Rupireddy wrote: > Please do, if you feel so. Thanks for your review. I don't really mind the addition of the LSN when operating on a given record where we are reading a location, like in the five error paths for the header validation or the three ones in ReadRecord() Now this one looks confusing: + XLogSegNoOffsetToRecPtr(openLogSegNo, startoffset, + wal_segment_size, lsn); ereport(PANIC, (errcode_for_file_access(), errmsg("could not write to log file %s " - "at offset %u, length %zu: %m", - xlogfname, startoffset, nleft))); + "at offset %u, LSN %X/%X, length %zu: %m", + xlogfname, startoffset, + LSN_FORMAT_ARGS(lsn), nleft))); This does not always refer to an exact LSN of a record as we may be in the middle of a write, so I would leave it as-is. Similarly the addition of wre_lsn would be confusing? The offset looks kind of enough to me when referring to the middle of a page in WALReadRaiseError(). -- Michael signature.asc Description: PGP signature
Re: Introduce a new view for checkpointer related stats
On Thu, Dec 1, 2022 at 9:50 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Wed, Nov 30, 2022 at 5:15 PM Bharath Rupireddy > wrote: > > > > I don't have a strong opinion about changing column names. However, if > > we were to change it, I prefer to use names that > > PgStat_CheckpointerStats has. BTW, that's what > > PgStat_BgWriterStats/pg_stat_bgwriter and > > PgStat_ArchiverStats/pg_stat_archiver uses. > > After thinking about this a while, I convinced myself to change the > column names to be a bit more meaningful. I still think having > checkpoints in the column names is needed because it also has other > backend related columns. I'm attaching the v4 patch for further > review. > CREATE VIEW pg_stat_checkpointer AS > SELECT > pg_stat_get_timed_checkpoints() AS timed_checkpoints, > pg_stat_get_requested_checkpoints() AS requested_checkpoints, > pg_stat_get_checkpoint_write_time() AS checkpoint_write_time, > pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time, > pg_stat_get_buf_written_checkpoints() AS > buffers_written_checkpoints, > pg_stat_get_buf_written_backend() AS buffers_written_backend, > pg_stat_get_buf_fsync_backend() AS buffers_fsync_backend, > pg_stat_get_checkpointer_stat_reset_time() AS stats_reset; > IMO, “buffers_written_checkpoints” is confusing. What do you think? > -- > Bharath Rupireddy > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com >
Re: Missing MaterialPath support in reparameterize_path_by_child
On Fri, Dec 2, 2022 at 10:55 AM Tom Lane wrote: > I traced that to the fact that reparameterize_path_by_child() > omits support for MaterialPath, so that if the only surviving > path(s) for a child join include materialization steps, we'll > fail outright to produce a plan for the parent join. Yeah, that's true. It's weird we neglect MaterialPath here. > Unfortunately, I don't have an example that produces such a > failure against HEAD. It seems certain to me that such cases > exist, though, so I'd like to apply and back-patch the attached. I tried on HEAD and got one, which leverages sampled rel to generate the MaterialPath and lateral reference to make it the only available path. SET enable_partitionwise_join to true; CREATE TABLE prt (a int, b int) PARTITION BY RANGE(a); CREATE TABLE prt_p1 PARTITION OF prt FOR VALUES FROM (0) TO (10); CREATE EXTENSION tsm_system_time; explain (costs off) select * from prt t1 left join lateral (select t1.a as t1a, t2.a as t2a from prt t2 TABLESAMPLE system_time (10)) ss on ss.t1a = ss.t2a; ERROR: could not devise a query plan for the given query Thanks Richard
Re: Introduce a new view for checkpointer related stats
On Fri, Dec 2, 2022 at 12:54 PM sirisha chamarthi wrote: > > On Thu, Dec 1, 2022 at 9:50 PM Bharath Rupireddy > wrote: >> >> On Wed, Nov 30, 2022 at 5:15 PM Bharath Rupireddy >> wrote: >> > >> > I don't have a strong opinion about changing column names. However, if >> > we were to change it, I prefer to use names that >> > PgStat_CheckpointerStats has. BTW, that's what >> > PgStat_BgWriterStats/pg_stat_bgwriter and >> > PgStat_ArchiverStats/pg_stat_archiver uses. >> >> After thinking about this a while, I convinced myself to change the >> column names to be a bit more meaningful. I still think having >> checkpoints in the column names is needed because it also has other >> backend related columns. I'm attaching the v4 patch for further >> review. >> CREATE VIEW pg_stat_checkpointer AS >> SELECT >> pg_stat_get_timed_checkpoints() AS timed_checkpoints, >> pg_stat_get_requested_checkpoints() AS requested_checkpoints, >> pg_stat_get_checkpoint_write_time() AS checkpoint_write_time, >> pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time, >> pg_stat_get_buf_written_checkpoints() AS buffers_written_checkpoints, >> pg_stat_get_buf_written_backend() AS buffers_written_backend, >> pg_stat_get_buf_fsync_backend() AS buffers_fsync_backend, >> pg_stat_get_checkpointer_stat_reset_time() AS stats_reset; > > > IMO, “buffers_written_checkpoints” is confusing. What do you think? Thanks. We can be "more and more" meaningful by naming buffers_written_by_checkpoints, buffers_written_by_backend, buffers_fsync_by_backend. However, I don't think that's a good idea here as names get too long. Having said that, I'll leave it to the committer's discretion. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
refactor ExecGrant_*() functions
Continuing the ideas in [0], this patch refactors the ExecGrant_Foo() functions and replaces many of them by a common function that is driven by the ObjectProperty table. It would be nice to do more here, for example ExecGrant_Language(), which has additional non-generalizable checks, but I think this is already a good start. For example, the work being discussed on privileges on publications [1] would be able to take good advantage of this. [0]: https://www.postgresql.org/message-id/95c30f96-4060-2f48-98b5-a4392d3b6...@enterprisedb.com [1]: https://www.postgresql.org/message-id/flat/20330.1652105397@antosFrom 200879e5edfc1ce93b7af3cbfafc1f618626cbe9 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 2 Dec 2022 08:16:53 +0100 Subject: [PATCH] Refactor ExecGrant_*() functions Instead of half a dozen of mostly-duplicate ExecGrant_Foo() functions, write one common function ExecGrant_generic() that can handle most of them. We already have all the information we need, such as which system catalog corresponds to which catalog table and which column is the ACL column. --- src/backend/catalog/aclchk.c | 662 ++- 1 file changed, 34 insertions(+), 628 deletions(-) diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c index 8d84a7b056..612ef1a666 100644 --- a/src/backend/catalog/aclchk.c +++ b/src/backend/catalog/aclchk.c @@ -106,14 +106,9 @@ bool binary_upgrade_record_init_privs = false; static void ExecGrantStmt_oids(InternalGrant *istmt); static void ExecGrant_Relation(InternalGrant *istmt); -static void ExecGrant_Database(InternalGrant *istmt); -static void ExecGrant_Fdw(InternalGrant *istmt); -static void ExecGrant_ForeignServer(InternalGrant *istmt); -static void ExecGrant_Function(InternalGrant *istmt); +static void ExecGrant_generic(InternalGrant *istmt, Oid classid, AclMode default_privs); static void ExecGrant_Language(InternalGrant *istmt); static void ExecGrant_Largeobject(InternalGrant *istmt); -static void ExecGrant_Namespace(InternalGrant *istmt); -static void ExecGrant_Tablespace(InternalGrant *istmt); static void ExecGrant_Type(InternalGrant *istmt); static void ExecGrant_Parameter(InternalGrant *istmt); @@ -602,22 +597,22 @@ ExecGrantStmt_oids(InternalGrant *istmt) ExecGrant_Relation(istmt); break; case OBJECT_DATABASE: - ExecGrant_Database(istmt); + ExecGrant_generic(istmt, DatabaseRelationId, ACL_ALL_RIGHTS_DATABASE); break; case OBJECT_DOMAIN: case OBJECT_TYPE: ExecGrant_Type(istmt); break; case OBJECT_FDW: - ExecGrant_Fdw(istmt); + ExecGrant_generic(istmt, ForeignDataWrapperRelationId, ACL_ALL_RIGHTS_FDW); break; case OBJECT_FOREIGN_SERVER: - ExecGrant_ForeignServer(istmt); + ExecGrant_generic(istmt, ForeignServerRelationId, ACL_ALL_RIGHTS_FOREIGN_SERVER); break; case OBJECT_FUNCTION: case OBJECT_PROCEDURE: case OBJECT_ROUTINE: - ExecGrant_Function(istmt); + ExecGrant_generic(istmt, ProcedureRelationId, ACL_ALL_RIGHTS_FUNCTION); break; case OBJECT_LANGUAGE: ExecGrant_Language(istmt); @@ -626,10 +621,10 @@ ExecGrantStmt_oids(InternalGrant *istmt) ExecGrant_Largeobject(istmt); break; case OBJECT_SCHEMA: - ExecGrant_Namespace(istmt); + ExecGrant_generic(istmt, NamespaceRelationId, ACL_ALL_RIGHTS_SCHEMA); break; case OBJECT_TABLESPACE: - ExecGrant_Tablespace(istmt); + ExecGrant_generic(istmt, TableSpaceRelationId, ACL_ALL_RIGHTS_TABLESPACE); break; case OBJECT_PARAMETER_ACL: ExecGrant_Parameter(istmt); @@ -2132,380 +2127,22 @@ ExecGrant_Relation(InternalGrant *istmt) } static void -ExecGrant_Database(InternalGrant *istmt) -{ - Relationrelation; - ListCell *cell; - - if (istmt->all_privs && istmt->privileges == ACL_NO_RIGHTS) - istmt->privileges = ACL_ALL_RIGHTS_DATABASE; - - relation = table_open(DatabaseRelationId, RowExclusiveLock); - - foreach(cell, istmt->objects) - { - Oid datId = lfirst_oid(cell); - Form_pg_database pg_database_tuple; - Datum aclDatum; - boolisNull; - AclMode avail_goptions; - AclMode this_p