Re: Add pg_get_acl() function get the ACL for a database object
On Wed, Jun 19, 2024, at 16:23, Isaac Morland wrote: > I have no idea how often this would be useful, but I wonder if it could > work to have overloaded single-parameter versions for each of > regprocedure (pg_proc.proacl), regclass (pg_class.relacl), …. To call, > just cast the OID to the appropriate reg* type. > > For example: To get the ACL for table 'example_table', call pg_get_acl > ('example_table'::regclass) +1 New patch attached. I've added overloaded versions for regclass and regproc so far: \df pg_get_acl List of functions Schema |Name| Result data type | Argument data types | Type ++--++-- pg_catalog | pg_get_acl | aclitem[]| classid oid, objid oid | func pg_catalog | pg_get_acl | aclitem[]| objid regclass | func pg_catalog | pg_get_acl | aclitem[]| objid regproc | func (3 rows) /Joel v3-0001-Add-pg_get_acl.patch Description: Binary data
pg_combinebackup --clone doesn't work
The pg_combinebackup --clone option currently doesn't work at all. Even on systems where it should it be supported, you'll always get a "file cloning not supported on this platform" error. The reason is this checking code in pg_combinebackup.c: #if (defined(HAVE_COPYFILE) && defined(COPYFILE_CLONE_FORCE)) || \ (defined(__linux__) && defined(FICLONE)) if (opt.dry_run) pg_log_debug("would use cloning to copy files"); else pg_log_debug("will use cloning to copy files"); #else pg_fatal("file cloning not supported on this platform"); #endif The problem is that this file does not include the appropriate OS header files that would define COPYFILE_CLONE_FORCE or FICLONE, respectively. The same problem also exists in copy_file.c. (That one has the right header file for macOS but still not for Linux.) This should be pretty easy to fix up, and we should think about some ways to refactor this to avoid repeating all these OS-specific things a few times. (The code was copied from pg_upgrade originally.) But in the short term, how about some test coverage? You can exercise the different pg_combinebackup copy modes like this: diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index 83f385a4870..7e8dd024c82 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -848,7 +848,7 @@ sub init_from_backup } local %ENV = $self->_get_env(); - my @combineargs = ('pg_combinebackup', '-d'); + my @combineargs = ('pg_combinebackup', '-d', '--clone'); if (exists $params{tablespace_map}) { while (my ($olddir, $newdir) = each %{ $params{tablespace_map} }) We could do something like what we have for pg_upgrade, where we can use the environment variable PG_TEST_PG_UPGRADE_MODE to test the different copy modes. We could turn this into a global option. (This might also be useful for future work to use file cloning elsewhere, like in CREATE DATABASE?) Also, I think it would be useful for consistency if pg_combinebackup had a --copy option to select the default mode, like pg_upgrade does.
Re: use CREATE DATABASE STRATEGY = FILE_COPY in pg_upgrade
On Wed, Jun 19, 2024 at 08:37:17AM -0500, Nathan Bossart wrote: > My understanding is that you basically have to restart the upgrade from > scratch if that happens. I suppose there could be a problem if you try to > use the half-upgraded cluster after a crash, but I imagine you have a good > chance of encountering other problems if you do that, too. So I don't > think we care... It's never been assumed that it would be safe to redo a pg_upgradeafter a crash on a cluster initdb'd for the upgrade, so I don't think we need to care about that, as well. One failure I suspect would quickly be faced is OIDs getting reused again as these are currently kept consistent. -- Michael signature.asc Description: PGP signature
Re: Failures in constraints regression test, "read only 0 of 8192 bytes"
On Wed, Jun 19, 2024 at 11:00:00PM +0300, Alexander Lakhin wrote: > Starting from af0e7deb4, the following script: > [...] > triggers a segfault: > 2024-06-19 19:22:49.009 UTC [1607210:6] LOG: server process (PID > 1607671) was terminated by signal 11: Segmentation fault Open item added for this issue. -- Michael signature.asc Description: PGP signature
Re: suspicious valgrind reports about radixtree/tidstore on arm64
On Thu, Jun 20, 2024 at 10:11:37AM +0900, Masahiko Sawada wrote: > I agree with radixtree-fix-proposed.patch. Even if we zero more fields > in the node it would not add noticeable overheads. This needs to be tracked as an open item, so I have added one now. -- Michael signature.asc Description: PGP signature
Re: State of pg_createsubscriber
On Thu, Jun 20, 2024 at 2:35 AM Euler Taveira wrote: > > I will open new threads soon if you don't. > Okay, thanks. I'll wait for you to start new threads and then we can discuss the respective problems in those threads. -- With Regards, Amit Kapila.
Re: Pgoutput not capturing the generated columns
Hi, here are my review comments for v8-0001. == Commit message. 1. It seems like the patch name was accidentally omitted, so it became a mess when it defaulted to the 1st paragraph of the commit message. == contrib/test_decoding/test_decoding.c 2. + data->include_generated_columns = true; I previously posted a comment [1, #4] that this should default to false; IMO it is unintuitive for the test_decoding to have an *opposite* default behaviour compared to CREATE SUBSCRIPTION. == doc/src/sgml/ref/create_subscription.sgml NITPICK - remove the inconsistent blank line in SGML == src/backend/commands/subscriptioncmds.c 3. +#define SUBOPT_include_generated_columns 0x0001 I previously posted a comment [1, #6] that this should be UPPERCASE, but it is not yet fixed. == src/bin/psql/describe.c NITPICK - move and reword the bogus comment ~ 4. + if (pset.sversion >= 17) + appendPQExpBuffer(&buf, + ", subincludegencols AS \"%s\"\n", + gettext_noop("include_generated_columns")); 4a. For consistency with every other parameter, that column title should be written in words "Include generated columns" (not "include_generated_columns"). ~ 4b. IMO this new column belongs with the other subscription parameter columns (e.g. put it ahead of the "Conninfo" column). == src/test/subscription/t/011_generated.pl NITPICK - fixed a comment 5. IMO, it would be better for readability if all the matching CREATE TABLE for publisher and subscriber are kept together, instead of the current code which is creating all publisher tables and then creating all subscriber tables. ~~~ 6. +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2"); +is( $result, qq(4|8 +5|10), 'confirm generated columns ARE replicated when the subscriber-side column is not generated'); + ... +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3"); +is( $result, qq(4|24 +5|25), 'confirm generated columns are NOT replicated when the subscriber-side column is also generated'); + 6a. These SELECT all need ORDER BY to protect against the SELECT * returning rows in some unexpected order. ~ 6b. IMO there should be more comments here to explain how you can tell the column was NOT replicated. E.g. it is because the result value of 'b' is the subscriber-side computed value (which you made deliberately different to the publisher-side computed value). == 99. Please also refer to the attached nitpicks top-up patch for minor cosmetic stuff. == [1] https://www.postgresql.org/message-id/CAHv8RjLeZtTeXpFdoY6xCPO41HtuOPMSSZgshVdb%2BV%3Dp2YHL8Q%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml index e8779dc..ee27a58 100644 --- a/doc/src/sgml/ref/create_subscription.sgml +++ b/doc/src/sgml/ref/create_subscription.sgml @@ -437,7 +437,6 @@ CREATE SUBSCRIPTION subscription_namefalse. - If the subscriber-side column is also a generated column then this option has no effect; the subscriber column will be filled as normal with the diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index 75a52ce..663015d 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -6609,12 +6609,12 @@ describeSubscriptions(const char *pattern, bool verbose) appendPQExpBuffer(&buf, ", subskiplsn AS \"%s\"\n", gettext_noop("Skip LSN")); + + /* include_generated_columns is only supported in v18 and higher */ if (pset.sversion >= 17) appendPQExpBuffer(&buf, ", subincludegencols AS \"%s\"\n", gettext_noop("include_generated_columns")); - - // include_generated_columns } /* Only display subscriptions in current database. */ diff --git a/src/test/subscription/t/011_generated.pl b/src/test/subscription/t/011_generated.pl index 92b3dbf..cbd5015 100644 --- a/src/test/subscription/t/011_generated.pl +++ b/src/test/subscription/t/011_generated.pl @@ -24,7 +24,7 @@ $node_publisher->safe_psql('postgres', "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED)" ); -# publisher-side tab2 has generated col 'b' but subscriber-side tab2 has NON-gnerated col 'b'. +# publisher-side tab2 has generated col 'b' but subscriber-side tab2 has NON-generated col 'b'. $node_publisher->safe_psql('postgres', "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED)" );
datfrozenxid > relfrozenxid w/ crash before XLOG_HEAP_INPLACE
https://postgr.es/m/20240512232923.aa.nmi...@google.com wrote: > Separable, nontrivial things not fixed in the attached patch stack: > - Trouble is possible, I bet, if the system crashes between the inplace-update > memcpy() and XLogInsert(). See the new XXX comment below the memcpy(). That comment: /*-- * XXX A crash here can allow datfrozenxid() to get ahead of relfrozenxid: * * ["D" is a VACUUM (ONLY_DATABASE_STATS)] * ["R" is a VACUUM tbl] * D: vac_update_datfrozenid() -> systable_beginscan(pg_class) * D: systable_getnext() returns pg_class tuple of tbl * R: memcpy() into pg_class tuple of tbl * D: raise pg_database.datfrozenxid, XLogInsert(), finish * [crash] * [recovery restores datfrozenxid w/o relfrozenxid] */ > Might solve this by inplace update setting DELAY_CHKPT, writing WAL, and > finally issuing memcpy() into the buffer. That fix worked. Along with that, I'm attaching a not-for-commit patch with a test case and one with the fix rebased on that test case. Apply on top of the v2 patch stack from https://postgr.es/m/20240617235854.f8.nmi...@google.com. This gets key testing from 027_stream_regress.pl; when I commented out some memcpy lines of the heapam.c change, that test caught it. This resolves the last inplace update defect known to me. Thanks, nm Author: Noah Misch Commit: Noah Misch WAL-log inplace update before revealing it to other sessions. A buffer lock won't stop a reader having already checked tuple visibility. If a vac_update_datfrozenid() and then a crash happened during inplace update of a relfrozenxid value, datfrozenxid could overtake relfrozenxid. Back-patch to v12 (all supported versions). Reviewed by FIXME. Discussion: https://postgr.es/m/FIXME diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock index fb06ff2..aec8dcc 100644 --- a/src/backend/access/heap/README.tuplock +++ b/src/backend/access/heap/README.tuplock @@ -201,6 +201,4 @@ Inplace updates create an exception to the rule that tuple data won't change under a reader holding a pin. A reader of a heap_fetch() result tuple may witness a torn read. Current inplace-updated fields are aligned and are no wider than four bytes, and current readers don't need consistency across -fields. Hence, they get by with just fetching each field once. XXX such a -caller may also read a value that has not reached WAL; see -heap_inplace_update_finish(). +fields. Hence, they get by with just fetching each field once. diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index d7e417f..2a5fea5 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -6305,6 +6305,8 @@ heap_inplace_update_finish(void *state, HeapTuple tuple) Relationrelation = scan->heap_rel; uint32 oldlen; uint32 newlen; + char *dst; + char *src; int nmsgs = 0; SharedInvalidationMessage *invalMessages = NULL; boolRelcacheInitFileInval = false; @@ -6315,6 +6317,9 @@ heap_inplace_update_finish(void *state, HeapTuple tuple) if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff) elog(ERROR, "wrong tuple length"); + dst = (char *) htup + htup->t_hoff; + src = (char *) tuple->t_data + tuple->t_data->t_hoff; + /* * Construct shared cache inval if necessary. Note that because we only * pass the new version of the tuple, this mustn't be used for any @@ -6338,15 +6343,15 @@ heap_inplace_update_finish(void *state, HeapTuple tuple) */ PreInplace_Inval(); - /* NO EREPORT(ERROR) from here till changes are logged */ - START_CRIT_SECTION(); - - memcpy((char *) htup + htup->t_hoff, - (char *) tuple->t_data + tuple->t_data->t_hoff, - newlen); - /*-- -* XXX A crash here can allow datfrozenxid() to get ahead of relfrozenxid: +* NO EREPORT(ERROR) from here till changes are complete +* +* Our buffer lock won't stop a reader having already pinned and checked +* visibility for this tuple. Hence, we write WAL first, then mutate the +* buffer. Like in MarkBufferDirtyHint() or RecordTransactionCommit(), +* checkpoint delay makes that acceptable. With the usual order of +* changes, a crash after memcpy() and before XLogInsert() could allow +* datfrozenxid to overtake relfrozenxid: * * ["D" is a VACUUM (ONLY_DATABASE_STATS)] * ["R" is a VACUUM tbl] @@ -6356,14 +6361,28 @@ heap_inplace_update_finish(void *state, HeapTuple tuple) * D: raise pg_database.datfrozenxid, XLogInsert(), finish * [crash] * [r
Re: suspicious valgrind reports about radixtree/tidstore on arm64
Hi, On Thu, Jun 20, 2024 at 7:54 AM Tom Lane wrote: > > I wrote: > > I've reproduced what looks like about the same thing on > > my Pi 4 using Fedora 38: just run "make installcheck-parallel" > > under valgrind, and kaboom. Definitely needs investigation. > > The problem appears to be that RT_ALLOC_NODE doesn't bother to > initialize the chunks[] array when making a RT_NODE_16 node. > If we fill fewer than RT_FANOUT_16_MAX of the chunks[] entries, > then when RT_NODE_16_SEARCH_EQ applies vector operations that > read the entire array, it's operating partially on uninitialized > data. Now, that's actually OK because of the "mask off invalid > entries" step, but aarch64 valgrind complains anyway. > > I hypothesize that the reason we're not seeing equivalent failures > on x86_64 is one of > > 1. x86_64 valgrind is stupider than aarch64, and fails to track that > the contents of the SIMD registers are only partially defined. > > 2. x86_64 valgrind is smarter than aarch64, and is able to see > that the "mask off invalid entries" step removes all the > potentially-uninitialized bits. > > The first attached patch, "radixtree-fix-minimal.patch", is enough > to stop the aarch64 valgrind failure for me. However, I think > that the coding here is pretty penny-wise and pound-foolish, > and that what we really ought to do is the second patch, > "radixtree-fix-proposed.patch". I do not believe that asking > memset to zero the three-byte RT_NODE structure produces code > that's either shorter or faster than having it zero 8 bytes > (as for RT_NODE_4) or having it do that and then separately > zero some more stuff (as for the larger node types). Leaving > RT_NODE_4's chunks[] array uninitialized is going to bite us > someday, too, even if it doesn't right now. So I think we > ought to just zero the whole fixed-size part of the nodes, > which is what radixtree-fix-proposed.patch does. I agree with radixtree-fix-proposed.patch. Even if we zero more fields in the node it would not add noticeable overheads. > > (The RT_NODE_48 case could be optimized a bit if we cared > to swap the order of its slot_idxs[] and isset[] arrays; > then the initial zeroing could just go up to slot_idxs[]. > I don't know if there's any reason why the current order > is preferable.) IIUC there is no particular reason for the current order in RT_NODE_48. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: DOCS: Generated table columns are skipped by logical replication
On Wed, Jun 19, 2024 at 2:21 PM Amit Kapila wrote: > ... > > Thanks for sharing examples. Your proposed patch as-is looks good to > me. We should back-patch this unless you or someone else thinks > otherwise. > Hi Amit. I modified the patch text slightly according to Peter E's suggestion [1]. I also tested the above examples against all older PostgreSQL versions 12,13,14,15,16,17. The logical replication behaviour of skipping generated columns is the same for all of them. Note that CREATE PUBLICATION column lists did not exist until PG15, so a modified patch is needed for the versions before that. ~ The attached "HEAD" patch is appropriate for HEAD, PG17, PG16, PG15 The attached "PG14" patch is appropriate for PG14, PG13, PG12 == [1] https://www.postgresql.org/message-id/2b291af9-929f-49ab-b378-5cbc029d348f%40eisentraut.org Kind Regards, Peter Smith. Fujitsu Australia v1-0001-DOCS-logical-replication-of-generated-columns-PG14 Description: Binary data v1-0001-DOCS-logical-replication-of-generated-columns-HEAD Description: Binary data
Re: Pluggable cumulative statistics
On Thu, Jun 13, 2024 at 04:59:50PM +0900, Michael Paquier wrote: > - How should the persistence of the custom stats be achieved? > Callbacks to give custom stats kinds a way to write/read their data, > push everything into a single file, or support both? > - Should this do like custom RMGRs and assign to each stats kinds ID > that are set in stone rather than dynamic ones? These two questions have been itching me in terms of how it would work for extension developers, after noticing that custom RMGRs are used more than I thought: https://wiki.postgresql.org/wiki/CustomWALResourceManagers The result is proving to be nicer, shorter by 300 lines in total and much simpler when it comes to think about the way stats are flushed because it is possible to achieve the same result as the first patch set without manipulating any of the code paths doing the read and write of the pgstats file. In terms of implementation, pgstat.c's KindInfo data is divided into two parts, for efficiency: - The exiting in-core stats with designated initializers, renamed as built-in stats kinds. - The custom stats kinds are saved in TopMemoryContext, and can only be registered with shared_preload_libraries. The patch reserves a set of 128 harcoded slots for all the custom kinds making the lookups for the KindInfos quite cheap. Upon registration, a custom stats kind needs to assign a unique ID, with uniqueness on the names and IDs checked at registration. The backend code does ID -> information lookups in the hotter paths, meaning that the code only checks if an ID is built-in or custom, then redirects to the correct array where the information is stored. There is one code path that does a name -> information lookup for the undocumented SQL function pg_stat_have_stats() used in the tests, which is a bit less efficient now, but that does not strike me as an issue. modules/injection_points/ works as previously as a template to show how to use these APIs, with tests for the whole. With that in mind, the patch set is more pleasant to the eye, and the attached v2 consists of: - 0001 and 0002 are some cleanups, same as previously to prepare for the backend-side APIs. - 0003 adds the backend support to plug-in custom stats. - 0004 includes documentation. - 0005 is an example of how to use them, with a TAP test providing coverage. Note that the patch I've proposed to make stats persistent at checkpoint so as we don't discard everything after a crash is able to work with the custom stats proposed on this thread: https://commitfest.postgresql.org/48/5047/ -- Michael From 4c065f73cb744d1735c01e6f276d658853810f2e Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Thu, 13 Jun 2024 13:25:05 +0900 Subject: [PATCH v2 1/5] Switch PgStat_Kind from enum to uint32 A follow-up patch is planned to make this counter extensible, and keeping a trace of the kind behind a type is useful in the internal routines used by pgstats. While on it, switch pgstat_is_kind_valid() to use PgStat_Kind, to be more consistent with its callers. --- src/include/pgstat.h| 35 ++--- src/backend/utils/activity/pgstat.c | 6 ++--- 2 files changed, 20 insertions(+), 21 deletions(-) diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 2136239710..2d30fadaf1 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -32,26 +32,25 @@ #define PG_STAT_TMP_DIR "pg_stat_tmp" /* The types of statistics entries */ -typedef enum PgStat_Kind -{ - /* use 0 for INVALID, to catch zero-initialized data */ - PGSTAT_KIND_INVALID = 0, +#define PgStat_Kind uint32 - /* stats for variable-numbered objects */ - PGSTAT_KIND_DATABASE, /* database-wide statistics */ - PGSTAT_KIND_RELATION, /* per-table statistics */ - PGSTAT_KIND_FUNCTION, /* per-function statistics */ - PGSTAT_KIND_REPLSLOT, /* per-slot statistics */ - PGSTAT_KIND_SUBSCRIPTION, /* per-subscription statistics */ +/* use 0 for INVALID, to catch zero-initialized data */ +#define PGSTAT_KIND_INVALID 0 - /* stats for fixed-numbered objects */ - PGSTAT_KIND_ARCHIVER, - PGSTAT_KIND_BGWRITER, - PGSTAT_KIND_CHECKPOINTER, - PGSTAT_KIND_IO, - PGSTAT_KIND_SLRU, - PGSTAT_KIND_WAL, -} PgStat_Kind; +/* stats for variable-numbered objects */ +#define PGSTAT_KIND_DATABASE 1 /* database-wide statistics */ +#define PGSTAT_KIND_RELATION 2 /* per-table statistics */ +#define PGSTAT_KIND_FUNCTION 3 /* per-function statistics */ +#define PGSTAT_KIND_REPLSLOT 4 /* per-slot statistics */ +#define PGSTAT_KIND_SUBSCRIPTION 5 /* per-subscription statistics */ + +/* stats for fixed-numbered objects */ +#define PGSTAT_KIND_ARCHIVER 6 +#define PGSTAT_KIND_BGWRITER 7 +#define PGSTAT_KIND_CHECKPOINTER 8 +#define PGSTAT_KIND_IO 9 +#define PGSTAT_KIND_SLRU 10 +#define PGSTAT_KIND_WAL 11 #define PGSTAT_KIND_FIRST_VALID PGSTAT_KIND_DATABASE #define PGSTAT_KIND_LAST PGSTAT_KIND_WAL diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c index dcc2ad8d95..d558cc14
Re: suspicious valgrind reports about radixtree/tidstore on arm64
I wrote: > I hypothesize that the reason we're not seeing equivalent failures > on x86_64 is one of > 1. x86_64 valgrind is stupider than aarch64, and fails to track that > the contents of the SIMD registers are only partially defined. > 2. x86_64 valgrind is smarter than aarch64, and is able to see > that the "mask off invalid entries" step removes all the > potentially-uninitialized bits. Hah: it's the second case. If I patch radixtree.h as attached, then x86_64 valgrind complains about ==00:00:00:32.759 247596== Conditional jump or move depends on uninitialised value(s) ==00:00:00:32.759 247596==at 0x52F668: local_ts_node_16_search_eq (radixtree.h:1018) showing that it knows that the result of vector8_highbit_mask is only partly defined. Kind of odd though that aarch64 valgrind is getting the hard part right and not the easy(?) part. regards, tom lane diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h index 338e1d741d..267ec6de03 100644 --- a/src/include/lib/radixtree.h +++ b/src/include/lib/radixtree.h @@ -1015,12 +1015,15 @@ RT_NODE_16_SEARCH_EQ(RT_NODE_16 * node, uint8 chunk) /* convert comparison to a bitfield */ bitfield = vector8_highbit_mask(cmp1) | (vector8_highbit_mask(cmp2) << sizeof(Vector8)); - /* mask off invalid entries */ - bitfield &= ((UINT64CONST(1) << count) - 1); - - /* convert bitfield to index by counting trailing zeros */ if (bitfield) - slot_simd = &node->children[pg_rightmost_one_pos32(bitfield)]; + { + /* mask off invalid entries */ + bitfield &= ((UINT64CONST(1) << count) - 1); + + /* convert bitfield to index by counting trailing zeros */ + if (bitfield) + slot_simd = &node->children[pg_rightmost_one_pos32(bitfield)]; + } Assert(slot_simd == slot); return slot_simd;
Re: suspicious valgrind reports about radixtree/tidstore on arm64
I wrote: > I hypothesize that the reason we're not seeing equivalent failures > on x86_64 is one of > 1. x86_64 valgrind is stupider than aarch64, and fails to track that > the contents of the SIMD registers are only partially defined. > 2. x86_64 valgrind is smarter than aarch64, and is able to see > that the "mask off invalid entries" step removes all the > potentially-uninitialized bits. Side note: it struck me that this could also be a valgrind version skew issue. But the machine I'm seeing the failure on is running valgrind-3.22.0-1.fc38.aarch64, which is the same upstream version as valgrind-3.22.0-2.el8.x86_64, where I don't see it. So apparently not. (There is a 3.23.0 out recently, but its release notes don't mention anything that seems related.) regards, tom lane
Re: suspicious valgrind reports about radixtree/tidstore on arm64
I wrote: > I've reproduced what looks like about the same thing on > my Pi 4 using Fedora 38: just run "make installcheck-parallel" > under valgrind, and kaboom. Definitely needs investigation. The problem appears to be that RT_ALLOC_NODE doesn't bother to initialize the chunks[] array when making a RT_NODE_16 node. If we fill fewer than RT_FANOUT_16_MAX of the chunks[] entries, then when RT_NODE_16_SEARCH_EQ applies vector operations that read the entire array, it's operating partially on uninitialized data. Now, that's actually OK because of the "mask off invalid entries" step, but aarch64 valgrind complains anyway. I hypothesize that the reason we're not seeing equivalent failures on x86_64 is one of 1. x86_64 valgrind is stupider than aarch64, and fails to track that the contents of the SIMD registers are only partially defined. 2. x86_64 valgrind is smarter than aarch64, and is able to see that the "mask off invalid entries" step removes all the potentially-uninitialized bits. The first attached patch, "radixtree-fix-minimal.patch", is enough to stop the aarch64 valgrind failure for me. However, I think that the coding here is pretty penny-wise and pound-foolish, and that what we really ought to do is the second patch, "radixtree-fix-proposed.patch". I do not believe that asking memset to zero the three-byte RT_NODE structure produces code that's either shorter or faster than having it zero 8 bytes (as for RT_NODE_4) or having it do that and then separately zero some more stuff (as for the larger node types). Leaving RT_NODE_4's chunks[] array uninitialized is going to bite us someday, too, even if it doesn't right now. So I think we ought to just zero the whole fixed-size part of the nodes, which is what radixtree-fix-proposed.patch does. (The RT_NODE_48 case could be optimized a bit if we cared to swap the order of its slot_idxs[] and isset[] arrays; then the initial zeroing could just go up to slot_idxs[]. I don't know if there's any reason why the current order is preferable.) regards, tom lane diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h index 338e1d741d..e21f7be3f9 100644 --- a/src/include/lib/radixtree.h +++ b/src/include/lib/radixtree.h @@ -849,8 +849,14 @@ RT_ALLOC_NODE(RT_RADIX_TREE * tree, const uint8 kind, const RT_SIZE_CLASS size_c switch (kind) { case RT_NODE_KIND_4: - case RT_NODE_KIND_16: break; + case RT_NODE_KIND_16: + { +RT_NODE_16 *n16 = (RT_NODE_16 *) node; + +memset(n16->chunks, 0, sizeof(n16->chunks)); +break; + } case RT_NODE_KIND_48: { RT_NODE_48 *n48 = (RT_NODE_48 *) node; diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h index 338e1d741d..4510f7c4cd 100644 --- a/src/include/lib/radixtree.h +++ b/src/include/lib/radixtree.h @@ -845,27 +845,25 @@ RT_ALLOC_NODE(RT_RADIX_TREE * tree, const uint8 kind, const RT_SIZE_CLASS size_c /* initialize contents */ - memset(node, 0, sizeof(RT_NODE)); switch (kind) { case RT_NODE_KIND_4: + memset(node, 0, offsetof(RT_NODE_4, children)); + break; case RT_NODE_KIND_16: + memset(node, 0, offsetof(RT_NODE_16, children)); break; case RT_NODE_KIND_48: { RT_NODE_48 *n48 = (RT_NODE_48 *) node; -memset(n48->isset, 0, sizeof(n48->isset)); +memset(n48, 0, offsetof(RT_NODE_48, children)); memset(n48->slot_idxs, RT_INVALID_SLOT_IDX, sizeof(n48->slot_idxs)); break; } case RT_NODE_KIND_256: - { -RT_NODE_256 *n256 = (RT_NODE_256 *) node; - -memset(n256->isset, 0, sizeof(n256->isset)); -break; - } + memset(node, 0, offsetof(RT_NODE_256, children)); + break; default: pg_unreachable(); }
Re: IPC::Run accepts bug reports
On Tue, Jun 18, 2024 at 08:07:27PM -0700, Andres Freund wrote: > > > > 1) Sometimes hangs hard on windows if started processes have not been > > > > shut > > > >down before script exits. > It reliably reproduces if I comment out > the lines below > # explicitly shut down psql instances gracefully - to avoid hangs > # or worse on windows > in 021_row_visibility.pl > > The logfile ends in > Warning: unable to close filehandle GEN25 properly: Bad file descriptor > during global destruction. > Warning: unable to close filehandle GEN20 properly: Bad file descriptor > during global destruction. > > > Even if I cancel the test, I can't rerun it because due to a leftover psql > a) a new temp install can't be made (could be solved by rm -rf) > b) the test's logfile can't be removed (couldn't even rename the directory) > > The psql instance needs to be found and terminated first. Thanks for that recipe. I've put that in my queue to fix. On Tue, Jun 18, 2024 at 12:00:13PM -0700, Andres Freund wrote: > On 2024-06-18 10:10:17 -0700, Noah Misch wrote: > > On Mon, Jun 17, 2024 at 11:11:17AM -0700, Andres Freund wrote: > > > 2) If a subprocess dies in an inopportune moment, IPC::Run dies with "ack > > >Broken pipe:" (in _do_filters()). There's plenty reports of this on the > > >list, and I've hit this several times personally. It seems to be timing > > >dependent, I've encountered it after seemingly irrelevant ordering > > > changes. > > > > > >I suspect I could create a reproducer with a bit of time. > > > > I've seen that one. If the harness has data to write to a child, the child > > exiting before the write is one way to reach that. Perhaps before exec(), > > IPC::Run should do a non-blocking write from each pending IO. That way, > > small > > writes never experience the timing-dependent behavior. > > I think the question is rather, why is ipc run choosing to die in this > situation and can that be fixed? With default signal handling, the process would die to SIGPIPE. Since PostgreSQL::Test ignores SIGPIPE, this happens instead. The IPC::Run source tree has no discussion of ignoring SIGPIPE, so I bet it didn't get a conscious decision. Perhaps it can do better.
Re: Remove dependence on integer wrapping
On Thu, Jun 13, 2024 at 10:56 PM Joseph Koshakow wrote: On Thu, Jun 13, 2024 at 10:48 PM Joseph Koshakow wrote: >I've attached >v4-0002-Handle-overflow-in-money-arithmetic.patch which adds some >overflow checks and tests. I didn't address the float multiplication >because I didn't see any helper methods in int.h. I did some some >useful helpers in float.h, but they raise an error directly instead >of returning a bool. Would those be appropriate for use with the >money type? If not I can refactor out the inner parts into a new method >that returns a bool. >v4-0001-Remove-dependence-on-integer-wrapping.patch is unchanged, I >just incremented the version number. I added overflow handling for float arithmetic to the `money` type. v6-0002-Handle-overflow-in-money-arithmetic.patch is ready for review. v6-0001-Remove-dependence-on-integer-wrapping.patch is unchanged, I just incremented the version number. Thanks, Joe Koshakow From 6eec604618ee76227ee33fcddcc121d9915ff0ab Mon Sep 17 00:00:00 2001 From: Joseph Koshakow Date: Sat, 8 Jun 2024 22:16:46 -0400 Subject: [PATCH 1/2] Remove dependence on integer wrapping This commit updates various parts of the code to no longer rely on integer wrapping for correctness. Not all compilers support -fwrapv, so it's best not to rely on it. --- src/backend/utils/adt/cash.c | 7 +- src/backend/utils/adt/numeric.c| 5 +- src/backend/utils/adt/numutils.c | 34 --- src/backend/utils/adt/timestamp.c | 28 +- src/include/common/int.h | 105 + src/interfaces/ecpg/pgtypeslib/timestamp.c | 11 +-- src/test/regress/expected/timestamp.out| 13 +++ src/test/regress/expected/timestamptz.out | 13 +++ src/test/regress/sql/timestamp.sql | 4 + src/test/regress/sql/timestamptz.sql | 4 + 10 files changed, 169 insertions(+), 55 deletions(-) diff --git a/src/backend/utils/adt/cash.c b/src/backend/utils/adt/cash.c index 32fbad2f57..f6f095a57b 100644 --- a/src/backend/utils/adt/cash.c +++ b/src/backend/utils/adt/cash.c @@ -352,8 +352,11 @@ cash_out(PG_FUNCTION_ARGS) if (value < 0) { - /* make the amount positive for digit-reconstruction loop */ - value = -value; + /* + * make the amount positive for digit-reconstruction loop, we can + * leave INT64_MIN unchanged + */ + pg_neg_s64_overflow(value, &value); /* set up formatting data */ signsymbol = (*lconvert->negative_sign != '\0') ? lconvert->negative_sign : "-"; sign_posn = lconvert->n_sign_posn; diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c index 5510a203b0..4ea2d9b0b4 100644 --- a/src/backend/utils/adt/numeric.c +++ b/src/backend/utils/adt/numeric.c @@ -8110,15 +8110,14 @@ int64_to_numericvar(int64 val, NumericVar *var) /* int64 can require at most 19 decimal digits; add one for safety */ alloc_var(var, 20 / DEC_DIGITS); + uval = pg_abs_s64(val); if (val < 0) { var->sign = NUMERIC_NEG; - uval = -val; } else { var->sign = NUMERIC_POS; - uval = val; } var->dscale = 0; if (val == 0) @@ -11222,7 +11221,7 @@ power_var_int(const NumericVar *base, int exp, int exp_dscale, * Now we can proceed with the multiplications. */ neg = (exp < 0); - mask = abs(exp); + mask = pg_abs_s32(exp); init_var(&base_prod); set_var_from_var(base, &base_prod); diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c index adc1e8a4cb..a3d7d6bf01 100644 --- a/src/backend/utils/adt/numutils.c +++ b/src/backend/utils/adt/numutils.c @@ -18,6 +18,7 @@ #include #include +#include "common/int.h" #include "port/pg_bitutils.h" #include "utils/builtins.h" @@ -131,6 +132,7 @@ pg_strtoint16_safe(const char *s, Node *escontext) uint16 tmp = 0; bool neg = false; unsigned char digit; + int16 result; /* * The majority of cases are likely to be base-10 digits without any @@ -190,10 +192,9 @@ pg_strtoint16_safe(const char *s, Node *escontext) if (neg) { - /* check the negative equivalent will fit without overflowing */ - if (unlikely(tmp > (uint16) (-(PG_INT16_MIN + 1)) + 1)) + if (pg_neg_u16_overflow(tmp, &result)) goto out_of_range; - return -((int16) tmp); + return result; } if (unlikely(tmp > PG_INT16_MAX)) @@ -333,10 +334,9 @@ slow: if (neg) { - /* check the negative equivalent will fit without overflowing */ - if (tmp > (uint16) (-(PG_INT16_MIN + 1)) + 1) + if (pg_neg_u16_overflow(tmp, &result)) goto out_of_range; - return -((int16) tmp); + return result; } if (tmp > PG_INT16_MAX) @@ -393,6 +393,7 @@ pg_strtoint32_safe(const char *s, Node *escontext) uint32 tmp = 0; bool neg = false; unsigned char digit; + int32 result; /* * The majority of cases are likely to be base-10 digits without any @@ -452,10 +453,9 @@ pg_strtoint32_safe(const char *s, Node *escontext) if
Re: fix pg_upgrade comment
On Tue, Jun 18, 2024 at 10:20:06PM +0200, Daniel Gustafsson wrote: > Nice catch, +1 for committing. Committed. -- nathan
Re: suspicious valgrind reports about radixtree/tidstore on arm64
Tomas Vondra writes: > On 6/19/24 17:11, Tom Lane wrote: >> What's the test scenario that triggers this? > I haven't investigated that yet, I just ran "make check". I've reproduced what looks like about the same thing on my Pi 4 using Fedora 38: just run "make installcheck-parallel" under valgrind, and kaboom. Definitely needs investigation. regards, tom lane ==00:00:04:08.988 16548== Memcheck, a memory error detector ==00:00:04:08.988 16548== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. ==00:00:04:08.988 16548== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info ==00:00:04:08.988 16548== Command: postgres -F ==00:00:04:08.988 16548== Parent PID: 16278 ==00:00:04:08.988 16548== ==00:00:04:13.280 16548== Conditional jump or move depends on uninitialised value(s) ==00:00:04:13.280 16548==at 0x4AD2CC: local_ts_node_16_search_eq (radixtree.h:1025) ==00:00:04:13.280 16548==by 0x4AD2CC: local_ts_node_search (radixtree.h:1057) ==00:00:04:13.280 16548==by 0x4AF173: local_ts_get_slot_recursive (radixtree.h:1667) ==00:00:04:13.280 16548==by 0x4AF173: local_ts_set (radixtree.h:1744) ==00:00:04:13.280 16548==by 0x4AF173: TidStoreSetBlockOffsets (tidstore.c:427) ==00:00:04:13.280 16548==by 0x4FCEEF: dead_items_add (vacuumlazy.c:2892) ==00:00:04:13.280 16548==by 0x4FE5EF: lazy_scan_prune (vacuumlazy.c:1500) ==00:00:04:13.280 16548==by 0x4FE5EF: lazy_scan_heap (vacuumlazy.c:975) ==00:00:04:13.280 16548==by 0x4FE5EF: heap_vacuum_rel (vacuumlazy.c:497) ==00:00:04:13.280 16548==by 0x681527: table_relation_vacuum (tableam.h:1720) ==00:00:04:13.280 16548==by 0x681527: vacuum_rel (vacuum.c:2210) ==00:00:04:13.281 16548==by 0x682957: vacuum (vacuum.c:622) ==00:00:04:13.281 16548==by 0x7B7E77: autovacuum_do_vac_analyze (autovacuum.c:3100) ==00:00:04:13.281 16548==by 0x7B7E77: do_autovacuum (autovacuum.c:2417) ==00:00:04:13.281 16548==by 0x7B830B: AutoVacWorkerMain (autovacuum.c:1569) ==00:00:04:13.281 16548==by 0x7BB937: postmaster_child_launch (launch_backend.c:265) ==00:00:04:13.281 16548==by 0x7BD64F: StartChildProcess (postmaster.c:3928) ==00:00:04:13.281 16548==by 0x7BF9FB: StartAutovacuumWorker (postmaster.c:3997) ==00:00:04:13.281 16548==by 0x7BF9FB: process_pm_pmsignal (postmaster.c:3809) ==00:00:04:13.281 16548==by 0x7BF9FB: ServerLoop.isra.0 (postmaster.c:1667) ==00:00:04:13.281 16548==by 0x7C0EBF: PostmasterMain (postmaster.c:1372) ==00:00:04:13.281 16548== ==00:00:04:13.673 16548== Conditional jump or move depends on uninitialised value(s) ==00:00:04:13.673 16548==at 0x4AD2CC: local_ts_node_16_search_eq (radixtree.h:1025) ==00:00:04:13.673 16548==by 0x4AD2CC: local_ts_node_search (radixtree.h:1057) ==00:00:04:13.673 16548==by 0x4AFD4F: local_ts_find (radixtree.h:) ==00:00:04:13.673 16548==by 0x4AFD4F: TidStoreIsMember (tidstore.c:443) ==00:00:04:13.673 16548==by 0x5110D7: btvacuumpage (nbtree.c:1235) ==00:00:04:13.673 16548==by 0x51171B: btvacuumscan (nbtree.c:1023) ==00:00:04:13.673 16548==by 0x51185B: btbulkdelete (nbtree.c:824) ==00:00:04:13.673 16548==by 0x683D3B: vac_bulkdel_one_index (vacuum.c:2498) ==00:00:04:13.673 16548==by 0x4FDA9B: lazy_vacuum_one_index (vacuumlazy.c:2443) ==00:00:04:13.673 16548==by 0x4FDA9B: lazy_vacuum_all_indexes (vacuumlazy.c:2026) ==00:00:04:13.673 16548==by 0x4FDA9B: lazy_vacuum (vacuumlazy.c:1944) ==00:00:04:13.673 16548==by 0x4FE7F3: lazy_scan_heap (vacuumlazy.c:1050) ==00:00:04:13.674 16548==by 0x4FE7F3: heap_vacuum_rel (vacuumlazy.c:497) ==00:00:04:13.674 16548==by 0x681527: table_relation_vacuum (tableam.h:1720) ==00:00:04:13.674 16548==by 0x681527: vacuum_rel (vacuum.c:2210) ==00:00:04:13.674 16548==by 0x682957: vacuum (vacuum.c:622) ==00:00:04:13.674 16548==by 0x7B7E77: autovacuum_do_vac_analyze (autovacuum.c:3100) ==00:00:04:13.674 16548==by 0x7B7E77: do_autovacuum (autovacuum.c:2417) ==00:00:04:13.674 16548==by 0x7B830B: AutoVacWorkerMain (autovacuum.c:1569) ==00:00:04:13.674 16548== ==00:00:04:13.681 16548== Conditional jump or move depends on uninitialised value(s) ==00:00:04:13.681 16548==at 0x4AD2CC: local_ts_node_16_search_eq (radixtree.h:1025) ==00:00:04:13.681 16548==by 0x4AD2CC: local_ts_node_search (radixtree.h:1057) ==00:00:04:13.681 16548==by 0x4AFD4F: local_ts_find (radixtree.h:) ==00:00:04:13.681 16548==by 0x4AFD4F: TidStoreIsMember (tidstore.c:443) ==00:00:04:13.681 16548==by 0x51126B: btreevacuumposting (nbtree.c:1405) ==00:00:04:13.681 16548==by 0x51126B: btvacuumpage (nbtree.c:1249) ==00:00:04:13.681 16548==by 0x51171B: btvacuumscan (nbtree.c:1023) ==00:00:04:13.681 16548==by 0x51185B: btbulkdelete (nbtree.c:824) ==00:00:04:13.681 16548==by 0x683D3B: vac_bulkdel_one_index (vacuum.c:2498) ==00:00:04:13.681 16548==by 0x4FDA9B: lazy_vacuum_one_index (vacuumlazy.c:2443)
Re: State of pg_createsubscriber
On Wed, Jun 19, 2024, at 12:51 AM, Amit Kapila wrote: > Ideally, invalidated slots shouldn't create any problems but it is > better that we discuss this also as a separate problem in new thread. Ok. > > Do you have any other scenarios in mind? > > > > No, so we have three issues to discuss (a) some unwarranted messages > in --dry-run mode; (b) whether to remove pre-existing subscriptions > during conversion; (c) whether to remove pre-existing replication > slots. > > Would you like to start three new threads for each of these or would > you like Kuroda-San or me to start some or all? I will open new threads soon if you don't. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: [HACKERS] make async slave to wait for lsn to be replayed
On Fri, Jun 14, 2024 at 3:46 PM Alexander Korotkov wrote: > On Wed, Jun 12, 2024 at 11:36 AM Kartyshov Ivan > wrote: > > > > Hi, Alexander, Here, I made some improvements according to your > > discussion with Heikki. > > > > On 2024-04-11 18:09, Alexander Korotkov wrote: > > > On Thu, Apr 11, 2024 at 1:46 AM Heikki Linnakangas > > > wrote: > > >> In a nutshell, it's possible for the loop in WaitForLSN to exit > > >> without > > >> cleaning up the process from the heap. I was able to hit that by > > >> adding > > >> a delay after the addLSNWaiter() call: > > >> > > >> > TRAP: failed Assert("!procInfo->inHeap"), File: > > >> > "../src/backend/commands/waitlsn.c", Line: 114, PID: 1936152 > > >> > postgres: heikki postgres [local] > > >> > CALL(ExceptionalCondition+0xab)[0x55da1f68787b] > > >> > postgres: heikki postgres [local] CALL(+0x331ec8)[0x55da1f204ec8] > > >> > postgres: heikki postgres [local] > > >> > CALL(WaitForLSN+0x139)[0x55da1f2052cc] > > >> > postgres: heikki postgres [local] > > >> > CALL(pg_wal_replay_wait+0x18b)[0x55da1f2056e5] > > >> > postgres: heikki postgres [local] > > >> > CALL(ExecuteCallStmt+0x46e)[0x55da1f18031a] > > >> > postgres: heikki postgres [local] > > >> > CALL(standard_ProcessUtility+0x8cf)[0x55da1f4b26c9] > > >> > > >> I think there's a similar race condition if the timeout is reached at > > >> the same time that the startup process wakes up the process. > > > > > > Thank you for catching this. I think WaitForLSN() just needs to call > > > deleteLSNWaiter() unconditionally after exit from the loop. > > > > Fix and add injection point test on this race condition. > > > > > On Thu, Apr 11, 2024 at 1:46 AM Heikki Linnakangas > > > wrote: > > >> The docs could use some-copy-editing, but just to point out one issue: > > >> > > >> > There are also procedures to control the progress of recovery. > > >> > > >> That's copy-pasted from an earlier sentence at the table that lists > > >> functions like pg_promote(), pg_wal_replay_pause(), and > > >> pg_is_wal_replay_paused(). The pg_wal_replay_wait() doesn't control > > >> the > > >> progress of recovery like those functions do, it only causes the > > >> calling > > >> backend to wait. > > > > Fix documentation and add extra tests on multi-standby replication > > and cascade replication. > > Thank you for the revised patch. > > I see couple of items which are not addressed in this revision. > * As Heikki pointed, that it's currently not possible in one round > trip to call call pg_wal_replay_wait() and do other job. The attached > patch addresses this. It milds the requirement. Now, it's not > necessary to be in atomic context. It's only necessary to have no > active snapshot. This new requirement works for me so far. I > appreciate a feedback on this. > * As Alvaro pointed, multiple waiters case isn't covered by the test > suite. That leads to no coverage of some code paths. The attached > patch addresses this by adding a test case with multiple waiters. > > The rest looks good to me. Oh, I forgot some notes about 044_wal_replay_wait_injection_test.pl. 1. It's not clear why this test needs node_standby2 at all. It seems useless. 2. The target LSN is set to pg_current_wal_insert_lsn() + 1. This location seems to be unachievable in this test. So, it's not clear what race condition this test could potentially detect. 3. I think it would make sense to check for the race condition reported by Heikki. That is to insert the injection point at the beginning of WaitLSNSetLatches(). Links. 1. https://www.postgresql.org/message-id/flat/CAPpHfdvGRssjqwX1%2Bidm5Tu-eWsTcx6DthB2LhGqA1tZ29jJaw%40mail.gmail.com#557900e860457a9e24256c93a2ad4920 -- Regards, Alexander Korotkov Supabase
Re: Failures in constraints regression test, "read only 0 of 8192 bytes"
Hello Heikki, 11.03.2024 10:09, Heikki Linnakangas wrote: On 10/03/2024 22:59, Thomas Munro wrote: On Mon, Mar 11, 2024 at 9:30 AM Heikki Linnakangas wrote: Barring objections, I'll commit the attached. +1 Pushed, thanks! Please look at a new anomaly, that I've discovered in master. Starting from af0e7deb4, the following script: numjobs=80 for ((i=1;i<=50;i++)); do echo "I $i" for ((j=1;j<=numjobs;j++)); do createdb db$j; done for ((j=1;j<=numjobs;j++)); do echo " VACUUM FULL pg_class; REINDEX INDEX pg_class_oid_index; " | psql -d db$j >/dev/null 2>&1 & echo " CREATE TABLE tbl1 (t text); DROP TABLE tbl1; " | psql -d db$j >/dev/null 2>&1 & done wait grep 'was terminated' server.log && break; for ((j=1;j<=numjobs;j++)); do dropdb db$j; done done triggers a segfault: 2024-06-19 19:22:49.009 UTC [1607210:6] LOG: server process (PID 1607671) was terminated by signal 11: Segmentation fault with the following stack trace: Core was generated by `postgres: law db50 [local] CREATE TABLE '. Program terminated with signal SIGSEGV, Segmentation fault. warning: Section `.reg-xstate/1607671' in core file too small. #0 0x55d04cb8232e in RelationReloadNailed (relation=0x7f7d0a1b1fd8) at relcache.c:2415 2415 relp = (Form_pg_class) GETSTRUCT(pg_class_tuple); (gdb) bt #0 0x55d04cb8232e in RelationReloadNailed (relation=0x7f7d0a1b1fd8) at relcache.c:2415 #1 0x55d04cb8278c in RelationClearRelation (relation=0x7f7d0a1b1fd8, rebuild=true) at relcache.c:2560 #2 0x55d04cb834e9 in RelationCacheInvalidate (debug_discard=false) at relcache.c:3048 #3 0x55d04cb72e3c in InvalidateSystemCachesExtended (debug_discard=false) at inval.c:680 #4 0x55d04cb73190 in InvalidateSystemCaches () at inval.c:794 #5 0x55d04c9754ad in ReceiveSharedInvalidMessages ( invalFunction=0x55d04cb72eee , resetFunction=0x55d04cb7317e ) at sinval.c:105 #6 0x55d04cb731b4 in AcceptInvalidationMessages () at inval.c:808 #7 0x55d04c97d2ed in LockRelationOid (relid=2662, lockmode=1) at lmgr.c:136 #8 0x55d04c404b1f in relation_open (relationId=2662, lockmode=1) at relation.c:55 #9 0x55d04c47941e in index_open (relationId=2662, lockmode=1) at indexam.c:137 #10 0x55d04c4787c2 in systable_beginscan (heapRelation=0x7f7d0a1b1fd8, indexId=2662, indexOK=true, snapshot=0x0, nkeys=1, key=0x7ffd456a8570) at genam.c:396 #11 0x55d04cb7e93d in ScanPgRelation (targetRelId=3466, indexOK=true, force_non_historic=false) at relcache.c:381 #12 0x55d04cb7fe15 in RelationBuildDesc (targetRelId=3466, insertIt=true) at relcache.c:1093 #13 0x55d04cb81c93 in RelationIdGetRelation (relationId=3466) at relcache.c:2108 #14 0x55d04c404b29 in relation_open (relationId=3466, lockmode=1) at relation.c:58 #15 0x55d04cb720a6 in BuildEventTriggerCache () at evtcache.c:129 #16 0x55d04cb71f6a in EventCacheLookup (event=EVT_SQLDrop) at evtcache.c:68 #17 0x55d04c61c037 in trackDroppedObjectsNeeded () at event_trigger.c:1342 #18 0x55d04c61bf02 in EventTriggerBeginCompleteQuery () at event_trigger.c:1284 #19 0x55d04c9ac744 in ProcessUtilitySlow (pstate=0x55d04d04aca0, pstmt=0x55d04d021540, queryString=0x55d04d020830 "CREATE TABLE tbl1 (t text);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55d04d021800, qc=0x7ffd456a8fb0) at utility.c:1107 #20 0x55d04c9ac64d in standard_ProcessUtility (pstmt=0x55d04d021540, queryString=0x55d04d020830 "CREATE TABLE tbl1 (t text);", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55d04d021800, qc=0x7ffd456a8fb0) at utility.c:1067 #21 0x55d04c9ab54d in ProcessUtility (pstmt=0x55d04d021540, queryString=0x55d04d020830 "CREATE TABLE tbl1 (t text);", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55d04d021800, qc=0x7ffd456a8fb0) at utility.c:523 #22 0x55d04c9a9dc6 in PortalRunUtility (portal=0x55d04d0c1020, pstmt=0x55d04d021540, isTopLevel=true, setHoldSnapshot=false, dest=0x55d04d021800, qc=0x7ffd456a8fb0) at pquery.c:1158 #23 0x55d04c9aa03d in PortalRunMulti (portal=0x55d04d0c1020, isTopLevel=true, setHoldSnapshot=false, dest=0x55d04d021800, altdest=0x55d04d021800, qc=0x7ffd456a8fb0) at pquery.c:1315 #24 0x55d04c9a9487 in PortalRun (portal=0x55d04d0c1020, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x55d04d021800, altdest=0x55d04d021800, qc=0x7ffd456a8fb0) at pquery.c:791 #25 0x55d04c9a2150 in exec_simple_query (query_string=0x55d04d020830 "CREATE TABLE tbl1 (t text);") at postgres.c:1273 #26 0x55d04c9a71f5 in PostgresMain (dbname=0x55d04d05dbf0 "db50", username=0x55d04d05dbd8 "law") at postgres.c:4675 #27 0x55d04c8bd8c8 in BackendRun (port=0x55d04d0540c0) at postmaster.c:4475 #28 0x55d04c8bcedd in BackendStartup (port=0x55d04d0540c0) at postmaster.c:4151 #29 0x55
Re: Missing docs for new enable_group_by_reordering GUC
On Wed, Jun 19, 2024 at 6:02 PM Pavel Borisov wrote: > On Wed, 19 Jun 2024 at 05:27, Alexander Korotkov wrote: >> >> On Tue, Jun 18, 2024 at 4:14 PM Pavel Borisov wrote: >> >> Controls if the query planner will produce a plan which will provide >> >> GROUP BY keys presorted in the order of keys of a >> >> child node of the plan, such as an index scan. When disabled, the query >> >> planner will produce a plan with GROUP BY keys only >> >> reordered to match >> >> the ORDER BY clause, if any. When enabled, the planner >> >> will try to produce a more efficient plan. The default value is on. >> > A correction of myself: presorted -> sorted, reordered ->sorted >> >> Thank you for your review. I think all of this make sense. Please, >> check the revised patch attached. > > To me patch v3 looks good. Ok, thank you. I'm going to push this if no objections. -- Regards, Alexander Korotkov Supabase
Re: remove check hooks for GUCs that contribute to MaxBackends
On Wed, Jun 19, 2024 at 03:14:16PM -0400, Tom Lane wrote: > Nathan Bossart writes: >> The attached patch removes these hooks and enhances the error message to >> look like this: > >> FATAL: too many backends configured >> DETAIL: "max_connections" (262100) plus "autovacuum_max_workers" (3) >> plus "max_worker_processes" (8) plus "max_wal_senders" (1) must be less >> than 262142. > > BTW, I suggest writing it as "too many server processes configured", > or perhaps "too many server processes required". "Backend" is too > much of an insider term. Will do, thanks for reviewing. -- nathan
Re: remove check hooks for GUCs that contribute to MaxBackends
Nathan Bossart writes: > The attached patch removes these hooks and enhances the error message to > look like this: > FATAL: too many backends configured > DETAIL: "max_connections" (262100) plus "autovacuum_max_workers" (3) > plus "max_worker_processes" (8) plus "max_wal_senders" (1) must be less > than 262142. BTW, I suggest writing it as "too many server processes configured", or perhaps "too many server processes required". "Backend" is too much of an insider term. regards, tom lane
Re: remove check hooks for GUCs that contribute to MaxBackends
Nathan Bossart writes: > While working on an idea from another thread [0], I noticed that each of > max_connections, max_worker_process, max_autovacuum_workers, and > max_wal_senders have a check hook that verifies the sum of those GUCs does > not exceed a certain value. Then, in InitializeMaxBackends(), we do the > same check once more. Not only do the check hooks seem redundant, but I > think they might sometimes be inaccurate since some values might not yet be > initialized. Yeah, these per-variable checks are inherently bogus. If we can get of them and make the net user experience actually better, that's a win-win. It seems easier to do for these because they can't change after server start, so there can be one well-defined time to apply the consistency check. IIRC, we have some similar issues in other hooks for variables that aren't PGC_POSTMASTER, so it's harder to see how we might get rid of their cross-checks. That doesn't make them less bogus though. regards, tom lane
Re: Extension security improvement: Add support for extensions with an owned schema
On Wed, 19 Jun 2024 at 19:55, Tom Lane wrote: > > Jelte Fennema-Nio writes: > > On Wed, 19 Jun 2024 at 17:28, Robert Haas wrote: > >> I have a feeling that this might be pretty annoying to implement, and > >> if that is true, then never mind. > > > Based on a quick look it's not trivial, but also not super bad. > > Basically it seems like in src/backend/catalog/namespace.c, every time > > we loop over activeSearchPath and CurrentExtensionObject is set, then > > we should skip any item that's not stored in pg_catalog, unless > > there's a DEPENDENCY_EXTENSION pg_depend entry for the item (and that > > pg_depend entry references the extension or the requires list). > > We could change the lookup rules that apply during execution of > an extension script, but we already restrict search_path at that > time so I'm not sure how much further this'd move the goalposts. The point I tried to make in my first email is that this restricted search_path you mention, is not very helpful at preventing privilege escalations. Since it's often possible for a non-superuser to create functions in one of the schemas in this search_path, e.g. by having the non-superuser first create the schema & create some functions in it, and then asking the DBA/control plane to create the extension in that schema. My patch tries to address that problem by creating the schema of the extension during extension creation, and failing if it already exists. Thus implicitly ensuring that a non-superuser cannot mess with the schema. The proposal from Robert tries to instead address by changing the lookup rules during execution of an extension script to be more strict than they would be outside of it (i.e. even if a function/operator matches search_path it might still not be picked). > The *real* problem IMO is that if you create a PL function or > (old-style) SQL function within an extension, execution of that > function is not similarly protected. That's definitely a big problem too, and that's the problem that [1] tries to fix. But first the lookup in extension scripts would need to be made secure, because it doesn't seem very useful (security wise) to use the same lookup mechanism in functions as we do in extension scripts, if the lookup in extension scripts is not secure in the first place. I think the method of making the lookup secure in my patch would transfer over well, because it adds a way for a safe search_path to exist, so all that's needed is for the PL function to use that search_path. Robbert's proposal would be more difficult I think. When executing a PL function from an extension we'd need to use the same changed lookup rules that we'd use during the extension script of that extension. I think that should be possible, but it's definitely more involved. [1]: https://www.postgresql.org/message-id/flat/CAE9k0P%253DFcZ%253Dao3ZpEq29BveF%252B%253D27KBcRT2HFowJxoNCv02dHLA%2540mail.gmail.com
remove check hooks for GUCs that contribute to MaxBackends
While working on an idea from another thread [0], I noticed that each of max_connections, max_worker_process, max_autovacuum_workers, and max_wal_senders have a check hook that verifies the sum of those GUCs does not exceed a certain value. Then, in InitializeMaxBackends(), we do the same check once more. Not only do the check hooks seem redundant, but I think they might sometimes be inaccurate since some values might not yet be initialized. Furthermore, the error message is not exactly the most descriptive: $ pg_ctl -D . start -o "-c max_connections=262100 -c max_wal_senders=1" FATAL: invalid value for parameter "max_wal_senders": 1 The attached patch removes these hooks and enhances the error message to look like this: FATAL: too many backends configured DETAIL: "max_connections" (262100) plus "autovacuum_max_workers" (3) plus "max_worker_processes" (8) plus "max_wal_senders" (1) must be less than 262142. The downside of this change is that server startup progresses a little further before it fails, but that might not be too concerning given this _should_ be a relatively rare occurrence. Thoughts? [0] https://postgr.es/m/20240618213331.ef2spg3nasksisbi%40awork3.anarazel.de -- nathan >From 2ab34581879d2122f03be0e3f9d0d15edb501a7c Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 19 Jun 2024 13:39:18 -0500 Subject: [PATCH v1 1/1] remove check hooks for GUCs that contribute to MaxBackends --- src/backend/utils/init/postinit.c | 57 - src/backend/utils/misc/guc_tables.c | 8 ++-- src/include/utils/guc_hooks.h | 6 --- 3 files changed, 11 insertions(+), 60 deletions(-) diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index 0805398e24..8a629982c4 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -580,57 +580,14 @@ InitializeMaxBackends(void) MaxBackends = MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders; - /* internal error because the values were all checked previously */ if (MaxBackends > MAX_BACKENDS) - elog(ERROR, "too many backends configured"); -} - -/* - * GUC check_hook for max_connections - */ -bool -check_max_connections(int *newval, void **extra, GucSource source) -{ - if (*newval + autovacuum_max_workers + 1 + - max_worker_processes + max_wal_senders > MAX_BACKENDS) - return false; - return true; -} - -/* - * GUC check_hook for autovacuum_max_workers - */ -bool -check_autovacuum_max_workers(int *newval, void **extra, GucSource source) -{ - if (MaxConnections + *newval + 1 + - max_worker_processes + max_wal_senders > MAX_BACKENDS) - return false; - return true; -} - -/* - * GUC check_hook for max_worker_processes - */ -bool -check_max_worker_processes(int *newval, void **extra, GucSource source) -{ - if (MaxConnections + autovacuum_max_workers + 1 + - *newval + max_wal_senders > MAX_BACKENDS) - return false; - return true; -} - -/* - * GUC check_hook for max_wal_senders - */ -bool -check_max_wal_senders(int *newval, void **extra, GucSource source) -{ - if (MaxConnections + autovacuum_max_workers + 1 + - max_worker_processes + *newval > MAX_BACKENDS) - return false; - return true; + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), +errmsg("too many backends configured"), +errdetail("\"max_connections\" (%d) plus \"autovacuum_max_workers\" (%d) plus \"max_worker_processes\" (%d) plus \"max_wal_senders\" (%d) must be less than %d.", + MaxConnections, autovacuum_max_workers, + max_worker_processes, max_wal_senders, + MAX_BACKENDS - 1))); } /* diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 46c258be28..07b575894d 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -2207,7 +2207,7 @@ struct config_int ConfigureNamesInt[] = }, &MaxConnections, 100, 1, MAX_BACKENDS, - check_max_connections, NULL, NULL + NULL, NULL, NULL }, { @@ -2923,7 +2923,7 @@ struct config_int ConfigureNamesInt[] = }, &max_wal_senders, 10, 0, MAX_BACKENDS, - check_max_wal_senders, NULL, NULL + NULL, NULL, NULL }, { @@ -3153,7 +3153,7 @@ struct config_int ConfigureNamesInt[] = }, &max_worker_processes, 8, 0, MAX_BACKENDS,
Re: BitmapHeapScan streaming read user and prelim refactoring
Thanks for taking a look at my patches, Álvaro! On Wed, Jun 19, 2024 at 12:51 PM Alvaro Herrera wrote: > > On 2024-Jun-14, Melanie Plageman wrote: > > > Subject: [PATCH v21 12/20] Update variable names in bitmap scan descriptors > > > > The previous commit which added BitmapTableScanDesc and > > BitmapHeapScanDesc used the existing member names from TableScanDescData > > and HeapScanDescData for diff clarity. This commit renames the members > > -- in many cases by removing the rs_ prefix which is not relevant or > > needed here. > > *Cough* Why? It makes grepping for struct members useless. I'd rather > keep these prefixes, as they allow easier code exploration. (Sometimes > when I need to search for uses of some field with a name that's too > common, I add a prefix to the name and let the compiler guide me to > them. But that's a waste of time ...) If we want to make it possible to use no tools and only manually grep for struct members, that means we can never reuse struct member names. Across a project of our size, that seems like a very serious restriction. Adding prefixes in struct members makes it harder to read code -- both because it makes the names longer and because people are more prone to abbreviate the meaningful parts of the struct member name to make the whole name shorter. While I understand we as a project want to make it possible to hack on Postgres without an IDE or a set of vim plugins a mile long, I also think we have to make some compromises for readability. Most commonly used text editors have LSP (language server protocol) support and should allow for meaningful identification of the usages of a struct member even if it has the same name as a member of another struct. That being said, I'm not unreasonable. If we have decided we can not reuse struct member names, I will change my patch. - Melanie
Re: Report runtimes in pg_upgrade verbose mode
> On 19 Jun 2024, at 17:09, Nathan Bossart wrote: > I've been using 'ts -i' as Peter suggested Oh nice, I had forgotten about that one, thanks for the reminder! -- Daniel Gustafsson
Re: BitmapHeapScan streaming read user and prelim refactoring
On Wed, Jun 19, 2024 at 12:38 PM Tomas Vondra wrote: > > > > On 6/19/24 17:55, Melanie Plageman wrote: > > On Tue, Jun 18, 2024 at 6:02 PM Tomas Vondra > > wrote: > > > From your v22b-0017-review.patch > > > > diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h > > index 036ef29e7d5..9c711ce0eb0 100644 > > --- a/src/include/access/relscan.h > > +++ b/src/include/access/relscan.h > > @@ -52,6 +52,13 @@ typedef struct TableScanDescData > > } TableScanDescData; > > typedef struct TableScanDescData *TableScanDesc; > > > > +/* > > + * XXX I don't understand why we should have this special node if we > > + * don't have special nodes for other scan types. > > > > In this case, up until the final commit (using the read stream > > interface), there are six fields required by bitmap heap scan that are > > not needed by any other user of HeapScanDescData. There are also > > several members of HeapScanDescData that are not needed by bitmap heap > > scans and all of the setup in initscan() for those fields is not > > required for bitmap heap scans. > > > > Also, because the BitmapHeapScanDesc needs information like the > > ParallelBitmapHeapState and prefetch_maximum (for use in prefetching), > > the scan_begin() callback would have to take those as parameters. I > > thought adding so much bitmap table scan-specific information to the > > generic table scan callbacks was a bad idea. > > > > Once we add the read stream API code, the number of fields required > > for bitmap heap scan that are in the scan descriptor goes down to > > three. So, perhaps we could justify having that many bitmap heap > > scan-specific fields in the HeapScanDescData. > > > > Though, I actually think we could start moving toward having > > specialized scan descriptors if the requirements for that scan type > > are appreciably different. I can't think of new code that would be > > added to the HeapScanDescData that would have to be duplicated over to > > specialized scan descriptors. > > > > With the final read stream state, I can see the argument for bloating > > the HeapScanDescData with three extra members and avoiding making new > > scan descriptors. But, for the intermediate patches which have all of > > the bitmap prefetch members pushed down into the HeapScanDescData, I > > think it is really not okay. Six members only for bitmap heap scans > > and two bitmap-specific members to begin_scan() seems bad. > > > > What I thought we plan to do is commit the refactoring patches > > sometime after the branch for 18 is cut and leave the final read > > stream patch uncommitted so we can do performance testing on it. If > > you think it is okay to have the six member bloated HeapScanDescData > > in master until we push the read stream code, I am okay with removing > > the BitmapTableScanDesc and BitmapHeapScanDesc. > > > > I admit I don't have a very good idea what the ideal / desired state > look like. My comment is motivated solely by the feeling that it seems > strange to have one struct serving most scan types, and then a special > struct for one particular scan type ... I see what you are saying. We could make BitmapTableScanDesc inherit from TableScanDescData which would be similar to what we do with other things like the executor scan nodes themselves. We would waste space and LOC with initializing the unneeded members, but it might seem less weird. Whether we want the specialized scan descriptors at all is probably the bigger question, though. The top level BitmapTableScanDesc is motivated by wanting fewer bitmap table scan specific members passed to scan_begin(). And the BitmapHeapScanDesc is motivated by this plus wanting to avoid bloating the HeapScanDescData. If you look at at HEAD~1 (with my patches applied) and think you would be okay with 1) the contents of the BitmapHeapScanDesc being in the HeapScanDescData and 2) the extra bitmap table scan-specific parameters in scan_begin_bm() being passed to scan_begin() then I will remove the specialized scan descriptors. The final state (with the read stream) will still have three bitmap heap scan-specific members in the HeapScanDescData. Would it help if I do a version like this so you can see what it is like? > > + * XXX Also, maybe this should do the naming convention with Data at > > + * the end (mostly for consistency). > > + */ > > typedef struct BitmapTableScanDesc > > { > > Relationrs_rd;/* heap relation descriptor */ > > > > I really want to move away from these Data typedefs. I find them so > > confusing as a developer, but it's hard to justify ripping out the > > existing ones because of code churn. If we add new scan descriptors, I > > had really hoped to start using a different pattern. > > > > Perhaps, I understand that. I'm not a huge fan of Data structs myself, > but I'm not sure it's a great idea to do both things in the same area of > code. That's guaranteed to be confusing for everyone ... > > If we want to move away f
Re: meson vs Cygwin
On 2024-02-13 Tu 7:00 AM, Marco Atzeri wrote: On 22/11/2023 13:02, Andrew Dunstan wrote: I've been trying to get meson working with Cygwin. On a fresh install (Cygwin 3.4.9, gcc 11.4.0, meson 1.0.2, ninja 1.11.1) I get a bunch of errors like this: ERROR: incompatible library "/home/andrew/bf/buildroot/HEAD/pgsql.build/tmp_install/home/andrew/bf/buildroot/HEAD/inst/lib/postgresql/plperl.dll": missing magic block Similar things happen if I try to build with python. I'm not getting the same on a configure/make build. Not sure what would be different. cheers andrew Hi Andrew, sorry for jumping on this request so late how are you configuring the build ? Sorry for not replying in turn :-( I just got this error again. All I did was: meson setup build . meson compile -C build meson test -C build I don't get the error if I build using ./configure --with-perl --with-python make world-bin make check-world cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Maybe don't process multi xmax in FreezeMultiXactId() if it is already marked as invalid?
On Wed, Jun 19, 2024 at 1:39 PM Alvaro Herrera wrote: > FWIW I don't think HEAP_XMAX_INVALID as purely a hint. > HEAP_XMAX_COMMITTED is a hint, for sure, as is HEAP_XMIN_COMMITTED on > its own; but as far as I recall, the INVALID flags must persist once > set. Seems we disagree on some pretty fundamental things in this area, then. > Consider the HEAP_XMIN_COMMITTED+ HEAP_XMIN_INVALID combination, > which we use to represent HEAP_XMIN_FROZEN; if that didn't persist, we'd > have a pretty serious data corruption issue, because we don't reset the > Xmin field when freezing the tuple. That's definitely true, but that's a case where the setting happens during a WAL-logged operation. It doesn't involve HEAP_XMAX_INVALID at all. FWIW I also don't think that even HEAP_XMIN_INVALID should be considered anything more than a hint when it appears on its own. Only HEAP_XMIN_FROZEN (by which I mean HEAP_XMIN_COMMITTED + HEAP_XMIN_INVALID) are non-hint xact status infomask bits. It's slightly annoying that we have this HEAP_XMIN_FROZEN case where a hint bit isn't just a hint, but that's just a historical detail. And I suppose that the same thing can be said of HEAP_XMAX_IS_MULTI itself (we have no other way of distinguishing a Multi from an Xid, so clearly that also has to be treated as a persistent non-hint by everybody). > So if we fail to keep the flag, the > tuple is no longer frozen. (My point here is that some infomask bits > are hints, but not all them are only hints.) So XMAX_INVALID gives > certainty that the Xmax value must not be read. "Certainty" seems far too strong here. > That is to say, I think > there are (or there were) situations in which we set the bit but don't > bother to reset the actual Xmax field. I'm sure that that's true, but that doesn't seem at all equivalent to what you said about XMAX_INVALID "giving certainty" about the tuple. > We should never try to read the > Xmax flag if the bit is set. But that's exactly what FreezeMultiXactId does. It doesn't pay attention to XMAX_INVALID (only to !MultiXactIdIsValid()). Yura is apparently arguing that FreezeMultiXactId should notice XMAX_INVALID and then tell its caller to "FRM_INVALIDATE_XMAX". That does seem like a valid optimization. But if we were to do that then we'd likely do it in a way that still resulted in ordinary processing of the multi (it would not work by immediately setting "FRM_INVALIDATE_XMAX" in the !MultiXactIdIsValid() path). That approach to the optimization makes the most sense, because we'd likely want to preserve the existing FreezeMultiXactId sanity checks. > I think the problem being investigated in this thread is that > HEAP_XMAX_IS_MULTI is being treated as persistent, that is, it can only > be set if the xmax is not invalid, but apparently that's not always the > case (or we wouldn't be having this conversation). A multixact/HEAP_XMAX_IS_MULTI xmax doesn't start out as invalid. Treating HEAP_XMAX_IS_MULTI as persistent doesn't mean that we should treat XMAX_INVALID as consistent. In particular, XMAX_INVALID isn't equivalent to !MultiXactIdIsValid() (you can make a similar statement about xmax XIDs). -- Peter Geoghegan
Re: Extension security improvement: Add support for extensions with an owned schema
Jelte Fennema-Nio writes: > On Wed, 19 Jun 2024 at 17:28, Robert Haas wrote: >> I have a feeling that this might be pretty annoying to implement, and >> if that is true, then never mind. > Based on a quick look it's not trivial, but also not super bad. > Basically it seems like in src/backend/catalog/namespace.c, every time > we loop over activeSearchPath and CurrentExtensionObject is set, then > we should skip any item that's not stored in pg_catalog, unless > there's a DEPENDENCY_EXTENSION pg_depend entry for the item (and that > pg_depend entry references the extension or the requires list). We could change the lookup rules that apply during execution of an extension script, but we already restrict search_path at that time so I'm not sure how much further this'd move the goalposts. The *real* problem IMO is that if you create a PL function or (old-style) SQL function within an extension, execution of that function is not similarly protected. regards, tom lane
Re: Extension security improvement: Add support for extensions with an owned schema
On Wed, 19 Jun 2024 at 17:28, Robert Haas wrote: > But I wonder if there might also be another possible approach: could > we, somehow, prevent object references in extension scripts from > resolving to anything other than the system catalogs and the contents > of that extension? This indeed does sound like the behaviour that pretty much every existing extension wants to have. One small addition/clarification that I would make to your definition: fully qualified references to other objects should still be allowed. I do think, even if we have this, there would be other good reasons to use "owned schemas" for extension authors. At least the following two: 1. To have a safe search_path that can be used in SET search_path on a function (see also [1]). 2. To make it easy for extension authors to avoid conflicts with other extensions/UDFs. > Perhaps with a control file setting to specify a > list of trusted extensions which we're also allowed to reference? I think we could simply use the already existing "requires" field from the control file. i.e. you're allowed to reference only your own extension > I have a feeling that this might be pretty annoying to implement, and > if that is true, then never mind. Based on a quick look it's not trivial, but also not super bad. Basically it seems like in src/backend/catalog/namespace.c, every time we loop over activeSearchPath and CurrentExtensionObject is set, then we should skip any item that's not stored in pg_catalog, unless there's a DEPENDENCY_EXTENSION pg_depend entry for the item (and that pg_depend entry references the extension or the requires list). There's quite a few loops over activeSearchPath in namespace.c, but they all seem pretty similar. So while a bunch of code would need to be changed, the changes could probably be well encapsulated in a function. [1]: https://www.postgresql.org/message-id/flat/00d8f046156e355ec0eb49585408bafc8012e4a5.camel%40j-davis.com#3ad7a8073d5ef50cfe44e305c38d
Docs: Order of json aggregate functions
The order of json related aggregate functions in the docs is currently like this: [...] json_agg json_objectagg json_object_agg json_object_agg_strict json_object_agg_unique json_arrayagg json_object_agg_unique_strict max min range_agg range_intersect_agg json_agg_strict [...] json_arrayagg and json_agg_strict are out of place. Attached patch puts them in the right spot. This is the same down to v16. Best, WolfgangFrom ad857a824d893a3e421c6c577c1215f71c1ebfe3 Mon Sep 17 00:00:00 2001 From: Wolfgang Walther Date: Wed, 19 Jun 2024 19:40:49 +0200 Subject: [PATCH v1] Fix order of json aggregate functions in docs --- doc/src/sgml/func.sgml | 96 +- 1 file changed, 48 insertions(+), 48 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 2609269610b..c3b342d832f 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -21790,6 +21790,54 @@ SELECT NULLIF(value, '(none)') ... No + + + + json_agg_strict + +json_agg_strict ( anyelement ) +json + + + + jsonb_agg_strict + +jsonb_agg_strict ( anyelement ) +jsonb + + +Collects all the input values, skipping nulls, into a JSON array. +Values are converted to JSON as per to_json +or to_jsonb. + + No + + + + +json_arrayagg +json_arrayagg ( + value_expression + ORDER BY sort_expression + { NULL | ABSENT } ON NULL + RETURNING data_type FORMAT JSON ENCODING UTF8 ) + + +Behaves in the same way as json_array +but as an aggregate function so it only takes one +value_expression parameter. +If ABSENT ON NULL is specified, any NULL +values are omitted. +If ORDER BY is specified, the elements will +appear in the array in that order rather than in the input order. + + +SELECT json_arrayagg(v) FROM (VALUES(2),(1)) t(v) +[2, 1] + + No + + json_objectagg @@ -21900,31 +21948,6 @@ SELECT NULLIF(value, '(none)') ... No - - -json_arrayagg -json_arrayagg ( - value_expression - ORDER BY sort_expression - { NULL | ABSENT } ON NULL - RETURNING data_type FORMAT JSON ENCODING UTF8 ) - - -Behaves in the same way as json_array -but as an aggregate function so it only takes one -value_expression parameter. -If ABSENT ON NULL is specified, any NULL -values are omitted. -If ORDER BY is specified, the elements will -appear in the array in that order rather than in the input order. - - -SELECT json_arrayagg(v) FROM (VALUES(2),(1)) t(v) -[2, 1] - - No - - @@ -22033,29 +22056,6 @@ SELECT NULLIF(value, '(none)') ... No - - - - json_agg_strict - -json_agg_strict ( anyelement ) -json - - - - jsonb_agg_strict - -jsonb_agg_strict ( anyelement ) -jsonb - - -Collects all the input values, skipping nulls, into a JSON array. -Values are converted to JSON as per to_json -or to_jsonb. - - No - - -- 2.45.1
Re: Maybe don't process multi xmax in FreezeMultiXactId() if it is already marked as invalid?
On 2024-Jun-19, Peter Geoghegan wrote: > On Wed, Jun 19, 2024 at 1:00 PM Yura Sokolov wrote: > > So it is quite different code paths, and one could not be used > > to decline or justify other. > > The point is that we shouldn't need to rely on what is formally a > hint. It might be useful to use the hint to decide whether or not > freezing now actually makes sense, but that isn't the same thing as > relying on the hint (we could make the same decision for a number of > different reasons). FWIW I don't think HEAP_XMAX_INVALID as purely a hint. HEAP_XMAX_COMMITTED is a hint, for sure, as is HEAP_XMIN_COMMITTED on its own; but as far as I recall, the INVALID flags must persist once set. Consider the HEAP_XMIN_COMMITTED+ HEAP_XMIN_INVALID combination, which we use to represent HEAP_XMIN_FROZEN; if that didn't persist, we'd have a pretty serious data corruption issue, because we don't reset the Xmin field when freezing the tuple. So if we fail to keep the flag, the tuple is no longer frozen. (My point here is that some infomask bits are hints, but not all them are only hints.) So XMAX_INVALID gives certainty that the Xmax value must not be read. That is to say, I think there are (or there were) situations in which we set the bit but don't bother to reset the actual Xmax field. We should never try to read the Xmax flag if the bit is set. I think the problem being investigated in this thread is that HEAP_XMAX_IS_MULTI is being treated as persistent, that is, it can only be set if the xmax is not invalid, but apparently that's not always the case (or we wouldn't be having this conversation). -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: Maybe don't process multi xmax in FreezeMultiXactId() if it is already marked as invalid?
On Wed, Jun 19, 2024 at 1:00 PM Yura Sokolov wrote: > So it is quite different code paths, and one could not be used > to decline or justify other. The point is that we shouldn't need to rely on what is formally a hint. It might be useful to use the hint to decide whether or not freezing now actually makes sense, but that isn't the same thing as relying on the hint (we could make the same decision for a number of different reasons). > More over, certainly test on HEAP_XMAX_INVALID could be used > there in heap_prepare_freeze_tuple to set > freeze_xmax = true; > Why didn't you do it? You might as well ask me why I didn't do any number of other things. I actually wrote a patch that made FreezeMultiXactId() more aggressive about this sort of thing (setting HEAP_XMAX_INVALID) that targeted Postgres 16. That worked by noticing that every member XID was at least before OldestXmin, even when the MXID itself was >= OldestMxact. That didn't go anywhere, even though it was a perfectly valid optimization. It's true that FreezeMultiXactId() optimizations like this are possible. So what? > It is not a bug per se. > But: > - it is missed opportunity for optimization, > - it is inconsistency in data handling. > Inconsistency leads to bugs when people attempt to modify code. In what sense is there an inconsistency? I think that you must mean that we need to do the same thing for the !MultiXactIdIsValid() case and the already-HEAP_XMAX_INVALID case. But I don't think that that's any meaningful kind of inconsistency. It's *also* what we do with plain XIDs. If anything, the problem is that we're *too* consistent (ideally we *would* treat MultiXacts differently). > Yes, we changed completely different place mistakenly relying on > consistent reaction on this "hint", and that leads to bug in our > patch. Oooops! > > HEAP_XMAX_INVALID is just a hint. > > > > WTF is "just a hint"? > I thought, hint is "yep, you can ignore it. But we already did some job > and stored its result as this hint. And if you don't ignore this hint, > then you can skip doing the job we did already". > > So every time we ignore hint, we miss opportunity for optimization. > Why the hell we shouldn't optimize when we safely can? This is the first email that anybody has used the word "optimization". We've been discussing this as if it was a bug. You introduced the topic of optimization 3 seconds ago. Remember? > If we couldn't rely on hint, then hint is completely meaningless. We don't actually trust the hints in any way. We always run checks inside heap_pre_freeze_checks(), rather than assuming that the hints are accurate. -- Peter Geoghegan
Re: Don't process multi xmax in FreezeMultiXactId() if it is already marked as invalid.
On 2024-Jun-14, Anton A. Melnikov wrote: > Hello! > > The src/backend/access/heap/README.tuplock says about HEAP_XMAX_INVALID bit > that "Any tuple with this bit set does not have a valid value stored in XMAX." > > Found that FreezeMultiXactId() tries to process such an invalid multi xmax > and may looks for an update xid in the pg_multixact for it. > > Maybe not do this work in FreezeMultiXactId() and exit immediately if the > bit HEAP_XMAX_INVALID was already set? > > For instance, like that: > > master > @@ -6215,6 +6215,15 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, > /* We should only be called in Multis */ > Assert(t_infomask & HEAP_XMAX_IS_MULTI); > + /* Xmax is already marked as invalid */ > + if (MultiXactIdIsValid(multi) && > + (t_infomask & HEAP_XMAX_INVALID)) Hmm, but why are we calling FreezeMultiXactId at all if the HEAP_XMAX_INVALID bit is set? We shouldn't do that. I think the fix should appear in heap_prepare_freeze_tuple() to skip work completely if HEAP_XMAX_INVALID is set. Then in FreezeMultiXactId you could simply Assert() that the given tuple does not have HEAP_XMAX_INVALID set. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "It takes less than 2 seconds to get to 78% complete; that's a good sign. A few seconds later it's at 90%, but it seems to have stuck there. Did somebody make percentages logarithmic while I wasn't looking?" http://smylers.hates-software.com/2005/09/08/1995c749.html
Re: generic plans and "initial" pruning
I had occasion to run the same benchmark you described in the initial email in this thread. To do so I applied patch series v49 on top of 07cb29737a4e, which is just one that happened to have the same date as v49. I then used a script like this (against a server having plan_cache_mode=force_generic_mode) for numparts in 0 1 2 4 8 16 32 48 64 80 81 96 127 128 160 200 256 257 288 300 384 512 1024 1536 2048; do pgbench testdb -i --partitions=$numparts 2>/dev/null echo -ne "$numparts\t" pgbench -n testdb -S -T30 -Mprepared | grep "^tps" | sed -e 's/^tps = \([0-9.]*\) .*/\1/' done and did the same with the commit mentioned above (that is, unpatched). I got this table as result partitions │ patched│ 07cb29737a ┼──┼── 0 │ 65632.090431 │ 68967.712741 1 │ 68096.641831 │ 65356.587223 2 │ 59456.507575 │ 60884.679464 4 │62097.426 │ 59698.747104 8 │ 58044.311175 │ 57817.104562 16 │ 59741.926563 │ 52549.916262 32 │ 59261.693449 │ 44815.317215 48 │ 59047.125629 │ 38362.123652 64 │ 59748.738797 │ 34051.158525 80 │ 59276.839183 │ 32026.135076 81 │ 62318.572932 │ 30418.122933 96 │ 59678.857163 │ 28478.113651 127 │ 58761.960028 │ 24272.303742 128 │ 59934.268306 │ 24275.214593 160 │ 56688.790899 │ 21119.043564 200 │ 56323.188599 │ 18111.212849 256 │ 55915.22466 │ 14753.953709 257 │ 57810.530461 │ 15093.497575 288 │ 56874.780092 │ 13873.332162 300 │ 57222.056549 │ 13463.768946 384 │ 54073.77295 │ 11183.558339 512 │ 37503.766847 │ 8114.32532 1024 │ 42746.866448 │ 4468.41359 1536 │ 39500.58411 │ 3049.984599 2048 │ 36988.519486 │ 2269.362006 where already at 16 partitions we can see that things are going downhill with the unpatched code. (However, what happens when the table is not partitioned looks a bit funny.) I hope we can get this new executor code in 18. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "La primera ley de las demostraciones en vivo es: no trate de usar el sistema. Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
Re: Maybe don't process multi xmax in FreezeMultiXactId() if it is already marked as invalid?
18.06.2024 18:47, Peter Geoghegan пишет: On Tue, Jun 18, 2024 at 10:29 AM Maxim Orlov wrote: Maybe, I'm too bold, but looks like a kinda bug to me. At least, I don't understand why we do not check the HEAP_XMAX_INVALID flag. My guess is nobody noticed, that MultiXactIdIsValid call does not check the mentioned flag in the "first" condition, but it's all my speculation. A related code path was changed in commit 02d647bbf0. That change made the similar xmax handling that covers XIDs (not MXIDs) *stop* doing what you're now proposing to do in the Multi path. I don't agree commit 02d647bbf0 is similar to suggested change. Commit 02d647bbf0 fixed decision to set freeze_xmax = false; xmax_already_frozen = true; Suggested change is for decision to set *flags |= FRM_INVALIDATE_XMAX; pagefrz->freeze_required = true; Which leads to freeze_xmax = true; So it is quite different code paths, and one could not be used to decline or justify other. More over, certainly test on HEAP_XMAX_INVALID could be used there in heap_prepare_freeze_tuple to set freeze_xmax = true; Why didn't you do it? Why do you think this is a bug? It is not a bug per se. But: - it is missed opportunity for optimization, - it is inconsistency in data handling. Inconsistency leads to bugs when people attempt to modify code. Yes, we changed completely different place mistakenly relying on consistent reaction on this "hint", and that leads to bug in our patch. Does anyone know if there are reasons to deliberately ignore the HEAP_XMAX INVALID flag? Or this is just an unfortunate oversight. HEAP_XMAX_INVALID is just a hint. WTF is "just a hint"? I thought, hint is "yep, you can ignore it. But we already did some job and stored its result as this hint. And if you don't ignore this hint, then you can skip doing the job we did already". So every time we ignore hint, we miss opportunity for optimization. Why the hell we shouldn't optimize when we safely can? If we couldn't rely on hint, then hint is completely meaningless. have a nice day Yura Sokolov
Re: BitmapHeapScan streaming read user and prelim refactoring
On 2024-Jun-14, Melanie Plageman wrote: > Subject: [PATCH v21 12/20] Update variable names in bitmap scan descriptors > > The previous commit which added BitmapTableScanDesc and > BitmapHeapScanDesc used the existing member names from TableScanDescData > and HeapScanDescData for diff clarity. This commit renames the members > -- in many cases by removing the rs_ prefix which is not relevant or > needed here. *Cough* Why? It makes grepping for struct members useless. I'd rather keep these prefixes, as they allow easier code exploration. (Sometimes when I need to search for uses of some field with a name that's too common, I add a prefix to the name and let the compiler guide me to them. But that's a waste of time ...) Thanks, -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "Cuando mañana llegue pelearemos segun lo que mañana exija" (Mowgli)
Re: BitmapHeapScan streaming read user and prelim refactoring
On 6/19/24 17:55, Melanie Plageman wrote: > On Tue, Jun 18, 2024 at 6:02 PM Tomas Vondra > wrote: >> >> I went through v22 to remind myself of what the patches do and do some >> basic review - I have some simple questions / comments for now, nothing >> major. I've kept the comments in separate 'review' patches, it does not >> seem worth copying here. > > Thanks so much for the review! > > I've implemented your feedback in attached v23 except for what I > mention below. I have not gone through each patch in the new set very > carefully after making the changes because I think we should resolve > the question of adding the new table scan descriptor before I do that. > A change there will require a big rebase. Then I can go through each > patch very carefully. > > From your v22b-0005-review.patch: > > src/backend/executor/nodeBitmapHeapscan.c | 14 ++ > src/include/access/tableam.h | 2 ++ > 2 files changed, 16 insertions(+) > > diff --git a/src/backend/executor/nodeBitmapHeapscan.c > b/src/backend/executor/nodeBitmapHeapscan.c > index e8b4a754434..6d7ef9ced19 100644 > --- a/src/backend/executor/nodeBitmapHeapscan.c > +++ b/src/backend/executor/nodeBitmapHeapscan.c > @@ -270,6 +270,20 @@ new_page: > > BitmapAdjustPrefetchIterator(node); > > +/* > + * XXX I'm a bit unsure if this needs to be handled using > goto. Wouldn't > + * it be simpler / easier to understand to have two nested loops? > + * > + * while (true) > + *if (!table_scan_bitmap_next_block(...)) { break; } > + *while (table_scan_bitmap_next_tuple(...)) { > + *... process tuples ... > + *} > + * > + * But I haven't tried implementing this. > + */ > if (!table_scan_bitmap_next_block(scan, &node->blockno, > &node->recheck, >&node->lossy_pages, > &node->exact_pages)) > break; > > We need to call table_scan_bimtap_next_block() the first time we call > BitmapHeapNext() on each scan but all subsequent invocations of > BitmapHeapNext() must call table_scan_bitmap_next_tuple() first each > -- because we return from BitmapHeapNext() to yield a tuple even when > there are more tuples on the page. I tried refactoring this a few > different ways and personally found the goto most clear. > OK, I haven't tried refactoring this myself, so you're probably right. > From your v22b-0010-review.patch: > > @@ -557,6 +559,8 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node) > table_rescan(node->ss.ss_currentScanDesc, NULL); > > /* release bitmaps and buffers if any */ > +/* XXX seems it should not be right after the comment, also shouldn't > + * we still reset the prefetch_iterator field to NULL? */ > tbm_end_iterate(&node->prefetch_iterator); > if (node->tbm) > tbm_free(node->tbm); > > prefetch_iterator is a TBMIterator which is stored in the struct (as > opposed to having a pointer to it stored in the struct). > tbm_end_iterate() sets the actual private and shared iterator pointers > to NULL. > Ah, right. > From your v22b-0017-review.patch > > diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h > index 036ef29e7d5..9c711ce0eb0 100644 > --- a/src/include/access/relscan.h > +++ b/src/include/access/relscan.h > @@ -52,6 +52,13 @@ typedef struct TableScanDescData > } TableScanDescData; > typedef struct TableScanDescData *TableScanDesc; > > +/* > + * XXX I don't understand why we should have this special node if we > + * don't have special nodes for other scan types. > > In this case, up until the final commit (using the read stream > interface), there are six fields required by bitmap heap scan that are > not needed by any other user of HeapScanDescData. There are also > several members of HeapScanDescData that are not needed by bitmap heap > scans and all of the setup in initscan() for those fields is not > required for bitmap heap scans. > > Also, because the BitmapHeapScanDesc needs information like the > ParallelBitmapHeapState and prefetch_maximum (for use in prefetching), > the scan_begin() callback would have to take those as parameters. I > thought adding so much bitmap table scan-specific information to the > generic table scan callbacks was a bad idea. > > Once we add the read stream API code, the number of fields required > for bitmap heap scan that are in the scan descriptor goes down to > three. So, perhaps we could justify having that many bitmap heap > scan-specific fields in the HeapScanDescData. > > Though, I actually think we could start moving toward having > specialized scan descriptors if the requirements for that scan type > are appreciably different. I can't think of new code that would be > added to the HeapScanDescData that would have to be duplicated over to > specialized scan descriptors. > > With the final read stream s
Re: What is a typical precision of gettimeofday()?
Peter Eisentraut writes: > AFAICT, pg_test_timing doesn't use gettimeofday(), so this doesn't > really address the original question. It's not exactly hard to make it do so (see attached). I tried this on several different machines, and my conclusion is that gettimeofday() reports full microsecond precision on any platform anybody is likely to be running PG on today. Even my one surviving pet dinosaur, NetBSD 10 on PowerPC Mac (mamba), shows results like this: $ ./pg_test_timing Testing timing overhead for 3 seconds. Per loop time including overhead: 901.41 ns Histogram of timing durations: < us % of total count 1 10.46074 348148 2 89.514952979181 4 0.00574191 8 0.00430143 16 0.00691230 32 0.00376125 64 0.00012 4 128 0.00303101 256 0.00027 9 512 0.9 3 1024 0.9 3 I also modified pg_test_timing to measure nanoseconds not microseconds (second patch attached), and got this: $ ./pg_test_timing Testing timing overhead for 3 seconds. Per loop time including overhead: 805.50 ns Histogram of timing durations: < ns % of total count 1 19.84234 739008 2 0.0 0 4 0.0 0 8 0.0 0 16 0.0 0 32 0.0 0 64 0.0 0 128 0.0 0 256 0.0 0 512 0.0 0 1024 80.140132984739 2048 0.00078 29 4096 0.00658245 8192 0.00290108 16384 0.00252 94 32768 0.00250 93 65536 0.00016 6 131072 0.00185 69 262144 0.8 3 524288 0.8 3 1048576 0.8 3 confirming that when the result changes it generally does so by 1usec. Applying just the second patch, I find that clock_gettime on this old hardware seems to be limited to 1us resolution, but on my more modern machines (mac M1, x86_64) it can tick at 40ns or less. Even a raspberry pi 4 shows $ ./pg_test_timing Testing timing overhead for 3 seconds. Per loop time including overhead: 69.12 ns Histogram of timing durations: < ns % of total count 1 0.0 0 2 0.0 0 4 0.0 0 8 0.0 0 16 0.0 0 32 0.0 0 64 37.59583 16317040 128 62.38568 27076131 256 0.01674 7265 512 0.2 8 1024 0.0 0 2048 0.0 0 4096 0.00153662 8192 0.00019 83 16384 0.1 3 32768 0.1 5 suggesting that the clock_gettime resolution is better than 64 ns. So I concur with Hannu that it's time to adjust pg_test_timing to resolve nanoseconds not microseconds. I gather he's created a patch that does more than mine below, so I'll wait for that. regards, tom lane diff --git a/src/include/portability/instr_time.h b/src/include/portability/instr_time.h index a6fc1922f2..5509d23d2f 100644 --- a/src/include/portability/instr_time.h +++ b/src/include/portability/instr_time.h @@ -84,7 +84,7 @@ typedef struct instr_time /* Use clock_gettime() */ -#include +#include /* * The best clockid to use according to the POSIX spec is CLOCK_MONOTONIC, @@ -111,10 +111,10 @@ static inline instr_time pg_clock_gettime_ns(void) { instr_time now; - struct timespec tmp; + struct timeval tmp; - clock_gettime(PG_INSTR_CLOCK, &tmp); - now.ticks = tmp.tv_sec * NS_PER_S + tmp.tv_nsec; + gettimeofday(&tmp, NULL); + now.ticks = tmp.tv_sec * NS_PER_S + tmp.tv_usec * 1000; return now; } diff --git a/src/bin/pg_test_timing/pg_test_timing.c b/src/bin/pg_test_timing/pg_test_timing.c index c29d6f8762..ea2b565b14 100644 --- a/src/bin/pg_test_timing/pg_test_timing.c +++ b/src/bin/pg_test_timing/pg_test_timing.c @@ -133,7 +133,7 @@ test_timing(unsigned int duration) total_time = duration > 0 ? duration * INT64CONST(100) : 0; INSTR_TIME_SET_CURRENT(start_time); - cur = INSTR_TIME_GET_MICROSEC(start_time); + cur = INSTR_TIME_GET_NANOSEC(start_time); while (time_elapsed < total_time) { @@ -142,7 +142,7 @@ test_timing(unsigned int duration) prev = cur; INSTR_TIME_SET_CURRENT(temp); - cur = INSTR_TIME_GET_MICROSEC(temp); + cur = INSTR_TIME_GET_NANOSEC(temp); diff = cur - prev; /* Did time go backwards? */ @@ -183,7 +183,7 @@ output(uint64 loop_count) { int64 max_bit = 31, i; - char *header1 = _("< us"); + char *header1 = _("< ns"); char *header2 = /* xgettext:no-c-format */ _("% of total"); char *header3 = _("count"); int len1 = strlen(header1);
Re: Pgoutput not capturing the generated columns
On 19.06.24 13:22, Shubham Khanna wrote: All the comments are handled. The attached Patch contains all the suggested changes. Please also take a look at the proposed patch for virtual generated columns [0] and consider how that would affect your patch. I think your feature can only replicate *stored* generated columns. So perhaps the documentation and terminology in your patch should reflect that. [0]: https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b...@eisentraut.org
Re: SQL/JSON query functions context_item doc entry and type requirement
On Wed, Jun 19, 2024 at 8:29 AM jian he wrote: > On Mon, Jun 17, 2024 at 9:05 PM Chapman Flack wrote: > > > > Hi, > > > > On 06/17/24 02:43, Amit Langote wrote: > > > context_item expression can be a value of > > > any type that can be cast to jsonb. This includes types > > > such as char, text, bpchar, > > > character varying, and bytea (with > > > ENCODING UTF8), as well as any domains over these types. > > > > Reading this message in conjunction with [0] makes me think that we are > > really talking about a function that takes a first parameter of type > jsonb, > > and behaves exactly that way (so any cast required is applied by the > system > > ahead of the call). Under those conditions, this seems like an unusual > > sentence to add in the docs, at least until we have also documented that > > tan's argument can be of any type that can be cast to double precision. > > > > I guess it would be fine to add an unusual sentence to the docs. > > imagine a function: array_avg(anyarray) returns anyelement. > array_avg calculate an array's elements's avg. like > array('{1,2,3}'::int[]) returns 2. > but array_avg won't make sense if the input argument is a date array. > so mentioning in the doc: array_avg can accept anyarray, but anyarray > cannot date array. > seems ok. > There is existing wording for this: "The expression can be of any JSON type, any character string type, or bytea in UTF8 encoding." If you add this sentence to the paragraph the link that already exists, which simply points the reader to this sentence, becomes redundant and should be removed. As for table 9.16.3 - it is unwieldy already. Lets try and make the core syntax shorter, not longer. We already have precedence in the subsequent json_table section - give each major clause item a name then below the table define the syntax and meaning for those names. Unlike in that section - which probably should be modified too - context_item should have its own description line. David J.
Re: Better error message when --single is not the first arg to postgres executable
On Wed, Jun 19, 2024 at 05:34:52PM +0200, Peter Eisentraut wrote: > I'm not really sure all this here is worth solving. I think requiring > things like --single or --boot to be first seems ok, and the alternatives > just make things more complicated. Yeah, I'm fine with doing something more like what Greg originally proposed at this point. -- nathan
RE: AIX support
Thanks Hikki, for going through the changes. > +/* Commenting for XLC > + * "IBM XL C/C++ for AIX, V12.1" miscompiles, for 32-bit, some inline > + * expansions of ginCompareItemPointers() "long long" arithmetic. To take > + * advantage of inlining, build a 64-bit PostgreSQL. > +#if defined(__ILP32__) && defined(__IBMC__) > +#define PG_FORCE_DISABLE_INLINE > +#endif > + */ I can remove these unwanted comments. I have to analyze the changes for the rest of your comment and will get back to you. Warm regards, Sriram. From: Heikki Linnakangas Date: Wednesday, 19 June 2024 at 8:45 PM To: Srirama Kucherlapati , Laurenz Albe , Bruce Momjian , Heikki Linnakangas Cc: Peter Eisentraut , Alvaro Herrera , pgsql-hack...@postgresql.org , Noah Misch , Michael Paquier , Andres Freund , Tom Lane , Thomas Munro , tvk1...@gmail.com , postgres-ibm-...@wwpdl.vnet.ibm.com Subject: [EXTERNAL] Re: AIX support On 19/06/2024 17:55, Srirama Kucherlapati wrote: > +/* Commenting for XLC > + * "IBM XL C/C++ for AIX, V12.1" miscompiles, for 32-bit, some inline > + * expansions of ginCompareItemPointers() "long long" arithmetic. To take > + * advantage of inlining, build a 64-bit PostgreSQL. > +#if defined(__ILP32__) && defined(__IBMC__) > +#define PG_FORCE_DISABLE_INLINE > +#endif > + */ This seems irrelevant. > + * Ordinarily, we'd code the branches here using GNU-style local symbols, > that > + * is "1f" referencing "1:" and so on. But some people run gcc on AIX with > + * IBM's assembler as backend, and IBM's assembler doesn't do local symbols. > + * So hand-code the branch offsets; fortunately, all PPC instructions are > + * exactly 4 bytes each, so it's not too hard to count. Could you use GCC assembler to avoid this? > @@ -662,6 +666,21 @@ tas(volatile slock_t *lock) > > #if !defined(HAS_TEST_AND_SET) /* We didn't trigger above, let's try > here */ > > +#if defined(_AIX)/* AIX */ > +/* > + * AIX (POWER) > + */ > +#define HAS_TEST_AND_SET > + > +#include > + > +typedef int slock_t; > + > +#define TAS(lock)_check_lock((slock_t *) (lock), 0, 1) > +#define S_UNLOCK(lock) _clear_lock((slock_t *) (lock), 0) > +#endif/* _AIX */ > + > + > /* These are in sunstudio_(sparc|x86).s */ > > #if defined(__SUNPRO_C) && (defined(__i386) || defined(__x86_64__) || > defined(__sparc__) || defined(__sparc)) What CPI/compiler/OS configuration is this for, exactly? Could we rely on GCC-provided __sync_lock_test_and_set() builtin function instead? > +# Allow platforms with buggy compilers to force restrict to not be > +# used by setting $FORCE_DISABLE_RESTRICT=yes in the relevant > +# template. Surely we don't need that anymore? Or is the compiler still buggy? Do you still care about 32-bit binaries on AIX? If not, let's make that the default in configure or a check for it, and remove the instructions on building 32-bit binaries from the docs. Please try hard to remove any changes from the diff that are not absolutely necessary. - Heikki
Re: DOCS: Generated table columns are skipped by logical replication
On 18.06.24 08:40, Peter Smith wrote: For now, I have provided just a simple patch for the "Generated Columns" section [3]. Perhaps it is enough. Makes sense. + Generated columns are skipped for logical replication, and cannot be + specified in a CREATE PUBLICATION column-list. Maybe remove the comma, and change "column-list" to "column list".
Re: suspicious valgrind reports about radixtree/tidstore on arm64
On 6/19/24 17:11, Tom Lane wrote: > Tomas Vondra writes: >> I tried running master under valgrind on 64-bit ARM (rpi5 running debian >> 12.5), and I got some suspicous reports, all related to the radixtree >> code used by tidstore. > > What's the test scenario that triggers this? > I haven't investigated that yet, I just ran "make check". regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: What is a typical precision of gettimeofday()?
On 18.06.24 07:47, Andrey M. Borodin wrote: On 19 Mar 2024, at 13:28, Peter Eisentraut wrote: I feel that we don't actually have any information about this portability concern. Does anyone know what precision we can expect from gettimeofday()? Can we expect the full microsecond precision usually? At PGConf.dev Hannu Krossing draw attention to pg_test_timing module. I’ve tried this module(slightly modified to measure nanoseconds) on some systems, and everywhere I found ~100ns resolution (95% of ticks fall into 64ns and 128ns buckets). AFAICT, pg_test_timing doesn't use gettimeofday(), so this doesn't really address the original question.
Re: Better error message when --single is not the first arg to postgres executable
On 19.06.24 16:04, Nathan Bossart wrote: What about just inlining --version and --help e.g. else if (strcmp(argv[i], "--version") == 0 || strcmp(argv[i], "-V") == 0) { fputs(PG_BACKEND_VERSIONSTR, stdout); exit(0); } I'm fine with being more persnickety about the other options; they are much rarer and not unixy. That seems like it should work. I'm not sure I agree that's the least surprising behavior (e.g., what exactly is the user trying to tell us with commands like "postgres --version --help --describe-config"?), but I also don't feel too strongly about it. There is sort of an existing convention that --help and --version behave like this, meaning they act immediately and exit without considering other arguments. I'm not really sure all this here is worth solving. I think requiring things like --single or --boot to be first seems ok, and the alternatives just make things more complicated.
Re: SQL/JSON query functions context_item doc entry and type requirement
On Mon, Jun 17, 2024 at 9:05 PM Chapman Flack wrote: > > Hi, > > On 06/17/24 02:43, Amit Langote wrote: > > context_item expression can be a value of > > any type that can be cast to jsonb. This includes types > > such as char, text, bpchar, > > character varying, and bytea (with > > ENCODING UTF8), as well as any domains over these types. > > Reading this message in conjunction with [0] makes me think that we are > really talking about a function that takes a first parameter of type jsonb, > and behaves exactly that way (so any cast required is applied by the system > ahead of the call). Under those conditions, this seems like an unusual > sentence to add in the docs, at least until we have also documented that > tan's argument can be of any type that can be cast to double precision. > I guess it would be fine to add an unusual sentence to the docs. imagine a function: array_avg(anyarray) returns anyelement. array_avg calculate an array's elements's avg. like array('{1,2,3}'::int[]) returns 2. but array_avg won't make sense if the input argument is a date array. so mentioning in the doc: array_avg can accept anyarray, but anyarray cannot date array. seems ok. > On the other hand, if the behavior of the functions were to be changed > (perhaps using prosupport rewriting as suggested in [1]?) so that it was > not purely describable as a function accepting exactly jsonb with a > possible system-applied cast in front, then in that case such an added > explanation in the docs might be very fitting. > prosupport won't work, I think. because json_exists, json_value, json_query, json_table don't have pg_proc entries. These are more like expressions.
Re: Extension security improvement: Add support for extensions with an owned schema
On Sat, Jun 1, 2024 at 8:08 PM Jelte Fennema-Nio wrote: > Writing the sql migration scripts that are run by CREATE EXTENSION and > ALTER EXTENSION UPDATE are security minefields for extension authors. > One big reason for this is that search_path is set to the schema of the > extension while running these scripts, and thus if a user with lower > privileges can create functions or operators in that schema they can do > all kinds of search_path confusion attacks if not every function and > operator that is used in the script is schema qualified. While doing > such schema qualification is possible, it relies on the author to never > make a mistake in any of the sql files. And sadly humans have a tendency > to make mistakes. I agree that this is a problem. I also think that the patch might be a reasonable solution (but I haven't reviewed it). But I wonder if there might also be another possible approach: could we, somehow, prevent object references in extension scripts from resolving to anything other than the system catalogs and the contents of that extension? Perhaps with a control file setting to specify a list of trusted extensions which we're also allowed to reference? I have a feeling that this might be pretty annoying to implement, and if that is true, then never mind. But if it isn't that annoying to implement, it would make a lot of unsafe extensions safe by default, without the extension author needing to take any action. Which could be pretty cool. It would also make it possible for extensions to safely share a schema, if desired. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Extension security improvement: Add support for extensions with an owned schema
On Wed, Jun 19, 2024 at 8:19 AM Jelte Fennema-Nio wrote: > Because part of it would > only be relevant once we support upgrading from PG18. So for now the > upgrade_code I haven't actually run. > Does it apply against v16? If so, branch off there, apply it, then upgrade from the v16 branch to master. David J.
Re: Extension security improvement: Add support for extensions with an owned schema
Attached is an updated version of this patch that fixes a few issues that CI reported (autoconf, compiler warnings and broken docs). I also think I changed the pg_upgrade to do the correct thing, but I'm not sure how to test this (even manually). Because part of it would only be relevant once we support upgrading from PG18. So for now the upgrade_code I haven't actually run. From 37b6fa45bf877bcc15ce76d7e342199b7ca76d50 Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio Date: Fri, 31 May 2024 02:04:31 -0700 Subject: [PATCH v2] Add support for extensions with an owned schema Writing the sql migration scripts that are run by CREATE EXTENSION and ALTER EXTENSION UPDATE are security minefields for extension authors. One big reason for this is that search_path is set to the schema of the extension while running these scripts, and thus if a user with lower privileges can create functions or operators in that schema they can do all kinds of search_path confusion attacks if not every function and operator that is used in the script is schema qualified. While doing such schema qualification is possible, it relies on the author to never make a mistake in any of the sql files. And sadly humans have a tendency to make mistakes. This patch adds a new "owned_schema" option to the extension control file that can be set to true to indicate that this extension wants to own the schema in which it is installed. What that means is that the schema should not exist before creating the extension, and will be created during extension creation. This thus gives the extension author an easy way to use a safe search_path, while still allowing all objects to be grouped together in a schema. The implementation also has the pleasant side effect that the schema will be automatically dropped when the extension is dropped. --- doc/src/sgml/extend.sgml | 13 ++ doc/src/sgml/ref/create_extension.sgml| 3 +- src/backend/commands/extension.c | 141 +- src/backend/utils/adt/pg_upgrade_support.c| 20 ++- src/bin/pg_dump/pg_dump.c | 15 +- src/bin/pg_dump/pg_dump.h | 1 + src/include/catalog/pg_extension.h| 1 + src/include/catalog/pg_proc.dat | 2 +- src/include/commands/extension.h | 4 +- src/test/modules/test_extensions/Makefile | 7 +- .../expected/test_extensions.out | 50 +++ src/test/modules/test_extensions/meson.build | 4 + .../test_extensions/sql/test_extensions.sql | 27 .../test_ext_owned_schema--1.0.sql| 2 + .../test_ext_owned_schema.control | 5 + ...test_ext_owned_schema_relocatable--1.0.sql | 2 + .../test_ext_owned_schema_relocatable.control | 4 + 17 files changed, 248 insertions(+), 53 deletions(-) create mode 100644 src/test/modules/test_extensions/test_ext_owned_schema--1.0.sql create mode 100644 src/test/modules/test_extensions/test_ext_owned_schema.control create mode 100644 src/test/modules/test_extensions/test_ext_owned_schema_relocatable--1.0.sql create mode 100644 src/test/modules/test_extensions/test_ext_owned_schema_relocatable.control diff --git a/doc/src/sgml/extend.sgml b/doc/src/sgml/extend.sgml index 218940ee5ce..36dc692abef 100644 --- a/doc/src/sgml/extend.sgml +++ b/doc/src/sgml/extend.sgml @@ -809,6 +809,19 @@ RETURNS anycompatible AS ... + + owned_schema (boolean) + + +An extension is owned_schema if it requires a +new dedicated schema for its objects. Such a requirement can make +security concerns related to search_path injection +much easier to reason about. The default is false, +i.e., the extension can be installed into an existing schema. + + + + schema (string) diff --git a/doc/src/sgml/ref/create_extension.sgml b/doc/src/sgml/ref/create_extension.sgml index ca2b80d669c..6e767c7bfca 100644 --- a/doc/src/sgml/ref/create_extension.sgml +++ b/doc/src/sgml/ref/create_extension.sgml @@ -102,7 +102,8 @@ CREATE EXTENSION [ IF NOT EXISTS ] extension_name The name of the schema in which to install the extension's objects, given that the extension allows its contents to be -relocated. The named schema must already exist. +relocated. The named schema must already exist if the extension's +control file does not specify owned_schema. If not specified, and the extension's control file does not specify a schema either, the current default object creation schema is used. diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c index 1643c8c69a0..c9586ad62a1 100644 --- a/src/backend/commands/extension.c +++ b/src/backend/commands/extension.c @@ -83,6 +83,8 @@ typedef struct ExtensionControlFile * MODULE_PATHNAME */ char *comment; /* comment, if any */ cha
Re: suspicious valgrind reports about radixtree/tidstore on arm64
Tomas Vondra writes: > I tried running master under valgrind on 64-bit ARM (rpi5 running debian > 12.5), and I got some suspicous reports, all related to the radixtree > code used by tidstore. What's the test scenario that triggers this? regards, tom lane
Re: Report runtimes in pg_upgrade verbose mode
On Wed, Jun 19, 2024 at 04:50:59PM +0200, Daniel Gustafsson wrote: > When doing performance hacking on pg_upgrade it's often important to see > individual runtimes to isolate changes. I've written versions of the attached > patch numerous times, and I wouldn't be surprised if others have done the > same. Indeed: https://postgr.es/m/flat/20230727235134.GA3658499%40nathanxps13 > Is there any interest in adding something like the attached to pg_upgrade? > The > patch needs some cleaning and tidying up but I wanted to to gauge interest > before investing time. I've added it to verbose mode mainly since it's not > really all that informative for regular users I think. I've been using 'ts -i' as Peter suggested [0], and that has worked decently well. One other thing that I've noticed is that some potentially long-running tasks don't have corresponding reports. For example, the initial get_db_rel_and_slot_infos() on the old cluster doesn't report anything, but that is often one of the most time-consuming steps. [0] https://postgr.es/m/32d24bcf-9ac4-b10e-4aa2-da6975312eb2%40eisentraut.org -- nathan
Re: Missing docs for new enable_group_by_reordering GUC
Hi, Alexander! On Wed, 19 Jun 2024 at 05:27, Alexander Korotkov wrote: > On Tue, Jun 18, 2024 at 4:14 PM Pavel Borisov > wrote: > >> Controls if the query planner will produce a plan which will provide > GROUP BY keys presorted in the order of keys of a child > node of the plan, such as an index scan. When disabled, the query planner > will produce a plan with GROUP BY keys only reordered to > match > >> the ORDER BY clause, if any. When enabled, the > planner will try to produce a more efficient plan. The default value is on. > > A correction of myself: presorted -> sorted, reordered ->sorted > > Thank you for your review. I think all of this make sense. Please, > check the revised patch attached. > To me patch v3 looks good. Regards, Pavel Borisov Supabase
RE: AIX support
Hi Team, Please find the attached patch, which resumes the AIX support with gcc alone. We have removed changes wrt to XLC on AIX. We are also continuing to work on the XLC and IBM-clang(openXLC) specific patch as well. Once we get an approval for the above patch we can submit a subsequent patch to support XLC/IBM-clang changes. Kindly let us know your inputs/feedback. Warm regards, Sriram. 0001-AIX-support-revert-the-changes-from-0b16bb8776bb8-v2.patch Description: 0001-AIX-support-revert-the-changes-from-0b16bb8776bb8-v2.patch
Report runtimes in pg_upgrade verbose mode
When doing performance hacking on pg_upgrade it's often important to see individual runtimes to isolate changes. I've written versions of the attached patch numerous times, and I wouldn't be surprised if others have done the same. Is there any interest in adding something like the attached to pg_upgrade? The patch needs some cleaning and tidying up but I wanted to to gauge interest before investing time. I've added it to verbose mode mainly since it's not really all that informative for regular users I think. -- Daniel Gustafsson 0001-Report-runtime-for-steps-in-pg_upgrade-verbose-mode.patch Description: Binary data
suspicious valgrind reports about radixtree/tidstore on arm64
Hi, I tried running master under valgrind on 64-bit ARM (rpi5 running debian 12.5), and I got some suspicous reports, all related to the radixtree code used by tidstore. I'm used to valgrind on arm sometimes reporting harmless issues, but this seems like it might be an actual issue. I'm attaching a snippet with a couple example reports. I can provide the complete report, but AFAIK it's all just repetitions of these cases. If needed, I can probably provide access to the rpi5 machine. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company==103190== Conditional jump or move depends on uninitialised value(s) ==103190==at 0x21FAC4: local_ts_node_16_search_eq (radixtree.h:1022) ==103190==by 0x21FC0F: local_ts_node_search (radixtree.h:1057) ==103190==by 0x220E23: local_ts_get_slot_recursive (radixtree.h:1667) ==103190==by 0x221043: local_ts_set (radixtree.h:1744) ==103190==by 0x2253A7: TidStoreSetBlockOffsets (tidstore.c:427) ==103190==by 0x2912EF: dead_items_add (vacuumlazy.c:2892) ==103190==by 0x28F193: lazy_scan_prune (vacuumlazy.c:1500) ==103190==by 0x28E767: lazy_scan_heap (vacuumlazy.c:975) ==103190==by 0x28D8C7: heap_vacuum_rel (vacuumlazy.c:497) ==103190==by 0x4C0D2B: table_relation_vacuum (tableam.h:1720) ==103190==by 0x4C458F: vacuum_rel (vacuum.c:2210) ==103190==by 0x4C1F8B: vacuum (vacuum.c:622) ==103190==by 0x6A2877: autovacuum_do_vac_analyze (autovacuum.c:3100) ==103190==by 0x6A144F: do_autovacuum (autovacuum.c:2417) ==103190==by 0x69FEE7: AutoVacWorkerMain (autovacuum.c:1569) ==103190==by 0x6A739B: postmaster_child_launch (launch_backend.c:265) ==103190==by 0x6AE9F3: StartChildProcess (postmaster.c:3928) ==103190==by 0x6AEB63: StartAutovacuumWorker (postmaster.c:3997) ==103190==by 0x6AE7DF: process_pm_pmsignal (postmaster.c:3809) ==103190==by 0x6AA937: ServerLoop (postmaster.c:1667) ==103190== Uninitialised value was created by a heap allocation ==103190==at 0x9DAE30: MemoryContextAlloc (mcxt.c:1201) ==103190==by 0x21F7D3: local_ts_alloc_node (radixtree.h:839) ==103190==by 0x2208A3: local_ts_grow_node_4 (radixtree.h:1484) ==103190==by 0x220ADF: local_ts_node_insert (radixtree.h:1547) ==103190==by 0x220E4F: local_ts_get_slot_recursive (radixtree.h:1675) ==103190==by 0x221043: local_ts_set (radixtree.h:1744) ==103190==by 0x2253A7: TidStoreSetBlockOffsets (tidstore.c:427) ==103190==by 0x2912EF: dead_items_add (vacuumlazy.c:2892) ==103190==by 0x28F193: lazy_scan_prune (vacuumlazy.c:1500) ==103190==by 0x28E767: lazy_scan_heap (vacuumlazy.c:975) ==103190==by 0x28D8C7: heap_vacuum_rel (vacuumlazy.c:497) ==103190==by 0x4C0D2B: table_relation_vacuum (tableam.h:1720) ==103190==by 0x4C458F: vacuum_rel (vacuum.c:2210) ==103190==by 0x4C1F8B: vacuum (vacuum.c:622) ==103190==by 0x6A2877: autovacuum_do_vac_analyze (autovacuum.c:3100) ==103190==by 0x6A144F: do_autovacuum (autovacuum.c:2417) ==103190==by 0x69FEE7: AutoVacWorkerMain (autovacuum.c:1569) ==103190==by 0x6A739B: postmaster_child_launch (launch_backend.c:265) ==103190==by 0x6AE9F3: StartChildProcess (postmaster.c:3928) ==103190==by 0x6AEB63: StartAutovacuumWorker (postmaster.c:3997) ==103190== { Memcheck:Cond fun:local_ts_node_16_search_eq fun:local_ts_node_search fun:local_ts_get_slot_recursive fun:local_ts_set fun:TidStoreSetBlockOffsets fun:dead_items_add fun:lazy_scan_prune fun:lazy_scan_heap fun:heap_vacuum_rel fun:table_relation_vacuum fun:vacuum_rel fun:vacuum fun:autovacuum_do_vac_analyze fun:do_autovacuum fun:AutoVacWorkerMain fun:postmaster_child_launch fun:StartChildProcess fun:StartAutovacuumWorker fun:process_pm_pmsignal fun:ServerLoop } ==103190== Conditional jump or move depends on uninitialised value(s) ==103190==at 0x21F18C: pg_rightmost_one_pos32 (pg_bitutils.h:114) ==103190==by 0x21FACF: local_ts_node_16_search_eq (radixtree.h:1023) ==103190==by 0x21FC0F: local_ts_node_search (radixtree.h:1057) ==103190==by 0x21FD57: local_ts_find (radixtree.h:) ==103190==by 0x225413: TidStoreIsMember (tidstore.c:443) ==103190==by 0x4C4DBF: vac_tid_reaped (vacuum.c:2545) ==103190==by 0x2A86D7: btvacuumpage (nbtree.c:1235) ==103190==by 0x2A829B: btvacuumscan (nbtree.c:1023) ==103190==by 0x2A7F1B: btbulkdelete (nbtree.c:824) ==103190==by 0x296C73: index_bulk_delete (indexam.c:758) ==103190==by 0x4C4C63: vac_bulkdel_one_index (vacuum.c:2498) ==103190==by 0x290957: lazy_vacuum_one_index (vacuumlazy.c:2443) ==103190==by 0x28FD7B: lazy_vacuum_all_indexes (vacuumlazy.c:2026) ==103190==by 0x28FB7B: lazy_vacuum (vacuumlazy.c:1944) ==103190==by 0x28E93F: lazy_scan_heap (vacuumlazy.c:1050) ==103190==by 0x28D8C7: heap_vacuum_rel (vacuumlazy.c:497) ==103190==by 0x4C0D2B: table_relation_vacuum (tableam.h:1720) ==103190==
Re: Inconsistency between try_mergejoin_path and create_mergejoin_plan
Richard Guo писал(а) 2024-06-19 16:30: On Wed, Jun 19, 2024 at 12:49 PM Tom Lane wrote: Richard Guo writes: > It seems to me that the new operator is somewhat artificial, since it is > designed to support a mergejoin but lacks a valid commutator. So before > we proceed to discuss the fix, I'd like to know whether this is a valid > issue that needs fixing. I do not think we should add a great deal of complexity or extra planner cycles to deal with this; but if it can be fixed at low cost, we should. I think we can simply verify the validity of commutators for clauses in the form "inner op outer" when selecting mergejoin/hash clauses. If a clause lacks a commutator, we should consider it unusable for the given pair of outer and inner rels. Please see the attached patch. This seems to be working for my test cases. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Add pg_get_acl() function get the ACL for a database object
On Wed, 19 Jun 2024 at 07:35, Joel Jacobson wrote: > Hello hackers, > > Currently, obtaining the Access Control List (ACL) for a database object > requires querying specific pg_catalog tables directly, where the user > needs to know the name of the ACL column for the object. > I have no idea how often this would be useful, but I wonder if it could work to have overloaded single-parameter versions for each of regprocedure (pg_proc.proacl), regclass (pg_class.relacl), …. To call, just cast the OID to the appropriate reg* type. For example: To get the ACL for table 'example_table', call pg_get_acl ('example_table'::regclass)
Re: Add pg_get_acl() function get the ACL for a database object
On Wed, Jun 19, 2024, at 15:51, Ranier Vilela wrote: > Regarding the patch, could it be written in the following style? Thanks for nice improvement. New version attached. Best, Joel v2-0001-Add-pg_get_acl.patch Description: Binary data
Re: Avoid orphaned objects dependencies, take 3
Hi, On Mon, Jun 17, 2024 at 05:57:05PM +, Bertrand Drouvot wrote: > Hi, > > On Mon, Jun 17, 2024 at 12:24:46PM -0400, Robert Haas wrote: > > So to try to sum up here: I'm not sure I agree with this design. But I > > also feel like the design is not as clear and consistently implemented > > as it could be. So even if I just ignored the question of whether it's > > the right design, it feels like we're a ways from having something > > potentially committable here, because of issues like the ones I > > mentioned in the last paragraph. > > > > Agree. I'll now move on with the "XXX Do we need a lock for > RelationRelationId?" > comments that I put in v10 (see [1]) and study all the cases around them. A. I went through all of them, did some testing for all, and reached the conclusion that we must be in one of the two following cases that would already prevent the relation to be dropped: 1. The relation is already locked (could be an existing relation or a relation that we are creating). 2. The relation is protected indirectly (i.e an index protected by a lock on its table, a table protected by a lock on a function that depends of the table...). So we don't need to add a lock if this is a RelationRelationId object class for the cases above. As a consequence, I replaced the "XXX" related comments that were in v10 by another set of comments in v11 (attached) like "No need to call LockRelationOid() (through LockNotPinnedObject())". Reason is to make it clear in the code and also to ease the review. B. I explained in [1] (while sharing v10) that the object locking is now outside of the dependency code except for (and I explained why): recordDependencyOnExpr() recordDependencyOnSingleRelExpr() makeConfigurationDependencies() So I also did some testing, on the RelationRelationId case, for those and I reached the same conclusion as the one shared above. For A. and B. the testing has been done by adding a "ereport(WARNING.." at those places when a RelationRelationId is involved. Then I run "make check" and went to the failed tests (output were not the expected ones due to the extra "WARNING"), reproduced them with gdb and checked for the lock on the relation producing the "WARNING". All of those were linked to 1. or 2. Note that adding an assertion on an existing lock would not work for the cases described in 2. So, I'm now confident that we must be in 1. or 2. but it's also possible that I've missed some cases (though I think the way the testing has been done is not that weak). To sum up, I did not see any cases that did not lead to 1. or 2., so I think it's safe to not add an extra lock for the RelationRelationId case. If, for any reason, there is still cases that are outside 1. or 2. then they may lead to orphaned dependencies linked to the RelationRelationId class. I think that's fine to take that "risk" given that a. that would not be worst than currently and b. I did not see any of those in our fleet currently (while I have seen a non negligible amount outside of the RelationRelationId case). > Once done, I think that it will easier to 1.remove ambiguity, 2.document and > 3.do the "right" thing regarding the RelationRelationId object class. > Please find attached v11, where I added more detailed comments in the commit message and also in the code (I also removed the useless check on AuthMemRelationId). [1]: https://www.postgresql.org/message-id/ZnAVEBhlGvpDDVOD%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 50e11432960e0ad5d940d2e7d9557fc4770d8262 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Fri, 29 Mar 2024 15:43:26 + Subject: [PATCH v11] Avoid orphaned objects dependencies It's currently possible to create orphaned objects dependencies, for example: Scenario 1: session 1: begin; drop schema schem; session 2: create a function in the schema schem session 1: commit; With the above, the function created in session 2 would be linked to a non existing schema. Scenario 2: session 1: begin; create a function in the schema schem session 2: drop schema schem; session 1: commit; With the above, the function created in session 1 would be linked to a non existing schema. To avoid those scenarios, a new lock (that conflicts with a lock taken by DROP) has been put in place before the dependencies are being recorded. With this in place, the drop schema in scenario 2 would be locked. Also, after the new lock attempt, the patch checks that the object still exists: with this in place session 2 in scenario 1 would be locked and would report an error once session 1 committs (that would not be the case should session 1 abort the transaction). If the object is dropped before the new lock attempt is triggered then the patch would also report an error (but with less details). The patch takes into account any type of objects except: - the ones that
Re: Better error message when --single is not the first arg to postgres executable
On Tue, Jun 18, 2024 at 09:42:32PM -0400, Greg Sabino Mullane wrote: > If I am reading your patch correctly, we have lost the behavior of least > surprise in which the first "meta" argument overrides all others: > > $ bin/postgres --version --boot --extrastuff > postgres (PostgreSQL) 16.2 Right, with the patch we fail if there are multiple such options specified: $ postgres --version --help FATAL: multiple server modes set DETAIL: Only one of --check, --boot, --describe-config, --single, --help/-?, --version/-V, -C may be set. > What about just inlining --version and --help e.g. > > else if (strcmp(argv[i], "--version") == 0 || strcmp(argv[i], "-V") == 0) > { > fputs(PG_BACKEND_VERSIONSTR, stdout); > exit(0); > } > > I'm fine with being more persnickety about the other options; they are much > rarer and not unixy. That seems like it should work. I'm not sure I agree that's the least surprising behavior (e.g., what exactly is the user trying to tell us with commands like "postgres --version --help --describe-config"?), but I also don't feel too strongly about it. -- nathan
Re: Add pg_get_acl() function get the ACL for a database object
Em qua., 19 de jun. de 2024 às 10:28, Ranier Vilela escreveu: > Em qua., 19 de jun. de 2024 às 10:26, Joel Jacobson > escreveu: > >> Hi Ranier, >> >> Thanks for looking at this. >> >> I've double-checked the patch I sent, and it works fine. >> >> I think I know the cause of your problem: >> >> Since this is a catalog change, you need to run `make clean`, to ensure >> the catalog is rebuilt, >> followed by the usual `make && make install`. >> >> You also need to run `initdb` to create a new database cluster, with the >> new catalog version. >> >> Let me know if you need more specific instructions. >> > Sorry, sorry but I'm on Windows -> meson. > > Double checked with: > ninja clean > ninja > ninja install > Sorry for the noise, now pg_get_acl is shown in the regress test. Regarding the patch, could it be written in the following style? Datum pg_get_acl(PG_FUNCTION_ARGS) { Oid classId = PG_GETARG_OID(0); Oid objectId = PG_GETARG_OID(1); Oid catalogId; AttrNumber Anum_oid; AttrNumber Anum_acl; /* for "pinned" items in pg_depend, return null */ if (!OidIsValid(classId) && !OidIsValid(objectId)) PG_RETURN_NULL(); catalogId = (classId == LargeObjectRelationId) ? LargeObjectMetadataRelationId : classId; Anum_oid = get_object_attnum_oid(catalogId); Anum_acl = get_object_attnum_acl(catalogId); if (Anum_acl != InvalidAttrNumber) { Relation rel; HeapTuple tup; Datum datum; bool isnull; rel = table_open(catalogId, AccessShareLock); tup = get_catalog_object_by_oid(rel, Anum_oid, objectId); if (!HeapTupleIsValid(tup)) elog(ERROR, "cache lookup failed for object %u of catalog \"%s\"", objectId, RelationGetRelationName(rel)); datum = heap_getattr(tup, Anum_acl, RelationGetDescr(rel), &isnull); table_close(rel, AccessShareLock); if (!isnull) PG_RETURN_DATUM(datum); } PG_RETURN_NULL(); } best regards, Ranier Vilela
Re: Meson far from ready on Windows
Hi On Tue, 18 Jun 2024 at 17:08, Andres Freund wrote: > > None of the dependencies include cmake files for distribution on Windows, > > so there are no additional files to tell meson to search for. The same > > applies to pkgconfig files, which is why the EDB team had to manually > craft > > them. > > Many of them do include at least cmake files on windows if you build them > though? > The only one that does is libxml2 as far as I can see. And that doesn't seem to work even if I use --cmake-prefix-path= as you suggested: C:\Users\dpage\git\postgresql>meson setup --auto-features=disabled --wipe -Dlibxml=enabled --cmake-prefix-path=C:\build64\lib\cmake\libxml2-2.11.8 build-libxml The Meson build system Version: 1.4.0 Source dir: C:\Users\dpage\git\postgresql Build dir: C:\Users\dpage\git\postgresql\build-libxml Build type: native build Project name: postgresql Project version: 17beta1 C compiler for the host machine: cl (msvc 19.39.33523 "Microsoft (R) C/C++ Optimizing Compiler Version 19.39.33523 for x64") C linker for the host machine: link link 14.39.33523.0 Host machine cpu family: x86_64 Host machine cpu: x86_64 Run-time dependency threads found: YES Library ws2_32 found: YES Library secur32 found: YES Program perl found: YES (C:\Strawberry\perl\bin\perl.EXE) Program python found: YES (C:\Python312\python.EXE) Program win_flex found: YES 2.6.4 2.6.4 (C:\ProgramData\chocolatey\bin\win_flex.EXE) Program win_bison found: YES 3.7.4 3.7.4 (C:\ProgramData\chocolatey\bin\win_bison.EXE) Program sed found: YES (C:\ProgramData\chocolatey\bin\sed.EXE) Program prove found: YES (C:\Strawberry\perl\bin\prove.BAT) Program tar found: YES (C:\Windows\system32\tar.EXE) Program gzip found: YES (C:\ProgramData\chocolatey\bin\gzip.EXE) Program lz4 found: NO Program openssl found: YES (C:\build64\bin\openssl.EXE) Program zstd found: NO Program dtrace skipped: feature dtrace disabled Program config/missing found: YES (sh C:\Users\dpage\git\postgresql\config/missing) Program cp found: YES (C:\Program Files (x86)\GnuWin32\bin\cp.EXE) Program xmllint found: YES (C:\build64\bin\xmllint.EXE) Program xsltproc found: YES (C:\build64\bin\xsltproc.EXE) Program wget found: YES (C:\ProgramData\chocolatey\bin\wget.EXE) Program C:\Python312\Scripts\meson found: YES (C:\Python312\Scripts\meson.exe) Check usable header "bsd_auth.h" skipped: feature bsd_auth disabled Check usable header "dns_sd.h" skipped: feature bonjour disabled Compiler for language cpp skipped: feature llvm disabled Found pkg-config: YES (C:\ProgramData\chocolatey\bin\pkg-config.EXE) 0.28 Found CMake: C:\Program Files\Microsoft Visual Studio\2022\Community\Common7\IDE\CommonExtensions\Microsoft\CMake\CMake\bin\cmake.EXE (3.28.0) Run-time dependency libxml-2.0 found: NO (tried pkgconfig and cmake) meson.build:796:11: ERROR: Dependency "libxml-2.0" not found, tried pkgconfig and cmake A full log can be found at C:\Users\dpage\git\postgresql\build-libxml\meson-logs\meson-log.txt > > > Btw, I've been working with Bilal to add a few of the dependencies to the > CI > images so we can test those automatically. Using vcpkg. We got that nearly > working, but he's on vacation this week... That does ensure both cmake and > .pc files are generated, fwiw. > > Currently builds gettext, icu, libxml2, libxslt, lz4, openssl, pkgconf, > python3, tcl, zlib, zstd. That appears to be using Mingw/Msys, which is quite different from a VC++ build, in part because it's a full environment with its own package manager and packages that people have put a lot of effort into making work as they do on unix. > I'm *NOT* sure that vcpkg is the way to go, fwiw. It does seem > advantageous to > use one of the toolkits thats commonly built for building dependencies on > windows, which seems to mean vcpkg or conan. > I don't think requiring or expecting vcpkg or conan is reasonable at all, for a number of reasons: - Neither supports all the dependencies at present. - There are real supply chain verification concerns for vendors. - That would be a huge change from what we've required in the last 19 years, with no deprecation notices or warnings for packagers etc. > > And that's why we really need to be able to locate headers and libraries > > easily by passing paths to meson, as we can't rely on pkgconfig, cmake, > or > > things being in some standard directory on Windows. > > Except that that often causes hard to diagnose breakages, because that > doesn't > allow including the necessary compiler/linker flags [2]. It's a bad > model, we shouldn't > perpetuate it. If we want to forever make windows a complicated annoying > stepchild, that's the way to go. > That is a good point, though I suspect it wouldn't solve your second example of the Kerberos libraries, as you'll get both 32 and 64 bit libs if you follow their standard process for building on Windows so you still need to have code to pick the right ones. > > FWIW, at least libzstd, libxml [3], lz4, zlib can generate cmake dependen
Re: Proposal for Updating CRC32C with AVX-512 Algorithm.
On Tue, Jun 18, 2024 at 02:00:34PM -0400, Bruce Momjian wrote: > While the license we are concerned about does not have this clause, it > does have: > >2. Redistributions in binary form must reproduce the above > copyright notice, this list of conditions and the following > disclaimer in the documentation and/or other materials provided > with the distribution. > > I assume that must also include the name of the copyright holder. > > I think that means we need to mention The Regents of the University of > California in our copyright notice, which we do. However several > non-Regents of the University of California copyright holder licenses > exist in our source tree, and accepting this AVX-512 patch would add > another one. Specifically, I see existing entries for: > > Aaron D. Gifford > Board of Trustees of the University of Illinois > David Burren > Eric P. Allman > Jens Schweikhardt > Marko Kreen > Sun Microsystems, Inc. > WIDE Project > > Now, some of these are these names plus Berkeley, and some are just the > names above. In summary, either we are doing something wrong in how we list copyrights in our documentation, or we don't need to make any changes for this Intel patch. Our license is at: https://www.postgresql.org/about/licence/ The Intel copyright in the source code is: * Copyright 2017 The Chromium Authors * Copyright (c) 2024, Intel(r) Corporation * * Use of this source code is governed by a BSD-style license that can be * found in the Chromium source repository LICENSE file. * https://chromium.googlesource.com/chromium/src/+/refs/heads/main/LICENSE and the URL contents are: // Copyright 2015 The Chromium Authors // // Redistribution and use in source and binary forms, with or without // modification, are permitted provided that the following conditions are // met: // //* Redistributions of source code must retain the above copyright // notice, this list of conditions and the following disclaimer. //* Redistributions in binary form must reproduce the above // copyright notice, this list of conditions and the following disclaimer // in the documentation and/or other materials provided with the // distribution. //* Neither the name of Google LLC nor the names of its // contributors may be used to endorse or promote products derived from // this software without specific prior written permission. // // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT // OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT // LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Google LLC is added to clause three, and I assume Intel is also covered by this because it is considered "the names of its contributors", maybe? It would be good to know exactly what, if any, changes the Intel lawyers want us to make to our license if we accept this patch. There are also different versions of clause three in our source tree. The Postgres license only lists the University of California in our equivalent of clause three, meaning that there are three-clause BSD licenses in our source tree that reference entities that we don't reference in the Postgres license. Oddly, the Postgres license doesn't even disclaim warranties for the PostgreSQL Global Development Group, only for Berkeley. An even bigger issue is that we are distributing 3-clause BSD licensed software under the Postgres license, which is not the 3-clause BSD license. I think we were functioning under the assuption that the licenses are compatibile, so can be combined, which is true, but I don't think we can assume the individual licenses can be covered by our one license, can we? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: use CREATE DATABASE STRATEGY = FILE_COPY in pg_upgrade
On Wed, 19 Jun 2024 at 15:17, Robert Haas wrote: > > On Fri, Jun 14, 2024 at 5:29 PM Nathan Bossart > wrote: > > On Wed, Jun 12, 2024 at 09:41:01AM -0500, Nathan Bossart wrote: > > > Actually, I think you are right that we need a manual checkpoint, except I > > > think we need it to be after prepare_new_globals(). set_frozenxids() > > > connects to each database (including template0) and updates a bunch of > > > pg_class rows, and we probably want those on disk before we start copying > > > the files to create all the user's databases. > > > > Here is an updated patch. > > OK, I have a (probably) stupid question. The comment says: > > + * In binary upgrade mode, we can skip this checkpoint because neither of > + * these problems applies: we don't ever replay the WAL generated during > + * pg_upgrade, and we don't concurrently modify template0 (not to mention > + * that trying to take a backup during pg_upgrade is pointless). > > But what happens if the system crashes during pg_upgrade? Does this > patch make things worse than they are today? And should we care? As Nathan just said, AFAIK we don't have a way to resume progress from a crashed pg_upgrade, which implies you currently have to start over with a fresh dbinit. This patch wouldn't change that for the worse, it'd only add one more process that depends on that behaviour. Maybe in the future that'll change if pg_upgrade and pg_dump are made smart enough to resume their restore progress across forceful disconnects, but for now this patch seems like a nice performance improvement. Kind regards, Matthias van de Meent Neon (https://neon.tech)
Re: use CREATE DATABASE STRATEGY = FILE_COPY in pg_upgrade
On Wed, Jun 19, 2024 at 09:17:00AM -0400, Robert Haas wrote: > OK, I have a (probably) stupid question. The comment says: > > + * In binary upgrade mode, we can skip this checkpoint because neither of > + * these problems applies: we don't ever replay the WAL generated during > + * pg_upgrade, and we don't concurrently modify template0 (not to mention > + * that trying to take a backup during pg_upgrade is pointless). > > But what happens if the system crashes during pg_upgrade? Does this > patch make things worse than they are today? And should we care? My understanding is that you basically have to restart the upgrade from scratch if that happens. I suppose there could be a problem if you try to use the half-upgraded cluster after a crash, but I imagine you have a good chance of encountering other problems if you do that, too. So I don't think we care... -- nathan
Re: Avoid orphaned objects dependencies, take 3
Hi Robert, On Wed, Jun 19, 2024 at 5:50 PM Robert Haas wrote: > > On Wed, Jun 19, 2024 at 7:49 AM Ashutosh Sharma wrote: > > If the dependency is more, this can hit max_locks_per_transaction > > limit very fast. > > Your experiment doesn't support this conclusion. Very few users would > have 15 separate user-defined types in the same table, and even if > they did, and dropped the table, using 23 locks is no big deal. By > default, max_locks_per_transaction is 64, so the user would need to > have more like 45 separate user-defined types in the same table in > order to use more than 64 locks. So, yes, it is possible that if every > backend in the system were simultaneously trying to drop a table and > all of those tables had an average of at least 45 or so user-defined > types, all different from each other, you might run out of lock table > space. > > But probably nobody will ever do that in real life, and if they did, > they could just raise max_locks_per_transaction. > > When posting about potential problems like this, it is a good idea to > first do a careful thought experiment to assess how realistic the > problem is. I would consider an issue like this serious if there were > a realistic scenario under which a small number of backends could > exhaust the lock table for the whole system, but I think you can see > that this isn't the case here. Even your original scenario is more > extreme than what most people are likely to hit in real life, and it > only uses 23 locks. > I agree that based on the experiment I shared (which is somewhat unrealistic), this doesn't seem to have any significant implications. However, I was concerned that it could potentially increase the usage of max_locks_per_transaction, which is why I wanted to mention it here. Nonetheless, my experiment did not reveal any serious issues related to this. Sorry for the noise. -- With Regards, Ashutosh Sharma.
Re: ON ERROR in json_query and the like
On Mon, Jun 17, 2024 at 9:07 PM Markus Winand wrote: > > > I think it affects the following feature IDs: > > - T821, Basic SQL/JSON query operators > For JSON_VALUE, JSON_TABLE and JSON_EXISTS > - T828, JSON_QUERY > > Also, how hard would it be to add the functions that accept > character strings? Is there, besides the effort, any thing else > against it? I’m asking because I believe once released it might > never be changed — for backward compatibility. > we have ExecEvalCoerceViaIOSafe, so it's doable. I tried, but other things break. so it's not super easy i think. because of eval_const_expressions_mutator, postgres will constantly evaluate simple const expressions to simplify some expressions. `SELECT JSON_QUERY('a', '$');` postgres will try to do the cast coercion from text 'a' to jsonb. but it will fail, but it's hard to send the cast failed information to later code, in ExecInterpExpr. in ExecEvalCoerceViaIOSafe, if we cast coercion failed, then this function value is zero, isnull is set to true. `SELECT JSON_QUERY('a', '$');` will be equivalent to `SELECT JSON_QUERY(NULL, '$');` so making one of the next 2 examples to return jsonb 1 would be hard to implement. SELECT JSON_QUERY('a', '$' default '1' on empty); SELECT JSON_QUERY('a', '$' default '1' on error); -- If the standard says the context_item can be text string (cannot cast to json successfully). future version we make it happen, then it should be fine? because it's like the previous version we are not fully compliant with standard, now the new version is in full compliance with standard.
Re: Inconsistency between try_mergejoin_path and create_mergejoin_plan
On Wed, Jun 19, 2024 at 12:49 PM Tom Lane wrote: > Richard Guo writes: > > It seems to me that the new operator is somewhat artificial, since it is > > designed to support a mergejoin but lacks a valid commutator. So before > > we proceed to discuss the fix, I'd like to know whether this is a valid > > issue that needs fixing. > I do not think we should add a great deal of complexity or extra > planner cycles to deal with this; but if it can be fixed at low > cost, we should. I think we can simply verify the validity of commutators for clauses in the form "inner op outer" when selecting mergejoin/hash clauses. If a clause lacks a commutator, we should consider it unusable for the given pair of outer and inner rels. Please see the attached patch. Thanks Richard v2-0001-Check-the-validity-of-commutators-for-merge-hash-clauses.patch Description: Binary data
Re: Add pg_get_acl() function get the ACL for a database object
Em qua., 19 de jun. de 2024 às 10:26, Joel Jacobson escreveu: > Hi Ranier, > > Thanks for looking at this. > > I've double-checked the patch I sent, and it works fine. > > I think I know the cause of your problem: > > Since this is a catalog change, you need to run `make clean`, to ensure > the catalog is rebuilt, > followed by the usual `make && make install`. > > You also need to run `initdb` to create a new database cluster, with the > new catalog version. > > Let me know if you need more specific instructions. > Sorry, sorry but I'm on Windows -> meson. Double checked with: ninja clean ninja ninja install best regards, Ranier Vilela
Re: Add pg_get_acl() function get the ACL for a database object
Hi Ranier, Thanks for looking at this. I've double-checked the patch I sent, and it works fine. I think I know the cause of your problem: Since this is a catalog change, you need to run `make clean`, to ensure the catalog is rebuilt, followed by the usual `make && make install`. You also need to run `initdb` to create a new database cluster, with the new catalog version. Let me know if you need more specific instructions. Best, Joel On Wed, Jun 19, 2024, at 14:59, Ranier Vilela wrote: > Em qua., 19 de jun. de 2024 às 08:35, Joel Jacobson > escreveu: >> Hello hackers, >> >> Currently, obtaining the Access Control List (ACL) for a database object >> requires querying specific pg_catalog tables directly, where the user >> needs to know the name of the ACL column for the object. >> >> Consider: >> >> ``` >> CREATE USER test_user; >> CREATE USER test_owner; >> CREATE SCHEMA test_schema AUTHORIZATION test_owner; >> SET ROLE TO test_owner; >> CREATE TABLE test_schema.test_table (); >> GRANT SELECT ON TABLE test_schema.test_table TO test_user; >> ``` >> >> To get the ACL we can do: >> >> ``` >> SELECT relacl FROM pg_class WHERE oid = >> 'test_schema.test_table'::regclass::oid; >> >> relacl >> - >> {test_owner=arwdDxtm/test_owner,test_user=r/test_owner} >> ``` >> >> Attached patch adds a new SQL-callable functoin `pg_get_acl()`, so we can do: >> >> ``` >> SELECT pg_get_acl('pg_class'::regclass, >> 'test_schema.test_table'::regclass::oid); >>pg_get_acl >> - >> {test_owner=arwdDxtm/test_owner,test_user=r/test_owner} >> ``` >> >> The original idea for this function came from Alvaro Herrera, >> in this related discussion: >> https://postgr.es/m/261def6b-226e-4238-b7eb-ff240cb9c...@www.fastmail.com >> >> On Thu, Mar 25, 2021, at 16:16, Alvaro Herrera wrote: >> > On 2021-Mar-25, Joel Jacobson wrote: >> > >> >> pg_shdepend doesn't contain the aclitem info though, >> >> so it won't work for pg_permissions if we want to expose >> >> privilege_type, is_grantable and grantor. >> > >> > Ah, of course -- the only way to obtain the acl columns is by going >> > through the catalogs individually, so it won't be possible. I think >> > this could be fixed with some very simple, quick function pg_get_acl() >> > that takes a catalog OID and object OID and returns the ACL; then >> > use aclexplode() to obtain all those details. >> >> The pg_get_acl() function has been implemented by following >> the guidance from Alvaro in the related dicussion: >> >> On Fri, Mar 26, 2021, at 13:43, Alvaro Herrera wrote: >> > AFAICS the way to do it is like AlterObjectOwner_internal obtains data >> > -- first do get_catalog_object_by_oid (gives you the HeapTuple that >> > represents the object), then >> > heap_getattr( ..., get_object_attnum_acl(), ..), and there you have the >> > ACL which you can "explode" (or maybe just return as-is). >> > >> > AFAICS if you do this, it's just one cache lookups per object, or >> > one indexscan for the cases with no by-OID syscache. It should be much >> > cheaper than the UNION ALL query. And you use pg_shdepend to guide >> > this, so you only do it for the objects that you already know are >> > interesting. >> >> Many thanks Alvaro for the very helpful instructions. >> >> This function would then allow users to e.g. create a view to show the >> privileges >> for all database objects, like the pg_privileges system view suggested in the >> related discussion. >> >> Tests and docs are added. > Hi, > For some reason, the function pg_get_acl, does not exist in generated > fmgrtab.c > > So, when install postgres, the function does not work. > > postgres=# SELECT pg_get_acl('pg_class'::regclass, > 'atest2'::regclass::oid); > ERROR: function pg_get_acl(regclass, oid) does not exist > LINE 1: SELECT pg_get_acl('pg_class'::regclass, 'atest2'::regclass::... >^ > HINT: No function matches the given name and argument types. You might > need to add explicit type casts. > > best regards, > Ranier Vilela -- Kind regards, Joel
Re: use CREATE DATABASE STRATEGY = FILE_COPY in pg_upgrade
On Fri, Jun 14, 2024 at 5:29 PM Nathan Bossart wrote: > On Wed, Jun 12, 2024 at 09:41:01AM -0500, Nathan Bossart wrote: > > Actually, I think you are right that we need a manual checkpoint, except I > > think we need it to be after prepare_new_globals(). set_frozenxids() > > connects to each database (including template0) and updates a bunch of > > pg_class rows, and we probably want those on disk before we start copying > > the files to create all the user's databases. > > Here is an updated patch. OK, I have a (probably) stupid question. The comment says: + * In binary upgrade mode, we can skip this checkpoint because neither of + * these problems applies: we don't ever replay the WAL generated during + * pg_upgrade, and we don't concurrently modify template0 (not to mention + * that trying to take a backup during pg_upgrade is pointless). But what happens if the system crashes during pg_upgrade? Does this patch make things worse than they are today? And should we care? -- Robert Haas EDB: http://www.enterprisedb.com
Re: use CREATE DATABASE STRATEGY = FILE_COPY in pg_upgrade
On Fri, 14 Jun 2024 at 23:29, Nathan Bossart wrote: > > On Wed, Jun 12, 2024 at 09:41:01AM -0500, Nathan Bossart wrote: > > Actually, I think you are right that we need a manual checkpoint, except I > > think we need it to be after prepare_new_globals(). set_frozenxids() > > connects to each database (including template0) and updates a bunch of > > pg_class rows, and we probably want those on disk before we start copying > > the files to create all the user's databases. Good catch, I hadn't thought of that. > Here is an updated patch. LGTM. Kind regards, Matthias van de Meent Neon (https://neon.tech)
Re: Add pg_get_acl() function get the ACL for a database object
Em qua., 19 de jun. de 2024 às 08:35, Joel Jacobson escreveu: > Hello hackers, > > Currently, obtaining the Access Control List (ACL) for a database object > requires querying specific pg_catalog tables directly, where the user > needs to know the name of the ACL column for the object. > > Consider: > > ``` > CREATE USER test_user; > CREATE USER test_owner; > CREATE SCHEMA test_schema AUTHORIZATION test_owner; > SET ROLE TO test_owner; > CREATE TABLE test_schema.test_table (); > GRANT SELECT ON TABLE test_schema.test_table TO test_user; > ``` > > To get the ACL we can do: > > ``` > SELECT relacl FROM pg_class WHERE oid = > 'test_schema.test_table'::regclass::oid; > > relacl > - > {test_owner=arwdDxtm/test_owner,test_user=r/test_owner} > ``` > > Attached patch adds a new SQL-callable functoin `pg_get_acl()`, so we can > do: > > ``` > SELECT pg_get_acl('pg_class'::regclass, > 'test_schema.test_table'::regclass::oid); >pg_get_acl > - > {test_owner=arwdDxtm/test_owner,test_user=r/test_owner} > ``` > > The original idea for this function came from Alvaro Herrera, > in this related discussion: > https://postgr.es/m/261def6b-226e-4238-b7eb-ff240cb9c...@www.fastmail.com > > On Thu, Mar 25, 2021, at 16:16, Alvaro Herrera wrote: > > On 2021-Mar-25, Joel Jacobson wrote: > > > >> pg_shdepend doesn't contain the aclitem info though, > >> so it won't work for pg_permissions if we want to expose > >> privilege_type, is_grantable and grantor. > > > > Ah, of course -- the only way to obtain the acl columns is by going > > through the catalogs individually, so it won't be possible. I think > > this could be fixed with some very simple, quick function pg_get_acl() > > that takes a catalog OID and object OID and returns the ACL; then > > use aclexplode() to obtain all those details. > > The pg_get_acl() function has been implemented by following > the guidance from Alvaro in the related dicussion: > > On Fri, Mar 26, 2021, at 13:43, Alvaro Herrera wrote: > > AFAICS the way to do it is like AlterObjectOwner_internal obtains data > > -- first do get_catalog_object_by_oid (gives you the HeapTuple that > > represents the object), then > > heap_getattr( ..., get_object_attnum_acl(), ..), and there you have the > > ACL which you can "explode" (or maybe just return as-is). > > > > AFAICS if you do this, it's just one cache lookups per object, or > > one indexscan for the cases with no by-OID syscache. It should be much > > cheaper than the UNION ALL query. And you use pg_shdepend to guide > > this, so you only do it for the objects that you already know are > > interesting. > > Many thanks Alvaro for the very helpful instructions. > > This function would then allow users to e.g. create a view to show the > privileges > for all database objects, like the pg_privileges system view suggested in > the > related discussion. > > Tests and docs are added. > Hi, For some reason, the function pg_get_acl, does not exist in generated fmgrtab.c So, when install postgres, the function does not work. postgres=# SELECT pg_get_acl('pg_class'::regclass, 'atest2'::regclass::oid); ERROR: function pg_get_acl(regclass, oid) does not exist LINE 1: SELECT pg_get_acl('pg_class'::regclass, 'atest2'::regclass::... ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. best regards, Ranier Vilela
Re: Avoid orphaned objects dependencies, take 3
On Wed, Jun 19, 2024 at 7:49 AM Ashutosh Sharma wrote: > If the dependency is more, this can hit max_locks_per_transaction > limit very fast. Your experiment doesn't support this conclusion. Very few users would have 15 separate user-defined types in the same table, and even if they did, and dropped the table, using 23 locks is no big deal. By default, max_locks_per_transaction is 64, so the user would need to have more like 45 separate user-defined types in the same table in order to use more than 64 locks. So, yes, it is possible that if every backend in the system were simultaneously trying to drop a table and all of those tables had an average of at least 45 or so user-defined types, all different from each other, you might run out of lock table space. But probably nobody will ever do that in real life, and if they did, they could just raise max_locks_per_transaction. When posting about potential problems like this, it is a good idea to first do a careful thought experiment to assess how realistic the problem is. I would consider an issue like this serious if there were a realistic scenario under which a small number of backends could exhaust the lock table for the whole system, but I think you can see that this isn't the case here. Even your original scenario is more extreme than what most people are likely to hit in real life, and it only uses 23 locks. -- Robert Haas EDB: http://www.enterprisedb.com
Re: State of pg_createsubscriber
On Tue, Jun 18, 2024 at 11:55 PM Amit Kapila wrote: > We can close the open item pointing to this thread. Done, and for the record I also asked for the thread to be split, back on May 22. IMHO, we shouldn't add open items pointing to general complaints like the one that started this thread. Open items need to be specific and actionable. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Pgoutput not capturing the generated columns
On Mon, Jun 17, 2024 at 1:57 PM Peter Smith wrote: > > Hi, here are my review comments for patch v7-0001. > > == > 1. GENERAL - \dRs+ > > Shouldn't the new SUBSCRIPTION parameter be exposed via "describe" > (e.g. \dRs+ mysub) the same as the other boolean parameters? > > == > Commit message > > 2. > When 'include_generated_columns' is false then the PUBLICATION > col-list will ignore any generated cols even when they are present in > a PUBLICATION col-list > > ~ > > Maybe you don't need to mention "PUBLICATION col-list" twice. > > SUGGESTION > When 'include_generated_columns' is false, generated columns are not > replicated, even when present in a PUBLICATION col-list. > > ~~~ > > 2. > CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port= > 'publication pub1; > > ~ > > 2a. > (I've questioned this one in previous reviews) > > What exactly is the purpose of this statement in the commit message? > Was this supposed to demonstrate the usage of the > 'include_generated_columns' parameter? > > ~ > > 2b. > /publication/ PUBLICATION/ > > > ~~~ > > 3. > If the subscriber-side column is also a generated column then > thisoption has no effect; the replicated data will be ignored and the > subscriber column will be filled as normal with the subscriber-side > computed or default data. > > ~ > > Missing space: /thisoption/this option/ > > == > .../expected/decoding_into_rel.out > > 4. > +-- When 'include-generated-columns' is not set > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); > +data > +- > + BEGIN > + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2 > + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4 > + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6 > + COMMIT > +(5 rows) > > Why is the default value here equivalent to > 'include-generated-columns' = '1' here instead of '0'? The default for > the CREATE SUBSCRIPTION parameter 'include_generated_columns' is > false, and IMO it seems confusing for these 2 defaults to be > different. Here I think it should default to '0' *regardless* of what > the previous functionality might have done -- e.g. this is a "test > decoder" so the parameter should behave sensibly. > > == > .../test_decoding/sql/decoding_into_rel.sql > > NITPICK - wrong comments. > > == > doc/src/sgml/protocol.sgml > > 5. > + > + include_generated_columns > + > + > +Boolean option to enable generated columns. This option controls > +whether generated columns should be included in the string > +representation of tuples during logical decoding in PostgreSQL. > +The default is false. > + > + > + > + > > Does the protocol version need to be bumped to support this new option > and should that be mentioned on this page similar to how all other > version values are mentioned? I already did the Backward Compatibility test earlier and decided that protocol bump is not needed. > doc/src/sgml/ref/create_subscription.sgml > > NITPICK - some missing words/sentence. > NITPICK - some missing tags. > NITPICK - remove duplicated sentence. > NITPICK - add another . > > == > src/backend/commands/subscriptioncmds.c > > 6. > #define SUBOPT_ORIGIN 0x8000 > +#define SUBOPT_include_generated_columns 0x0001 > > Please use UPPERCASE for consistency with other macros. > > == > .../libpqwalreceiver/libpqwalreceiver.c > > 7. > + if (options->proto.logical.include_generated_columns && > + PQserverVersion(conn->streamConn) >= 17) > + appendStringInfoString(&cmd, ", include_generated_columns 'on'"); > + > > IMO it makes more sense to say 'true' here instead of 'on'. It seems > like this was just cut/paste from the above code (where 'on' was > sensible). > > == > src/include/catalog/pg_subscription.h > > 8. > @@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) > BKI_SHARED_RELATION BKI_ROW > * slots) in the upstream database are enabled > * to be synchronized to the standbys. */ > > + bool subincludegencol; /* True if generated columns must be published */ > + > > Not fixed as claimed. This field name ought to be plural. > > /subincludegencol/subincludegencols/ > > ~~~ > > 9. > char*origin; /* Only publish data originating from the > * specified origin */ > + bool includegencol; /* publish generated column data */ > } Subscription; > > Not fixed as claimed. This field name ought to be plural. > > /includegencol/includegencols/ > > == > src/test/subscription/t/031_column_list.pl > > 10. > +$node_publisher->safe_psql('postgres', > + "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a > * 2) STORED)" > +); > + > +$node_publisher->safe_psql('postgres', > + "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a > + 10) STORED)" > +); >
Re: Avoid orphaned objects dependencies, take 3
Hi, If the dependency is more, this can hit max_locks_per_transaction limit very fast. Won't it? I just tried this little experiment with and without patch. 1) created some UDTs (I have just chosen some random number, 15) do $$ declare i int := 1; type_name text; begin while i <= 15 loop type_name := format('ct_%s', i); -- check if the type already exists if not exists ( select 1 from pg_type where typname = type_name ) then execute format('create type %I as (f1 INT, f2 TEXT);', type_name); end if; i := i + 1; end loop; end $$; 2) started a transaction and tried creating a table that uses all udts created above: begin; create table dep_tab(a ct_1, b ct_2, c ct_3, d ct_4, e ct_5, f ct_6, g ct_7, h ct_8, i ct_9, j ct_10, k ct_11, l ct_12, m ct_13, n ct_14, o ct_15); 3) checked the pg_locks entries inside the transaction both with and without patch: -- with patch: select count(*) from pg_locks; count --- 23 (1 row) -- without patch: select count(*) from pg_locks; count --- 7 (1 row) With patch, it increased by 3 times. Won't that create a problem if many concurrent sessions engage in similar activity? -- With Regards, Ashutosh Sharma.
Re: Pgoutput not capturing the generated columns
> Few comments: > 1) Here tab1 and tab2 are exactly the same tables, just check if the > table tab1 itself can be used for your tests. > @@ -24,20 +24,50 @@ $node_publisher->safe_psql('postgres', > "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS > AS (a * 2) STORED)" > ); > +$node_publisher->safe_psql('postgres', > + "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS > AS (a * 2) STORED)" > +); On the subscription side the tables have different descriptions, so we need to have different tables on the publisher side. > 2) We can document that the include_generate_columns option cannot be > altered. > > 3) You can mention that include-generated-columns is true by default > and generated column data will be selected > +-- When 'include-generated-columns' is not set > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); > +data > +- > + BEGIN > + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2 > + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4 > + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6 > + COMMIT > +(5 rows) > > 4) The comment seems to be wrong here, the comment says b will not be > replicated but b is being selected: > -- When 'include-generated-columns' = '1' the generated column 'b' > values will not be replicated > INSERT INTO gencoltable (a) VALUES (1), (2), (3); > SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > 'include-generated-columns', '1'); > data > - > BEGIN > table public.gencoltable: INSERT: a[integer]:1 b[integer]:2 > table public.gencoltable: INSERT: a[integer]:2 b[integer]:4 > table public.gencoltable: INSERT: a[integer]:3 b[integer]:6 > COMMIT > (5 rows) > > 5) Similarly here too the comment seems to be wrong, the comment says > b will not replicated but b is not being selected: > INSERT INTO gencoltable (a) VALUES (4), (5), (6); > -- When 'include-generated-columns' = '0' the generated column 'b' > values will be replicated > SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > 'include-generated-columns', '0'); > data > > BEGIN > table public.gencoltable: INSERT: a[integer]:4 > table public.gencoltable: INSERT: a[integer]:5 > table public.gencoltable: INSERT: a[integer]:6 > COMMIT > (5 rows) > > 6) SUBOPT_include_generated_columns change it to SUBOPT_GENERATED to > keep the name consistent: > --- a/src/backend/commands/subscriptioncmds.c > +++ b/src/backend/commands/subscriptioncmds.c > @@ -72,6 +72,7 @@ > #define SUBOPT_FAILOVER0x2000 > #define SUBOPT_LSN 0x4000 > #define SUBOPT_ORIGIN 0x8000 > +#define SUBOPT_include_generated_columns 0x0001 > > 7) The comment style seems to be inconsistent, both of them can start > in lower case > +-- check include-generated-columns option with generated column > +CREATE TABLE gencoltable (a int, b int GENERATED ALWAYS AS (a * 2) STORED); > +INSERT INTO gencoltable (a) VALUES (1), (2), (3); > +-- When 'include-generated-columns' is not set > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); > +data > +- > + BEGIN > + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2 > + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4 > + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6 > + COMMIT > +(5 rows) > + > +-- When 'include-generated-columns' = '1' the generated column 'b' > values will not be replicated > > 8) This could be changed to remove the insert statements by using > pg_logical_slot_peek_changes: > -- When 'include-generated-columns' is not set > SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); > -- When 'include-generated-columns' = '1' the generated column 'b' > values will not be replicated > INSERT INTO gencoltable (a) VALUES (1), (2), (3); > SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > 'include-generated-columns', '1'); > INSERT INTO gencoltable (a) VALUES (4), (5), (6); > -- When 'include-generated-columns' = '0' the generated column 'b' > values will be replicated > SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > 'include-generated-columns
Add pg_get_acl() function get the ACL for a database object
Hello hackers, Currently, obtaining the Access Control List (ACL) for a database object requires querying specific pg_catalog tables directly, where the user needs to know the name of the ACL column for the object. Consider: ``` CREATE USER test_user; CREATE USER test_owner; CREATE SCHEMA test_schema AUTHORIZATION test_owner; SET ROLE TO test_owner; CREATE TABLE test_schema.test_table (); GRANT SELECT ON TABLE test_schema.test_table TO test_user; ``` To get the ACL we can do: ``` SELECT relacl FROM pg_class WHERE oid = 'test_schema.test_table'::regclass::oid; relacl - {test_owner=arwdDxtm/test_owner,test_user=r/test_owner} ``` Attached patch adds a new SQL-callable functoin `pg_get_acl()`, so we can do: ``` SELECT pg_get_acl('pg_class'::regclass, 'test_schema.test_table'::regclass::oid); pg_get_acl - {test_owner=arwdDxtm/test_owner,test_user=r/test_owner} ``` The original idea for this function came from Alvaro Herrera, in this related discussion: https://postgr.es/m/261def6b-226e-4238-b7eb-ff240cb9c...@www.fastmail.com On Thu, Mar 25, 2021, at 16:16, Alvaro Herrera wrote: > On 2021-Mar-25, Joel Jacobson wrote: > >> pg_shdepend doesn't contain the aclitem info though, >> so it won't work for pg_permissions if we want to expose >> privilege_type, is_grantable and grantor. > > Ah, of course -- the only way to obtain the acl columns is by going > through the catalogs individually, so it won't be possible. I think > this could be fixed with some very simple, quick function pg_get_acl() > that takes a catalog OID and object OID and returns the ACL; then > use aclexplode() to obtain all those details. The pg_get_acl() function has been implemented by following the guidance from Alvaro in the related dicussion: On Fri, Mar 26, 2021, at 13:43, Alvaro Herrera wrote: > AFAICS the way to do it is like AlterObjectOwner_internal obtains data > -- first do get_catalog_object_by_oid (gives you the HeapTuple that > represents the object), then > heap_getattr( ..., get_object_attnum_acl(), ..), and there you have the > ACL which you can "explode" (or maybe just return as-is). > > AFAICS if you do this, it's just one cache lookups per object, or > one indexscan for the cases with no by-OID syscache. It should be much > cheaper than the UNION ALL query. And you use pg_shdepend to guide > this, so you only do it for the objects that you already know are > interesting. Many thanks Alvaro for the very helpful instructions. This function would then allow users to e.g. create a view to show the privileges for all database objects, like the pg_privileges system view suggested in the related discussion. Tests and docs are added. Best regards, Joel Jakobsson 0001-Add-pg_get_acl.patch Description: Binary data
Re: Pgoutput not capturing the generated columns
> Hi Shubham, thanks for providing a patch. > I have some comments for v6-0001. > > 1. create_subscription.sgml > There is repetition of the same line. > > + > + Specifies whether the generated columns present in the tables > + associated with the subscription should be replicated. If the > + subscriber-side column is also a generated column then this option > + has no effect; the replicated data will be ignored and the > subscriber > + column will be filled as normal with the subscriber-side computed > or > + default data. > + false. > + > + > + > + This parameter can only be set true if > copy_data is > + set to false. If the subscriber-side > column is also a > + generated column then this option has no effect; the > replicated data will > + be ignored and the subscriber column will be filled as > normal with the > + subscriber-side computed or default data. > + > > == > 2. subscriptioncmds.c > > 2a. The macro name should be in uppercase. We can use a short name > like 'SUBOPT_INCLUDE_GEN_COL'. Thought? > +#define SUBOPT_include_generated_columns 0x0001 > > 2b.Update macro name accordingly > + if (IsSet(supported_opts, SUBOPT_include_generated_columns)) > + opts->include_generated_columns = false; > > 2c. Update macro name accordingly > + else if (IsSet(supported_opts, SUBOPT_include_generated_columns) && > + strcmp(defel->defname, "include_generated_columns") == 0) > + { > + if (IsSet(opts->specified_opts, SUBOPT_include_generated_columns)) > + errorConflictingDefElem(defel, pstate); > + > + opts->specified_opts |= SUBOPT_include_generated_columns; > + opts->include_generated_columns = defGetBoolean(defel); > + } > > 2d. Update macro name accordingly > + SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | SUBOPT_ORIGIN | > + SUBOPT_include_generated_columns); > > > == > > 3. decoding_into_rel.out > > 3a. In comment, I think it should be "When 'include-generated-columns' > = '1' the generated column 'b' values will be replicated" > +-- When 'include-generated-columns' = '1' the generated column 'b' > values will not be replicated > +INSERT INTO gencoltable (a) VALUES (1), (2), (3); > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > 'include-generated-columns', '1'); > +data > +- > + BEGIN > + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2 > + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4 > + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6 > + COMMIT > > 3b. In comment, I think it should be "When 'include-generated-columns' > = '1' the generated column 'b' values will not be replicated" > +-- When 'include-generated-columns' = '0' the generated column 'b' > values will be replicated > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > 'include-generated-columns', '0'); > + data > + > + BEGIN > + table public.gencoltable: INSERT: a[integer]:4 > + table public.gencoltable: INSERT: a[integer]:5 > + table public.gencoltable: INSERT: a[integer]:6 > + COMMIT > +(5 rows) > > = > > 4. Here names for both the tests are the same. I think we should use > different names. > > +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2"); > +is( $result, qq(4|8 > +5|10), 'generated columns replicated to non-generated column on subscriber'); > + > +$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)"); > + > +$node_publisher->wait_for_catchup('sub3'); > + > +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3"); > +is( $result, qq(4|24 > +5|25), 'generated columns replicated to non-generated column on subscriber'); All the comments are handled. The attached Patch contains all the suggested changes. Thanks and Regards, Shubham Khanna. v8-0001-Currently-generated-column-values-are-not-replica.patch Description: Binary data
Re: What is a typical precision of gettimeofday()?
Here is the output of nanosecond-precision pg_test_timing on M1 Macbook Air /work/postgres/src/bin/pg_test_timing % ./pg_test_timing Testing timing overhead for 3 seconds. Per loop time including overhead: 21.54 ns Histogram of timing durations: <= ns % of total running % count 0 49.165549.1655 68481688 1 0.49.1655 0 3 0.49.1655 0 7 0.49.1655 0 15 0.49.1655 0 31 0.49.1655 0 63 50.689099.8545 70603742 127 0.143299.9976 199411 255 0.001599.9991 2065 511 0.000199.9992 98 1023 0.000199.9993140 2047 0.000299.9995284 4095 0.99.9995 50 8191 0.99.9996 65 16383 0.000299.9997240 32767 0.000199.9998128 65535 0.000199. 97 131071 0.99. 58 262143 0. 100. 44 524287 0. 100. 22 1048575 0. 100. 7 2097151 0. 100. 2 First 128 exact nanoseconds: 0 49.165549.1655 68481688 41 16.896466.0619 23534708 42 33.792699.8545 47069034 83 0.083599.9380 116362 84 0.041999.9799 58349 125 0.017799.9976 24700 As you see the 40 ns internal tick gets somehow blurred into not-quite-40-ns timing step On Linux / ARM Ampere where __builtin_readcyclecounter() works (it compiles but crashes on Mac OS M1, I have not yet tested on Linux M1) the tick is exactly 40 ns and I'd expect it to be the same on M1. On Tue, Jun 18, 2024 at 5:08 PM Hannu Krosing wrote: > > I plan to send patch to pg_test_timing in a day or two > > the underlying time precision on modern linux seems to be > > 2 ns for some Intel CPUs > 10 ns for Zen4 > 40 ns for ARM (Ampere) > > --- > Hannu > > > > | > > > > > On Tue, Jun 18, 2024 at 7:48 AM Andrey M. Borodin > wrote: >> >> >> >> > On 19 Mar 2024, at 13:28, Peter Eisentraut wrote: >> > >> > I feel that we don't actually have any information about this portability >> > concern. Does anyone know what precision we can expect from >> > gettimeofday()? Can we expect the full microsecond precision usually? >> >> At PGConf.dev Hannu Krossing draw attention to pg_test_timing module. I’ve >> tried this module(slightly modified to measure nanoseconds) on some systems, >> and everywhere I found ~100ns resolution (95% of ticks fall into 64ns and >> 128ns buckets). >> >> I’ll add cc Hannu, and also pg_test_timing module authors Ants ang Greg. >> Maybe they can add some context. >> >> >> Best regards, Andrey Borodin.
Re: Proposal: Document ABI Compatibility
On 18.06.24 00:40, David E. Wheeler wrote: On Jun 12, 2024, at 11:57, Peter Geoghegan wrote: That having been said, it would be useful if there was a community web resource for this -- something akin to coverage.postgresql.org, but with differential ABI breakage reports. You can see an example report here: https://postgr.es/m/CAH2-Wzm-W6hSn71sUkz0Rem=qdeu7tnfmc7_jg2djrlfef_...@mail.gmail.com Theoretically anybody can do this themselves. In practice they don't. So something as simple as providing automated reports about ABI changes might well move the needle here. What would be required to make such a thing? Maybe it’d make a good Summer of Code project. The above thread contains a lengthy discussion about what one could do.
Re: Proposal: Document ABI Compatibility
On 18.06.24 00:37, David E. Wheeler wrote: ABI Policy == The PostgreSQL core team maintains two application binary interface (ABI) guarantees: one for major releases and one for minor releases. Major Releases -- Applications that use the PostgreSQL APIs This is probably a bit confusing. This might as well mean client application code against libpq. Better something like "server plugin code that uses the PostgreSQL server APIs". must be compiled for each major release supported by the application. The inclusion of `PG_MODULE_MAGIC` ensures that code compiled for one major version will rejected by other major versions. ok so far Furthermore, new releases may make API changes that require code changes. Use the `PG_VERSION_NUM` constant to adjust code in a backwards compatible way: ``` c #if PG_VERSION_NUM >= 16 #include "varatt.h" #endif ``` But now we're talking about API. That might be subject of another document or another section in this one, but it seems confusing to mix this with the ABI discussion. PostgreSQL avoids unnecessary API changes in major releases, but usually ships a few necessary API changes, including deprecation, renaming, and argument variation. Obviously, as a practical matter, there won't be random pointless changes. But I wouldn't go as far as writing anything down about how these APIs are developed. In such cases the incompatible changes will be listed in the Release Notes. I don't think anyone is signing up to do that. Minor Releases -- PostgreSQL makes an effort to avoid both API and ABI breaks in minor releases. In general, an application compiled against any minor release will work with any other minor release, past or future. When a change *is* required, PostgreSQL will choose the least invasive way possible, for example by squeezing a new field into padding space or appending it to the end of a struct. This sort of change should not impact dependent applications unless, they use `sizeof(the struct)` on a struct with an appended field, or create their own instances of such structs --- patterns best avoided. In rare cases, however, even such non-invasive changes may be impractical or impossible. In such an event, the change will be documented in the Release Notes, and details on the issue will also be posted to [TBD; mail list? Blog post? News item?]. I think one major problem besides actively avoiding or managing such minor-version ABI breaks is some automated detection. Otherwise, this just means "we try, but who knows".
Re: Conflict Detection and Resolution
On Wed, Jun 19, 2024 at 2:36 PM shveta malik wrote: > > On Wed, Jun 19, 2024 at 1:52 PM Dilip Kumar wrote: > > > > On Tue, Jun 18, 2024 at 3:29 PM shveta malik wrote: > > > On Tue, Jun 18, 2024 at 11:34 AM Dilip Kumar > > > wrote: > > > > > > I tried to work out a few scenarios with this, where the apply worker > > > will wait until its local clock hits 'remote_commit_tts - max_skew > > > permitted'. Please have a look. > > > > > > Let's say, we have a GUC to configure max_clock_skew permitted. > > > Resolver is last_update_wins in both cases. > > > > > > 1) Case 1: max_clock_skew set to 0 i.e. no tolerance for clock skew. > > > > > > Remote Update with commit_timestamp = 10.20AM. > > > Local clock (which is say 5 min behind) shows = 10.15AM. > > > > > > When remote update arrives at local node, we see that skew is greater > > > than max_clock_skew and thus apply worker waits till local clock hits > > > 'remote's commit_tts - max_clock_skew' i.e. till 10.20 AM. Once the > > > local clock hits 10.20 AM, the worker applies the remote change with > > > commit_tts of 10.20AM. In the meantime (during wait period of apply > > > worker)) if some local update on same row has happened at say 10.18am, > > > that will applied first, which will be later overwritten by above > > > remote change of 10.20AM as remote-change's timestamp appear more > > > latest, even though it has happened earlier than local change. Oops lot of mistakes in the usage of change-1 and change-2, sorry about that. > > For the sake of simplicity let's call the change that happened at > > 10:20 AM change-1 and the change that happened at 10:15 as change-2 > > and assume we are talking about the synchronous commit only. > > Do you mean "the change that happened at 10:18 as change-2" Right > > > > I think now from an application perspective the change-1 wouldn't have > > caused the change-2 because we delayed applying change-2 on the local > > node > > Do you mean "we delayed applying change-1 on the local node." Right > >which would have delayed the confirmation of the change-1 to the > > application that means we have got the change-2 on the local node > > without the confirmation of change-1 hence change-2 has no causal > > dependency on the change-1. So it's fine that we perform change-1 > > before change-2 > > Do you mean "So it's fine that we perform change-2 before change-1" Right > >and the timestamp will also show the same at any other > > node if they receive these 2 changes. > > > > The goal is to ensure that if we define the order where change-2 > > happens before change-1, this same order should be visible on all > > other nodes. This will hold true because the commit timestamp of > > change-2 is earlier than that of change-1. > > Considering the above corrections as base, I agree with this. +1 > > > 2) Case 2: max_clock_skew is set to 2min. > > > > > > Remote Update with commit_timestamp=10.20AM > > > Local clock (which is say 5 min behind) = 10.15AM. > > > > > > Now apply worker will notice skew greater than 2min and thus will wait > > > till local clock hits 'remote's commit_tts - max_clock_skew' i.e. > > > 10.18 and will apply the change with commit_tts of 10.20 ( as we > > > always save the origin's commit timestamp into local commit_tts, see > > > RecordTransactionCommit->TransactionTreeSetCommitTsData). Now lets say > > > another local update is triggered at 10.19am, it will be applied > > > locally but it will be ignored on remote node. On the remote node , > > > the existing change with a timestamp of 10.20 am will win resulting in > > > data divergence. > > > > Let's call the 10:20 AM change as a change-1 and the change that > > happened at 10:19 as change-2 > > > > IIUC, although we apply the change-1 at 10:18 AM the commit_ts of that > > commit_ts of that change is 10:20, and the same will be visible to all > > other nodes. So in conflict resolution still the change-1 happened > > after the change-2 because change-2's commit_ts is 10:19 AM. Now > > there could be a problem with the causal order because we applied the > > change-1 at 10:18 AM so the application might have gotten confirmation > > at 10:18 AM and the change-2 of the local node may be triggered as a > > result of confirmation of the change-1 that means now change-2 has a > > causal dependency on the change-1 but commit_ts shows change-2 > > happened before the change-1 on all the nodes. > > > > So, is this acceptable? I think yes because the user has configured a > > maximum clock skew of 2 minutes, which means the detected order might > > not always align with the causal order for transactions occurring > > within that time frame. > > Agree. I had the same thoughts, and wanted to confirm my understanding. Okay -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Conflict Detection and Resolution
On Wed, Jun 19, 2024 at 12:03 PM Amit Kapila wrote: > > I doubt that it would be that simple. The application will have to > intervene and tell one of the employees that their reservation has failed. > It looks natural that the first one to reserve the room should get the > reservation, but implementing that is more complex than resolving a > conflict in the database. In fact, mostly it will be handled outside > database. > > > > Sure, the application needs some handling but I have tried to explain > with a simple way that comes to my mind and how it can be realized > with db involved. This is a known conflict detection method but note > that I am not insisting to have "earliest_timestamp_wins". Even, if we > want this we can have a separate discussion on this and add it later. > > It will be good to add a minimal set of conflict resolution strategies to begin with, while designing the feature for extensibility. I imagine the first version might just detect the conflict and throw error or do nothing. That's already two simple conflict resolution strategies with minimal efforts. We can add more complicated ones incrementally. > > > > The inconsistency will arise irrespective of conflict resolution method. > On a single system effects of whichever transaction runs last will be > visible entirely. But in the example above the node where T1, T2, and T3 > (from *different*) origins) are applied, we might end up with a situation > where some changes from T1 are applied whereas some changes from T3 are > applied. > > > > I still think it will lead to the same result if all three T1, T2, T3 > happen on the same node in the same order as you mentioned. Say, we > have a pre-existing table with rows r1, r2, r3, r4. Now, if we use the > order of transactions to be applied on the same node based on t2 < t1 > < t3. First T2 will be applied, so for now, r1 is a pre-existing > version and r2 is from T2. Next, when T1 is performed, both r1 and r2 > are from T1. Lastly, when T3 is applied, r1 will be from T3 and r2 > will be from T1. This is what you mentioned will happen after conflict > resolution in the above example. > > You are right. It won't affect the consistency. The contents of transaction on each node might vary after application depending upon the changes that conflict resolver makes; but the end result will be the same. -- Best Wishes, Ashutosh Bapat
Re: tiny step toward threading: reduce dependence on setlocale()
On 15.06.24 01:35, Jeff Davis wrote: On Thu, 2024-06-06 at 11:37 -0700, Jeff Davis wrote: I think this patch series is a nice cleanup, as well, making libc more like the other providers and not dependent on global state. New rebased series attached with additional cleanup. Now that pg_locale_t is never NULL, we can simplify the way the collation cache works, eliminating ~100 lines. Overall, this is great. I have wished for a simplification like this for a long time. It is the logical continuation of relying on locale_t stuff rather than process-global settings. * v2-0001-Make-database-default-collation-internal-to-pg_lo.patch +void +pg_init_database_collation() The function argument should be (void). The prefix pg_ makes it sound like it's a user-facing function of some sort. Is there a reason for it? Maybe add a small comment before the function. * v2-0002-Make-database-collation-pg_locale_t-always-non-NU.patch There is quite a bit of code duplication from pg_newlocale_from_collation(). Can we do this better? * v2-0003-ts_locale.c-do-not-use-NULL-to-mean-the-database-.patch The "TODO" markers should be left, because what they refer to is that these functions just use the default collation rather than something passed in from the expression machinery. This is not addressed by this change. (Obviously, the comments could have been clearer about this.) * v2-0004-Remove-support-for-null-pg_locale_t.patch I found a few more places that should be adjusted in a similar way. - In src/backend/regex/regc_pg_locale.c, the whole business with pg_regex_locale, in particular in pg_set_regex_collation(). - In src/backend/utils/adt/formatting.c, various places such as str_tolower(). - In src/backend/utils/adt/pg_locale.c, wchar2char() and char2wchar(). (Also, for wchar2char() there is one caller that explicitly passes NULL.) The changes at the call sites of pg_locale_deterministic() are unfortunate, because they kind of go into the opposite direction: They add checks for NULL locales instead of removing them. (For a minute I was thinking I was reading your patch backwards.) We should think of a way to write this clearer. Looking for example at hashtext() after 0001..0004 applied, it is if (!lc_collate_is_c(collid)) mylocale = pg_newlocale_from_collation(collid); if (!mylocale || pg_locale_deterministic(mylocale)) { But then after your 0006 patch, lc_locale_is_c() internally also calls pg_newlocale_from_collation(), so this code just becomes redundant. Better might be if pg_locale_deterministic() itself checked if collate is C, and then hashtext() would just need to write: mylocale = pg_newlocale_from_collation(collid); if (pg_locale_deterministic(mylocale)) { The patch sequencing might be a bit tricky here. Maybe it's ok if patch 0004 stays as is in this respect if 0006 were to fix it back. * v2-0005-Avoid-setlocale-in-lc_collate_is_c-and-lc_ctype_i.patch Nothing uses this, AFAICT, so why? Also, this creates more duplication between pg_init_database_collation() and pg_newlocale_from_collation(), as mentioned at patch 0002. * v2-0006-Simplify-collation-cache.patch ok
Re: Conflict Detection and Resolution
On Wed, Jun 19, 2024 at 1:52 PM Dilip Kumar wrote: > > On Tue, Jun 18, 2024 at 3:29 PM shveta malik wrote: > > On Tue, Jun 18, 2024 at 11:34 AM Dilip Kumar wrote: > > > > I tried to work out a few scenarios with this, where the apply worker > > will wait until its local clock hits 'remote_commit_tts - max_skew > > permitted'. Please have a look. > > > > Let's say, we have a GUC to configure max_clock_skew permitted. > > Resolver is last_update_wins in both cases. > > > > 1) Case 1: max_clock_skew set to 0 i.e. no tolerance for clock skew. > > > > Remote Update with commit_timestamp = 10.20AM. > > Local clock (which is say 5 min behind) shows = 10.15AM. > > > > When remote update arrives at local node, we see that skew is greater > > than max_clock_skew and thus apply worker waits till local clock hits > > 'remote's commit_tts - max_clock_skew' i.e. till 10.20 AM. Once the > > local clock hits 10.20 AM, the worker applies the remote change with > > commit_tts of 10.20AM. In the meantime (during wait period of apply > > worker)) if some local update on same row has happened at say 10.18am, > > that will applied first, which will be later overwritten by above > > remote change of 10.20AM as remote-change's timestamp appear more > > latest, even though it has happened earlier than local change. > > For the sake of simplicity let's call the change that happened at > 10:20 AM change-1 and the change that happened at 10:15 as change-2 > and assume we are talking about the synchronous commit only. Do you mean "the change that happened at 10:18 as change-2" > > I think now from an application perspective the change-1 wouldn't have > caused the change-2 because we delayed applying change-2 on the local > node Do you mean "we delayed applying change-1 on the local node." >which would have delayed the confirmation of the change-1 to the > application that means we have got the change-2 on the local node > without the confirmation of change-1 hence change-2 has no causal > dependency on the change-1. So it's fine that we perform change-1 > before change-2 Do you mean "So it's fine that we perform change-2 before change-1" >and the timestamp will also show the same at any other > node if they receive these 2 changes. > > The goal is to ensure that if we define the order where change-2 > happens before change-1, this same order should be visible on all > other nodes. This will hold true because the commit timestamp of > change-2 is earlier than that of change-1. Considering the above corrections as base, I agree with this. > > 2) Case 2: max_clock_skew is set to 2min. > > > > Remote Update with commit_timestamp=10.20AM > > Local clock (which is say 5 min behind) = 10.15AM. > > > > Now apply worker will notice skew greater than 2min and thus will wait > > till local clock hits 'remote's commit_tts - max_clock_skew' i.e. > > 10.18 and will apply the change with commit_tts of 10.20 ( as we > > always save the origin's commit timestamp into local commit_tts, see > > RecordTransactionCommit->TransactionTreeSetCommitTsData). Now lets say > > another local update is triggered at 10.19am, it will be applied > > locally but it will be ignored on remote node. On the remote node , > > the existing change with a timestamp of 10.20 am will win resulting in > > data divergence. > > Let's call the 10:20 AM change as a change-1 and the change that > happened at 10:19 as change-2 > > IIUC, although we apply the change-1 at 10:18 AM the commit_ts of that > commit_ts of that change is 10:20, and the same will be visible to all > other nodes. So in conflict resolution still the change-1 happened > after the change-2 because change-2's commit_ts is 10:19 AM. Now > there could be a problem with the causal order because we applied the > change-1 at 10:18 AM so the application might have gotten confirmation > at 10:18 AM and the change-2 of the local node may be triggered as a > result of confirmation of the change-1 that means now change-2 has a > causal dependency on the change-1 but commit_ts shows change-2 > happened before the change-1 on all the nodes. > > So, is this acceptable? I think yes because the user has configured a > maximum clock skew of 2 minutes, which means the detected order might > not always align with the causal order for transactions occurring > within that time frame. Agree. I had the same thoughts, and wanted to confirm my understanding. >Generally, the ideal configuration for > max_clock_skew should be in multiple of the network round trip time. > Assuming this configuration, we wouldn’t encounter this problem > because for change-2 to be caused by change-1, the client would need > to get confirmation of change-1 and then trigger change-2, which would > take at least 2-3 network round trips. thanks Shveta
Re: [PATCH] Add CANONICAL option to xmlserialize
On 09.02.24 14:19, Jim Jones wrote: > v9 attached with rebase due to changes done to primnodes.h in 615f5f6 > v10 attached with rebase due to changes in primnodes, parsenodes.h, and gram.y -- Jim From fbd98149d50fe19b886b30ed49b9d553a18f30b4 Mon Sep 17 00:00:00 2001 From: Jim Jones Date: Wed, 19 Jun 2024 10:22:10 +0200 Subject: [PATCH v10] Add CANONICAL output format to xmlserialize This patch introduces the CANONICAL option to xmlserialize, which serializes xml documents in their canonical form - as described in the W3C Canonical XML Version 1.1 specification. This option can be used with the additional parameter WITH [NO] COMMENTS to keep or remove xml comments from the canonical xml output. In case no parameter is provided, WITH COMMENTS will be used as default. This feature is based on the function xmlC14NDocDumpMemory from the C14N module of libxml2. This patch also includes regression tests and documentation. --- doc/src/sgml/datatype.sgml| 41 +++- src/backend/executor/execExprInterp.c | 2 +- src/backend/parser/gram.y | 21 +- src/backend/parser/parse_expr.c | 2 +- src/backend/utils/adt/xml.c | 272 -- src/include/nodes/parsenodes.h| 1 + src/include/nodes/primnodes.h | 10 + src/include/parser/kwlist.h | 1 + src/include/utils/xml.h | 2 +- src/test/regress/expected/xml.out | 114 +++ src/test/regress/expected/xml_1.out | 108 ++ src/test/regress/expected/xml_2.out | 114 +++ src/test/regress/sql/xml.sql | 63 ++ src/tools/pgindent/typedefs.list | 1 + 14 files changed, 642 insertions(+), 110 deletions(-) diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 6646820d6a..7c28d34866 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -4483,7 +4483,7 @@ xml 'bar' xml, uses the function xmlserialize:xmlserialize -XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [ NO ] INDENT ] ) +XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ { [ NO ] INDENT ] | CANONICAL [ WITH [NO] COMMENTS ]}) type can be character, character varying, or @@ -4500,6 +4500,45 @@ XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS + +The option CANONICAL converts a given +XML document to its https://www.w3.org/TR/xml-c14n11/#Terminology";>canonical form +based on the https://www.w3.org/TR/xml-c14n11/";>W3C Canonical XML 1.1 Specification. +It is basically designed to provide applications the ability to compare xml documents or test if they +have been changed. The optional parameters WITH COMMENTS (which is the default) or +WITH NO COMMENTS, respectively, keep or remove XML comments from the given document. + + + + Example: + + + When a character string value is cast to or from type xml without going through XMLPARSE or diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index 852186312c..f14d7464ef 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -4053,7 +4053,7 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op) *op->resvalue = PointerGetDatum(xmltotext_with_options(DatumGetXmlP(value), xexpr->xmloption, - xexpr->indent)); + xexpr->format)); *op->resnull = false; } break; diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 4d582950b7..ff1caa21f2 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -619,12 +619,13 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type xml_root_version opt_xml_root_standalone %type xmlexists_argument %type document_or_content -%type xml_indent_option xml_whitespace_option +%type xml_whitespace_option %type xmltable_column_list xmltable_column_option_list %type xmltable_column_el %type xmltable_column_option_el %type xml_namespace_list %type xml_namespace_el +%type opt_xml_serialize_format %type func_application func_expr_common_subexpr %type func_expr func_expr_windowless @@ -712,7 +713,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT BOOLEAN_P BOTH BREADTH BY - CACHE CALL CALLED CASCADE CASCADED CASE CAST CATALOG_P CHAIN CHAR_P + CACHE CALL CALLED CANONICAL CASCADE CASCADED CASE CAST CATALOG_P CHAIN CHAR_P CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE CLUSTER COALESCE COLLATE COLLATION COLUMN COLUMNS COMMENT COMMENTS COMMIT COMMITTED COMPRESSION CONCURRENTLY CONDITIONAL CONFIGURATION CONFLICT @@ -15965,14 +15966,14 @@ func_expr_common_subexpr: $$ = makeXmlExpr(IS_XMLROOT, NULL, NIL, list_make3($3, $5, $6), @1); } - | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename xml_indent_option ')
Re: replace strtok()
At Tue, 18 Jun 2024 09:18:28 +0200, Peter Eisentraut wrote in > Under the topic of getting rid of thread-unsafe functions in the > backend [0], here is a patch series to deal with strtok(). > > Of course, strtok() is famously not thread-safe and can be replaced by > strtok_r(). But it also has the wrong semantics in some cases, > because it considers adjacent delimiters to be one delimiter. So if > you parse > > SCRAM-SHA-256$:$: > > with strtok(), then > > SCRAM-SHA-256$$::$$:: > > parses just the same. In many cases, this is arguably wrong and could > hide mistakes. > > So I'm suggesting to use strsep() in those places. strsep() is > nonstandard but widely available. > > There are a few places where strtok() has the right semantics, such as > parsing tokens separated by whitespace. For those, I'm using > strtok_r(). I agree with the distinction. > A reviewer job here would be to check whether I made that distinction > correctly in each case. 0001 and 0002 look correct to me regarding that distinction. They applied correctly to the master HEAD and all tests passed on Linux. > On the portability side, I'm including a port/ replacement for > strsep() and some workaround to get strtok_r() for Windows. I have > included these here as separate patches for clarity. 0003 looks fine and successfully built and seems working on an MSVC build. About 0004, Cygwin seems to have its own strtok_r, but I haven't checked how that fact affects the build. > [0]: > https://www.postgresql.org/message-id/856e5ec3-879f-42ee-8258-8bcc6ec9b...@eisentraut.org regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Conflict Detection and Resolution
On Tue, Jun 18, 2024 at 3:29 PM shveta malik wrote: > On Tue, Jun 18, 2024 at 11:34 AM Dilip Kumar wrote: > > I tried to work out a few scenarios with this, where the apply worker > will wait until its local clock hits 'remote_commit_tts - max_skew > permitted'. Please have a look. > > Let's say, we have a GUC to configure max_clock_skew permitted. > Resolver is last_update_wins in both cases. > > 1) Case 1: max_clock_skew set to 0 i.e. no tolerance for clock skew. > > Remote Update with commit_timestamp = 10.20AM. > Local clock (which is say 5 min behind) shows = 10.15AM. > > When remote update arrives at local node, we see that skew is greater > than max_clock_skew and thus apply worker waits till local clock hits > 'remote's commit_tts - max_clock_skew' i.e. till 10.20 AM. Once the > local clock hits 10.20 AM, the worker applies the remote change with > commit_tts of 10.20AM. In the meantime (during wait period of apply > worker)) if some local update on same row has happened at say 10.18am, > that will applied first, which will be later overwritten by above > remote change of 10.20AM as remote-change's timestamp appear more > latest, even though it has happened earlier than local change. For the sake of simplicity let's call the change that happened at 10:20 AM change-1 and the change that happened at 10:15 as change-2 and assume we are talking about the synchronous commit only. I think now from an application perspective the change-1 wouldn't have caused the change-2 because we delayed applying change-2 on the local node which would have delayed the confirmation of the change-1 to the application that means we have got the change-2 on the local node without the confirmation of change-1 hence change-2 has no causal dependency on the change-1. So it's fine that we perform change-1 before change-2 and the timestamp will also show the same at any other node if they receive these 2 changes. The goal is to ensure that if we define the order where change-2 happens before change-1, this same order should be visible on all other nodes. This will hold true because the commit timestamp of change-2 is earlier than that of change-1. > 2) Case 2: max_clock_skew is set to 2min. > > Remote Update with commit_timestamp=10.20AM > Local clock (which is say 5 min behind) = 10.15AM. > > Now apply worker will notice skew greater than 2min and thus will wait > till local clock hits 'remote's commit_tts - max_clock_skew' i.e. > 10.18 and will apply the change with commit_tts of 10.20 ( as we > always save the origin's commit timestamp into local commit_tts, see > RecordTransactionCommit->TransactionTreeSetCommitTsData). Now lets say > another local update is triggered at 10.19am, it will be applied > locally but it will be ignored on remote node. On the remote node , > the existing change with a timestamp of 10.20 am will win resulting in > data divergence. Let's call the 10:20 AM change as a change-1 and the change that happened at 10:19 as change-2 IIUC, although we apply the change-1 at 10:18 AM the commit_ts of that commit_ts of that change is 10:20, and the same will be visible to all other nodes. So in conflict resolution still the change-1 happened after the change-2 because change-2's commit_ts is 10:19 AM. Now there could be a problem with the causal order because we applied the change-1 at 10:18 AM so the application might have gotten confirmation at 10:18 AM and the change-2 of the local node may be triggered as a result of confirmation of the change-1 that means now change-2 has a causal dependency on the change-1 but commit_ts shows change-2 happened before the change-1 on all the nodes. So, is this acceptable? I think yes because the user has configured a maximum clock skew of 2 minutes, which means the detected order might not always align with the causal order for transactions occurring within that time frame. Generally, the ideal configuration for max_clock_skew should be in multiple of the network round trip time. Assuming this configuration, we wouldn’t encounter this problem because for change-2 to be caused by change-1, the client would need to get confirmation of change-1 and then trigger change-2, which would take at least 2-3 network round trips. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Speed up collation cache
On 15.06.24 01:46, Jeff Davis wrote: * instead of a hardwired set of special collation IDs, have a single- element "last collation ID" to check before doing the hash lookup? I'd imagine that method could be very effective.
Re: 001_rep_changes.pl fails due to publisher stuck on shutdown
FYI - I applied this latest patch and re-ran the original failing test script 10 times (e.g. 10 x 100 test iterations; it took 4+ hours). There were zero failures observed in my environment. == Kind Regards, Peter Smith. Fujitsu Australia