Re: Improve logging when using Huge Pages
On Fri, Jun 23, 2023 at 01:57:51PM +0900, Michael Paquier wrote: > I could not resist adding two checks in the TAP tests to make sure > that we don't report unknown. Perhaps that's not necessary, but that > would provide coverage in a more broader way, and these are cheap. > > I have run one indentation, while on it. > > Note to self: check that manually on Windows. I have spent a few hours on that, running more tests with -DEXEC_BACKEND, including Windows and macos, and applied it. -- Michael signature.asc Description: PGP signature
Re: Assert while autovacuum was executing
On Fri, Jun 30, 2023 at 8:26 AM Amit Kapila wrote: > > On Wed, Jun 28, 2023 at 7:26 AM Zhijie Hou (Fujitsu) > wrote: > > > > Thanks for the verification. Unless someone has any further comments > or suggestions, I'll push this next week sometime. > Pushed but forgot to do indent which leads to BF failure[1]. I'll take care of it. [1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=koel&dt=2023-07-06%2005%3A19%3A03 -- With Regards, Amit Kapila.
Re: test_extensions: fix inconsistency between meson.build and Makefile
On Thu, 2023-07-06 at 11:41 +0900, Michael Paquier wrote: > Why is the addition of --encoding necessary for test_extensions? Its > Makefile has a NO_LOCALE, but it has no ENCODING set. I think that was an oversight -- as you point out, the Makefile doesn't set ENCODING, so the meson.build does not need to, either. Regards, Jeff Davis
Re: Add index scan progress to pg_stat_progress_vacuum
On Thu, Jul 6, 2023 at 11:15 AM Michael Paquier wrote: > > On Thu, Jul 06, 2023 at 11:07:14AM +0900, Masahiko Sawada wrote: > > Thank you for updating the patch. It looks good to me too. I've > > updated the commit message. > > Thanks. I was planning to review this patch today and/or tomorrow now > that my stack of things to do is getting slightly lower (registered my > name as committer as well a few weeks ago to not format). > > One thing I was planning to do is to move the new message processing > API for the incrementational updates in its own commit for clarity, as > that's a separate concept than the actual feature, useful on its own. +1. I had the same idea. Please find the attached patches. > Anyway, would you prefer taking care of it? I can take care of it if you're okay. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com v31-0002-Report-index-vacuum-progress.patch Description: Binary data v31-0001-Add-new-parallel-message-type-to-progress-report.patch Description: Binary data
Re: MERGE ... RETURNING
On Sat, Jul 1, 2023 at 4:08 AM Dean Rasheed wrote: > > On Mon, 13 Mar 2023 at 13:36, Dean Rasheed wrote: > > > > And another rebase. > > > > I ran out of cycles to pursue the MERGE patches in v16, but hopefully > I can make more progress in v17. > > Looking at this one with fresh eyes, it looks mostly in good shape. +1 Most of the review was done with the v6 of the patch, minus the doc build step. The additional changes in v7 of the patch were eyeballed, and tested with `make check`. > To > recap, this adds support for the RETURNING clause in MERGE, together > with new support functions pg_merge_action() and > pg_merge_when_clause() that can be used in the RETURNING list of MERGE nit: s/can be used in/can be used only in/ > to retrieve the kind of action (INSERT/UPDATE/DELETE), and the index > of the WHEN clause executed for each row merged. In addition, > RETURNING support allows MERGE to be used as the source query in COPY > TO and WITH queries. > > One minor annoyance is that, due to the way that pg_merge_action() and > pg_merge_when_clause() require access to the MergeActionState, they > only work if they appear directly in the RETURNING list. They can't, > for example, appear in a subquery in the RETURNING list, and I don't > see an easy way round that limitation. I believe that's a serious limitation, and would be a blocker for the feature. > Attached is an updated patch with some cosmetic updates, plus updated > ruleutils support. With each iteration of your patch, it is becoming increasingly clear that this will be a documentation-heavy patch :-) I think the name of function pg_merge_when_clause() can be improved. How about pg_merge_when_clause_ordinal(). In the doc page of MERGE, you've moved the 'with_query' from the bottom of Parameters section to the top of that section. Any reason for this? Since the Parameters section is quite large, for a moment I thought the patch was _adding_ the description of 'with_query'. +/* + * Merge support functions should only be called directly from a MERGE + * command, and need access to the parent ModifyTableState. The parser + * should have checked that such functions only appear in the RETURNING + * list of a MERGE, so this should never fail. + */ +if (IsMergeSupportFunction(funcid)) +{ +if (!state->parent || +!IsA(state->parent, ModifyTableState) || +((ModifyTableState *) state->parent)->operation != CMD_MERGE) +elog(ERROR, "merge support function called in non-merge context"); As the comment says, this is an unexpected condition, and should have been caught and reported by the parser. So I think it'd be better to use an Assert() here. Moreover, there's ERROR being raised by the pg_merge_action() and pg_merge_when_clause() functions, when they detect the function context is not appropriate. I found it very innovative to allow these functions to be called only in a certain part of certain SQL command. I don't think there's a precedence for this in Postgres; I'd be glad to learn if there are other functions that Postgres exposes this way. -EXPR_KIND_RETURNING,/* RETURNING */ +EXPR_KIND_RETURNING,/* RETURNING in INSERT/UPDATE/DELETE */ +EXPR_KIND_MERGE_RETURNING, /* RETURNING in MERGE */ Having to invent a whole new ParseExprKind enum, feels like a bit of an overkill. My only objection is that this is exactly like EXPR_KIND_RETURNING, hence EXPR_KIND_MERGE_RETURNING needs to be dealt with exactly in as many places. But I don't have any alternative proposals. --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h +/* Is this a merge support function? (Requires fmgroids.h) */ +#define IsMergeSupportFunction(funcid) \ +((funcid) == F_PG_MERGE_ACTION || \ + (funcid) == F_PG_MERGE_WHEN_CLAUSE) If it doesn't cause recursion or other complications, I think we should simply include the fmgroids.h in pg_proc.h. I understand that not all .c files/consumers that include pg_proc.h may need fmgroids.h, but #include'ing it will eliminate the need to keep the "Requires..." note above, and avoid confusion, too. --- a/src/test/regress/expected/merge.out +++ b/src/test/regress/expected/merge.out -WHEN MATCHED AND tid > 2 THEN +WHEN MATCHED AND tid >= 2 THEN This change can be treated as a bug fix :-) +-- COPY (MERGE ... RETURNING) TO ... +BEGIN; +COPY ( +MERGE INTO sq_target t +USING v +ON tid = sid +WHEN MATCHED AND tid > 2 THEN For consistency, the check should be tid >= 2, like you've fixed in command referenced above. +BEGIN; +COPY ( +MERGE INTO sq_target t +USING v +ON tid = sid +WHEN MATCHED AND tid > 2 THEN +UPDATE SET balance = t.balance + delta +WHEN NOT MATCHED THEN +INSERT (balance, tid) VALUES (balance + delta, sid) +WHEN MATCHED AND tid < 2 THEN +DELETE +RETURNING pg_merge_action(), t.* +) TO stdout; +DELETE 1 100 +ROLLBACK; I expected the
Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
On Wed, Jul 5, 2023 at 1:48 AM Melih Mutlu wrote: > > Hayato Kuroda (Fujitsu) , 4 Tem 2023 Sal, > 08:42 tarihinde şunu yazdı: > > > > But in the later patch the tablesync worker tries to reuse the slot > > > > during the > > > > synchronization, so in this case the application_name should be same as > > > slotname. > > > > > > > > > > Fair enough. I am slightly afraid that if we can't show the benefits > > > with later patches then we may need to drop them but at this stage I > > > feel we need to investigate why those are not helping? > > > > Agreed. Now I'm planning to do performance testing independently. We can > > discuss > > based on that or Melih's one. > > Here I attached what I use for performance testing of this patch. > > I only benchmarked the patch set with reusing connections very roughly > so far. But seems like it improves quite significantly. For example, > it took 611 ms to sync 100 empty tables, it was 1782 ms without > reusing connections. > First 3 patches from the set actually bring a good amount of > improvement, but not sure about the later patches yet. > I suggest then we should focus first on those 3, get them committed and then look at the remaining. > Amit Kapila , 3 Tem 2023 Pzt, 08:59 tarihinde > şunu yazdı: > > On thinking about this, I think the primary benefit we were expecting > > by saving network round trips for slot drop/create but now that we > > anyway need an extra round trip to establish a snapshot, so such a > > benefit was not visible. This is just a theory so we should validate > > it. The another idea as discussed before [1] could be to try copying > > multiple tables in a single transaction. Now, keeping a transaction > > open for a longer time could have side-effects on the publisher node. > > So, we probably need to ensure that we don't perform multiple large > > syncs and even for smaller tables (and later sequences) perform it > > only for some threshold number of tables which we can figure out by > > some tests. Also, the other safety-check could be that anytime we need > > to perform streaming (sync with apply worker), we won't copy more > > tables in same transaction. > > > > Thoughts? > > Yeah, maybe going to the publisher for creating a slot or only a > snapshot does not really make enough difference. I was hoping that > creating only snapshot by an existing replication slot would help the > performance. I guess I was either wrong or am missing something in the > implementation. > > The tricky bit with keeping a long transaction to copy multiple tables > is deciding how many tables one transaction can copy. > Yeah, I was thinking that we should not allow copying some threshold data in one transaction. After every copy, we will check the size of the table and add it to the previously copied table size in the same transaction. Once the size crosses a certain threshold, we will end the transaction. This may not be a very good scheme but I think it this helps then it would be much simpler than creating-only-snapshot approach. -- With Regards, Amit Kapila.
Re: Should we remove db_user_namespace?
On Thu, Jul 06, 2023 at 08:21:18AM +0900, Michael Paquier wrote: > Removing the GUC from this table is kind of annoying. Cannot this be > handled like default_with_oids or ssl_renegotiation_limit to avoid any > kind of issues with the reload of dump files and the kind? Ah, good catch. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From ba8f57f2e15bcf9c147c25496f5ea7dba211fefb Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 30 Jun 2023 12:46:08 -0700 Subject: [PATCH v4 1/1] remove db_user_namespace --- doc/src/sgml/client-auth.sgml | 5 -- doc/src/sgml/config.sgml | 52 --- src/backend/commands/variable.c | 15 ++ src/backend/libpq/auth.c | 5 -- src/backend/libpq/hba.c | 12 - src/backend/postmaster/postmaster.c | 19 --- src/backend/utils/misc/guc_tables.c | 16 -- src/backend/utils/misc/postgresql.conf.sample | 1 - src/include/libpq/pqcomm.h| 2 - src/include/utils/guc_hooks.h | 1 + .../unsafe_tests/expected/guc_privs.out | 4 ++ .../modules/unsafe_tests/sql/guc_privs.sql| 3 ++ 12 files changed, 35 insertions(+), 100 deletions(-) diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml index 204d09df67..6c95f0df1e 100644 --- a/doc/src/sgml/client-auth.sgml +++ b/doc/src/sgml/client-auth.sgml @@ -1253,11 +1253,6 @@ omicron bryanh guest1 attacks. - - The md5 method cannot be used with - the feature. - - To ease transition from the md5 method to the newer SCRAM method, if md5 is specified as a method diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 6262cb7bb2..e6cea8ddfc 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1188,58 +1188,6 @@ include_dir 'conf.d' - - - db_user_namespace (boolean) - - db_user_namespace configuration parameter - - - - -This parameter enables per-database user names. It is off by default. -This parameter can only be set in the postgresql.conf -file or on the server command line. - - - -If this is on, you should create users as username@dbname. -When username is passed by a connecting client, -@ and the database name are appended to the user -name and that database-specific user name is looked up by the -server. Note that when you create users with names containing -@ within the SQL environment, you will need to -quote the user name. - - - -With this parameter enabled, you can still create ordinary global -users. Simply append @ when specifying the user -name in the client, e.g., joe@. The @ -will be stripped off before the user name is looked up by the -server. - - - -db_user_namespace causes the client's and -server's user name representation to differ. -Authentication checks are always done with the server's user name -so authentication methods must be configured for the -server's user name, not the client's. Because -md5 uses the user name as salt on both the -client and server, md5 cannot be used with -db_user_namespace. - - - - - This feature is intended as a temporary measure until a - complete solution is found. At that time, this option will - be removed. - - - - diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c index f0f2e07655..b6a2fa2512 100644 --- a/src/backend/commands/variable.c +++ b/src/backend/commands/variable.c @@ -1157,6 +1157,21 @@ check_bonjour(bool *newval, void **extra, GucSource source) return true; } +bool +check_db_user_namespace(bool *newval, void **extra, GucSource source) +{ + if (*newval) + { + /* check the GUC's definition for an explanation */ + GUC_check_errcode(ERRCODE_FEATURE_NOT_SUPPORTED); + GUC_check_errmsg("db_user_namespace is not supported"); + + return false; + } + + return true; +} + bool check_default_with_oids(bool *newval, void **extra, GucSource source) { diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c index a98b934a8e..65d452f099 100644 --- a/src/backend/libpq/auth.c +++ b/src/backend/libpq/auth.c @@ -873,11 +873,6 @@ CheckMD5Auth(Port *port, char *shadow_pass, const char **logdetail) char *passwd; int result; - if (Db_user_namespace) - ereport(FATAL, -(errcode(ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION), - errmsg("MD5 authentication is not supported when \"db_user_namespace\" is enabled"))); - /* include the salt to use for computing the response */ if (!pg_strong_random(md5Salt, 4))
Re: contrib/pg_freespacemap first check input argument, then relation_open.
Hi, On Thu, Jul 06, 2023 at 10:14:46AM +0800, jian he wrote: > > In: > https://git.postgresql.org/cgit/postgresql.git/tree/contrib/pg_freespacemap/pg_freespacemap.c > > rel = relation_open(relid, AccessShareLock); > > if (blkno < 0 || blkno > MaxBlockNumber) > ereport(ERROR, > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > errmsg("invalid block number"))); > > > should it first check input arguments, then relation_open? It would probably be a slightly better approach but wouldn't really change much in practice so I'm not sure it's worth changing now. > Does ereport automatically unlock the relation? Yes, locks, lwlocks, memory contexts and everything else is properly cleaned / released in case of error.
Re: test_extensions: fix inconsistency between meson.build and Makefile
On Sat, Jun 17, 2023 at 07:40:18AM -0700, Gurjeet Singh wrote: > So attached is updated patch that makes the order consistent across > all 3 occurrences. There is no need to update unaccent since 44e73a4. --- a/src/test/modules/test_extensions/meson.build +++ b/src/test/modules/test_extensions/meson.build @@ -47,5 +47,6 @@ tests += { 'test_extensions', 'test_extdepend', ], +'regress_args': ['--no-locale', '--encoding=UTF8'], Why is the addition of --encoding necessary for test_extensions? Its Makefile has a NO_LOCALE, but it has no ENCODING set. -- Michael signature.asc Description: PGP signature
Re: Add index scan progress to pg_stat_progress_vacuum
On Thu, Jul 06, 2023 at 11:07:14AM +0900, Masahiko Sawada wrote: > Thank you for updating the patch. It looks good to me too. I've > updated the commit message. Thanks. I was planning to review this patch today and/or tomorrow now that my stack of things to do is getting slightly lower (registered my name as committer as well a few weeks ago to not format). One thing I was planning to do is to move the new message processing API for the incrementational updates in its own commit for clarity, as that's a separate concept than the actual feature, useful on its own. Anyway, would you prefer taking care of it? -- Michael signature.asc Description: PGP signature
contrib/pg_freespacemap first check input argument, then relation_open.
Hi. In: https://git.postgresql.org/cgit/postgresql.git/tree/contrib/pg_freespacemap/pg_freespacemap.c rel = relation_open(relid, AccessShareLock); if (blkno < 0 || blkno > MaxBlockNumber) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("invalid block number"))); should it first check input arguments, then relation_open? Does ereport automatically unlock the relation?
Re: Add index scan progress to pg_stat_progress_vacuum
On Wed, Apr 12, 2023 at 9:22 PM Imseih (AWS), Sami wrote: > > > This should be OK (also checked the code paths where the reports are > > added). Note that the patch needed a few adjustments for its > > indentation. > > Thanks for the formatting corrections! This looks good to me. Thank you for updating the patch. It looks good to me too. I've updated the commit message. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com From f92929f21d4052cfbc3b068ca5b8be954741da3d Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Wed, 12 Apr 2023 13:45:59 +0900 Subject: [PATCH v30] Report index vacuum progress. This commit adds two columns: indexes_total and indexes_processed, to pg_stat_progress_vacuum system view to show the index vacuum progress. These numbers are reported in the "vacuuming indexes" and "cleaning up indexes" phases. It also adds a new type of parallel message, 'P'. Which is used to convey the progress updates made by parallel workers to the leader process. Therefore, the parallel workers' progress updates are reflected in an asynchronous manner. Currently it supports only incremental progress reporting but it's possible to allow for supporting of other backend process APIs in the future. Authors: Sami Imseih Reviewed by: Masahiko Sawada, Michael Paquier, Nathan Bossart, Andres Freund Discussion: https://www.postgresql.org/message-id/flat/5478DFCD-2333-401A-B2F0-0D186AB09228@amazon.com --- doc/src/sgml/monitoring.sgml | 23 ++ src/backend/access/heap/vacuumlazy.c | 70 --- src/backend/access/transam/parallel.c | 18 + src/backend/catalog/system_views.sql | 3 +- src/backend/commands/vacuumparallel.c | 9 ++- src/backend/utils/activity/backend_progress.c | 32 + src/include/commands/progress.h | 2 + src/include/utils/backend_progress.h | 1 + src/test/regress/expected/rules.out | 4 +- 9 files changed, 150 insertions(+), 12 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 506aeaa879..588b720f57 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -6110,6 +6110,29 @@ FROM pg_stat_get_backend_idset() AS backendid; Number of dead tuples collected since the last index vacuum cycle. + + + + indexes_total bigint + + + Total number of indexes that will be vacuumed or cleaned up. This + number is reported at the beginning of the + vacuuming indexes phase or the + cleaning up indexes phase. + + + + + + indexes_processed bigint + + + Number of indexes processed. This counter only advances when the + phase is vacuuming indexes or + cleaning up indexes. + + diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 4eb953f904..6a41ee635d 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -2319,6 +2319,17 @@ lazy_vacuum_all_indexes(LVRelState *vacrel) { bool allindexes = true; double old_live_tuples = vacrel->rel->rd_rel->reltuples; + const int progress_start_index[] = { + PROGRESS_VACUUM_PHASE, + PROGRESS_VACUUM_INDEXES_TOTAL + }; + const int progress_end_index[] = { + PROGRESS_VACUUM_INDEXES_TOTAL, + PROGRESS_VACUUM_INDEXES_PROCESSED, + PROGRESS_VACUUM_NUM_INDEX_VACUUMS + }; + int64 progress_start_val[2]; + int64 progress_end_val[3]; Assert(vacrel->nindexes > 0); Assert(vacrel->do_index_vacuuming); @@ -2331,9 +2342,13 @@ lazy_vacuum_all_indexes(LVRelState *vacrel) return false; } - /* Report that we are now vacuuming indexes */ - pgstat_progress_update_param(PROGRESS_VACUUM_PHASE, - PROGRESS_VACUUM_PHASE_VACUUM_INDEX); + /* + * Report that we are now vacuuming indexes and the number of indexes to + * vacuum. + */ + progress_start_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_INDEX; + progress_start_val[1] = vacrel->nindexes; + pgstat_progress_update_multi_param(2, progress_start_index, progress_start_val); if (!ParallelVacuumIsActive(vacrel)) { @@ -2346,6 +2361,10 @@ lazy_vacuum_all_indexes(LVRelState *vacrel) old_live_tuples, vacrel); + /* Report the number of indexes vacuumed */ + pgstat_progress_update_param(PROGRESS_VACUUM_INDEXES_PROCESSED, + idx + 1); + if (lazy_check_wraparound_failsafe(vacrel)) { /* Wraparound emergency -- end current index scan */ @@ -2380,14 +2399,17 @@ lazy_vacuum_all_indexes(LVRelState *vacrel) Assert(allindexes || VacuumFailsafeActive); /* - * Increase and report the number of index scans. + * Increase and report the number of index scans. Also, we reset + * PROGRESS_VACUUM_INDEXES_TOTAL and PROGRESS_VACUUM_INDEXES_PROCESSED. * * We deliberately include the case where we started a round of bulk * deletes that we
Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.
On Thu, Jul 6, 2023 at 7:58 AM Peter Smith wrote: > > On Wed, Jul 5, 2023 at 4:32 PM Masahiko Sawada wrote: > > > > On Wed, Jul 5, 2023 at 2:46 PM Amit Kapila wrote: > > > > > > On Wed, Jul 5, 2023 at 9:01 AM Peter Smith wrote: > > > > > > > > Hi. Here are some review comments for this patch. > > > > > > > > +1 for the patch idea. > > > > > > > > -- > > > > > > > > I wasn't sure about the code comment adjustments suggested by Amit [1]: > > > > "Accordingly, the comments atop build_replindex_scan_key(), > > > > FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression() > > > > should also be adjusted." > > > > As for IsIndexOnlyOnExpression(), what part do you think we need to > > adjust? It says: > > > > /* > > * Returns true if the given index consists only of expressions such as: > > * CREATE INDEX idx ON table(foo(col)); > > * > > * Returns false even if there is one column reference: > > * CREATE INDEX idx ON table(foo(col), col_2); > > */ > > > > and it seems to me that the function doesn't check if the leftmost > > index column is a non-expression. > > > > TBH, this IsIndexOnlyOnExpression() function seemed redundant to me, > otherwise, there can be some indexes that are firstly considered > "useable" but then fail the subsequent leftmost check. It does not > seem right. I see your point. IsIndexUsableForReplicaIdentityFull(), the sole user of IsIndexOnlyOnExpression(), is also called by RelationFindReplTupleByIndex() in an assertion build. I thought this is the reason why we have separate IsIndexOnlyOnExpression() ( and IsIndexUsableForReplicaIdentityFull()). But this assertion doesn't check if the leftmost index column exists on the remote relation. What are we doing this check for? If it's not necessary, we can remove this assertion and merge both IsIndexOnlyOnExpression() and IsIndexUsableForReplicaIdentityFull() into FindUsableIndexForReplicaIdentityFull(). > > > > > doc/src/sgml/logical-replication.sgml > > > > > > > > 1. > > > > the key. When replica identity FULL is > > > > specified, > > > > indexes can be used on the subscriber side for searching the rows. > > > > Candidate > > > > indexes must be btree, non-partial, and have at least one column > > > > reference > > > > - (i.e. cannot consist of only expressions). These restrictions > > > > - on the non-unique index properties adhere to some of the > > > > restrictions that > > > > - are enforced for primary keys. If there are no such suitable > > > > indexes, > > > > + at the leftmost column indexes (i.e. cannot consist of only > > > > expressions). These > > > > + restrictions on the non-unique index properties adhere to some of > > > > the restrictions > > > > + that are enforced for primary keys. If there are no such suitable > > > > indexes, > > > > the search on the subscriber side can be very inefficient, therefore > > > > replica identity FULL should only be used as a > > > > fallback if no other solution is possible. If a replica identity > > > > other > > > > > > > > Isn't this using the word "indexes" with different meanings in the > > > > same sentence? e.g. IIUC "leftmost column indexes" is referring to the > > > > ordinal number of the index fields. > > > > That was my mistake, it should be " at the leftmost column". > > IIUC the subscriber-side table can have more columns than the > publisher-side table, so just describing in the docs that the leftmost > INDEX field must be a column may not be quite enough; it also needs to > say that column has to exist on the publisher-table, doesn't it? Right. I've updated the patch. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com doc_update_v2.patch Description: Binary data
Re: Add PQsendSyncMessage() to libpq
Hello, On 05/07/2023 21:45, Daniel Gustafsson wrote: Please rebase and send an updated version. Here it is (including the warning fix). Thanks, Anton KirilovFrom 3c2e064a151568830e4d9e4bf238739b458350b4 Mon Sep 17 00:00:00 2001 From: Anton Kirilov Date: Wed, 22 Mar 2023 20:39:57 + Subject: [PATCH v4] Add PQpipelinePutSync() to libpq This new function is equivalent to PQpipelineSync(), except that it lets the caller choose whether anything is flushed to the server. Its purpose is to reduce the system call overhead of pipeline mode. --- doc/src/sgml/libpq.sgml | 52 --- src/interfaces/libpq/exports.txt | 1 + src/interfaces/libpq/fe-exec.c| 30 --- src/interfaces/libpq/libpq-fe.h | 6 +++ .../modules/libpq_pipeline/libpq_pipeline.c | 37 + .../traces/multi_pipelines.trace | 11 6 files changed, 121 insertions(+), 16 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index b6f5aba1b1..0bc88b1569 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -3547,8 +3547,9 @@ ExecStatusType PQresultStatus(const PGresult *res); The PGresult represents a -synchronization point in pipeline mode, requested by -. +synchronization point in pipeline mode, requested by either + or +. This status occurs only when pipeline mode has been selected. @@ -5122,7 +5123,8 @@ int PQsendClosePortal(PGconn *conn, const char *portalName); , , , - , or + , + , or call, and returns it. A null pointer is returned when the command is complete and there @@ -5490,7 +5492,8 @@ int PQflush(PGconn *conn); or its prepared-query sibling . These requests are queued on the client-side until flushed to the server; - this occurs when is used to + this occurs when (or optionally + ) is used to establish a synchronization point in the pipeline, or when is called. The functions , @@ -5506,8 +5509,9 @@ int PQflush(PGconn *conn); client sends them. The server will begin executing the commands in the pipeline immediately, not waiting for the end of the pipeline. Note that results are buffered on the server side; the server flushes - that buffer when a synchronization point is established with - PQpipelineSync, or when + that buffer when a synchronization point is established with either + PQpipelineSync or + PQpipelinePutSync, or when PQsendFlushRequest is called. If any statement encounters an error, the server aborts the current transaction and does not execute any subsequent command in the queue @@ -5564,7 +5568,8 @@ int PQflush(PGconn *conn); PGresult types PGRES_PIPELINE_SYNC and PGRES_PIPELINE_ABORTED. PGRES_PIPELINE_SYNC is reported exactly once for each - PQpipelineSync at the corresponding point + PQpipelineSync or + PQpipelinePutSync at the corresponding point in the pipeline. PGRES_PIPELINE_ABORTED is emitted in place of a normal query result for the first error and all subsequent results @@ -5602,7 +5607,8 @@ int PQflush(PGconn *conn); PQresultStatus will report a PGRES_PIPELINE_ABORTED result for each remaining queued operation in an aborted pipeline. The result for - PQpipelineSync is reported as + PQpipelineSync or + PQpipelinePutSync is reported as PGRES_PIPELINE_SYNC to signal the end of the aborted pipeline and resumption of normal result processing. @@ -5834,6 +5840,36 @@ int PQsendFlushRequest(PGconn *conn); + + + PQpipelinePutSyncPQpipelinePutSync + + + + Marks a synchronization point in a pipeline by sending a + sync message. + This serves as the delimiter of an implicit transaction and + an error recovery point; see . + + +int PQpipelinePutSync(PGconn *conn, int flags); + + + + Returns 0 for success. Returns -1 if the connection is not in + pipeline mode or sending a + sync message + failed. Returns 1 if it was unable to send all the data in + the send queue yet (this case can only occur if the connection + is nonblocking and output data is flushed to the server). + The flags argument is a bitwise OR of + several flags. PG_PIPELINEPUTSYNC_FLUSH + specifies that any queued output data is flushed to the + server. Otherwise, nothing is flushed automatically; + use PQflush if necessary. + + + diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt index 850734ac96..18e7b31539 100644 --- a/src/interfaces/libpq/exports.txt +++ b/src/interfaces/libpq/exports.txt @@ -191,3 +191,4
Re: Autogenerate some wait events code and documentation
On Wed, Jul 05, 2023 at 02:59:39PM -0700, Andres Freund wrote: > Rebasing a patch over this I was a bit confused because I got a bunch of > ""unable to parse wait_event_names.txt" errors. Took me a while to figure out > that that was just because I didn't include a trailing . in the description. > Perhaps that could be turned into a more meaningful error? > > die "unable to parse wait_event_names.txt" > unless $line =~ /^(\w+)\t+(\w+)\t+("\w+")\t+("\w.*\.")$/; Agreed that we could at least add the $line in the error message generated, at least, to help with debugging. > I also do wonder if we should invest in generating the lwlock names as > well. Except for a few abbreviations, the second column is always the > camel-cased version of what follows WAIT_EVENT_. Feels pretty tedious to write > that out. And you mean getting rid of lwlocknames.txt? The impact on dtrace or other similar tools is uncertain to me because we have free number on this list, and stuff like GetLWLockIdentifier() rely on the input ID from lwlocknames.txt. > Perhaps we should just change the case of the upper-cased names (DSM, SSL, > WAL, ...) to follow the other names? So you mean renaming the existing events like WalSenderWaitForWAL to WalSenderWaitForWal? The impact on existing monitoring queries is not zero because any changes would be silent, and that's the part that worried me the most even if it can remove one column in the txt file. -- Michael signature.asc Description: PGP signature
Re: pg_upgrade and cross-library upgrades
On Wed, Jul 05, 2023 at 07:03:56AM -0400, Tom Lane wrote: > Alvaro Herrera writes: > > 002_pg_upgrade.pl can test for presence or absence of pgcrypto by > > grepping pg_config --configure for --with-ssl or --with-openssl. If the > > old cluster has it but the new doesn't, we must drop the > > contrib_regression_pgcrypto database. > > Hmm, but you'd also need code to handle meson builds no? Yes. It is worth noting that pg_config (or its SQL function) would report the switches for ./configure in its CONFIGURE field, but, err.. We report nothing under meson. That's a problem. > Could it > be easier to look for the pgcrypto library in the new installation? If all the contrib/ modules are handled, we'd need mapping rules for everything listed in contrib/Makefile. > Not entirely sure this is worth the effort. I am not sure either.. Anyway, the buildfarm code does similar things already, see around $bad_module in TestUpgradeXversion.pm, for instance. So this kind of workaround exists already. It seems to me that we should try to pull that out of the buildfarm code and have that in the core module instead as a routine that would be used by the in-core TAP tests of pg_upgrade and the buildfarm code. -- Michael signature.asc Description: PGP signature
Re: Consider \v to the list of whitespace characters in the parser
On Tue, Jul 04, 2023 at 09:28:21AM +0900, Michael Paquier wrote: > Yeah, thanks. I have looked again at that this morning, and did not notice any missing spots, so applied.. Let's see how it goes. -- Michael signature.asc Description: PGP signature
Re: Should we remove db_user_namespace?
On Wed, Jul 05, 2023 at 02:29:27PM -0700, Nathan Bossart wrote: > }, > - { > - {"db_user_namespace", PGC_SIGHUP, CONN_AUTH_AUTH, > - gettext_noop("Enables per-database user names."), > - NULL > - }, > - &Db_user_namespace, > - false, > - NULL, NULL, NULL > - }, > { Removing the GUC from this table is kind of annoying. Cannot this be handled like default_with_oids or ssl_renegotiation_limit to avoid any kind of issues with the reload of dump files and the kind? -- Michael signature.asc Description: PGP signature
Re: Wrong syntax in feature description
On Wed, Jul 5, 2023 at 5:37 PM Daniel Gustafsson wrote: > > > On 4 Jun 2023, at 18:48, Amit Kapila wrote: > > > > On Fri, Jun 2, 2023 at 7:01 PM Peter Smith wrote: > >> > >> Hi, I noticed a feature description [1] referring to a command example: > >> > >> CREATE PUBLICATION ... FOR ALL TABLES IN SCHEMA > >> > >> ~~ > >> > >> AFAIK that should say "FOR TABLES IN SCHEMA" (without the "ALL", see [2]) > > > > Right, this should be changed. > > Agreed, so I've fixed this in the featurematrix on the site. I will mark this > CF entry as committed even though there is nothing to commit (the > featurematrix > is stored in the postgresql.org django instance) since there was a change > performed. > > Thanks for the report! Thanks for (not) pushing ;-) -- Kind Regards, Peter Smith. Fujitsu Australia
Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.
On Wed, Jul 5, 2023 at 4:32 PM Masahiko Sawada wrote: > > On Wed, Jul 5, 2023 at 2:46 PM Amit Kapila wrote: > > > > On Wed, Jul 5, 2023 at 9:01 AM Peter Smith wrote: > > > > > > Hi. Here are some review comments for this patch. > > > > > > +1 for the patch idea. > > > > > > -- > > > > > > I wasn't sure about the code comment adjustments suggested by Amit [1]: > > > "Accordingly, the comments atop build_replindex_scan_key(), > > > FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression() > > > should also be adjusted." > > As for IsIndexOnlyOnExpression(), what part do you think we need to > adjust? It says: > > /* > * Returns true if the given index consists only of expressions such as: > * CREATE INDEX idx ON table(foo(col)); > * > * Returns false even if there is one column reference: > * CREATE INDEX idx ON table(foo(col), col_2); > */ > > and it seems to me that the function doesn't check if the leftmost > index column is a non-expression. > TBH, this IsIndexOnlyOnExpression() function seemed redundant to me, otherwise, there can be some indexes that are firstly considered "useable" but then fail the subsequent leftmost check. It does not seem right. > > > doc/src/sgml/logical-replication.sgml > > > > > > 1. > > > the key. When replica identity FULL is specified, > > > indexes can be used on the subscriber side for searching the rows. > > > Candidate > > > indexes must be btree, non-partial, and have at least one column > > > reference > > > - (i.e. cannot consist of only expressions). These restrictions > > > - on the non-unique index properties adhere to some of the restrictions > > > that > > > - are enforced for primary keys. If there are no such suitable indexes, > > > + at the leftmost column indexes (i.e. cannot consist of only > > > expressions). These > > > + restrictions on the non-unique index properties adhere to some of > > > the restrictions > > > + that are enforced for primary keys. If there are no such suitable > > > indexes, > > > the search on the subscriber side can be very inefficient, therefore > > > replica identity FULL should only be used as a > > > fallback if no other solution is possible. If a replica identity > > > other > > > > > > Isn't this using the word "indexes" with different meanings in the > > > same sentence? e.g. IIUC "leftmost column indexes" is referring to the > > > ordinal number of the index fields. > > That was my mistake, it should be " at the leftmost column". IIUC the subscriber-side table can have more columns than the publisher-side table, so just describing in the docs that the leftmost INDEX field must be a column may not be quite enough; it also needs to say that column has to exist on the publisher-table, doesn't it? Also, after you document this 'leftmost field restriction' that already implies there *must* be a non-expression in the INDEX. So I thought we can just omit the "(i.e. cannot consist of only expressions)" part. Anyway, I will wait to see the wording of the updated patch before commenting further. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Thu, Jul 6, 2023 at 9:00 AM Jacob Champion wrote: > My goal is to maintain compatibility with existing PQconnectPoll() > applications, where the only way we get to communicate with the client > is through the PQsocket() for the connection. Ideally, you shouldn't > have to completely rewrite your application loop just to make use of > OAuth. (I assume a requirement like that would be a major roadblock to > committing this -- and if that's not a correct assumption, then I > guess my job gets a lot easier?) Ah, right, I get it. I guess there are a couple of ways to do it if we give up the goal of no-code-change-for-the-client: 1. Generalised PQsocket(), that so that a client can call something like: int PQpollset(const PGConn *conn, struct pollfd fds[], int fds_size, int *nfds, int *timeout_ms); That way, libpq could tell you about which events it would like to wait for on which fds, and when it would like you to call it back due to timeout, and you can either pass that information directly to poll() or WSAPoll() or some equivalent interface (we don't care, we just gave you the info you need), or combine it in obvious ways with whatever else you want to multiplex with in your client program. 2. Convert those events into new libpq events like 'I want you to call me back in 100ms', and 'call me back when socket #42 has data', and let clients handle that by managing their own poll set etc. (This is something I've speculated about to support more efficient postgres_fdw shard query multiplexing; gotta figure out how to get multiple connections' events into one WaitEventSet...) I guess there is a practical middle ground where client code on systems that have epoll/kqueue can use OAUTHBEARER without any code change, and the feature is available on other systems too but you'll have to change your client code to use one of those interfaces or else you get an error 'coz we just can't do it. Or, more likely in the first version, you just can't do it at all... Doesn't seem that bad to me. BTW I will happily do the epoll->kqueue port work if necessary.
Re: pgsql: Clean up role created in new subscription test.
> On 16 May 2023, at 11:17, Daniel Gustafsson wrote: > Parked in the July CF for now. Rebased to fix a trivial conflict highlighted by the CFBot. -- Daniel Gustafsson v3-0001-pg_regress-Add-database-verification-test.patch Description: Binary data
Re: Autogenerate some wait events code and documentation
Hi, On 2023-07-05 10:57:19 +0900, Michael Paquier wrote: > With all that in place, VPATH builds, the CI, meson, configure/make > and the various cleanup targets were working fine, so I have applied > it. Now let's see what the buildfarm tells. > > The final --stat number is like that: > 22 files changed, 757 insertions(+), 2111 deletions(-) That's pretty nice! Rebasing a patch over this I was a bit confused because I got a bunch of ""unable to parse wait_event_names.txt" errors. Took me a while to figure out that that was just because I didn't include a trailing . in the description. Perhaps that could be turned into a more meaningful error? die "unable to parse wait_event_names.txt" unless $line =~ /^(\w+)\t+(\w+)\t+("\w+")\t+("\w.*\.")$/; It's not helped by the fact that the regex in the error actually doesn't match any lines, because it's not operating on the input but on push(@lines, $section_name . "\t" . $_); I also do wonder if we should invest in generating the lwlock names as well. Except for a few abbreviations, the second column is always the camel-cased version of what follows WAIT_EVENT_. Feels pretty tedious to write that out. Perhaps we should just change the case of the upper-cased names (DSM, SSL, WAL, ...) to follow the other names? Greetings, Andres Freund
[BUG] Fix DETACH with FK pointing to a partitioned table fails
Hi, (patch proposal below). Consider a table with a FK pointing to a partitioned table. CREATE TABLE p ( id bigint PRIMARY KEY ) PARTITION BY list (id); CREATE TABLE p_1 PARTITION OF p FOR VALUES IN (1); CREATE TABLE r_1 ( id bigint PRIMARY KEY, p_id bigint NOT NULL, FOREIGN KEY (p_id) REFERENCES p (id) ); Now, attach this table "refg_1" as partition of another one having the same FK: CREATE TABLE r ( id bigint PRIMARY KEY, p_id bigint NOT NULL, FOREIGN KEY (p_id) REFERENCES p (id) ) PARTITION BY list (id); ALTER TABLE r ATTACH PARTITION r_1 FOR VALUES IN (1); The old sub-FKs (below 18289) created in this table to enforce the action triggers on referenced partitions are not deleted when the table becomes a partition. Because of this, we have additional and useless triggers on the referenced partitions and we can not DETACH this partition on the referencing side anymore: => ALTER TABLE r DETACH PARTITION r_1; ERROR: could not find ON INSERT check triggers of foreign key constraint 18289 => SELECT c.oid, conparentid, conrelid::regclass, confrelid::regclass, t.tgfoid::regproc FROM pg_constraint c JOIN pg_trigger t ON t.tgconstraint = c.oid WHERE confrelid::regclass = 'p_1'::regclass; oid │ conparentid │ conrelid │ confrelid │ tgfoid ───┼─┼──┼───┼ 18289 │ 18286 │ r_1 │ p_1 │ "RI_FKey_noaction_del" 18289 │ 18286 │ r_1 │ p_1 │ "RI_FKey_noaction_upd" 18302 │ 18299 │ r│ p_1 │ "RI_FKey_noaction_del" 18302 │ 18299 │ r│ p_1 │ "RI_FKey_noaction_upd" (4 rows) The legitimate constraint and triggers here are 18302. The old sub-FK 18289 having 18286 as parent should have gone during the ATTACH PARTITION. Please, find in attachment a patch dropping old "sub-FK" during the ATTACH PARTITION command and adding a regression test about it. At the very least, it help understanding the problem and sketch a possible solution. Regards, >From 5fc7997b9f9a17ee5a31f059c18e6c01fd716c04 Mon Sep 17 00:00:00 2001 From: Jehan-Guillaume de Rorthais Date: Wed, 5 Jul 2023 19:19:40 +0200 Subject: [PATCH v1] Remove useless parted-FK constraints when attaching a partition When a FK is referencing a partitioned table, this FK is parenting as many other "parted-FK" constraints than the referencing side has partitions. Each of these sub-constraints enforce only the UPDATE/DELETE triggers on each referenced partition, but not the check triggers on the referencing partition as this is already handle by the parent constraint. These parted-FK are half-backed on purpose. However when attaching such standalone table to a partitioned table having the same FK definition, all these sub-constraints were not removed. This leave a useless action trigger on each referenced partition, the legitimate one already checking against the root of the referencing partition. Moreover, when the partition is later detached, DetachPartitionFinalize() look for all existing FK on the relation and try to detach them from the parent triggers and constraints. But because these parted-FK are half backed, calling GetForeignKeyCheckTriggers() to later detach the check triggers raise an ERROR: ERROR: could not find ON INSERT check triggers of foreign key constraint N. --- src/backend/commands/tablecmds.c | 57 +-- src/test/regress/expected/foreign_key.out | 22 + src/test/regress/sql/foreign_key.sql | 27 +++ 3 files changed, 103 insertions(+), 3 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index fce5e6f220..7c1aa5d395 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -10566,9 +10566,11 @@ tryAttachPartitionForeignKey(ForeignKeyCacheInfo *fk, Form_pg_constraint parentConstr; HeapTuple partcontup; Form_pg_constraint partConstr; - ScanKeyData key; + ScanKeyData key[3]; SysScanDesc scan; HeapTuple trigtup; + HeapTuple contup; + Relation pg_constraint; Oid insertTriggerOid, updateTriggerOid; @@ -10631,12 +10633,12 @@ tryAttachPartitionForeignKey(ForeignKeyCacheInfo *fk, * in the partition. We identify them because they have our constraint * OID, as well as being on the referenced rel. */ - ScanKeyInit(&key, + ScanKeyInit(key, Anum_pg_trigger_tgconstraint, BTEqualStrategyNumber, F_OIDEQ, ObjectIdGetDatum(fk->conoid)); scan = systable_beginscan(trigrel, TriggerConstraintIndexId, true, - NULL, 1, &key); + NULL, 1, key); while ((trigtup = systable_getnext(scan)) != NULL) { Form_pg_trigger trgform = (Form_pg_trigger) GETSTRUCT(trigtup); @@ -10684,6 +10686,55 @@ tryAttachPartitionForeignKey(ForeignKeyCacheInfo *fk, TriggerSetParentTrigger(trigrel, updateTriggerOid, parentUpdTrigger,
Re: Should we remove db_user_namespace?
On Mon, Jul 03, 2023 at 04:20:39PM +0900, Michael Paquier wrote: > I am on the side of +1'ing for the removal. Here is a rebased version of the patch. So far no one has responded to the pgsql-general thread [0], and no one here has argued for keeping this parameter. I'm planning to bump the pgsql-general thread next week to give folks one more opportunity to object. [0] https://postgr.es/m/20230630215608.GD2941194%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 3d46751ec7fa55d2ab776a9cb47533fe77ab0739 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 30 Jun 2023 12:46:08 -0700 Subject: [PATCH v3 1/1] remove db_user_namespace --- doc/src/sgml/client-auth.sgml | 5 -- doc/src/sgml/config.sgml | 52 --- src/backend/libpq/auth.c | 5 -- src/backend/libpq/hba.c | 12 - src/backend/postmaster/postmaster.c | 19 --- src/backend/utils/misc/guc_tables.c | 9 src/backend/utils/misc/postgresql.conf.sample | 1 - src/include/libpq/pqcomm.h| 2 - 8 files changed, 105 deletions(-) diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml index 204d09df67..6c95f0df1e 100644 --- a/doc/src/sgml/client-auth.sgml +++ b/doc/src/sgml/client-auth.sgml @@ -1253,11 +1253,6 @@ omicron bryanh guest1 attacks. - - The md5 method cannot be used with - the feature. - - To ease transition from the md5 method to the newer SCRAM method, if md5 is specified as a method diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 6262cb7bb2..e6cea8ddfc 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1188,58 +1188,6 @@ include_dir 'conf.d' - - - db_user_namespace (boolean) - - db_user_namespace configuration parameter - - - - -This parameter enables per-database user names. It is off by default. -This parameter can only be set in the postgresql.conf -file or on the server command line. - - - -If this is on, you should create users as username@dbname. -When username is passed by a connecting client, -@ and the database name are appended to the user -name and that database-specific user name is looked up by the -server. Note that when you create users with names containing -@ within the SQL environment, you will need to -quote the user name. - - - -With this parameter enabled, you can still create ordinary global -users. Simply append @ when specifying the user -name in the client, e.g., joe@. The @ -will be stripped off before the user name is looked up by the -server. - - - -db_user_namespace causes the client's and -server's user name representation to differ. -Authentication checks are always done with the server's user name -so authentication methods must be configured for the -server's user name, not the client's. Because -md5 uses the user name as salt on both the -client and server, md5 cannot be used with -db_user_namespace. - - - - - This feature is intended as a temporary measure until a - complete solution is found. At that time, this option will - be removed. - - - - diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c index a98b934a8e..65d452f099 100644 --- a/src/backend/libpq/auth.c +++ b/src/backend/libpq/auth.c @@ -873,11 +873,6 @@ CheckMD5Auth(Port *port, char *shadow_pass, const char **logdetail) char *passwd; int result; - if (Db_user_namespace) - ereport(FATAL, -(errcode(ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION), - errmsg("MD5 authentication is not supported when \"db_user_namespace\" is enabled"))); - /* include the salt to use for computing the response */ if (!pg_strong_random(md5Salt, 4)) { diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c index f89f138f3c..5d4ddbb04d 100644 --- a/src/backend/libpq/hba.c +++ b/src/backend/libpq/hba.c @@ -1741,19 +1741,7 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel) else if (strcmp(token->string, "reject") == 0) parsedline->auth_method = uaReject; else if (strcmp(token->string, "md5") == 0) - { - if (Db_user_namespace) - { - ereport(elevel, - (errcode(ERRCODE_CONFIG_FILE_ERROR), - errmsg("MD5 authentication is not supported when \"db_user_namespace\" is enabled"), - errcontext("line %d of configuration file \"%s\"", -line_num, file_name))); - *err_msg = "MD5 authentication is not supported when \"db_user_namespace\" is enabled"; - return NULL; - } parsedli
Re: Fix last unitialized memory warning
On Mon Jul 3, 2023 at 1:19 AM CDT, Peter Eisentraut wrote: > On 07.06.23 16:31, Tristan Partin wrote: > > This patch is really not necessary from a functional point of view. It > > is only necessary if we want to silence a compiler warning. > > > > Tested on `gcc (GCC) 13.1.1 20230511 (Red Hat 13.1.1-2)`. > > > > After silencing this warning, all I am left with (given my build > > configuration) is: > > > > [1667/2280] Compiling C object src/pl/plpgsql/src/plpgsql.so.p/pl_exec.c.o > > In file included from ../src/include/access/htup_details.h:22, > > from ../src/pl/plpgsql/src/pl_exec.c:21: > > In function ‘assign_simple_var’, > > inlined from ‘exec_set_found’ at > > ../src/pl/plpgsql/src/pl_exec.c:8349:2: > > ../src/include/varatt.h:230:36: warning: array subscript 0 is outside array > > bounds of ‘char[0]’ [-Warray-bounds=] > >230 | (((varattrib_1b_e *) (PTR))->va_tag) > >|^ > > ../src/include/varatt.h:94:12: note: in definition of macro > > ‘VARTAG_IS_EXPANDED’ > > 94 | (((tag) & ~1) == VARTAG_EXPANDED_RO) > >|^~~ > > ../src/include/varatt.h:284:57: note: in expansion of macro ‘VARTAG_1B_E’ > >284 | #define VARTAG_EXTERNAL(PTR) > > VARTAG_1B_E(PTR) > >| ^~~ > > ../src/include/varatt.h:301:57: note: in expansion of macro > > ‘VARTAG_EXTERNAL’ > >301 | (VARATT_IS_EXTERNAL(PTR) && > > !VARTAG_IS_EXPANDED(VARTAG_EXTERNAL(PTR))) > >| > > ^~~ > > ../src/pl/plpgsql/src/pl_exec.c:8537:17: note: in expansion of macro > > ‘VARATT_IS_EXTERNAL_NON_EXPANDED’ > > 8537 | > > VARATT_IS_EXTERNAL_NON_EXPANDED(DatumGetPointer(newvalue))) > >| ^~~ > > > > From my perspective, this warning definitely seems like a false > > positive, but I don't know the code well-enough to say that for certain. > > I cannot reproduce this warning with gcc-13. Are you using any > non-standard optimization options. Could you give your full configure > and build commands and the OS? Thanks for following up. My system is Fedora 38. I can confirm this is still happening on master. $ gcc --version gcc (GCC) 13.1.1 20230614 (Red Hat 13.1.1-4) Copyright (C) 2023 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ meson setup build --buildtype=release $ ninja -C build -- Tristan Partin Neon (https://neon.tech)
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Fri, Jun 30, 2023 at 9:29 PM Thomas Munro wrote: > > On Sat, May 20, 2023 at 10:01 AM Jacob Champion > wrote: > > - The client implementation is currently epoll-/Linux-specific. I think > > kqueue shouldn't be too much trouble for the BSDs, but it's even more > > code to maintain. > > I guess you also need a fallback that uses plain old POSIX poll()? The use of the epoll API here is to combine several sockets into one, not to actually call epoll_wait() itself. kqueue descriptors should let us do the same, IIUC. > I see you're not just using epoll but also timerfd. Could that be > converted to plain old timeout bookkeeping? That should be enough to > get every other Unix and *possibly* also Windows to work with the same > code path. I might be misunderstanding your suggestion, but I think our internal bookkeeping is orthogonal to that. The use of timerfd here allows us to forward libcurl's timeout requirements up to the top-level PQsocket(). As an example, libcurl is free to tell us to call it again in ten milliseconds, and we have to make sure a nonblocking client calls us again after that elapses; otherwise they might hang waiting for data that's not coming. > > - Unless someone is aware of some amazing Winsock magic, I'm pretty sure > > the multiplexed-socket approach is dead in the water on Windows. I think > > the strategy there probably has to be a background thread plus a fake > > "self-pipe" (loopback socket) for polling... which may be controversial? > > I am not a Windows user or hacker, but there are certainly several > ways to multiplex sockets. First there is the WSAEventSelect() + > WaitForMultipleObjects() approach that latch.c uses. I don't think that strategy plays well with select() clients, though -- it requires a handle array, and we've just got the one socket. My goal is to maintain compatibility with existing PQconnectPoll() applications, where the only way we get to communicate with the client is through the PQsocket() for the connection. Ideally, you shouldn't have to completely rewrite your application loop just to make use of OAuth. (I assume a requirement like that would be a major roadblock to committing this -- and if that's not a correct assumption, then I guess my job gets a lot easier?) > It's a shame to write modern code using select(), but you can find > lots of shouting all over the internet about WSAPoll()'s defects, most > famously the cURL guys[1] whose blog is widely cited, so people still > do it. Right -- that's basically the root of my concern. I can't guarantee that existing Windows clients out there are all using WaitForMultipleObjects(). From what I can tell, whatever we hand up through PQsocket() has to be fully Winsock-/select-compatible. > Another thing people > complain about is the lack of socketpair() or similar in winsock which > means you unfortunately can't easily make anonymous > select/poll-compatible local sockets, but that doesn't seem to be > needed here. For the background-thread implementation, it probably would be. I've been looking at libevent (BSD-licensed) and its socketpair hack for Windows... Thanks! --Jacob
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG
Someday I will learn... Attached is the v2. -- Tristan Partin Neon (https://neon.tech) From 5688bc2b2c6331f437a72b6a429199c5416ccd76 Mon Sep 17 00:00:00 2001 From: Tristan Partin Date: Fri, 30 Jun 2023 09:31:04 -0500 Subject: [PATCH v2 1/3] Skip checking for uselocale on Windows Windows doesn't have uselocale, so skip it. --- meson.build | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/meson.build b/meson.build index fbec997947..df76f10822 100644 --- a/meson.build +++ b/meson.build @@ -2439,7 +2439,7 @@ func_checks = [ ['strsignal'], ['sync_file_range'], ['syncfs'], - ['uselocale'], + ['uselocale', {'skip': host_system == 'windows'}], ['wcstombs_l'], ] -- Tristan Partin Neon (https://neon.tech) From d52ab7851e9638e36cd99859014d7aa31154e600 Mon Sep 17 00:00:00 2001 From: Tristan Partin Date: Fri, 30 Jun 2023 11:13:29 -0500 Subject: [PATCH v2 2/3] Add locale_is_c function In some places throughout the codebase, there are string comparisons for locales matching C or POSIX. Encapsulate this logic into a single function and use it. --- src/backend/utils/adt/pg_locale.c | 37 ++- src/backend/utils/init/postinit.c | 4 +--- src/backend/utils/mb/mbutils.c| 5 ++--- src/include/utils/pg_locale.h | 1 + 4 files changed, 25 insertions(+), 22 deletions(-) diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index 514d4a9eeb..b455ef3aff 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -154,6 +154,21 @@ static void icu_set_collation_attributes(UCollator *collator, const char *loc, UErrorCode *status); #endif +/* + * Check if a locale is the C locale + * + * POSIX is an alias for C. Passing ingore_case as true will use + * pg_strcasecmp() instead of strcmp(). + */ +bool +locale_is_c(const char *locale, bool ignore_case) +{ + if (!ignore_case) + return strcmp(locale, "C") == 0 || strcmp(locale, "POSIX") == 0; + + return pg_strcasecmp(locale, "C") == 0 || pg_strcasecmp(locale, "POSIX") == 0; +} + /* * pg_perm_setlocale * @@ -1239,10 +1254,8 @@ lookup_collation_cache(Oid collation, bool set_flags) datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collctype); collctype = TextDatumGetCString(datum); - cache_entry->collate_is_c = ((strcmp(collcollate, "C") == 0) || - (strcmp(collcollate, "POSIX") == 0)); - cache_entry->ctype_is_c = ((strcmp(collctype, "C") == 0) || - (strcmp(collctype, "POSIX") == 0)); + cache_entry->collate_is_c = locale_is_c(collcollate, false); + cache_entry->ctype_is_c = locale_is_c(collctype, false); } else { @@ -1290,12 +1303,8 @@ lc_collate_is_c(Oid collation) if (!localeptr) elog(ERROR, "invalid LC_COLLATE setting"); - if (strcmp(localeptr, "C") == 0) - result = true; - else if (strcmp(localeptr, "POSIX") == 0) - result = true; - else - result = false; + result = locale_is_c(localeptr, false); + return (bool) result; } @@ -1343,12 +1352,8 @@ lc_ctype_is_c(Oid collation) if (!localeptr) elog(ERROR, "invalid LC_CTYPE setting"); - if (strcmp(localeptr, "C") == 0) - result = true; - else if (strcmp(localeptr, "POSIX") == 0) - result = true; - else - result = false; + result = locale_is_c(localeptr, false); + return (bool) result; } diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index 0f9b92b32e..a92a0c438f 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -419,9 +419,7 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect " which is not recognized by setlocale().", ctype), errhint("Recreate the database with another locale or install the missing locale."))); - if (strcmp(ctype, "C") == 0 || - strcmp(ctype, "POSIX") == 0) - database_ctype_is_c = true; + database_ctype_is_c = locale_is_c(ctype, false); if (dbform->datlocprovider == COLLPROVIDER_ICU) { diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c index 67a1ab2ab2..997156515c 100644 --- a/src/backend/utils/mb/mbutils.c +++ b/src/backend/utils/mb/mbutils.c @@ -39,6 +39,7 @@ #include "mb/pg_wchar.h" #include "utils/builtins.h" #include "utils/memutils.h" +#include "utils/pg_locale.h" #include "utils/syscache.h" #include "varatt.h" @@ -1237,9 +1238,7 @@ pg_bind_textdomain_codeset(const char *domainname) int new_msgenc; #ifndef WIN32 - const char *ctype = setlocale(LC_CTYPE, NULL); - - if (pg_strcasecmp(ctype, "C") == 0 || pg_strcasecmp(ctype, "POSIX") == 0) + if (locale_is_c(locale_ctype, true)) #endif if (encoding != PG_SQL_ASCII && raw_pg_bind_textdomain_codeset(domainname, encoding)) diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h index e2a7243542..872bef798a 100644 --- a/src/include/utils/pg_locale.h +++ b/src/include/utils/pg_locale.h @@ -54,6 +54,7 @@ extern PGDL
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG
On Mon Jul 3, 2023 at 9:42 AM CDT, Tristan Partin wrote: > Thanks for your patience. Attached is a patch that should cover all the > problematic use cases of setlocale(). There are some setlocale() calls in > tests, initdb, and ecpg left. I plan to get to ecpglib before the final > version of this patch after I abstract over Windows not having > uselocale(). I think leaving initdb and tests as is would be fine, but I > am also happy to just permanently purge setlocale() from the codebase > if people see value in that. We could also poison[0] setlocale() at that > point. > > [0]: https://gcc.gnu.org/onlinedocs/cpp/Pragmas.html Here is a v2 with best effort Windows support. My patch currently assumes that you either have uselocale() or are Windows. I dropped the environment variable hacks, but could bring them back if we didn't like this requirement. I tried to add an email[0] to discuss this with hackers, but failed to add the CC. Let's discuss here instead given my complete inability to manage mailing lists :). [0]: https://www.postgresql.org/message-id/CTUJ604ZWHI1.3PFZK152XCWLX%40gonk -- Tristan Partin Neon (https://neon.tech)
Re: Add PQsendSyncMessage() to libpq
> On 21 May 2023, at 19:17, Anton Kirilov wrote: > .. here is an updated version of the patch This hunk here: - if (PQflush(conn) < 0) + const int ret = flags & PG_PIPELINEPUTSYNC_FLUSH ? PQflush(conn) : 0; + + if (ret < 0) ..is causing this compiler warning: fe-exec.c: In function ‘PQpipelinePutSync’: fe-exec.c:3203:2: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 3203 | const int ret = flags & PG_PIPELINEPUTSYNC_FLUSH ? PQflush(conn) : 0; | ^ cc1: all warnings being treated as errors Also, the patch no longer applies. Please rebase and send an updated version. -- Daniel Gustafsson
Re: Including a sample Table Access Method with core code
> > > Included Mark Dilger directly to this mail as he mentioned he has a > > Perl script that makes a functional copy of heap AM that can be > > compiled as installed as custom AM. > > Similar discussion has happened in 640c198 related to the creation of > dummy_index_am, where the argument is that such a module needs to > provide value in testing some of the core internals. dummy_index_am > did so for reloptions on indexes because there was not much coverage > for that part of the system. > > > @mark - maybe you can create 3 boilerplate Table AMs for the above > > named `mem_am`, `overlay_am` and `py3_am` and we could put them > > somewhere for interested parties to play with ? > > Not sure if that's worth counting, but I also have a table AM template > stored in my plugin repo: > https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am > And based on your `blackhole_am` I've sent a patch [1] to add a `dummy_table_am` for testing purposes. Regards, [1] https://www.postgresql.org/message-id/CAFcNs+pcU2ib=jvjnznbod+m2tho+vd77c_yzj2rsgr0tp3...@mail.gmail.com -- Fabrízio de Royes Mello
Re: [PATCH] Honor PG_TEST_NOCLEAN for tempdirs
On Sun, Jul 2, 2023 at 6:17 PM Michael Paquier wrote: > Adjusted that as well, on top of an extra comment. Thanks all! --Jacob
Re: Disabling Heap-Only Tuples
On Wed, 2023-07-05 at 12:02 +0100, Thom Brown wrote: > On Wed, 5 Jul 2023 at 11:57, Matthias van de Meent > wrote: > > On Wed, 5 Jul 2023 at 12:45, Thom Brown wrote: > > > Heap-Only Tuple (HOT) updates are a significant performance > > > enhancement, as they prevent unnecessary page writes. However, HOT > > > comes with a caveat: it means that if we have lots of available space > > > earlier on in the relation, it can only be used for new tuples or in > > > cases where there's insufficient space on a page for an UPDATE to use > > > HOT. > > > > > > Considering these trade-offs, I'd like to propose an option to allow > > > superusers to disable HOT on tables. The intent is to trade some > > > performance benefits for the ability to reduce the size of a table > > > without the typical locking associated with it. > > > > Interesting use case, but I think that disabling HOT would be missing > > the forest for the trees. I think that a feature that disables > > block-local updates for pages > some offset would be a better solution > > to your issue: Normal updates also prefer the new tuple to be stored > > in the same pages as the old tuple if at all possible, so disabling > > HOT wouldn't solve the issue of tuples residing in the tail of your > > table - at least not while there is still empty space in those pages. > > Hmm... I see your point. It's when an UPDATE isn't going to land on > the same page that it relocates to the earlier available page. So I > guess I'm after whatever mechanism would allow that to happen reliably > and predictably. > > So $subject should really be "Allow forcing UPDATEs off the same page". I've been thinking about the same thing - an option that changes the update strategy to always use the lowest block with enough free space. That would allow to consolidate bloated tables with no down time. Yours, Laurenz Albe
Re: Allow specifying a dbname in pg_basebackup connection string
On Wed, 5 Jul 2023 at 20:09, Thom Brown wrote: > Okay, understood. In that case, please remember to write changes to > the pg_basebackup docs page explaining how the dbname value is ignored I updated the wording in the docs for pg_basebackup and pg_receivewal. They now clarify that Postgres itself doesn't care if there's a database name in the connection string, but that a proxy might. v2-0001-Allow-specifying-a-dbname-in-pg_basebackup-connec.patch Description: Binary data
Re: brininsert optimization opportunity
On Wed, Jul 5, 2023 at 3:16 AM Tomas Vondra wrote: > > I'll try this out and introduce a couple of new index AM callbacks. I > > think it's best to do it before releasing the locks - otherwise it > > might be weird > > to manipulate buffers of an index relation, without having some sort of > > lock on > > it. I'll think about it some more. > > > > I don't understand why would this need more than just a callback to > release the cache. We wouldn't. I thought that it would be slightly cleaner and slightly more performant if we moved the (if !state) branches out of the XXXinsert() functions. But I guess, let's minimize the changes here. One cleanup callback is enough. > > PS: It should be possible to make GIN and GiST use the new index AM APIs > > as well. > > > > Why should GIN/GiST use the new API? I think it's perfectly sensible to > only require the "cleanup callback" when just pfree() is not enough. Yeah no need. Attached v3 of the patch w/ a single index AM callback. Regards, Soumyadeep (VMware) From 563b905da5c2602113044689e37f8385cbfbde64 Mon Sep 17 00:00:00 2001 From: Soumyadeep Chakraborty Date: Wed, 5 Jul 2023 10:59:11 -0700 Subject: [PATCH v3 1/1] Reuse revmap and brin desc in brininsert brininsert() used to have code that performed per-tuple initialization of the revmap. That had some overhead. To mitigate, we introduce a new AM callback to clean up state at the end of all inserts, and perform the revmap cleanup there. --- contrib/bloom/blutils.c | 1 + doc/src/sgml/indexam.sgml| 14 +- src/backend/access/brin/brin.c | 74 +++- src/backend/access/gin/ginutil.c | 1 + src/backend/access/gist/gist.c | 1 + src/backend/access/hash/hash.c | 1 + src/backend/access/index/indexam.c | 15 ++ src/backend/access/nbtree/nbtree.c | 1 + src/backend/access/spgist/spgutils.c | 1 + src/backend/executor/execIndexing.c | 5 ++ src/include/access/amapi.h | 3 ++ src/include/access/brin_internal.h | 1 + src/include/access/genam.h | 2 + 13 files changed, 108 insertions(+), 12 deletions(-) diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c index d935ed8fbdf..9720f69de09 100644 --- a/contrib/bloom/blutils.c +++ b/contrib/bloom/blutils.c @@ -131,6 +131,7 @@ blhandler(PG_FUNCTION_ARGS) amroutine->ambuild = blbuild; amroutine->ambuildempty = blbuildempty; amroutine->aminsert = blinsert; + amroutine->aminsertcleanup = NULL; amroutine->ambulkdelete = blbulkdelete; amroutine->amvacuumcleanup = blvacuumcleanup; amroutine->amcanreturn = NULL; diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml index e813e2b620a..17f7d6597f1 100644 --- a/doc/src/sgml/indexam.sgml +++ b/doc/src/sgml/indexam.sgml @@ -139,6 +139,7 @@ typedef struct IndexAmRoutine ambuild_function ambuild; ambuildempty_function ambuildempty; aminsert_function aminsert; +aminsertcleanup_function aminsertcleanup; ambulkdelete_function ambulkdelete; amvacuumcleanup_function amvacuumcleanup; amcanreturn_function amcanreturn; /* can be NULL */ @@ -355,11 +356,22 @@ aminsert (Relation indexRelation, within an SQL statement, it can allocate space in indexInfo->ii_Context and store a pointer to the data in indexInfo->ii_AmCache (which will be NULL - initially). + initially). If the data cached contains structures that can be simply pfree'd, + they will implicitly be pfree'd. But, if it contains more complex data, such + as Buffers or TupleDescs, additional cleanup is necessary. This additional + cleanup can be performed in indexinsertcleanup. +bool +aminsertcleanup (IndexInfo *indexInfo); + + Clean up state that was maintained across successive inserts in + indexInfo->ii_AmCache. This is useful if the data + contained is complex - like Buffers or TupleDescs which need additional + cleanup, unlike simple structures that will be implicitly pfree'd. + IndexBulkDeleteResult * ambulkdelete (IndexVacuumInfo *info, IndexBulkDeleteResult *stats, diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c index 3c6a956eaa3..2da59f2ab03 100644 --- a/src/backend/access/brin/brin.c +++ b/src/backend/access/brin/brin.c @@ -58,6 +58,17 @@ typedef struct BrinBuildState BrinMemTuple *bs_dtuple; } BrinBuildState; +/* + * We use a BrinInsertState to capture running state spanning multiple + * brininsert invocations, within the same command. + */ +typedef struct BrinInsertState +{ + BrinRevmap *bs_rmAccess; + BrinDesc *bs_desc; + BlockNumber bs_pages_per_range; +} BrinInsertState; + /* * Struct used as "opaque" during index scans */ @@ -72,6 +83,7 @@ typedef struct BrinOpaque static BrinBuildState *initialize_brin_buildstate(Relation idxRel, BrinRevmap *revmap, BlockNumber pagesPerRange); +static BrinInsertState *initialize_brin_insertstate(Relation idxRel, IndexInfo *indexInfo);
Re: Allow specifying a dbname in pg_basebackup connection string
On Wed, 5 Jul 2023 at 16:50, Jelte Fennema wrote: > > On Wed, 5 Jul 2023 at 16:01, Euler Taveira wrote: > > One of the PgBouncer's missions is to be a transparent proxy. > > > > Sometimes you cannot reach out the database directly due to a security > > policy. > > Indeed the transparent proxy use case is where replication through > pgbouncer makes sense. There's quite some reasons to set up PgBouncer > like such a proxy apart from security policies. Some others that come > to mind are: > - load balancer layer of pgbouncers > - transparent failovers > - transparent database moves > > And in all of those cases its nice for a user to use a single > connection string/hostname. Instead of having to think: Oh yeah, for > backups, I need to use this other one. Okay, understood. In that case, please remember to write changes to the pg_basebackup docs page explaining how the dbname value is ignored under normal usage. Thom
Re: Disabling Heap-Only Tuples
On Wed, 5 Jul 2023 at 18:05, Matthias van de Meent wrote: > > On Wed, 5 Jul 2023 at 14:39, Thom Brown wrote: > > > > On Wed, 5 Jul 2023 at 13:12, Matthias van de Meent > > wrote: > > > > > > On Wed, 5 Jul 2023 at 13:03, Thom Brown wrote: > > > > > > > > On Wed, 5 Jul 2023 at 11:57, Matthias van de Meent > > > > wrote: > > > > > > > > > > On Wed, 5 Jul 2023 at 12:45, Thom Brown wrote: > > > > > > Heap-Only Tuple (HOT) updates are a significant performance > > > > > > enhancement, as they prevent unnecessary page writes. However, HOT > > > > > > comes with a caveat: it means that if we have lots of available > > > > > > space > > > > > > earlier on in the relation, it can only be used for new tuples or in > > > > > > cases where there's insufficient space on a page for an UPDATE to > > > > > > use > > > > > > HOT. > > > > > > > > > > > > This mechanism limits our options for condensing tables, forcing us > > > > > > to > > > > > > resort to methods like running VACUUM FULL/CLUSTER or using external > > > > > > tools like pg_repack. These either require exclusive locks (which > > > > > > will > > > > > > be a deal-breaker on large tables on a production system), or > > > > > > there's > > > > > > risks involved. Of course we can always flood pages with new > > > > > > versions > > > > > > of a row until it's forced onto an early page, but that shouldn't be > > > > > > necessary. > > > > > > > > > > > > Considering these trade-offs, I'd like to propose an option to allow > > > > > > superusers to disable HOT on tables. The intent is to trade some > > > > > > performance benefits for the ability to reduce the size of a table > > > > > > without the typical locking associated with it. > > > > > > > > > > Interesting use case, but I think that disabling HOT would be missing > > > > > the forest for the trees. I think that a feature that disables > > > > > block-local updates for pages > some offset would be a better solution > > > > > to your issue: Normal updates also prefer the new tuple to be stored > > > > > in the same pages as the old tuple if at all possible, so disabling > > > > > HOT wouldn't solve the issue of tuples residing in the tail of your > > > > > table - at least not while there is still empty space in those pages. > > > > > > > > Hmm... I see your point. It's when an UPDATE isn't going to land on > > > > the same page that it relocates to the earlier available page. So I > > > > guess I'm after whatever mechanism would allow that to happen reliably > > > > and predictably. > > > > > > > > So $subject should really be "Allow forcing UPDATEs off the same page". > > > > > > You'd probably want to do that only for a certain range of the table - > > > for a table with 1GB of data and 3GB of bloat there is no good reason > > > to force page-crossing updates in the first 1GB of the table - all > > > tuples of the table will eventually reside there, so why would you > > > take a performance penalty and move the tuples from inside that range > > > to inside that same range? > > > > I'm thinking more of a case of: > > > > > > > > UPDATE bigtable > > SET primary key = primary key > > WHERE ctid IN ( > > SELECT ctid > > FROM bigtable > > ORDER BY ctid DESC > > LIMIT 10); > > So what were you thinking of? A session GUC? A table option? Both. > The benefit of a table option is that it is retained across sessions > and thus allows tables that get enough updates to eventually get to a > cleaner state. The main downside of such a table option is that it > requires a temporary table-level lock to update the parameter. Yes, but the maintenance window to make such a change would be extremely brief. > The benefit of a session GUC is that you can set it without impacting > other sessions, but the downside is that you need to do the > maintenance in that session, and risk that cascading updates to other > tables (e.g. through AFTER UPDATE triggers) are also impacted by this > non-local update GUC. > > > > Something else to note: Indexes would suffer some (large?) amount of > > > bloat in this process, as you would be updating a lot of tuples > > > without the HOT optimization, thus increasing the work to be done by > > > VACUUM. > > > This may result in more bloat in indexes than what you get back from > > > shrinking the table. > > > > This could be the case, but I guess indexes are expendable to an > > extent, unlike tables. > > I don't think that's accurate - index rebuilds are quite expensive. > But, that's besides the point of this thread. > > Somewhat related: did you consider using pg_repack instead of this > potential feature? pg_repack isn't exactly innocuous, and can leave potentially the database in an irrevocable state. Plus, if disk space is an issue, it doesn't help. Thom
Re: logicalrep_message_type throws an error
On 2023-Jul-05, Alvaro Herrera wrote: > On 2023-Jul-05, Euler Taveira wrote: > > > Isn't this numerical value already exposed in the error message (X = 88)? > > In this example, it is: > > > > ERROR: invalid logical replication message type "X" > > CONTEXT: processing remote data for replication origin "pg_16638" during > > message type "??? (88)" in transaction 796, finished at 0/1626698 > > > > IMO it could be confusing if we provide two representations of the same > > data (X > > and 88). Since we already provide "X" I don't think we need "88". > > The CONTEXT message could theoretically appear in other error throws, > not just in "invalid logical replication message". So the duplicity is > not really a problem. Ah, but you're thinking that if the message type is invalid, then it will have been rejected in the "invalid logical replication message" stage, so no invalid message type will be reported. I guess there's a point to that argument as well. However, I don't see that having the numerical ASCII value there causes any harm, even if the char value is already exposed in the other message. (I'm sure you'll agree that this is quite a minor issue.) I doubt that each of these two prints of the enum value showing different formats is confusing. Yes, the enum is defined in terms of char literals, but if an actually invalid message shows up with an uint32 value outside the printable ASCII range, the printout might be ugly or useless. > > Another option, is to remove "X" from apply_dispatch() and add "??? > > (88)" to logicalrep_message_type(). Now *this* would be an actively bad idea IMO. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: Disabling Heap-Only Tuples
On Wed, 5 Jul 2023 at 14:39, Thom Brown wrote: > > On Wed, 5 Jul 2023 at 13:12, Matthias van de Meent > wrote: > > > > On Wed, 5 Jul 2023 at 13:03, Thom Brown wrote: > > > > > > On Wed, 5 Jul 2023 at 11:57, Matthias van de Meent > > > wrote: > > > > > > > > On Wed, 5 Jul 2023 at 12:45, Thom Brown wrote: > > > > > Heap-Only Tuple (HOT) updates are a significant performance > > > > > enhancement, as they prevent unnecessary page writes. However, HOT > > > > > comes with a caveat: it means that if we have lots of available space > > > > > earlier on in the relation, it can only be used for new tuples or in > > > > > cases where there's insufficient space on a page for an UPDATE to use > > > > > HOT. > > > > > > > > > > This mechanism limits our options for condensing tables, forcing us to > > > > > resort to methods like running VACUUM FULL/CLUSTER or using external > > > > > tools like pg_repack. These either require exclusive locks (which will > > > > > be a deal-breaker on large tables on a production system), or there's > > > > > risks involved. Of course we can always flood pages with new versions > > > > > of a row until it's forced onto an early page, but that shouldn't be > > > > > necessary. > > > > > > > > > > Considering these trade-offs, I'd like to propose an option to allow > > > > > superusers to disable HOT on tables. The intent is to trade some > > > > > performance benefits for the ability to reduce the size of a table > > > > > without the typical locking associated with it. > > > > > > > > Interesting use case, but I think that disabling HOT would be missing > > > > the forest for the trees. I think that a feature that disables > > > > block-local updates for pages > some offset would be a better solution > > > > to your issue: Normal updates also prefer the new tuple to be stored > > > > in the same pages as the old tuple if at all possible, so disabling > > > > HOT wouldn't solve the issue of tuples residing in the tail of your > > > > table - at least not while there is still empty space in those pages. > > > > > > Hmm... I see your point. It's when an UPDATE isn't going to land on > > > the same page that it relocates to the earlier available page. So I > > > guess I'm after whatever mechanism would allow that to happen reliably > > > and predictably. > > > > > > So $subject should really be "Allow forcing UPDATEs off the same page". > > > > You'd probably want to do that only for a certain range of the table - > > for a table with 1GB of data and 3GB of bloat there is no good reason > > to force page-crossing updates in the first 1GB of the table - all > > tuples of the table will eventually reside there, so why would you > > take a performance penalty and move the tuples from inside that range > > to inside that same range? > > I'm thinking more of a case of: > > > > UPDATE bigtable > SET primary key = primary key > WHERE ctid IN ( > SELECT ctid > FROM bigtable > ORDER BY ctid DESC > LIMIT 10); So what were you thinking of? A session GUC? A table option? The benefit of a table option is that it is retained across sessions and thus allows tables that get enough updates to eventually get to a cleaner state. The main downside of such a table option is that it requires a temporary table-level lock to update the parameter. The benefit of a session GUC is that you can set it without impacting other sessions, but the downside is that you need to do the maintenance in that session, and risk that cascading updates to other tables (e.g. through AFTER UPDATE triggers) are also impacted by this non-local update GUC. > > Something else to note: Indexes would suffer some (large?) amount of > > bloat in this process, as you would be updating a lot of tuples > > without the HOT optimization, thus increasing the work to be done by > > VACUUM. > > This may result in more bloat in indexes than what you get back from > > shrinking the table. > > This could be the case, but I guess indexes are expendable to an > extent, unlike tables. I don't think that's accurate - index rebuilds are quite expensive. But, that's besides the point of this thread. Somewhat related: did you consider using pg_repack instead of this potential feature? Kind regards, Matthias van de Meent Neon (https://neon.tech)
Re: logicalrep_message_type throws an error
On 2023-Jul-05, Euler Taveira wrote: > Isn't this numerical value already exposed in the error message (X = 88)? > In this example, it is: > > ERROR: invalid logical replication message type "X" > CONTEXT: processing remote data for replication origin "pg_16638" during > message type "??? (88)" in transaction 796, finished at 0/1626698 > > IMO it could be confusing if we provide two representations of the same data > (X > and 88). Since we already provide "X" I don't think we need "88". Another > option, is to remove "X" from apply_dispatch() and add "??? (88)" to > logicalrep_message_type(). > > Opinions? The CONTEXT message could theoretically appear in other error throws, not just in "invalid logical replication message". So the duplicity is not really a problem. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Now I have my system running, not a byte was off the shelf; It rarely breaks and when it does I fix the code myself. It's stable, clean and elegant, and lightning fast as well, And it doesn't cost a nickel, so Bill Gates can go to hell."
Re: [PATCH] Add GitLab CI to PostgreSQL
On Wed, 5 Jul 2023 at 15:22, Andrew Dunstan wrote: > > > On 2023-07-04 Tu 19:44, Newhouse, Robin wrote: > > > Hello! > > > > I propose the attached patch to be applied on the 'master' branch > > of PostgreSQL to add GitLab CI automation alongside Cirrus CI in the > > PostgreSQL repository. Can you configure GitLab to use a .gitlab-ci.yml file that is not in the root directory? > > It is not intended to be a replacement for Cirrus CI, but simply suggestion > > for the > > PostgreSQL project to host centrally a Gitlab CI definition for those who > > prefer to use > > it while developing/testing PostgreSQL. > > > > The intent is to facilitate collaboration among GitLab users, promote > > standardization > > and consistency, and ultimately, improve testing and code quality. > > > This seems very RedHat-centric, which I'm not sure is a good idea. Also, > shouldn't at least some of these recipes call dnf and dnf-builddep instead of > yum and yum-build-dep? I don't think it's bad to add an automated test suite for redhat-based images? > If we're going to do this then arguably we should also be supporting GitHub > Actions and who knows what other CI frameworks. There is a case for us > special casing Cirrus CI because it's used for the cfbot. I think there's a good case for _not_ using Cirrus CI, namely that the license may be prohibitive, self-management of the build system (storage of artifacts, UI, database) is missing for Cirrus CI, and it also seems to be unable to run automated CI on repositories that aren't hosted on Github. I think these are good arguments for adding a GitLab CI configuration. Unless the cfbot is entirely under management of the PostgreSQL project (which it doesn't appear to be, as far as I know the URL is still cfbot.cputube.org indicating some amount of external control) the only special casing for Cirrus CI within the project seems to be "people have experience with this tool", which is good, but not exclusive to Cirrus CI - clearly there are also people here who have experience with (or are interested in) maintaining GitLab CI configurations. Kind regards, Matthias van de Meent Neon (https://neon.tech)
Re: Allow specifying a dbname in pg_basebackup connection string
On Wed, 5 Jul 2023 at 16:01, Euler Taveira wrote: > One of the PgBouncer's missions is to be a transparent proxy. > > Sometimes you cannot reach out the database directly due to a security policy. Indeed the transparent proxy use case is where replication through pgbouncer makes sense. There's quite some reasons to set up PgBouncer like such a proxy apart from security policies. Some others that come to mind are: - load balancer layer of pgbouncers - transparent failovers - transparent database moves And in all of those cases its nice for a user to use a single connection string/hostname. Instead of having to think: Oh yeah, for backups, I need to use this other one.
Re: logical decoding and replication of sequences, take 2
And the last patch 0008. @@ -1180,6 +1194,13 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt, ... snip ... +if (IsSet(opts.specified_opts, SUBOPT_SEQUENCES)) +{ +values[Anum_pg_subscription_subsequences - 1] = +BoolGetDatum(opts.sequences); +replaces[Anum_pg_subscription_subsequences - 1] = true; +} + The list of allowed options set a few lines above this code does not contain "sequences". Is this option missing there or this code is unnecessary? If we intend to add "sequence" at a later time after a subscription is created, will the sequences be synced after ALTER SUBSCRIPTION? +/* + * ignore sequences when not requested + * + * XXX Maybe we should differentiate between "callbacks not defined" or + * "subscriber disabled sequence replication" and "subscriber does not + * know about sequence replication" (e.g. old subscriber version). + * + * For the first two it'd be fine to bail out here, but for the last it It's not clear which two you are talking about. Maybe that's because the paragraph above is ambiguious. It is in the form of A or B and C; so not clear which cases we are differentiating between: (A, B, C), ((A or B) and C) or (A or (B and C)) or something else. + * might be better to continue and error out only when the sequence + * would be replicated (e.g. as part of the publication). We don't know + * that here, unfortunately. Please see comments on changes to pgoutput_startup() below. We may want to change the paragraph accordingly. @@ -298,6 +298,20 @@ StartupDecodingContext(List *output_plugin_options, */ ctx->reorder->update_progress_txn = update_progress_txn_cb_wrapper; +/* + * To support logical decoding of sequences, we require the sequence + * callback. We decide it here, but only check it later in the wrappers. + * + * XXX Isn't it wrong to define only one of those callbacks? Say we + * only define the stream_sequence_cb() - that may get strange results + * depending on what gets streamed. Either none or both? I don't think the current condition is correct; it will consider sequence changes to be streamed even when sequence_cb is not defined and actually not send those. sequence_cb is needed to send sequence changes irrespective of whether transaction streaming is supported. But stream_sequence_cb is required if other stream callbacks are available. Something like if (ctx->callbacks.sequence_cb) { if (ctx->streaming) { if ctx->callbacks.stream_sequence_cb == NULL) ctx->sequences = false; else ctx->sequences = true; } else ctx->sequences = true; } else ctx->sequences = false; + * + * XXX Shouldn't sequence be defined at slot creation time, similar + * to two_phase? Probably not. I don't know why two_phase is defined at the slot creation time, so can't comment on this. But looks like something we need to answer before committing the patches. +/* + * We allow decoding of sequences when the option is given at the streaming + * start, provided the plugin supports all the callbacks for two-phase. s/two-phase/sequences/ + * + * XXX Similar behavior to the two-phase block below. I think we need to describe sequence specific behaviour instead of pointing to the two-phase. two-phase is part of in replication slot's on disk specification but sequence is not. Given that it's XXX, I think you are planning to do that. + * + * XXX Shouldn't this error out if the callbacks are not defined? Isn't this already being done in pgoutput_startup()? Should we remove this XXX. +/* + * Here, we just check whether the sequences decoding option is passed + * by plugin and decide whether to enable it at later point of time. It + * remains enabled if the previous start-up has done so. But we only + * allow the option to be passed in with sufficient version of the + * protocol, and when the output plugin supports it. + */ +if (!data->sequences) +ctx->sequences_opt_given = false; +else if (data->protocol_version < LOGICALREP_PROTO_SEQUENCES_VERSION_NUM) +ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("requested proto_version=%d does not support sequences, need %d or higher", +data->protocol_version, LOGICALREP_PROTO_SEQUENCES_VERSION_NUM))); +else if (!ctx->sequences) +ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("sequences requested, but not supported by output plugin"))); If a given output plugin doesn't implement the callbacks but subscription specifies sequences, the code will throw an error whether or not publica
Re: logical decoding and replication of sequences, take 2
0005, 0006 and 0007 are all related to the initial sequence sync. [3] resulted in 0007 and I think we need it. That leaves 0005 and 0006 to be reviewed in this response. I followed the discussion starting [1] till [2]. The second one mentions the interlock mechanism which has been implemented in 0005 and 0006. While I don't have an objection to allowing LOCKing a sequence using the LOCK command, I am not sure whether it will actually work or is even needed. The problem described in [1] seems to be the same as the problem described in [2]. In both cases we see the sequence moving backwards during CATCHUP. At the end of catchup the sequence is in the right state in both the cases. [2] actually deems this behaviour OK. I also agree that the behaviour is ok. I am confused whether we have solved anything using interlocking and it's really needed. I see that the idea of using an LSN to decide whether or not to apply a change to sequence started in [4]. In [5] Tomas proposed to use page LSN. Looking at [6], it actually seems like a good idea. In [7] Tomas agreed that LSN won't be sufficient. But I don't understand why. There are three LSNs in the picture - restart LSN of sync slot, confirmed_flush LSN of sync slot and page LSN of the sequence page from where we read the initial state of the sequence. I think they can be used with the following rules: 1. The publisher will not send any changes with LSN less than confirmed_flush so we are good there. 2. Any non-transactional changes that happened between confirmed_flush and page LSN should be discarded while syncing. They are already visible to SELECT. 3. Any transactional changes with commit LSN between confirmed_flush and page LSN should be discarded while syncing. They are already visible to SELECT. 4. A DDL acquires a lock on sequence. Thus no other change to that sequence can have an LSN between the LSN of the change made by DDL and the commit LSN of that transaction. Only DDL changes to sequence are transactional. Hence any transactional changes with commit LSN beyond page LSN would not have been seen by the SELECT otherwise SELECT would see the page LSN committed by that transaction. so they need to be applied while syncing. 5. Any non-transactional changes beyond page LSN should be applied. They are not seen by SELECT. Am I missing something? I don't have an idea how to get page LSN via a SQL query (while also fetching data on that page). That may or may not be a challenge. [1] https://www.postgresql.org/message-id/c2799362-9098-c7bf-c315-4d7975acafa3%40enterprisedb.com [2] https://www.postgresql.org/message-id/2d4bee7b-31be-8b36-2847-a21a5d56e04f%40enterprisedb.com [3] https://www.postgresql.org/message-id/f5a9d63d-a6fe-59a9-d1ed-38f6a5582c13%40enterprisedb.com [4] https://www.postgresql.org/message-id/CAA4eK1KUYrXFq25xyjBKU1UDh7Dkzw74RXN1d3UAYhd4NzDcsg%40mail.gmail.com [5] https://www.postgresql.org/message-id/CAA4eK1LiA8nV_ZT7gNHShgtFVpoiOvwoxNsmP_fryP%3DPsYPvmA%40mail.gmail.com [6] https://www.postgresql.org/docs/current/storage-page-layout.html -- Best Wishes, Ashutosh Bapat
Re: pg_column_toast_chunk_id: a function to get a chunk ID of a TOASTed value
Hi! I like the idea of having a standard function which shows a TOAST value ID for a row. I've used my own to handle TOAST errors. Just, maybe, more correct name would be "...value_id", because you actually retrieve valueid field from the TOAST pointer, and chunk ID consists of valueid + chunk_seq. -- Regards, Nikita Malakhov Postgres Professional The Russian Postgres Company https://postgrespro.ru/
Re: Doc limitation update proposal: include out-of-line OID usage per TOAST-ed columns
On Tue, Jun 13, 2023 at 10:20 AM John Naylor wrote: Hi John, v3 is attached for review. > > > >- > >+ see note below on TOAST > > Maybe: > "further limited by the number of TOAST-ed values; see note below" Fixed. > > I've wrongly put it, I've meant that pg_largeobject also consume OID > > and as such are subject to 32TB limit. > No, OID has nothing to do with the table size limit, they have to do with the max number of LOs in a DB. Clearly I needed more coffee back then... > Also, perhaps the LO entries should be split into a separate patch. Since they are a special case and documented elsewhere, it's not clear their limits fit well here. Maybe they could. Well, but those are *limits* of the engine and they seem to be pretty widely chosen especially in migration scenarios (because they are the only ones allowed to store over 1GB). I think we should warn the dangers of using pg_largeobjects. > > + > > + For every TOAST-ed column (that is for field values wider than TOAST_TUPLE_TARGET > > + [2040 bytes by default]), due to internal PostgreSQL implementation of using one > > + shared global OID counter - today you cannot have more than 4,294,967,296 out-of-line > > + values in a single table. > > + > > + > > + > "column" != "field value". (..)"Today" is irrelevant. Needs polish. Fixed. > Also the shared counter is the cause of the slowdown, but not the reason for the numeric limit. Isn't it both? typedef Oid is unsigned int = 2^32, and according to GetNewOidWithIndex() logic if we exhaust the whole OID space it will hang indefinitely which has the same semantics as "being impossible"/permanent hang (?) > + out-of-line value (The search for free OIDs won't stop until it finds the free OID). > Still too much detail, and not very illuminating. If it *did* stop, how does that make it any less of a problem? OK I see your point - so it's removed. As for the question: well, maybe we could document that one day in known-performance-cliffs.sgml (or via Wiki) instead of limits.sgml. > + Therefore, the OID shortages will eventually cause slowdowns to the > + INSERTs/UPDATE/COPY statements. > Maybe this whole sentence is better as "This will eventually cause slowdowns for INSERT, UPDATE, and COPY statements." Yes, it flows much better that way. > > > + Please note that that the limit of 2^32 > > > + out-of-line TOAST values applies to the sum of both visible and invisible tuples. > > > > > > We didn't feel the need to mention this for normal tuples... > > > > Right, but this somewhat points reader to the queue-like scenario > > mentioned by Nikita. > That seems to be in response to you mentioning "especially to steer people away from designing very wide non-partitioned tables". In any case, I'm now thinking that everything in this sentence and after doesn't belong here. We don't need to tell people to vacuum, and we don't need to tell them about partitioning as a workaround -- it's a workaround for the table size limit, too, but we are just documenting the limits here. OK, I've removed the visible/invisible fragments and workaround techniques. -J. v3-0001-doc-Add-some-OID-TOAST-related-limitations-to-the.patch Description: Binary data
Re: Parallel CREATE INDEX for BRIN indexes
On Wed, 5 Jul 2023 at 00:08, Tomas Vondra wrote: > > > > On 7/4/23 23:53, Matthias van de Meent wrote: > > On Thu, 8 Jun 2023 at 14:55, Tomas Vondra > > wrote: > >> > >> Hi, > >> > >> Here's a WIP patch allowing parallel CREATE INDEX for BRIN indexes. The > >> infrastructure (starting workers etc.) is "inspired" by the BTREE code > >> (i.e. copied from that and massaged a bit to call brin stuff). > > > > Nice work. > > > >> In both cases _brin_end_parallel then reads the summaries from worker > >> files, and adds them into the index. In 0001 this is fairly simple, > >> although we could do one more improvement and sort the ranges by range > >> start to make the index nicer (and possibly a bit more efficient). This > >> should be simple, because the per-worker results are already sorted like > >> that (so a merge sort in _brin_end_parallel would be enough). > > > > I see that you manually built the passing and sorting of tuples > > between workers, but can't we use the parallel tuplesort > > infrastructure for that? It already has similar features in place and > > improves code commonality. > > > > Maybe. I wasn't that familiar with what parallel tuplesort can and can't > do, and the little I knew I managed to forget since I wrote this patch. > Which similar features do you have in mind? I was referring to the feature that is "emitting a single sorted run of tuples at the leader backend based on data gathered in parallel worker backends". It manages the sort state, on-disk runs etc. so that you don't have to manage that yourself. Adding a new storage format for what is effectively a logical tape (logtape.{c,h}) and manually merging it seems like a lot of changes if that functionality is readily available, standardized and optimized in sortsupport; and adds an additional place to manually go through for disk-related changes like TDE. Kind regards, Matthias van de Meent Neon (https://neon.tech/)
Re: Removing unneeded self joins
Hi, During the significant code revision in v.41 I lost some replacement operations. Here is the fix and extra tests to check this in the future. Also, Tom added the JoinDomain structure five months ago, and I added code to replace relids for that place too. One more thing, I found out that we didn't replace SJs, defined by baserestrictinfos if no one self-join clause have existed for the join. Now, it is fixed, and the test has been added. To understand changes readily, see the delta file in the attachment. -- regards, Andrey Lepikhov Postgres Professional From 8f5a432f6fbbcad1fd2937f33af09e9328690b6b Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Tue, 4 Jul 2023 16:07:50 +0700 Subject: [PATCH] Add lost arrangements of relids and varnos. Add the test to check it. Add one more cleaning procedure on JoinDomain relids which was introduced recently with commit 3bef56e. Fix the corner case when we haven't removed SJ if the selfjoinquals list was empty. --- src/backend/optimizer/plan/analyzejoins.c | 15 ++- src/test/regress/expected/join.out| 26 --- src/test/regress/expected/updatable_views.out | 17 +--- src/test/regress/sql/join.sql | 9 +++ 4 files changed, 53 insertions(+), 14 deletions(-) diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index a93e4ce05c..15234b7a3b 100644 --- a/src/backend/optimizer/plan/analyzejoins.c +++ b/src/backend/optimizer/plan/analyzejoins.c @@ -424,6 +424,8 @@ remove_rel_from_query_common(PlannerInfo *root, RelOptInfo *rel, sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst); sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst); sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst); + + replace_varno((Node *) sjinf->semi_rhs_exprs, relid, subst); } /* @@ -465,6 +467,8 @@ remove_rel_from_query_common(PlannerInfo *root, RelOptInfo *rel, /* ph_needed might or might not become empty */ phv->phrels = replace_relid(phv->phrels, relid, subst); phv->phrels = replace_relid(phv->phrels, ojrelid, subst); + phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst); + phinfo->ph_var->phrels = replace_relid(phinfo->ph_var->phrels, relid, subst); Assert(!bms_is_empty(phv->phrels)); Assert(phv->phnullingrels == NULL); /* no need to adjust */ } @@ -1545,6 +1549,7 @@ update_eclass(EquivalenceClass *ec, int from, int to) } em->em_relids = replace_relid(em->em_relids, from, to); + em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to); /* We only process inner joins */ replace_varno((Node *) em->em_expr, from, to); @@ -2101,7 +2106,7 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids) */ restrictlist = generate_join_implied_equalities(root, joinrelids, inner->relids, - outer, 0); + outer, NULL); /* * Process restrictlist to seperate the self join quals out of @@ -2111,6 +2116,14 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids) split_selfjoin_quals(root, restrictlist, &selfjoinquals, &otherjoinquals, inner->relid, outer->relid); + /* +* To enable SJE for the only degenerate case without any self join +* clauses at all, add baserestrictinfo to this list. +* Degenerate case works only if both sides have the same clause. So +* doesn't matter which side to add. +*/ + selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo); + /* * Determine if the inner table can duplicate outer rows. We must * bypass the unique rel cache here since we're possibly using a diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index b1f43f6ff8..027c356bcc 100644 --- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -5807,11 +5807,13 @@ ex
Re: logicalrep_message_type throws an error
On Wed, Jul 5, 2023 at 7:07 PM Euler Taveira wrote: > > On Wed, Jul 5, 2023, at 7:56 AM, Alvaro Herrera wrote: > > On 2023-Jul-05, Amit Kapila wrote: > > > I think after returning "???" from logicalrep_message_type(), the > > above is possible when we get the error: "invalid logical replication > > message type "X"" from apply_dispatch(), right? If so, then what about > > the case when we forgot to handle some message in > > logicalrep_message_type() but handled it in apply_dispatch()? apply_dispatch() has a default case in switch() so it can theoretically forget to handle some message type. I think we should avoid default case in that function to catch missing message type in that function. But if logicalrep_message_type() doesn't use default case, it won't forget to handle a known message type. So what you are suggesting is not possible. It might happen that the upstream may send an unknown message type that both apply_dispatch() and logicalrep_message_type() can not handle. > ERROR: invalid logical replication message type "X" > CONTEXT: processing remote data for replication origin "pg_16638" during > message type "??? (88)" in transaction 796, finished at 0/1626698 > > IMO it could be confusing if we provide two representations of the same data > (X > and 88). Since we already provide "X" I don't think we need "88". Another > option, is to remove "X" from apply_dispatch() and add "??? (88)" to > logicalrep_message_type(). I think we don't need message type to be mentioned in the context for an error about invalid message type. I think what needs to be done is to set apply_error_callback_arg.command = 0 before calling ereport in the default case of apply_dispatch(). apply_error_callback() will just return without providing a context. If we need a context and have all the other necessary fields, we can improve apply_error_callback() to provide context when apply_error_callback_arg.command = 0 . -- Best Wishes, Ashutosh Bapat
Re: Prevent psql \watch from running queries that return no rows
Thanks for the feedback! On Wed, Jul 5, 2023 at 5:51 AM Daniel Gustafsson wrote: > > The comment on ExecQueryAndProcessResults() needs to be updated with an > explanation of what this parameter is. > I added a comment in the place where min_rows is used, but not sure what you mean by adding it to the main comment at the top of the function? None of the other args are explained there, even the non-intuitive ones (e.g. svpt_gone_p) - return cancel_pressed ? 0 : success ? 1 : -1; > + return (cancel_pressed || return_early) ? 0 : success ? 1 : -1; > > I think this is getting tangled up enough that it should be replaced with > separate if() statements for the various cases. > Would like to hear others weigh in, I think it's still only three states plus a default, so I'm not convinced it warrants multiple statements yet. :) + HELP0(" \\watch [[i=]SEC] [c=N] [m=ROW]\n"); > + HELP0(" execute query every SEC seconds, > up to N times\n"); > + HELP0(" stop if less than ROW minimum > rows are rerturned\n"); > > "less than ROW minimum rows" reads a bit awkward IMO, how about calling it > [m=MIN] and describe as "stop if less than MIN rows are returned"? Also, > there > is a typo: s/rerturned/returned/. > Great idea: changed and will attach a new patch Cheers, Greg psql_watch_exit_on_zero_rows_v3.patch Description: Binary data
Re: Allow specifying a dbname in pg_basebackup connection string
On Wed, Jul 5, 2023, at 9:43 AM, Thom Brown wrote: > I guess my immediate question is, should backups be taken through > PgBouncer? It seems beyond PgBouncer's remit. One of the PgBouncer's missions is to be a transparent proxy. Sometimes you cannot reach out the database directly due to a security policy. I've heard this backup question a few times. IMO if dbname doesn't matter for reaching the server directly, I don't see a problem relaxing this restriction to support this use case. We just need to document that dbname will be ignored if specified. Other connection poolers might also benefit from it. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: pg_decode_message vs skip_empty_xacts and xact_wrote_changes
On Wed, Jul 5, 2023 at 2:29 PM Amit Kapila wrote: > > On Wed, Jul 5, 2023 at 2:28 PM Amit Kapila wrote: > > > > On Mon, Jul 3, 2023 at 4:49 PM vignesh C wrote: > > > > > > +1 for the first version patch, I also felt the first version is > > > easily understandable. > > > > > > > Okay, please find the slightly updated version (changed a comment and > > commit message). Unless there are more comments, I'll push this in a > > day or two. > > > > oops, forgot to attach the patch. I still think that we need to do something so that a new call to pg_output_begin() automatically takes care of the conditions under which it should be called. Otherwise, we will introduce a similar problem in some other place in future. -- Best Wishes, Ashutosh Bapat
Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
Hi, > Reviewing this now. I think it's almost ready to be committed. > > There's another big effort going on to move SLRUs to the regular buffer > cache (https://commitfest.postgresql.org/43/3514/). I wonder how moving > to 64 bit page numbers will affect that. BlockNumber is still 32 bits, > after all. Somehow I didn't pay too much attention to this effort, thanks. I will familiarize myself with the patch. Intuitively I don't think that the patchse should block each other. > This patch makes the filenames always 12 characters wide (for SLRUs that > opt-in to the new naming). That's actually not enough for the full range > that a 64 bit page number can represent. Should we make it 16 characters > now, to avoid repeating the same mistake we made earlier? Or make it > more configurable, on a per-SLRU basis. One reason I don't want to just > make it 16 characters is that WAL segment filenames are also 16 hex > characters, which could cause confusion. Good catch. I propose the following solution: ``` SlruFileName(SlruCtl ctl, char *path, int64 segno) { if (ctl->long_segment_names) -return snprintf(path, MAXPGPATH, "%s/%012llX", ctl->Dir, +/* + * We could use 16 characters here but the disadvantage would be that + * the SLRU segments will be hard to distinguish from WAL segments. + * + * For this reason we use 15 characters. It is enough but also means + * that in the future we can't decrease SLRU_PAGES_PER_SEGMENT easily. + */ +return snprintf(path, MAXPGPATH, "%s/%015llX", ctl->Dir, (long long) segno); else return snprintf(path, MAXPGPATH, "%s/%04X", (ctl)->Dir, ``` PFE the corrected patchset v58. -- Best regards, Aleksander Alekseev v58-0003-Make-use-FullTransactionId-in-2PC-filenames.patch Description: Binary data v58-0001-Index-SLRUs-by-64-bit-integers-rather-than-by-32.patch Description: Binary data v58-0002-Use-larger-segment-file-names-for-pg_notify.patch Description: Binary data
Re: logicalrep_message_type throws an error
On Wed, Jul 5, 2023, at 7:56 AM, Alvaro Herrera wrote: > On 2023-Jul-05, Amit Kapila wrote: > > > I think after returning "???" from logicalrep_message_type(), the > > above is possible when we get the error: "invalid logical replication > > message type "X"" from apply_dispatch(), right? If so, then what about > > the case when we forgot to handle some message in > > logicalrep_message_type() but handled it in apply_dispatch()? Isn't it > > better to return the 'action' from the function > > logicalrep_message_type() for unknown type? That way the information > > could be a bit better and we can easily catch the code bug as well. > > Are you suggesting that logicalrep_message_type should include the > numerical value of 'action' in the ??? message? Something like this: > > ERROR: invalid logical replication message type "X" > CONTEXT: processing remote data for replication origin "pg_16638" during > message type "??? (123)" in transaction 796, finished at 0/16266F8 Isn't this numerical value already exposed in the error message (X = 88)? In this example, it is: ERROR: invalid logical replication message type "X" CONTEXT: processing remote data for replication origin "pg_16638" during message type "??? (88)" in transaction 796, finished at 0/1626698 IMO it could be confusing if we provide two representations of the same data (X and 88). Since we already provide "X" I don't think we need "88". Another option, is to remove "X" from apply_dispatch() and add "??? (88)" to logicalrep_message_type(). Opinions? -- Euler Taveira EDB https://www.enterprisedb.com/
Re: [PATCH] Add GitLab CI to PostgreSQL
On 2023-07-04 Tu 19:44, Newhouse, Robin wrote: Hello! I propose the attached patch to be applied on the 'master' branch of PostgreSQL to add GitLab CI automation alongside Cirrus CI in the PostgreSQL repository. It is not intended to be a replacement for Cirrus CI, but simply suggestion for the PostgreSQL project to host centrally a Gitlab CI definition for those who prefer to use it while developing/testing PostgreSQL. The intent is to facilitate collaboration among GitLab users, promote standardization and consistency, and ultimately, improve testing and code quality. This seems very RedHat-centric, which I'm not sure is a good idea. Also, shouldn't at least some of these recipes call dnf and dnf-builddep instead of yum and yum-build-dep? If we're going to do this then arguably we should also be supporting GitHub Actions and who knows what other CI frameworks. There is a case for us special casing Cirrus CI because it's used for the cfbot. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: pg_basebackup check vs Windows file path limits
On 2023-07-04 Tu 16:54, Daniel Gustafsson wrote: On 4 Jul 2023, at 20:19, Andrew Dunstan wrote: But sadly we're kinda back where we started. fairywren is failing on REL_16_STABLE. Before the changes the failure occurred because the test script was unable to create the file with a path > 255. Now that we have a way to create the file the test for pg_basebackup to reject files with names > 100 fails, I presume because the server can't actually see the file. At this stage I'm thinking the best thing would be to skip the test altogether on windows if the path is longer than 255. That does sound like a fairly large hammer for a nail small enough that we should be able to fix it, but I don't have any other good ideas off the cuff. Not sure it's such a big hammer. Here's a patch. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index e0009c8531..ee0d03512c 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -311,20 +311,22 @@ $node->command_fails( '-T with invalid format fails'); # Tar format doesn't support filenames longer than 100 bytes. -# Create the test file via a short name directory so it doesn't blow the -# Windows path limit. -my $lftmp = PostgreSQL::Test::Utils::tempdir_short; -dir_symlink "$pgdata", "$lftmp/pgdata"; -my $superlongname = "superlongname_" . ("x" x 100); -my $superlongpath = "$lftmp/pgdata/$superlongname"; +SKIP: +{ + my $superlongname = "superlongname_" . ("x" x 100); + my $superlongpath = "$pgdata/$superlongname"; -open my $file, '>', "$superlongpath" - or die "unable to create file $superlongpath"; -close $file; -$node->command_fails( - [ @pg_basebackup_defs, '-D', "$tempdir/tarbackup_l1", '-Ft' ], - 'pg_basebackup tar with long name fails'); -unlink "$superlongpath"; + skip "File path too long", 1 + if $windows_os && length($superlongpath) > 255; + + open my $file, '>', "$superlongpath" + or die "unable to create file $superlongpath"; + close $file; + $node->command_fails( + [ @pg_basebackup_defs, '-D', "$tempdir/tarbackup_l1", '-Ft' ], + 'pg_basebackup tar with long name fails'); + unlink "$superlongpath"; +} # The following tests are for symlinks.
Re: logicalrep_message_type throws an error
On Wed, Jul 5, 2023 at 4:26 PM Alvaro Herrera wrote: > > On 2023-Jul-05, Amit Kapila wrote: > > > I think after returning "???" from logicalrep_message_type(), the > > above is possible when we get the error: "invalid logical replication > > message type "X"" from apply_dispatch(), right? If so, then what about > > the case when we forgot to handle some message in > > logicalrep_message_type() but handled it in apply_dispatch()? Isn't it > > better to return the 'action' from the function > > logicalrep_message_type() for unknown type? That way the information > > could be a bit better and we can easily catch the code bug as well. > > Are you suggesting that logicalrep_message_type should include the > numerical value of 'action' in the ??? message? Something like this: > > ERROR: invalid logical replication message type "X" > CONTEXT: processing remote data for replication origin "pg_16638" during > message type "??? (123)" in transaction 796, finished at 0/16266F8 > Yes, something like that would be better. -- With Regards, Amit Kapila.
Re: Allow specifying a dbname in pg_basebackup connection string
On Mon, 3 Jul 2023 at 13:23, Jelte Fennema wrote: > > Normally it doesn't really matter which dbname is used in the connection > string that pg_basebackup and other physical replication CLI tools use. > The reason being, that physical replication does not work at the > database level, but instead at the server level. So you will always get > the data for all databases. > > However, when there's a proxy, such as PgBouncer, in between the client > and the server, then it might very well matter. Because this proxy might > want to route the connection to a different server depending on the > dbname parameter in the startup packet. > > This patch changes the creation of the connection string key value > pairs, so that the following command will actually include > dbname=postgres in the startup packet to the server: > > pg_basebackup --dbname 'dbname=postgres port=6432' -D dump I guess my immediate question is, should backups be taken through PgBouncer? It seems beyond PgBouncer's remit. Thom
Re: Disabling Heap-Only Tuples
On Wed, 5 Jul 2023 at 13:12, Matthias van de Meent wrote: > > On Wed, 5 Jul 2023 at 13:03, Thom Brown wrote: > > > > On Wed, 5 Jul 2023 at 11:57, Matthias van de Meent > > wrote: > > > > > > On Wed, 5 Jul 2023 at 12:45, Thom Brown wrote: > > > > Heap-Only Tuple (HOT) updates are a significant performance > > > > enhancement, as they prevent unnecessary page writes. However, HOT > > > > comes with a caveat: it means that if we have lots of available space > > > > earlier on in the relation, it can only be used for new tuples or in > > > > cases where there's insufficient space on a page for an UPDATE to use > > > > HOT. > > > > > > > > This mechanism limits our options for condensing tables, forcing us to > > > > resort to methods like running VACUUM FULL/CLUSTER or using external > > > > tools like pg_repack. These either require exclusive locks (which will > > > > be a deal-breaker on large tables on a production system), or there's > > > > risks involved. Of course we can always flood pages with new versions > > > > of a row until it's forced onto an early page, but that shouldn't be > > > > necessary. > > > > > > > > Considering these trade-offs, I'd like to propose an option to allow > > > > superusers to disable HOT on tables. The intent is to trade some > > > > performance benefits for the ability to reduce the size of a table > > > > without the typical locking associated with it. > > > > > > Interesting use case, but I think that disabling HOT would be missing > > > the forest for the trees. I think that a feature that disables > > > block-local updates for pages > some offset would be a better solution > > > to your issue: Normal updates also prefer the new tuple to be stored > > > in the same pages as the old tuple if at all possible, so disabling > > > HOT wouldn't solve the issue of tuples residing in the tail of your > > > table - at least not while there is still empty space in those pages. > > > > Hmm... I see your point. It's when an UPDATE isn't going to land on > > the same page that it relocates to the earlier available page. So I > > guess I'm after whatever mechanism would allow that to happen reliably > > and predictably. > > > > So $subject should really be "Allow forcing UPDATEs off the same page". > > You'd probably want to do that only for a certain range of the table - > for a table with 1GB of data and 3GB of bloat there is no good reason > to force page-crossing updates in the first 1GB of the table - all > tuples of the table will eventually reside there, so why would you > take a performance penalty and move the tuples from inside that range > to inside that same range? I'm thinking more of a case of: UPDATE bigtable SET primary key = primary key WHERE ctid IN ( SELECT ctid FROM bigtable ORDER BY ctid DESC LIMIT 10); > Something else to note: Indexes would suffer some (large?) amount of > bloat in this process, as you would be updating a lot of tuples > without the HOT optimization, thus increasing the work to be done by > VACUUM. > This may result in more bloat in indexes than what you get back from > shrinking the table. This could be the case, but I guess indexes are expendable to an extent, unlike tables. Thom
Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
On 09/03/2023 17:21, Aleksander Alekseev wrote: v57-0001-Index-SLRUs-by-64-bit-integers-rather-than-by-32.patch Reviewing this now. I think it's almost ready to be committed. There's another big effort going on to move SLRUs to the regular buffer cache (https://commitfest.postgresql.org/43/3514/). I wonder how moving to 64 bit page numbers will affect that. BlockNumber is still 32 bits, after all. +/* + * An internal function used by SlruScanDirectory(). + * + * Returns true if a file with a name of a given length may be a correct + * SLRU segment. + */ +static inline bool +SlruCorrectSegmentFilenameLength(SlruCtl ctl, size_t len) +{ + if (ctl->long_segment_names) + return (len == 12); + else + + /* +* XXX Should we still consider 5 and 6 to be a correct length? It +* looks like it was previously allowed but now SlruFileName() can't +* return such a name. +*/ + return (len == 4 || len == 5 || len == 6); +} Huh, I didn't realize that 5 and 6 character SLRU segment names are possible. But indeed they are. Commit 638cf09e76d allowed 5-character lengths: commit 638cf09e76d70dd83d8123e7079be6c0532102d2 Author: Alvaro Herrera Date: Thu Jan 2 18:17:29 2014 -0300 Handle 5-char filenames in SlruScanDirectory Original users of slru.c were all producing 4-digit filenames, so that was all that that code was prepared to handle. Changes to multixact.c in the course of commit 0ac5ad5134f made pg_multixact/members create 5-digit filenames once a certain threshold was reached, which SlruScanDirectory wasn't prepared to deal with; in particular, 5-digit-name files were not removed during truncation. Change that routine to make it aware of those files, and have it process them just like any others. Right now, some pg_multixact/members directories will contain a mixture of 4-char and 5-char filenames. A future commit is expected fix things so that each slru.c user declares the correct maximum width for the files it produces, to avoid such unsightly mixtures. Noticed while investigating bug #8673 reported by Serge Negodyuck. And later commit 73c986adde5, which introduced commit_ts, allowed 6 characters. With 8192 block size, pg_commit_ts segments are indeed 5 chars wide, and with 512 block size, 6 chars are needed. This patch makes the filenames always 12 characters wide (for SLRUs that opt-in to the new naming). That's actually not enough for the full range that a 64 bit page number can represent. Should we make it 16 characters now, to avoid repeating the same mistake we made earlier? Or make it more configurable, on a per-SLRU basis. One reason I don't want to just make it 16 characters is that WAL segment filenames are also 16 hex characters, which could cause confusion. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Extension Enhancement: Buffer Invalidation in pg_buffercache
+1 for the idea. It's going to be more useful to test and understand the buffer management of PostgreSQL and it can be used to explicitly free up the buffers if there are any such requirements. I had a quick look over the patch. Following are the comments. First, The TryInvalidateBuffer() tries to flush the buffer if it is dirty and then tries to invalidate it if it meets the requirement. Instead of directly doing this can we provide an option to the caller to mention whether to invalidate the dirty buffers or not. For example, TryInvalidateBuffer(Buffer bufnum, bool force), if the force is set to FALSE, then ignore invalidating dirty buffers. Otherwise, flush the dirty buffer and try to invalidate. Second, In TryInvalidateBuffer(), it first checks if the reference count is greater than zero and then checks for dirty buffers. Will there be a scenario where the buffer is dirty and its reference count is zero? Can you please provide more information on this or adjust the code accordingly. > +/* > +Try Invalidating a buffer using bufnum. > +If the buffer is invalid, the function returns false. > +The function checks for dirty buffer and flushes the dirty buffer before > invalidating. > +If the buffer is still dirty it returns false. > +*/ > +bool The star(*) and space are missing here. Please refer to the style of function comments and change accordingly. Thanks & Regards, Nitin Jadhav On Fri, Jun 30, 2023 at 4:17 PM Palak Chaturvedi wrote: > > I hope this email finds you well. I am excited to share that I have > extended the functionality of the `pg_buffercache` extension by > implementing buffer invalidation capability, as requested by some > PostgreSQL contributors for improved testing scenarios. > > This marks my first time submitting a patch to pgsql-hackers, and I am > eager to receive your expert feedback on the changes made. Your > insights are invaluable, and any review or comments you provide will > be greatly appreciated. > > The primary objective of this enhancement is to enable explicit buffer > invalidation within the `pg_buffercache` extension. By doing so, we > can simulate scenarios where buffers are invalidated and observe the > resulting behavior in PostgreSQL. > > As part of this patch, a new function or mechanism has been introduced > to facilitate buffer invalidation. I would like to hear your thoughts > on whether this approach provides a good user interface for this > functionality. Additionally, I seek your evaluation of the buffer > locking protocol employed in the extension to ensure its correctness > and efficiency. > > Please note that I plan to add comprehensive documentation once the > details of this enhancement are agreed upon. This documentation will > serve as a valuable resource for users and contributors alike. I > believe that your expertise will help uncover any potential issues and > opportunities for further improvement. > > I have attached the patch file to this email for your convenience. > Your valuable time and consideration in reviewing this extension are > sincerely appreciated. > > Thank you for your continued support and guidance. I am looking > forward to your feedback and collaboration in enhancing the PostgreSQL > ecosystem. > > The working of the extension: > > 1. Creating the extension pg_buffercache and then call select query on > a table and note the buffer to be cleared. > pgbench=# create extension pg_buffercache; > CREATE EXTENSION > pgbench=# select count(*) from pgbench_accounts; > count > > 10 > (1 row) > > pgbench=# SELECT * > FROM pg_buffercache > WHERE relfilenode = pg_relation_filenode('pgbench_accounts'::regclass); > bufferid | relfilenode | reltablespace | reldatabase | relforknumber > | relblocknumber | isdirty | usagecount | pinning_backends > --+-+---+-+---++-++-- > 233 | 16397 | 1663 | 16384 | 0 > | 0 | f | 1 |0 > 234 | 16397 | 1663 | 16384 | 0 > | 1 | f | 1 |0 > 235 | 16397 | 1663 | 16384 | 0 > | 2 | f | 1 |0 > 236 | 16397 | 1663 | 16384 | 0 > | 3 | f | 1 |0 > 237 | 16397 | 1663 | 16384 | 0 > | 4 | f | 1 |0 > > > 2. Clearing a single buffer by entering the bufferid. > pgbench=# SELECT count(*) > FROM pg_buffercache > WHERE relfilenode = pg_relation_filenode('pgbench_accounts'::regclass); > count > --- > 1660 > (1 row) > > pgbench=# select pg_buffercache_invalidate(233); > pg_buffercache_invalidate > --- > t > (1 row) > > pgbench=# SELECT count(*) > FROM pg_buffercache
Re: Disabling Heap-Only Tuples
On Wed, 5 Jul 2023 at 13:03, Thom Brown wrote: > > On Wed, 5 Jul 2023 at 11:57, Matthias van de Meent > wrote: > > > > On Wed, 5 Jul 2023 at 12:45, Thom Brown wrote: > > > Heap-Only Tuple (HOT) updates are a significant performance > > > enhancement, as they prevent unnecessary page writes. However, HOT > > > comes with a caveat: it means that if we have lots of available space > > > earlier on in the relation, it can only be used for new tuples or in > > > cases where there's insufficient space on a page for an UPDATE to use > > > HOT. > > > > > > This mechanism limits our options for condensing tables, forcing us to > > > resort to methods like running VACUUM FULL/CLUSTER or using external > > > tools like pg_repack. These either require exclusive locks (which will > > > be a deal-breaker on large tables on a production system), or there's > > > risks involved. Of course we can always flood pages with new versions > > > of a row until it's forced onto an early page, but that shouldn't be > > > necessary. > > > > > > Considering these trade-offs, I'd like to propose an option to allow > > > superusers to disable HOT on tables. The intent is to trade some > > > performance benefits for the ability to reduce the size of a table > > > without the typical locking associated with it. > > > > Interesting use case, but I think that disabling HOT would be missing > > the forest for the trees. I think that a feature that disables > > block-local updates for pages > some offset would be a better solution > > to your issue: Normal updates also prefer the new tuple to be stored > > in the same pages as the old tuple if at all possible, so disabling > > HOT wouldn't solve the issue of tuples residing in the tail of your > > table - at least not while there is still empty space in those pages. > > Hmm... I see your point. It's when an UPDATE isn't going to land on > the same page that it relocates to the earlier available page. So I > guess I'm after whatever mechanism would allow that to happen reliably > and predictably. > > So $subject should really be "Allow forcing UPDATEs off the same page". You'd probably want to do that only for a certain range of the table - for a table with 1GB of data and 3GB of bloat there is no good reason to force page-crossing updates in the first 1GB of the table - all tuples of the table will eventually reside there, so why would you take a performance penalty and move the tuples from inside that range to inside that same range? Something else to note: Indexes would suffer some (large?) amount of bloat in this process, as you would be updating a lot of tuples without the HOT optimization, thus increasing the work to be done by VACUUM. This may result in more bloat in indexes than what you get back from shrinking the table. Kind regards, Matthias van de Meent Neon (https://neon.tech/)
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Tue, Jul 4, 2023 at 12:49 PM Masahiko Sawada wrote: > > As you mentioned, the 1-byte value is embedded into 8 byte so 7 bytes > are unused, but we use less memory since we use less slab contexts and > save fragmentations. Thanks for testing. This tree is sparse enough that most of the space is taken up by small inner nodes, and not by leaves. So, it's encouraging to see a small space savings even here. > I've also tested some large value cases (e.g. the value is 80-bytes) > and got a similar result. Interesting. With a separate allocation per value the overhead would be 8 bytes, or 10% here. It's plausible that savings elsewhere can hide that, globally. > Regarding the codes, there are many todo and fixme comments so it > seems to me that your recent work is still in-progress. What is the > current status? Can I start reviewing the code or should I wait for a > while until your recent work completes? Well, it's going to be a bit of a mess until I can demonstrate it working (and working well) with bitmap heap scan. Fixing that now is just going to create conflicts. I do have a couple small older patches laying around that were quick experiments -- I think at least some of them should give a performance boost in loading speed, but haven't had time to test. Would you like to take a look? -- John Naylor EDB: http://www.enterprisedb.com
Re: Parallel CREATE INDEX for BRIN indexes
On 7/5/23 10:44, Matthias van de Meent wrote: > On Wed, 5 Jul 2023 at 00:08, Tomas Vondra > wrote: >> >> >> >> On 7/4/23 23:53, Matthias van de Meent wrote: >>> On Thu, 8 Jun 2023 at 14:55, Tomas Vondra >>> wrote: Hi, Here's a WIP patch allowing parallel CREATE INDEX for BRIN indexes. The infrastructure (starting workers etc.) is "inspired" by the BTREE code (i.e. copied from that and massaged a bit to call brin stuff). >>> >>> Nice work. >>> In both cases _brin_end_parallel then reads the summaries from worker files, and adds them into the index. In 0001 this is fairly simple, although we could do one more improvement and sort the ranges by range start to make the index nicer (and possibly a bit more efficient). This should be simple, because the per-worker results are already sorted like that (so a merge sort in _brin_end_parallel would be enough). >>> >>> I see that you manually built the passing and sorting of tuples >>> between workers, but can't we use the parallel tuplesort >>> infrastructure for that? It already has similar features in place and >>> improves code commonality. >>> >> >> Maybe. I wasn't that familiar with what parallel tuplesort can and can't >> do, and the little I knew I managed to forget since I wrote this patch. >> Which similar features do you have in mind? >> >> The workers are producing the results in "start_block" order, so if they >> pass that to the leader, it probably can do the usual merge sort. >> For 0002 it's a bit more complicated, because with a single parallel scan brinbuildCallbackParallel can't decide if a range is assigned to a different worker or empty. And we want to generate summaries for empty ranges in the index. We could either skip such range during index build, and then add empty summaries in _brin_end_parallel (if needed), or add them and then merge them using "union". I just realized there's a third option to do this - we could just do regular parallel scan (with no particular regard to pagesPerRange), and then do "union" when merging results from workers. It doesn't require the sequence of TID scans, and the union would also handle the empty ranges. The per-worker results might be much larger, though, because each worker might produce up to the "full" BRIN index. >>> >>> Would it be too much effort to add a 'min_chunk_size' argument to >>> table_beginscan_parallel (or ParallelTableScanDesc) that defines the >>> minimum granularity of block ranges to be assigned to each process? I >>> think that would be the most elegant solution that would require >>> relatively little effort: table_block_parallelscan_nextpage already >>> does parallel management of multiple chunk sizes, and I think this >>> modification would fit quite well in that code. >>> >> >> I'm confused. Isn't that pretty much exactly what 0002 does? I mean, >> that passes pagesPerRange to table_parallelscan_initialize(), so that >> each pagesPerRange is assigned to a single worker. > > Huh, I overlooked that one... Sorry for that. > >> The trouble I described above is that the scan returns tuples, and the >> consumer has no idea what was the chunk size or how many other workers >> are there. Imagine you get a tuple from block 1, and then a tuple from >> block 1000. Does that mean that the blocks in between are empty or that >> they were processed by some other worker? > > If the unit of work for parallel table scans is the index's > pages_per_range, then I think we can just fill in expected-but-missing > ranges as 'empty' in the parallel leader during index loading, like > the first of the two solutions you proposed. > Right, I think that's the right solution. Or rather the only solution, because the other idea (generating the empty ranges in workers) relies on the workers knowing when to generate that. But I don't think the workers have the necessary information. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_upgrade and cross-library upgrades
Alvaro Herrera writes: > 002_pg_upgrade.pl can test for presence or absence of pgcrypto by > grepping pg_config --configure for --with-ssl or --with-openssl. If the > old cluster has it but the new doesn't, we must drop the > contrib_regression_pgcrypto database. Hmm, but you'd also need code to handle meson builds no? Could it be easier to look for the pgcrypto library in the new installation? Not entirely sure this is worth the effort. regards, tom lane
Re: Disabling Heap-Only Tuples
On Wed, 5 Jul 2023 at 11:57, Matthias van de Meent wrote: > > On Wed, 5 Jul 2023 at 12:45, Thom Brown wrote: > > Heap-Only Tuple (HOT) updates are a significant performance > > enhancement, as they prevent unnecessary page writes. However, HOT > > comes with a caveat: it means that if we have lots of available space > > earlier on in the relation, it can only be used for new tuples or in > > cases where there's insufficient space on a page for an UPDATE to use > > HOT. > > > > This mechanism limits our options for condensing tables, forcing us to > > resort to methods like running VACUUM FULL/CLUSTER or using external > > tools like pg_repack. These either require exclusive locks (which will > > be a deal-breaker on large tables on a production system), or there's > > risks involved. Of course we can always flood pages with new versions > > of a row until it's forced onto an early page, but that shouldn't be > > necessary. > > > > Considering these trade-offs, I'd like to propose an option to allow > > superusers to disable HOT on tables. The intent is to trade some > > performance benefits for the ability to reduce the size of a table > > without the typical locking associated with it. > > Interesting use case, but I think that disabling HOT would be missing > the forest for the trees. I think that a feature that disables > block-local updates for pages > some offset would be a better solution > to your issue: Normal updates also prefer the new tuple to be stored > in the same pages as the old tuple if at all possible, so disabling > HOT wouldn't solve the issue of tuples residing in the tail of your > table - at least not while there is still empty space in those pages. Hmm... I see your point. It's when an UPDATE isn't going to land on the same page that it relocates to the earlier available page. So I guess I'm after whatever mechanism would allow that to happen reliably and predictably. So $subject should really be "Allow forcing UPDATEs off the same page". Thom
Re: Disabling Heap-Only Tuples
On Wed, 5 Jul 2023 at 12:45, Thom Brown wrote: > Heap-Only Tuple (HOT) updates are a significant performance > enhancement, as they prevent unnecessary page writes. However, HOT > comes with a caveat: it means that if we have lots of available space > earlier on in the relation, it can only be used for new tuples or in > cases where there's insufficient space on a page for an UPDATE to use > HOT. > > This mechanism limits our options for condensing tables, forcing us to > resort to methods like running VACUUM FULL/CLUSTER or using external > tools like pg_repack. These either require exclusive locks (which will > be a deal-breaker on large tables on a production system), or there's > risks involved. Of course we can always flood pages with new versions > of a row until it's forced onto an early page, but that shouldn't be > necessary. > > Considering these trade-offs, I'd like to propose an option to allow > superusers to disable HOT on tables. The intent is to trade some > performance benefits for the ability to reduce the size of a table > without the typical locking associated with it. Interesting use case, but I think that disabling HOT would be missing the forest for the trees. I think that a feature that disables block-local updates for pages > some offset would be a better solution to your issue: Normal updates also prefer the new tuple to be stored in the same pages as the old tuple if at all possible, so disabling HOT wouldn't solve the issue of tuples residing in the tail of your table - at least not while there is still empty space in those pages. Kind regards, Matthias van de Meent Neon (https://neon.tech/)
Re: logicalrep_message_type throws an error
On 2023-Jul-05, Amit Kapila wrote: > I think after returning "???" from logicalrep_message_type(), the > above is possible when we get the error: "invalid logical replication > message type "X"" from apply_dispatch(), right? If so, then what about > the case when we forgot to handle some message in > logicalrep_message_type() but handled it in apply_dispatch()? Isn't it > better to return the 'action' from the function > logicalrep_message_type() for unknown type? That way the information > could be a bit better and we can easily catch the code bug as well. Are you suggesting that logicalrep_message_type should include the numerical value of 'action' in the ??? message? Something like this: ERROR: invalid logical replication message type "X" CONTEXT: processing remote data for replication origin "pg_16638" during message type "??? (123)" in transaction 796, finished at 0/16266F8 I don't see why not -- seems easy enough, and might help somebody. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ Tom: There seems to be something broken here. Teodor: I'm in sackcloth and ashes... Fixed. http://postgr.es/m/482d1632.8010...@sigaev.ru
Disabling Heap-Only Tuples
Hi, Heap-Only Tuple (HOT) updates are a significant performance enhancement, as they prevent unnecessary page writes. However, HOT comes with a caveat: it means that if we have lots of available space earlier on in the relation, it can only be used for new tuples or in cases where there's insufficient space on a page for an UPDATE to use HOT. This mechanism limits our options for condensing tables, forcing us to resort to methods like running VACUUM FULL/CLUSTER or using external tools like pg_repack. These either require exclusive locks (which will be a deal-breaker on large tables on a production system), or there's risks involved. Of course we can always flood pages with new versions of a row until it's forced onto an early page, but that shouldn't be necessary. Considering these trade-offs, I'd like to propose an option to allow superusers to disable HOT on tables. The intent is to trade some performance benefits for the ability to reduce the size of a table without the typical locking associated with it. This feature could be used to shrink tables in one of two ways: temporarily disabling HOT until DML operations have compacted the data into a smaller area, or performing a mass update on later rows to relocate them to an earlier location, probably in stages. Of course, this would need to be used in conjunction with a VACUUM operation. Admittedly this isn't ideal, and it would be better if we had an operation that could do this (e.g. VACUUM COMPACT ), or an option that causes some operations to avoid HOT when it detects an amount of free space over a threshold, but in lieu of those, I thought this would at least allow users to help themselves when running into disk space issues. Thoughts? Thom
Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.
On Wed, Jul 5, 2023 at 12:02 PM Masahiko Sawada wrote: > > On Wed, Jul 5, 2023 at 2:46 PM Amit Kapila wrote: > > > > On Wed, Jul 5, 2023 at 9:01 AM Peter Smith wrote: > > > > > > Hi. Here are some review comments for this patch. > > > > > > +1 for the patch idea. > > > > > > -- > > > > > > I wasn't sure about the code comment adjustments suggested by Amit [1]: > > > "Accordingly, the comments atop build_replindex_scan_key(), > > > FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression() > > > should also be adjusted." > > As for IsIndexOnlyOnExpression(), what part do you think we need to > adjust? It says: > > /* > * Returns true if the given index consists only of expressions such as: > * CREATE INDEX idx ON table(foo(col)); > * > * Returns false even if there is one column reference: > * CREATE INDEX idx ON table(foo(col), col_2); > */ > > and it seems to me that the function doesn't check if the leftmost > index column is a non-expression. > Right, so, we can leave this as is. -- With Regards, Amit Kapila.
Re: Missing llvm_leave_fatal_on_oom() call
On 04/07/2023 19:33, Daniel Gustafsson wrote: On 21 Feb 2023, at 15:50, Heikki Linnakangas wrote: llvm_release_context() calls llvm_enter_fatal_on_oom(), but it never calls llvm_leave_fatal_on_oom(). Isn't that a clear leak? Not sure how much of a leak it is since IIUC LLVM just stores a function pointer to our error handler, but I can't see a reason not clean it up here. The attached fix LGTM and passes make check with jit_above_cost set to zero. Pushed to all live branches, thanks for the review! -- Heikki Linnakangas Neon (https://neon.tech)
Re: brininsert optimization opportunity
On 7/5/23 02:35, Soumyadeep Chakraborty wrote: > On Tue, Jul 4, 2023 at 2:54 PM Tomas Vondra > wrote: > >> I'm not sure if memory context callbacks are the right way to rely on >> for this purpose. The primary purpose of memory contexts is to track >> memory, so using them for this seems a bit weird. > > Yeah, this just kept getting dirtier and dirtier. > >> There are cases that do something similar, like expandendrecord.c where >> we track refcounted tuple slot, but IMHO there's a big difference >> between tracking one slot allocated right there, and unknown number of >> buffers allocated much later. > > Yeah, the following code in ER_mc_callbackis is there I think to prevent > double > freeing the tupdesc (since it might be freed in > ResourceOwnerReleaseInternal()) > (The part about: /* Ditto for tupdesc references */). > > if (tupdesc->tdrefcount > 0) > { > if (--tupdesc->tdrefcount == 0) > FreeTupleDesc(tupdesc); > } > Plus the above code doesn't try anything with Resource owner stuff, whereas > releasing a buffer means: > ReleaseBuffer() -> UnpinBuffer() -> > ResourceOwnerForgetBuffer(CurrentResourceOwner, b); > Well, my understanding is the expandedrecord.c code increments the refcount exactly to prevent the resource owner to release the slot too early. My assumption is we'd need to do something similar for the revmap buffers by calling IncrBufferRefCount or something. But that's going to be messy, because the buffers are read elsewhere. >> The fact that even with the extra context is still doesn't handle query >> cancellations is another argument against that approach (I wonder how >> expandedrecord.c handles that, but I haven't checked). >> >>> >>> Maybe there is a better way of doing our cleanup? I'm not sure. Would >>> love your input! >>> >>> The other alternative for all this is to introduce new AM callbacks for >>> insert_begin and insert_end. That might be a tougher sell? >>> >> >> That's the approach I wanted to suggest, more or less - to do the >> cleanup from ExecCloseIndices() before index_close(). I wonder if it's >> even correct to do that later, once we release the locks etc. > > I'll try this out and introduce a couple of new index AM callbacks. I > think it's best to do it before releasing the locks - otherwise it > might be weird > to manipulate buffers of an index relation, without having some sort of lock > on > it. I'll think about it some more. > I don't understand why would this need more than just a callback to release the cache. >> I don't think ii_AmCache was intended for stuff like this - GIN and GiST >> only use it to cache stuff that can be just pfree-d, but for buffers >> that's no enough. It's not surprising we need to improve this. > > Hmmm, yes, although the docs state: > "If the index AM wishes to cache data across successive index insertions > within > an SQL statement, it can allocate space in indexInfo->ii_Context and > store a pointer > to the data in indexInfo->ii_AmCache (which will be NULL initially)." > they don't mention anything about buffer usage. Well we will fix it! > > PS: It should be possible to make GIN and GiST use the new index AM APIs > as well. > Why should GIN/GiST use the new API? I think it's perfectly sensible to only require the "cleanup callback" when just pfree() is not enough. >> FWIW while debugging this (breakpoint on MemoryContextDelete) I was >> rather annoyed the COPY keeps dropping and recreating the two BRIN >> contexts - brininsert cxt / brin dtuple. I wonder if we could keep and >> reuse those too, but I don't know how much it'd help. >> > > Interesting, I will investigate that. > >>> Now, to finally answer your question about the speedup without >>> generate_series(). We do see an even higher speedup! >>> >>> seq 1 2 > /tmp/data.csv >>> \timing >>> DROP TABLE heap; >>> CREATE TABLE heap(i int); >>> CREATE INDEX ON heap USING brin(i) WITH (pages_per_range=1); >>> COPY heap FROM '/tmp/data.csv'; >>> >>> -- 3 runs (master 29cf61ade3f245aa40f427a1d6345287ef77e622) >>> COPY 2 >>> Time: 205072.444 ms (03:25.072) >>> Time: 215380.369 ms (03:35.380) >>> Time: 203492.347 ms (03:23.492) >>> >>> -- 3 runs (branch v2) >>> >>> COPY 2 >>> Time: 135052.752 ms (02:15.053) >>> Time: 135093.131 ms (02:15.093) >>> Time: 138737.048 ms (02:18.737) >>> >> >> That's nice, but it still doesn't say how much of that is reading the >> data. If you do just copy into a table without any indexes, how long >> does it take? > > So, I loaded the same heap table without any indexes and at the same > scale. I got: > > postgres=# COPY heap FROM '/tmp/data.csv'; > COPY 2 > Time: 116161.545 ms (01:56.162) > Time: 114182.745 ms (01:54.183) > Time: 114975.368 ms (01:54.975) > OK, so the baseline is 115 seconds. With the current code, an index means +100 seconds. With the optimization, it's just +20. Nice. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreS
Re: Prevent psql \watch from running queries that return no rows
> On 4 Jun 2023, at 20:55, Greg Sabino Mullane wrote: > > On Sat, Jun 3, 2023 at 5:58 PM Michael Paquier wrote: > > Wouldn't something like a target_rows be more flexible? You could use > this parameter with a target number of rows to expect, zero being one > choice in that. > > Thank you! That does feel better to me. Please see attached a new v2 patch > that uses > a min_rows=X syntax (defaults to 0). Also added some help.c changes. This is a feature I've wanted on several occasions so definitely a +1 on this suggestion. I've yet to test it out and do a full review, but a few comments from skimming the patch: - bool is_watch, + bool is_watch, int min_rows, The comment on ExecQueryAndProcessResults() needs to be updated with an explanation of what this parameter is. - return cancel_pressed ? 0 : success ? 1 : -1; + return (cancel_pressed || return_early) ? 0 : success ? 1 : -1; I think this is getting tangled up enough that it should be replaced with separate if() statements for the various cases. - HELP0(" \\watch [[i=]SEC] [c=N] execute query every SEC seconds, up to N times\n"); + HELP0(" \\watch [[i=]SEC] [c=N] [m=ROW]\n"); + HELP0(" execute query every SEC seconds, up to N times\n"); + HELP0(" stop if less than ROW minimum rows are rerturned\n"); "less than ROW minimum rows" reads a bit awkward IMO, how about calling it [m=MIN] and describe as "stop if less than MIN rows are returned"? Also, there is a typo: s/rerturned/returned/. -- Daniel Gustafsson
Re: Speed up transaction completion faster after many relations are accessed in a transaction
On 10/02/2023 04:51, David Rowley wrote: I've attached another set of patches. I do need to spend longer looking at this. I'm mainly attaching these as CI seems to be highlighting a problem that I'm unable to recreate locally and I wanted to see if the attached fixes it. I like this patch's approach. index 296dc82d2ee..edb8b6026e5 100644 --- a/src/backend/commands/discard.c +++ b/src/backend/commands/discard.c @@ -71,7 +71,7 @@ DiscardAll(bool isTopLevel) Async_UnlistenAll(); - LockReleaseAll(USER_LOCKMETHOD, true); + LockReleaseSession(USER_LOCKMETHOD); ResetPlanCache(); This assumes that there are no transaction-level advisory locks. I think that's OK. It took me a while to convince myself of that, though. I think we need a high level comment somewhere that explains what assumptions we make on which locks can be held in session mode and which in transaction mode. @@ -3224,14 +3206,6 @@ PostPrepare_Locks(TransactionId xid) Assert(lock->nGranted <= lock->nRequested); Assert((proclock->holdMask & ~lock->grantMask) == 0); - /* Ignore it if nothing to release (must be a session lock) */ - if (proclock->releaseMask == 0) - continue; - - /* Else we should be releasing all locks */ - if (proclock->releaseMask != proclock->holdMask) - elog(PANIC, "we seem to have dropped a bit somewhere"); - /* * We cannot simply modify proclock->tag.myProc to reassign * ownership of the lock, because that's part of the hash key and This looks wrong. If you prepare a transaction that is holding any session locks, we will now transfer them to the prepared transaction. And its locallock entry will be out of sync. To fix, I think we could keep around the hash table that CheckForSessionAndXactLocks() builds, and use that here. -- Heikki Linnakangas Neon (https://neon.tech)
CHECK Constraint Deferrable
Hi, Currently, there is no support for CHECK constraint DEFERRABLE in a create table statement. SQL standard specifies that CHECK constraint can be defined as DEFERRABLE. The attached patch is having implementation for CHECK constraint Deferrable as below: ‘postgres[579453]=#’CREATE TABLE t1 (i int CHECK(i<>0) DEFERRABLE, t text); CREATE TABLE ‘postgres[579453]=#’\d t1 Table "public.t1" Column | Type | Collation | Nullable | Default +-+---+--+- i | integer | | | t | text| | | Check constraints: "t1_i_check" CHECK (i <> 0) DEFERRABLE Now we can have a deferrable CHECK constraint, and we can defer the constraint validation: ‘postgres[579453]=#’BEGIN; BEGIN ‘postgres[579453]=#*’SET CONSTRAINTS t1_i_check DEFERRED; SET CONSTRAINTS ‘postgres[579453]=#*’INSERT INTO t1 VALUES (0, 'one'); -- should succeed INSERT 0 1 ‘postgres[579453]=#*’UPDATE t1 SET i = 1 WHERE t = 'one'; UPDATE 1 ‘postgres[579453]=#*’COMMIT; -- should succeed COMMIT Attaching the initial patch, I will improve it with documentation in my next version of the patch. thoughts? -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com From 1eb72b14a3a6914e893854508a071ae835d23245 Mon Sep 17 00:00:00 2001 From: Himanshu Upadhyaya Date: Wed, 5 Jul 2023 14:54:37 +0530 Subject: [PATCH v1] Implementation of "CHECK Constraint" to make it Deferrable. --- src/backend/access/common/tupdesc.c | 2 + src/backend/catalog/heap.c| 50 -- src/backend/commands/constraint.c | 116 ++ src/backend/commands/copyfrom.c | 10 +- src/backend/commands/tablecmds.c | 8 ++ src/backend/commands/trigger.c| 43 ++-- src/backend/executor/execMain.c | 33 +- src/backend/executor/execReplication.c| 10 +- src/backend/executor/nodeModifyTable.c| 29 +++--- src/backend/parser/gram.y | 2 +- src/backend/parser/parse_utilcmd.c| 9 +- src/backend/utils/cache/relcache.c| 2 + src/include/access/tupdesc.h | 2 + src/include/catalog/heap.h| 2 + src/include/catalog/pg_proc.dat | 5 + src/include/commands/trigger.h| 2 + src/include/executor/executor.h | 42 +++- src/test/regress/expected/constraints.out | 98 ++ src/test/regress/sql/constraints.sql | 99 ++ 19 files changed, 518 insertions(+), 46 deletions(-) diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c index 7c5c390503..098cb27932 100644 --- a/src/backend/access/common/tupdesc.c +++ b/src/backend/access/common/tupdesc.c @@ -204,6 +204,8 @@ CreateTupleDescCopyConstr(TupleDesc tupdesc) cpy->check[i].ccbin = pstrdup(constr->check[i].ccbin); cpy->check[i].ccvalid = constr->check[i].ccvalid; cpy->check[i].ccnoinherit = constr->check[i].ccnoinherit; +cpy->check[i].ccdeferrable = constr->check[i].ccdeferrable; +cpy->check[i].ccdeferred = constr->check[i].ccdeferred; } } diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 2a0d82aedd..8a1e8e266f 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -52,17 +52,20 @@ #include "catalog/pg_statistic.h" #include "catalog/pg_subscription_rel.h" #include "catalog/pg_tablespace.h" +#include "catalog/pg_trigger.h" #include "catalog/pg_type.h" #include "catalog/storage.h" #include "commands/tablecmds.h" #include "commands/typecmds.h" #include "miscadmin.h" +#include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" #include "optimizer/optimizer.h" #include "parser/parse_coerce.h" #include "parser/parse_collate.h" #include "parser/parse_expr.h" #include "parser/parse_relation.h" +#include "parser/parser.h" #include "parser/parsetree.h" #include "partitioning/partdesc.h" #include "pgstat.h" @@ -102,7 +105,8 @@ static ObjectAddress AddNewRelationType(const char *typeName, static void RelationRemoveInheritance(Oid relid); static Oid StoreRelCheck(Relation rel, const char *ccname, Node *expr, bool is_validated, bool is_local, int inhcount, - bool is_no_inherit, bool is_internal); + bool is_no_inherit, bool is_internal, + bool is_deferrable, bool initdeferred); static void StoreConstraints(Relation rel, List *cooked_constraints, bool is_internal); static bool MergeWithExistingConstraint(Relation rel, const char *ccname, Node *expr, @@ -2063,13 +2067,15 @@ SetAttrMissing(Oid relid, char *attname, char *value) static Oid StoreRelCheck(Relation rel, const char *ccname, Node *expr, bool is_validated, bool is_local, int inhcount, - bool is_no_inherit, bool is_internal) + bool is_no_inherit, bool is_internal, + bool is_deferrable, bool initdeferred) { char *ccbin; List *varList; int keycount
Re: pg_waldump: add test for coverage
On 29.06.23 21:16, Tristen Raab wrote: I've reviewed your latest v3 patches on Ubuntu 23.04. Both patches apply correctly and all the tests run and pass as they should. Execution time was normal for me, I didn't notice any significant latency when compared to other tests. The only other feedback I can provide would be to add test coverage to some of the other options that aren't currently covered (ie. --bkp-details, --end, --follow, --path, etc.) for completeness. Other than that, this looks like a great patch. Committed. I added a test for the --quiet option. --end and --path are covered. The only options not covered now are -b, --bkp-details output detailed information about backup blocks -f, --follow keep retrying after reaching end of WAL -t, --timeline=TLI timeline from which to read WAL records -x, --xid=XID only show records with transaction ID XID --follow is a bit tricky to test because you need to leave pg_waldump running in the background for a while, or something like that. --timeline and --xid can be tested but would need some work on the underlying test data (such as creating more than one timeline). I don't know much about --bkp-details, so I don't have a good idea how to test it. So I'll leave those as projects for the future.
Re: pg_decode_message vs skip_empty_xacts and xact_wrote_changes
On Wed, Jul 5, 2023 at 2:28 PM Amit Kapila wrote: > > On Mon, Jul 3, 2023 at 4:49 PM vignesh C wrote: > > > > +1 for the first version patch, I also felt the first version is > > easily understandable. > > > > Okay, please find the slightly updated version (changed a comment and > commit message). Unless there are more comments, I'll push this in a > day or two. > oops, forgot to attach the patch. -- With Regards, Amit Kapila. v3-0001-Add-BEGIN-COMMIT-for-transactional-messages-durin.patch Description: Binary data
Re: pg_decode_message vs skip_empty_xacts and xact_wrote_changes
On Mon, Jul 3, 2023 at 4:49 PM vignesh C wrote: > > +1 for the first version patch, I also felt the first version is > easily understandable. > Okay, please find the slightly updated version (changed a comment and commit message). Unless there are more comments, I'll push this in a day or two. -- With Regards, Amit Kapila.
Re: pg_upgrade and cross-library upgrades
On 2023-Jul-05, Michael Paquier wrote: > The same thing as HEAD could be done on its back-branches by removing > --with-openssl and bring more consistency, but pg_upgrade has never > been good at handling upgrades with different library requirements. > Something I am wondering is if AdjustUpgrade.pm could gain more > knowledge in this area, though I am unsure how much could be achieved > as this module has only object-level knowledge. Hmm, let's explore the AdjustUpgrade.pm path a bit more: 002_pg_upgrade.pl can test for presence or absence of pgcrypto by grepping pg_config --configure for --with-ssl or --with-openssl. If the old cluster has it but the new doesn't, we must drop the contrib_regression_pgcrypto database. I think we would need a new function in AdjustUpgrade.pm (or an API change to adjust_database_contents) so that we can add the DROP DATABASE command conditionally. This seems easily extended to contrib/xml2 also. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)
Re: Parallel CREATE INDEX for BRIN indexes
On Wed, 5 Jul 2023 at 00:08, Tomas Vondra wrote: > > > > On 7/4/23 23:53, Matthias van de Meent wrote: > > On Thu, 8 Jun 2023 at 14:55, Tomas Vondra > > wrote: > >> > >> Hi, > >> > >> Here's a WIP patch allowing parallel CREATE INDEX for BRIN indexes. The > >> infrastructure (starting workers etc.) is "inspired" by the BTREE code > >> (i.e. copied from that and massaged a bit to call brin stuff). > > > > Nice work. > > > >> In both cases _brin_end_parallel then reads the summaries from worker > >> files, and adds them into the index. In 0001 this is fairly simple, > >> although we could do one more improvement and sort the ranges by range > >> start to make the index nicer (and possibly a bit more efficient). This > >> should be simple, because the per-worker results are already sorted like > >> that (so a merge sort in _brin_end_parallel would be enough). > > > > I see that you manually built the passing and sorting of tuples > > between workers, but can't we use the parallel tuplesort > > infrastructure for that? It already has similar features in place and > > improves code commonality. > > > > Maybe. I wasn't that familiar with what parallel tuplesort can and can't > do, and the little I knew I managed to forget since I wrote this patch. > Which similar features do you have in mind? > > The workers are producing the results in "start_block" order, so if they > pass that to the leader, it probably can do the usual merge sort. > > >> For 0002 it's a bit more complicated, because with a single parallel > >> scan brinbuildCallbackParallel can't decide if a range is assigned to a > >> different worker or empty. And we want to generate summaries for empty > >> ranges in the index. We could either skip such range during index build, > >> and then add empty summaries in _brin_end_parallel (if needed), or add > >> them and then merge them using "union". > >> > >> > >> I just realized there's a third option to do this - we could just do > >> regular parallel scan (with no particular regard to pagesPerRange), and > >> then do "union" when merging results from workers. It doesn't require > >> the sequence of TID scans, and the union would also handle the empty > >> ranges. The per-worker results might be much larger, though, because > >> each worker might produce up to the "full" BRIN index. > > > > Would it be too much effort to add a 'min_chunk_size' argument to > > table_beginscan_parallel (or ParallelTableScanDesc) that defines the > > minimum granularity of block ranges to be assigned to each process? I > > think that would be the most elegant solution that would require > > relatively little effort: table_block_parallelscan_nextpage already > > does parallel management of multiple chunk sizes, and I think this > > modification would fit quite well in that code. > > > > I'm confused. Isn't that pretty much exactly what 0002 does? I mean, > that passes pagesPerRange to table_parallelscan_initialize(), so that > each pagesPerRange is assigned to a single worker. Huh, I overlooked that one... Sorry for that. > The trouble I described above is that the scan returns tuples, and the > consumer has no idea what was the chunk size or how many other workers > are there. Imagine you get a tuple from block 1, and then a tuple from > block 1000. Does that mean that the blocks in between are empty or that > they were processed by some other worker? If the unit of work for parallel table scans is the index's pages_per_range, then I think we can just fill in expected-but-missing ranges as 'empty' in the parallel leader during index loading, like the first of the two solutions you proposed. Kind regards, Matthias van de Meent Neon (https://neon.tech/)
Re: Wrong syntax in feature description
> On 4 Jun 2023, at 18:48, Amit Kapila wrote: > > On Fri, Jun 2, 2023 at 7:01 PM Peter Smith wrote: >> >> Hi, I noticed a feature description [1] referring to a command example: >> >> CREATE PUBLICATION ... FOR ALL TABLES IN SCHEMA >> >> ~~ >> >> AFAIK that should say "FOR TABLES IN SCHEMA" (without the "ALL", see [2]) > > Right, this should be changed. Agreed, so I've fixed this in the featurematrix on the site. I will mark this CF entry as committed even though there is nothing to commit (the featurematrix is stored in the postgresql.org django instance) since there was a change performed. Thanks for the report! -- Daniel Gustafsson
Re: evtcache: EventTriggerCache vs Event Trigger Cache
> On 8 May 2023, at 10:39, Daniel Gustafsson wrote: >> On 4 May 2023, at 14:18, Daniel Gustafsson wrote: >>> On 4 May 2023, at 14:09, Tom Lane wrote: >>> How about naming >>> the hash "EventTriggerCacheHash" or so? >> >> I think the level is the indicator here, but I have no strong opinions, >> EventTriggerCacheHash is fine by me. > > The attached trivial diff does that, parking this in the next CF. Pushed, thanks! -- Daniel Gustafsson
Re: [PATCH] Add loongarch native checksum implementation.
On Wed, Jul 5, 2023 at 9:16 AM YANG Xudong wrote: > > Is there any other comment? It's only been a few weeks since the last patch, and this is not an urgent bugfix, so there is no reason to ping the thread. Feature freeze will likely be in April of next year. Also, please don't top-post (which means: quoting an entire message, with new text at the top) -- it clutters our archives. Before I look at this again: Are there any objections to another CRC implementation for the reason of having no buildfarm member? -- John Naylor EDB: http://www.enterprisedb.com
Re: On /*----- comments
On 04/07/2023 21:36, Tom Lane wrote: Heikki Linnakangas writes: Pushed a patch to remove the end-guard from the example in the pgindent README. And fixed the bogus end-guard in walsender.c. I don't see any actual push? Forgot it after all. Pushed now, thanks. -- Heikki Linnakangas Neon (https://neon.tech)