Re: Doc limitation update proposal: include out-of-line OID usage per TOAST-ed columns
On Wed, Apr 26, 2023 at 4:48 AM David Rowley wrote: > > On Sun, 23 Apr 2023, 3:42 am Gurjeet Singh, wrote: >> >> I anticipate that edits to Appendix K Postgres Limits will prompt >> improving the note in there about the maximum column limit, That note >> is too wordy, and sometimes confusing, especially for the audience >> that it's written for: newcomers to Postgres ecosystem. > > > I doubt it, but feel free to submit a patch yourself which improves it > without losing the information which the paragraph is trying to convey. I could not think of a way to reduce the wordiness without losing information. But since this page is usually consulted by those who are new to Postgres, usually sent here by a search engine, I believe the page can be improved for that audience, without losing much in terms of accuracy. I agree the information provided in the paragraph about max-columns is pertinent. But since the limits section is most often consulted by people migrating from other database systems (hence the claim that they're new to the Postgres ecosystem), I imagine the terminology used there may cause confusion for the reader. So my suggestion is to make that paragraph, and perhaps even that page, use fewer hacker/internals terms. Technically, there may be a difference between table vs. relation, row vs. tuple, and column vs. field. But using those terms, seemingly interchangeably on that page does not help the reader. The page neither describes the terms, nor links to their definitions, so a reader is left with more questions than before. For example, > rows per table:: limited by the number of tuples that can fit onto > 4,294,967,295 pages A newcomer> what's a tuple in this context, and how is it similar to/different from a row? Please see attached the proposed patch, which attempts to make that language newcomer-friendly. The patch adds one link for TOAST, and replaces Postgres-specific terms with generic ones. PS: I've retained line boundaries, so that `git diff --color-words doc/src/sgml/limits.sgml` would make it easy to see the changes. Best regards, Gurjeet http://Gurje.et limits-generic-terms.diff Description: Binary data
Re: Fix pg_stat_reset_single_table_counters function
On 2023-08-21 13:34, Michael Paquier wrote: On Mon, Aug 21, 2023 at 11:33:28AM +0900, Kyotaro Horiguchi wrote: I'm not sure about others, but I would avoid using the name "current_database" for the variable. Agreed. Looking at the surroundings, we have for example "datname" in the collation tests so I have just used that, and applied the patch down to 15. Thanks! Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Re: Fix pg_stat_reset_single_table_counters function
On Mon, Aug 21, 2023 at 11:33:28AM +0900, Kyotaro Horiguchi wrote: > I'm not sure about others, but I would avoid using the name > "current_database" for the variable. Agreed. Looking at the surroundings, we have for example "datname" in the collation tests so I have just used that, and applied the patch down to 15. -- Michael signature.asc Description: PGP signature
Re: Fix pg_stat_reset_single_table_counters function
On 2023-08-21 11:33, Kyotaro Horiguchi wrote: On 2023/08/15 14:13, Masahiro Ikeda wrote: On 2023-08-15 11:48, Masahiko Sawada wrote: +COMMENT ON DATABASE :current_database IS 'This is a test comment'; -- insert or update in 'pg_shdescription' I think the current_database should be quoted (see other examples where using current_database(), e.g. collate.linux.utf8.sql). Also it would be better to reset the comment after the test. I'm not sure about others, but I would avoid using the name "current_database" for the variable. I would just use "database" or "db" instead. Thanks! I agree your comment. I fixed in v5 patch. Regards, -- Masahiro Ikeda NTT DATA CORPORATIONFrom 5980c041e33fd180e43868cbdc878ac0cf93112e Mon Sep 17 00:00:00 2001 From: Masahiro Ikeda Date: Mon, 21 Aug 2023 13:24:27 +0900 Subject: [PATCH] Fix pg_stat_reset_single_table_counters function. This commit revives the feature to reset statistics for a single relation shared across all databases in the cluster to zero, which was implemented by the following commit. * Enhance pg_stat_reset_single_table_counters function(e04267844) The following commit accidentally deleted the feature. * pgstat: store statistics in shared memory(5891c7a8e) Need to backpatch from 15. Reported-by: Mitsuru Hinata --- src/backend/utils/adt/pgstatfuncs.c | 9 +-- src/test/regress/expected/stats.out | 42 + src/test/regress/sql/stats.sql | 26 ++ 3 files changed, 75 insertions(+), 2 deletions(-) diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index 2a4c8ef87f..2b9742ad21 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -17,6 +17,7 @@ #include "access/htup_details.h" #include "access/xlog.h" #include "access/xlogprefetcher.h" +#include "catalog/catalog.h" #include "catalog/pg_authid.h" #include "catalog/pg_type.h" #include "common/ip.h" @@ -1776,13 +1777,17 @@ pg_stat_reset_shared(PG_FUNCTION_ARGS) PG_RETURN_VOID(); } -/* Reset a single counter in the current database */ +/* + * Reset a statistics for a single object, which may be of current + * database or shared across all databases in the cluster. + */ Datum pg_stat_reset_single_table_counters(PG_FUNCTION_ARGS) { Oid taboid = PG_GETARG_OID(0); + Oid dboid = (IsSharedRelation(taboid) ? InvalidOid : MyDatabaseId); - pgstat_reset(PGSTAT_KIND_RELATION, MyDatabaseId, taboid); + pgstat_reset(PGSTAT_KIND_RELATION, dboid, taboid); PG_RETURN_VOID(); } diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out index 319164a5e9..9985579e18 100644 --- a/src/test/regress/expected/stats.out +++ b/src/test/regress/expected/stats.out @@ -764,6 +764,48 @@ FROM pg_stat_all_tables WHERE relid = 'test_last_scan'::regclass; 2 | t |3 | t (1 row) +- +-- Test to reset stats for a table shared across all databases (ex. pg_shdescription) +- +-- store the old comment to reset +SELECT shobj_description(d.oid, 'pg_database') as description_before +FROM pg_database d WHERE datname = current_database() \gset +-- update the stats in pg_shdescription +BEGIN; +SELECT current_database() as db \gset +COMMENT ON DATABASE :"db" IS 'This is a test comment'; +SELECT pg_stat_force_next_flush(); + pg_stat_force_next_flush +-- + +(1 row) + +COMMIT; +-- check to reset the stats +SELECT n_tup_ins + n_tup_upd > 0 FROM pg_stat_all_tables WHERE relid = 'pg_shdescription'::regclass; + ?column? +-- + t +(1 row) + +SELECT pg_stat_reset_single_table_counters('pg_shdescription'::regclass); + pg_stat_reset_single_table_counters +- + +(1 row) + +SELECT n_tup_ins + n_tup_upd FROM pg_stat_all_tables WHERE relid = 'pg_shdescription'::regclass; + ?column? +-- +0 +(1 row) + +-- cleanup the comment +\if :{?description_before} + COMMENT ON DATABASE :"db" IS :'description_before'; +\else + COMMENT ON DATABASE :"db" IS NULL; +\endif - -- Test that various stats views are being properly populated - diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql index 9a16df1c49..5393b18faa 100644 --- a/src/test/regress/sql/stats.sql +++ b/src/test/regress/sql/stats.sql @@ -376,6 +376,32 @@ COMMIT; SELECT seq_scan, :'test_last_seq' = last_seq_scan AS seq_ok, idx_scan, :'test_last_idx' < last_idx_scan AS idx_ok FROM pg_stat_all_tables WHERE relid = 'test_last_scan'::regclass; +- +-- Test to reset stats for a table shared across all databases (ex. pg_shdescription) +- + +-- store the old comment to reset +SELECT shobj_description(d.oid, 'pg_database') as description_before +FROM pg_database d WHERE datname = current_database() \gset + +-- update the stats in pg_shdescription +BEGIN; +SELECT current_database() as db \gset +COMMENT ON DATABASE :"db" IS 'This is a test comment'; +SELECT pg_stat_force_next_flush(); +COMMI
Re: Make --help output fit within 80 columns per line
On 2023-07-05 10:47, torikoshia wrote: Hi, As discussed in [1], outputs of --help for some commands fits into 80 columns per line, while others do not. Since it seems preferable to have consistent line break policy and some people use 80-column terminal, wouldn't it be better to make all commands in 80 columns per line? Attached patch which does this for src/bin commands. If this is the way to go, I'll do same things for contrib commands. [1] https://www.postgresql.org/message-id/3fe4af5a0a81fc6a2ec01cb484c0a487%40oss.nttdata.com Thanks for making the patches! I have some comments to v1 patch. (1) Why don't you add test for the purpose? It could be overkill... I though the following function is the best place. diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm index 617caa022f..1bdb81ac56 100644 --- a/src/test/perl/PostgreSQL/Test/Utils.pm +++ b/src/test/perl/PostgreSQL/Test/Utils.pm @@ -843,6 +843,10 @@ sub program_help_ok ok($result, "$cmd --help exit code 0"); isnt($stdout, '', "$cmd --help goes to stdout"); is($stderr, '', "$cmd --help nothing to stderr"); + foreach my $line (split /\n/, $stdout) + { + ok(length($line) <= 80, "$cmd --help output fit within 80 columns per line"); + } return; } (2) Is there any reason that only src/bin commands are targeted? I found that we also need to fix vacuumlo with the above test. I think it's better to fix it because it's a contrib module. $ vacuumlo --help | while IFS='' read line; do echo $((`echo $line | wc -m` - 1)) $line; done | sort -n -r | head -n 2 84 -n, --dry-run don't remove large objects, just show what would be done 74 -l, --limit=LIMIT commit after removing each LIMIT large objects (3) Is to delete '/mnt/server' intended? I though it better to leave it as is since archive_cleanup_command example uses the absolute path. - " pg_archivecleanup /mnt/server/archiverdir 00010010.0020.backup\n")); + " pg_archivecleanup archiverdir 00010010.0020.backup\n")); I will confirmed that the --help text are not changed and only the line breaks are changed. But, currently the above change break it. (4) I found that some binaries, for example ecpg, are not tested with program_help_ok(). Is it better to add tests in the patch? BTW, I check the difference with the following commands # files include "--help" $ find -name "*.c" | xargs -I {} sh -c 'if [ `grep -e --help {} | wc -l` -gt 0 ]; then echo {}; fi' # programs which is tested with program_help_ok $ find -name "*.pl" | xargs -I {} sh -c 'grep -e program_help_ok {}' Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Support run-time partition pruning for hash join
If we have a hash join with an Append node on the outer side, something like Hash Join Hash Cond: (pt.a = t.a) -> Append -> Seq Scan on pt_p1 pt_1 -> Seq Scan on pt_p2 pt_2 -> Seq Scan on pt_p3 pt_3 -> Hash -> Seq Scan on t We can actually prune those subnodes of the Append that cannot possibly contain any matching tuples from the other side of the join. To do that, when building the Hash table, for each row from the inner side we can compute the minimum set of subnodes that can possibly match the join condition. When we have built the Hash table and start to execute the Append node, we should have known which subnodes are survived and thus can skip other subnodes. This kind of partition pruning can be extended to happen across multiple join levels. For instance, Hash Join Hash Cond: (pt.a = t2.a) -> Hash Join Hash Cond: (pt.a = t1.a) -> Append -> Seq Scan on pt_p1 pt_1 -> Seq Scan on pt_p2 pt_2 -> Seq Scan on pt_p3 pt_3 -> Hash -> Seq Scan on t1 -> Hash -> Seq Scan on t2 We can compute the matching subnodes of the Append when building Hash table for 't1' according to the join condition 'pt.a = t1.a', and when building Hash table for 't2' according to join condition 'pt.a = t2.a', and the final surviving subnodes would be their intersection. Greenplum [1] has implemented this kind of partition pruning as 'Partition Selector'. Attached is a patch that refactores Greenplum's implementation to make it work on PostgreSQL master. Here are some details about the patch. During planning: 1. When creating a hash join plan in create_hashjoin_plan() we first collect information required to build PartitionPruneInfos at this join, which includes the join's RestrictInfos and the join's inner relids, and put this information in a stack. 2. When we call create_append_plan() for an appendrel, for each of the joins we check if join partition pruning is possible to take place for this appendrel, based on the information collected at that join, and if so build a PartitionPruneInfo and add it to the stack entry. 3. After finishing the outer side of the hash join, we should have built all the PartitionPruneInfos that can be used to perform join partition pruning at this join. So we pop out the stack entry to get the PartitionPruneInfos and add them to Hash node. During executing: When building the hash table for a hash join, we perform the partition prunning for each row according to each of the JoinPartitionPruneStates at this join, and store each result in a special executor parameter to make it available to Append nodes. When executing an Append node, we can directly use the pre-computed pruning results to skip subnodes that cannot contain any matching rows. Here is a query that shows the effect of the join partition prunning. CREATE TABLE pt (a int, b int, c varchar) PARTITION BY RANGE(a); CREATE TABLE pt_p1 PARTITION OF pt FOR VALUES FROM (0) TO (250); CREATE TABLE pt_p2 PARTITION OF pt FOR VALUES FROM (250) TO (500); CREATE TABLE pt_p3 PARTITION OF pt FOR VALUES FROM (500) TO (600); INSERT INTO pt SELECT i, i % 25, to_char(i, 'FM') FROM generate_series(0, 599) i WHERE i % 2 = 0; CREATE TABLE t1 (a int, b int); INSERT INTO t1 values (10, 10); CREATE TABLE t2 (a int, b int); INSERT INTO t2 values (300, 300); ANALYZE pt, t1, t2; SET enable_nestloop TO off; explain (analyze, costs off, summary off, timing off) select * from pt join t1 on pt.a = t1.a right join t2 on pt.a = t2.a; QUERY PLAN Hash Right Join (actual rows=1 loops=1) Hash Cond: (pt.a = t2.a) -> Hash Join (actual rows=0 loops=1) Hash Cond: (pt.a = t1.a) -> Append (actual rows=0 loops=1) -> Seq Scan on pt_p1 pt_1 (never executed) -> Seq Scan on pt_p2 pt_2 (never executed) -> Seq Scan on pt_p3 pt_3 (never executed) -> Hash (actual rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on t1 (actual rows=1 loops=1) -> Hash (actual rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on t2 (actual rows=1 loops=1) (14 rows) There are several points that need more consideration. 1. All the join partition prunning decisions are made in createplan.c where the best path tree has been decided. This is not great. Maybe it's better to make it happen when we build up the path tree, so that we can take the partition prunning into consideration when estimating the costs. 2. In order to make the join partition prunning take effect, the patch hacks the empty-outer optimization in ExecHashJoinImpl(). Not sure if this is a good practice. 3. This patch does not support parallel has
Re: [PoC] pg_upgrade: allow to upgrade publisher node
On Sun, Aug 20, 2023 at 6:49 PM Masahiko Sawada wrote: > > On Thu, Aug 17, 2023 at 10:31 PM Amit Kapila wrote: > > > > > > > > Sorry I was not clear. I meant the logical replication slots that are > > > *not* used by logical replication, i.e., are created manually and used > > > by third party tools that periodically consume decoded changes. As we > > > discussed before, these slots will never be able to pass that > > > confirmed_flush_lsn check. > > > > > > > I think normally one would have a background process to periodically > > consume changes. Won't one can use the walsender infrastructure for > > their plugins to consume changes probably by using replication > > protocol? > > Not sure. > I think one can use Streaming Replication Protocol to achieve it [1]. > > Also, I feel it is the plugin author's responsibility to > > consume changes or advance slot to the required position before > > shutdown. > > How does the plugin author ensure that the slot consumes all WAL > records including shutdown_checkpoint before shutdown? > By using "Streaming Replication Protocol" so that walsender can take care of it. If not, I think users should drop such slots before the upgrade because anyway, they won't be usable after the upgrade. > > > > > After some thoughts, one thing we might > > > need to consider is that in practice, the upgrade project is performed > > > during the maintenance window and has a backup plan that revert the > > > upgrade process, in case something bad happens. If we require the > > > users to drop such logical replication slots, they cannot resume to > > > use the old cluster in that case, since they would need to create new > > > slots, missing some changes. > > > > > > > Can't one keep the backup before removing slots? > > Yes, but restoring the back could take time. > > > > > > Other checks in pg_upgrade seem to be > > > compatibility checks that would eventually be required for the upgrade > > > anyway. Do we need to consider this case? For example, we do that > > > confirmed_flush_lsn check for only the slots with pgoutput plugin. > > > > > > > I think one is allowed to use pgoutput plugin even for manually > > created slots. So, such a check may not work. > > Right, but I thought it's a very rare case. > Okay, but not sure that we can ignore it. > Since the slot's flushed_confirmed_lsn check is not a compatibility > check unlike the existing check, I wonder if we can make it optional. > There are arguments both ways. Initially, the patch proposed to make them optional by having an option like --include-logical-replication-slots but Jonathan raised a point that it will be more work for users and should be the default. Then we also discussed having an option like --exclude-logical-replication-slots but as we don't have any other similar option, it doesn't seem natural to add such an option. Also, I am afraid, if there is no user of such an option, it won't be worth it. BTW, how would you like to see it as an optional (via --include or via --exclude switch)? Personally, I am okay to make it optional if we have a broader consensus. My preference would be to have an --exclude kind of option. How about first getting the main patch reviewed and committed, then based on consensus, we can decide whether to make it optional and if so, what is the preferred way? [1] - https://www.postgresql.org/docs/current/protocol-replication.html -- With Regards, Amit Kapila.
Re: Extract numeric filed in JSONB more effectively
On 2023-08-20 21:31, Andy Fan wrote: Highlighting the user case of makeRelableType is interesting! But using the Oid directly looks more promising for this question IMO, it looks like: "you said we can put anything in this arg, so I put an OID const here", seems nothing is wrong. Perhaps one of the more senior developers will chime in, but to me, leaving out the relabel nodes looks more like "all of PostgreSQL's type checking happened before the SupportRequestSimplify, so nothing has noticed that we rewrote the tree with mismatched types, and as long as nothing crashes we sort of got away with it." Suppose somebody writes an extension to double-check that plan trees are correctly typed. Or improves EXPLAIN to check a little more carefully than it seems to. Omitting the relabel nodes could spell trouble then. Or, someone more familiar with the code than I am might say "oh, mismatches like that are common in rewritten trees, we live with it." But unless somebody tells me that, I'm not believing it. But I would say we have proved the concept of SupportRequestSimplify for this task. :) Now, it would make me happy to further reduce some of the code duplication between the original and the _type versions of these functions. I see that you noticed the duplication in the case of jsonb_extract_path, and you factored out jsonb_get_jsonbvalue so it could be reused. There is also some duplication with object_field and array_element. (Also, we may have overlooked jsonb_path_query and jsonb_path_query_first as candidates for the source of the cast. Two more candidates; five total.) Here is one way this could be structured. Observe that every one of those five source candidates operates in two stages: Start: All of the processing until a JsonbValue has been obtained. Finish: Converting the JsonbValue to some form for return. Before this patch, there were two choices for Finish: JsonbValueToJsonb or JsonbValueAsText. With this patch, there are four Finish choices: those two, plus PG_RETURN_BOOL(v->val.boolean), PG_RETURN_NUMERIC(v->val.numeric). Clearly, with rewriting, we can avoid 5✕4 = 20 distinct functions. The five candidate functions only differ in Start. Suppose each of those had a _start version, like jsonb_object_field_start, that only proceeds as far as obtaining the JsonbValue, and returns that directly (an 'internal' return type). Naturally, each _start function would need an 'internal' parameter also, even if it isn't used, just to make sure it is not SQL-callable. Now consider four Finish functions: jsonb_finish_jsonb, jsonb_finish_text, jsonb_finish_boolean, jsonb_finish_numeric. Each would have one 'internal' parameter (a JsonbValue), and its return type declared normally. There is no need to pass a type oid to any of these, and they need not contain any switch to select a return type. The correct finisher to use is simply chosen once at the time of rewriting. So cast(jsonb_array_element(jb, 0) as numeric) would just get rewritten as jsonb_finish_numeric(jsonb_array_element_start(jb,0)). The other (int and float) types don't need new code; just have the rewriter add a cast-from-numeric node on top. That's all those other switch cases in cast_jsonbvalue_to_type are doing, anyway. Notice in this structure, less relabeling is needed. The final return does not need relabeling, because each finish function has the expected return type. Each finish function's parameter is typed 'internal' (a JsonbValue), but that's just what each start function returns, so no relabeling needed there either. The rewriter will have to supply some 'internal' constant as a start-function parameter (because of the necessary 'internal' parameter). It might still be civilized to relabel that. Regards, -Chap
Re: Fix pg_stat_reset_single_table_counters function
Apologies. In the previous mail, I mistakenly addressed it to the wrong recipients. Reposted. On 2023/08/15 14:13, Masahiro Ikeda wrote: On 2023-08-15 11:48, Masahiko Sawada wrote: +COMMENT ON DATABASE :current_database IS 'This is a test comment'; -- insert or update in 'pg_shdescription' I think the current_database should be quoted (see other examples where using current_database(), e.g. collate.linux.utf8.sql). Also it would be better to reset the comment after the test. Thanks! I fixed the issues in the v4 patch. I'm not sure about others, but I would avoid using the name "current_database" for the variable. I would just use "database" or "db" instead. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Rethink the wait event names for postgres_fdw, dblink and etc
Hi, Thanks for your comments. I updated the patch to v2. * Update a comment instead writing documentation about the wait events for pg_prewarm. * Make the name of wait events which are different code path different. Add DblinkGetConnect and PgPrewarmDumpShutdown. * Make variable names shorter like pgfdw_we_receive. * Update documents. * Add some tests with pg_wait_events view. Regards, -- Masahiro Ikeda NTT DATA CORPORATIONFrom 025525eae164e33117d94f2a90a4219808072b3c Mon Sep 17 00:00:00 2001 From: Masahiro Ikeda Date: Mon, 21 Aug 2023 10:36:10 +0900 Subject: [PATCH] Make to use custom wait events for modules --- contrib/dblink/dblink.c | 16 +++- contrib/dblink/expected/dblink.out| 9 + contrib/dblink/sql/dblink.sql | 4 ++ contrib/pg_prewarm/autoprewarm.c | 22 ++- contrib/postgres_fdw/connection.c | 23 +-- .../postgres_fdw/expected/postgres_fdw.out| 12 ++ contrib/postgres_fdw/sql/postgres_fdw.sql | 7 doc/src/sgml/dblink.sgml | 28 + doc/src/sgml/postgres-fdw.sgml| 39 +++ doc/src/sgml/xfunc.sgml | 6 +-- src/test/modules/test_shm_mq/setup.c | 9 - src/test/modules/test_shm_mq/test.c | 9 - .../modules/worker_spi/t/001_worker_spi.pl| 12 +++--- src/test/modules/worker_spi/worker_spi.c | 2 +- 14 files changed, 179 insertions(+), 19 deletions(-) diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 41e1f6c91d..104d6e9f37 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -130,6 +130,10 @@ static void restoreLocalGucs(int nestlevel); static remoteConn *pconn = NULL; static HTAB *remoteConnHash = NULL; +/* value cached, fetched from shared memory */ +static uint32 dblink_we_get_conn = 0; +static uint32 dblink_we_conn = 0; + /* * Following is list that holds multiple remote connections. * Calling convention of each dblink function changes to accept @@ -202,8 +206,12 @@ dblink_get_conn(char *conname_or_str, connstr = conname_or_str; dblink_connstr_check(connstr); + /* first time, allocate or get the custom wait event */ + if (dblink_we_get_conn == 0) + dblink_we_get_conn = WaitEventExtensionNew("DblinkGetConnect"); + /* OK to make connection */ - conn = libpqsrv_connect(connstr, WAIT_EVENT_EXTENSION); + conn = libpqsrv_connect(connstr, dblink_we_get_conn); if (PQstatus(conn) == CONNECTION_BAD) { @@ -292,8 +300,12 @@ dblink_connect(PG_FUNCTION_ARGS) /* check password in connection string if not superuser */ dblink_connstr_check(connstr); + /* first time, allocate or get the custom wait event */ + if (dblink_we_conn == 0) + dblink_we_conn = WaitEventExtensionNew("DblinkConnect"); + /* OK to make connection */ - conn = libpqsrv_connect(connstr, WAIT_EVENT_EXTENSION); + conn = libpqsrv_connect(connstr, dblink_we_conn); if (PQstatus(conn) == CONNECTION_BAD) { diff --git a/contrib/dblink/expected/dblink.out b/contrib/dblink/expected/dblink.out index 7809f58d96..c17c7b1361 100644 --- a/contrib/dblink/expected/dblink.out +++ b/contrib/dblink/expected/dblink.out @@ -1209,6 +1209,15 @@ SHOW intervalstyle; postgres (1 row) +-- Check custom wait events are registered +SELECT name FROM pg_wait_events WHERE name ~ '^Dblink' +ORDER BY name COLLATE "C"; + name +-- + DblinkConnect + DblinkGetConnect +(2 rows) + -- Clean up GUC-setting tests SELECT dblink_disconnect('myconn'); dblink_disconnect diff --git a/contrib/dblink/sql/dblink.sql b/contrib/dblink/sql/dblink.sql index 7870ce5d5a..519b3f5266 100644 --- a/contrib/dblink/sql/dblink.sql +++ b/contrib/dblink/sql/dblink.sql @@ -627,6 +627,10 @@ FROM dblink_fetch('myconn','error_cursor', 1) AS t(i int); SHOW datestyle; SHOW intervalstyle; +-- Check custom wait events are registered +SELECT name FROM pg_wait_events WHERE name ~ '^Dblink' +ORDER BY name COLLATE "C"; + -- Clean up GUC-setting tests SELECT dblink_disconnect('myconn'); RESET datestyle; diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c index d0efc9e524..2160d40705 100644 --- a/contrib/pg_prewarm/autoprewarm.c +++ b/contrib/pg_prewarm/autoprewarm.c @@ -105,6 +105,16 @@ static AutoPrewarmSharedState *apw_state = NULL; static bool autoprewarm = true; /* start worker? */ static int autoprewarm_interval = 300; /* dump interval */ +/* + * Cached custom wait events, fetched from shared memory. + * + * Note that they are not reported on pg_stat_activity. pgstat_bestart() isn't + * called because the autoprewarm worker isn't an auxiliary process and it + * doesn't connect to a database. + */ +static uint32 autoprewarm_we_shutdown = 0; +static uint32 autoprewarm_we_delay = 0; + /* * Module load callback. */ @@ -233,11 +243,15 @@ autoprewarm_main(Datum main_arg) if (autoprewarm_interval <= 0) { +
Re: Extract numeric filed in JSONB more effectively
On Sat, Aug 19, 2023 at 3:09 AM Chapman Flack wrote: > On 2023-08-18 14:50, Chapman Flack wrote: > > Now, my guess is EXPLAIN is complaining when it sees the Const > > of type internal, and doesn't know how to show that value. > > Perhaps makeRelabelType is the answer there, too: what if the > > Const has Oid type, so EXPLAIN can show it, and what's inserted > > as the function argument is a relabel node saying it's internal? > Simply changing the Const to be of type Oid makes EXPLAIN happy, > and nothing ever says "hey, why are you passing this oid for an > arg that wants internal?". This is without adding any relabel > nodes anywhere. > Highlighting the user case of makeRelableType is interesting! But using the Oid directly looks more promising for this question IMO, it looks like: "you said we can put anything in this arg, so I put an OID const here", seems nothing is wrong. Compared with the makeRelableType method, I think the current method is more straightforward. Compared with anyelement, it avoids the creation of makeDummyConst which I'm not sure the implementation is alway correct. So I am pretty inclined to this way! v10 attached. -- Best Regards Andy Fan v10-0001-optimize-casting-jsonb-to-a-given-type.patch Description: Binary data v10-0002-convert-anyelement-to-internal.patch Description: Binary data
Re: PostgreSQL 16 RC1 + GA release dates
On 8/20/23 10:50 AM, Tom Lane wrote: "Jonathan S. Katz" writes: The date for PostgreSQL 16 Release Candidate 1 (RC1) is August 31, 2023. Please ensure all open items[1] are completed and committed before August 26, 2023 12:00 UTC. FYI, I moved the "Oversight in reparameterize_path_by_child leading to executor crash" open item to the "Older bugs affecting stable branches" section, because it is in fact an old bug: the given test case crashes in every branch that has enable_partitionwise_join. I'll still look at getting in the fix before RC1, but we should understand what we're dealing with. [RMT hat] Thanks -- appreciative of the accurate record keeping. Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: Adding a LogicalRepWorker type field
On Fri, Aug 18, 2023 at 6:16 PM Zhijie Hou (Fujitsu) wrote: > > On Friday, August 18, 2023 11:20 AM Amit Kapila > wrote: > > > > On Mon, Aug 14, 2023 at 12:08 PM Peter Smith > > wrote: > > > > > > The main patch for adding the worker type enum has been pushed [1]. > > > > > > Here is the remaining (rebased) patch for changing some previous > > > cascading if/else to switch on the LogicalRepWorkerType enum instead. > > > > > > > I see this as being useful if we plan to add more worker types. Does anyone > > else > > see this remaining patch as an improvement? > > +1 > > I have one comment for the new error message. > > + case WORKERTYPE_UNKNOWN: > + ereport(ERROR, > errmsg_internal("should_apply_changes_for_rel: Unknown worker type")); > > I think reporting an ERROR in this case is fine. However, I would suggest > refraining from mentioning the function name in the error message, as > recommended in the error style document [1]. Also, it appears we could use > elog() here. > > [1] https://www.postgresql.org/docs/devel/error-style-guide.html OK. Modified as suggested. Anyway, getting these errors should not even be possible, so I added a comment to emphasise that. PSA v9 -- Kind Regards, Peter Smith. Fujitsu Australia v9-0001-Switch-on-worker-type.patch Description: Binary data
Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue
On Fri, Aug 18, 2023 at 08:49:16AM +0900, Michael Paquier wrote: > After sleeping on it, I think that I'd just agree with Robert's point > to just use the same language as the message, while also agreeing with > the patch to not set MyClientConnectionInfo.authn_id in the uaTrust > case, only logging something under log_connections. > > +* No authentication was actually performed; this happens e.g. when > the > +* trust method is in use. > > This comment should be reworded a bit, say "No authentication identity > was set; blah ..". Attached is a v3 to do these two things, with adjustments for two SSL tests. Any objections about it? (Note: no backpatch) -- Michael From 13b4c6fe0388bb4eec1d8aab6a25192ff7ad0684 Mon Sep 17 00:00:00 2001 From: Jacob Champion Date: Wed, 16 Aug 2023 14:48:49 -0700 Subject: [PATCH v3] log_connections: add entries for "trust" connections Allow DBAs to audit unexpected "trust" connections. Previously, you had to notice the absence of a "connection authenticated" line between the "connection received" and "connection authorized" entries. Now it's called out explicitly. --- src/backend/libpq/auth.c | 15 +++ src/test/authentication/t/001_password.pl | 8 ++-- src/test/ssl/t/001_ssltests.pl| 4 ++-- 3 files changed, 23 insertions(+), 4 deletions(-) diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c index 315a24bb3f..1a084d21de 100644 --- a/src/backend/libpq/auth.c +++ b/src/backend/libpq/auth.c @@ -645,6 +645,21 @@ ClientAuthentication(Port *port) #endif } + if (Log_connections && (status == STATUS_OK) && + !MyClientConnectionInfo.authn_id) + { + /* + * No authentication identity was set; this happens e.g. when the + * trust method is in use. For audit purposes, log a breadcrumb to + * explain where in the HBA this happened. + */ + ereport(LOG, +errmsg("connection authenticated: user=\"%s\" method=%s " + "(%s:%d)", + port->user_name, hba_authname(port->hba->auth_method), + port->hba->sourcefile, port->hba->linenumber)); + } + if (ClientAuthentication_hook) (*ClientAuthentication_hook) (port, status); diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl index 12552837a8..4402fa7721 100644 --- a/src/test/authentication/t/001_password.pl +++ b/src/test/authentication/t/001_password.pl @@ -140,9 +140,13 @@ $node->safe_psql('postgres', "CREATE database regex_testdb;"); # considered to be authenticated. reset_pg_hba($node, 'all', 'all', 'trust'); test_conn($node, 'user=scram_role', 'trust', 0, - log_unlike => [qr/connection authenticated:/]); + log_like => [ + qr/connection authenticated: user="scram_role" method=trust/ + ]); test_conn($node, 'user=md5_role', 'trust', 0, - log_unlike => [qr/connection authenticated:/]); + log_like => [ + qr/connection authenticated: user="md5_role" method=trust/ + ]); # SYSTEM_USER is null when not authenticated. $res = $node->safe_psql('postgres', "SELECT SYSTEM_USER IS NULL;"); diff --git a/src/test/ssl/t/001_ssltests.pl b/src/test/ssl/t/001_ssltests.pl index 76442de063..81e36a4e88 100644 --- a/src/test/ssl/t/001_ssltests.pl +++ b/src/test/ssl/t/001_ssltests.pl @@ -801,7 +801,7 @@ $node->connect_ok( . sslkey('client.key'), "auth_option clientcert=verify-full succeeds with matching username and Common Name", # verify-full does not provide authentication - log_unlike => [qr/connection authenticated:/],); + log_like => [qr/connection authenticated: user="ssltestuser" method=trust/],); $node->connect_fails( "$common_connstr user=anotheruser sslcert=ssl/client.crt " @@ -819,7 +819,7 @@ $node->connect_ok( . sslkey('client.key'), "auth_option clientcert=verify-ca succeeds with mismatching username and Common Name", # verify-full does not provide authentication - log_unlike => [qr/connection authenticated:/],); + log_like => [qr/connection authenticated: user="yetanotheruser" method=trust/],); # intermediate client_ca.crt is provided by client, and isn't in server's ssl_ca_file switch_server_cert($node, certfile => 'server-cn-only', cafile => 'root_ca'); -- 2.40.1 signature.asc Description: PGP signature
Re: POC, WIP: OR-clause support for indexes
On Thu, Aug 17, 2023 at 3:08 AM a.rybakina wrote: > But now I see an interesting transformation, which was the most interesting > for me. > > EXPLAIN (COSTS OFF) SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 > OR tenthous = 3 OR tenthous = 42); It would be even more interesting if it could be an index-only scan as a result of the transformation. For example, we could use an index-only scan with this query (once your patch was in place): "SELECT thousand, tenthous FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42)" Index-only scans were the original motivation for adding native ScalarArrayExprOp support to nbtree (in 2011 commit 9e8da0f7), in fact. As I suggested earlier, I suspect that there is too much planner logic that targets BitmapOrs specifically -- maybe even selectivity estimation/restrictinfo stuff. PS I wonder if the correctness issues that you saw could be related to eval_const_expressions(), since "the planner assumes that this [eval_const_expressions] will always flatten nested AND and OR clauses into N-argument form". See its subroutines simplify_or_arguments() and simplify_and_arguments(). -- Peter Geoghegan
Re: POC, WIP: OR-clause support for indexes
On Sun, Aug 20, 2023 at 3:11 PM Peter Geoghegan wrote: > Back in 2003, commit 9888192f removed (or at least simplified) what > were then called "CNF/DNF CONVERSION ROUTINES". Prior to that point > the optimizer README had something about leaving clause lists > un-normalized leading to selectivity estimation problems. Bear in mind > that this is a couple of years before ScalarArrayOpExpr was first > invented. Apparently even back then "The OR-of-ANDs format is useful > for indexscan implementation". It's possible that that old work will > offer some hints on what to do now. There was actually support for OR lists in index AMs prior to ScalarArrayOpExpr. Even though ScalarArrayOpExpr don't really seem all that related to bitmap scans these days (since at least nbtree knows how to execute them "natively"), that wasn't always the case. ScalarArrayOpExpr were invented the same year that bitmap index scans were first added (2005), and seem more or less related to that work. See commits bc843d39, 5b051852, 1e9a6ba5, and 290166f9 (all from 2005). Particularly the last one, which has a commit message that heavily suggests that my interpretation is correct. I think that we currently over-rely on BitmapOr for OR clauses. It's useful that they're so general, of course, but ISTM that we shouldn't even try to use a BitmapOr in simple cases. Things like the "WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42)" tenk1 query that you brought up probably shouldn't even have a BitmapOr path (which I guess they don't with you patch). Note that I recently discussed the same query at length with Tomas Vondra on the ongoing thread for his index filter patch (you probably knew that already). -- Peter Geoghegan
Re: POC, WIP: OR-clause support for indexes
On Thu, Aug 17, 2023 at 3:08 AM a.rybakina wrote: > This is an interesting feature. I didn't notice this function before, I > studied many times consider_new_or_cause, which were called there. As far as > I know, there is a selectivity calculation going on there, but as far as I > remember, I called it earlier after my conversion, and unfortunately it > didn't solve my problem with calculating selectivity. I'll reconsider it > again, maybe I can find something I missed. Back in 2003, commit 9888192f removed (or at least simplified) what were then called "CNF/DNF CONVERSION ROUTINES". Prior to that point the optimizer README had something about leaving clause lists un-normalized leading to selectivity estimation problems. Bear in mind that this is a couple of years before ScalarArrayOpExpr was first invented. Apparently even back then "The OR-of-ANDs format is useful for indexscan implementation". It's possible that that old work will offer some hints on what to do now. In a way it's not surprising that work in this area would have some impact on selectivies. The surprising part is the extent of the problem, I suppose. I see that a lot of the things in this area are just used by BitmapOr clauses, such as build_paths_for_OR() -- but you're not necessarily able to use any of that stuff. Also, choose_bitmap_and() has some stuff about how it compensates to avoid "too-small selectivity that makes a redundant AND step look like it reduces the total cost". It also mentions some problems with match_join_clauses_to_index() + extract_restriction_or_clauses(). Again, this might be a good place to look for more clues. -- Peter Geoghegan
Re: UUID v7
> On 30 Jul 2023, at 13:08, Andrey M. Borodin wrote: > > > After discussion on GitHub with Sergey Prokhorenko [0] I understood that > counter is optional, but useful part of UUID v7. It actually promotes > sortability of data generated at high speed. > The standard does not specify how big counter should be. PFA patch with 16 > bit counter. Maybe it worth doing 18bit counter - it will save us one byte of > PRNG data. Currently we only take 2 bits out of the whole random byte. > Here's a new patch version. Now counter is initialised with strong random on every time change (each ms). However, one first bit of the counter is preserved to zero. This is done to extend counter capacity (I left comments with reference to RFC with explanations). Thanks! Best regards, Andrey Borodin. v4-0001-Implement-UUID-v7-as-per-IETF-draft.patch Description: Binary data
datetime from a JsonbValue
Hi, Thread [1] concerns (generalizing slightly) the efficient casting to an SQL type of the result of a jsonb extracting operation (array indexing, object keying, path evaluation) that has ended with a scalar JsonbValue. So far, it can efficiently rewrite casts to boolean or numeric types. I notice that, since 6dda292, JsonbValue includes a datetime scalar member. As far as I can tell, the only jsonb extracting operations that might be capable of producing such a JsonbValue would be jsonb_path_query(_first)?(_tz)? with a path ending in .datetime(). If casts existed from jsonb to date/time types, then the same techniques used in [1] would be able to rewrite such casts, eliding the JsonbValueToJsonb and subsequent reconversion via text. But no such casts seem to exist, providing nothing to hang the optimization on. (And, after all, 6dda292 says "These datetime values are allowed for temporary representation only. During serialization datetime values are converted into strings.") Perhaps it isn't worth supplying such casts. The value is held as text within jsonb, so .datetime() in a jsonpath had to parse it. One might lament the extra serialization and reparsing if that path query result goes through ::text::timestamp, but then simply leaving .datetime() off of the jsonpath in the first place would have left the parsing to be done just once by ::timestamp. Optimizable casts might be of more interest if the jsonpath language had more operations on datetimes, so that you might efficiently retrieve the result of some arbitrary expression in the path, not just a literal datetime value that has to get parsed in one place or another anyway. I haven't looked into SQL/JSON to see what it provides in terms of casts to SQL types. I'm more familiar with SQL/XML, which does provide XMLCAST, which can take an XML source and SQL date/time target, and does the equivalent of an XML Query ending in "cast as xs:dateTime" and assigns that result to the SQL type (with some time zone subtleties rather carefully specified). So I might assume SQL/JSON has something analogous? On the other hand, XML Query does offer more operations on date/time values, which may, as discussed above, make such a cast more interesting to have around. Thoughts? Regards, -Chap [1] https://www.postgresql.org/message-id/flat/CAKU4AWoqAVya6PBhn+BCbFaBMt3z-2=i5fKO3bW=6hphbid...@mail.gmail.com
Re: Oversight in reparameterize_path_by_child leading to executor crash
I looked over the v3 patch. I think it's going in generally the right direction, but there is a huge oversight: path_is_reparameterizable_by_child does not come close to modeling the success/failure conditions of reparameterize_path_by_child. In all the cases where reparameterize_path_by_child can recurse, path_is_reparameterizable_by_child has to do so also, to check whether the child path is reparameterizable. (I'm somewhat disturbed that apparently we have no test cases that caught that oversight; can we add one cheaply?) BTW, I also note from the cfbot that 9e9931d2b broke this patch by adding more ADJUST_CHILD_ATTRS calls that need to be modified. I wonder if we could get away with having that macro cast to "void *" to avoid needing to change all its call sites. I'm not sure whether pickier compilers might warn about doing it that way. regards, tom lane
Re: [PATCH] Add function to_oct
On Sat, Aug 19, 2023 at 08:35:46AM +0100, Dean Rasheed wrote: > I note that there are no tests for negative inputs. I added some in v8. > Doing a quick test, shows that this changes the current behaviour, > because all inputs are now treated as 64-bit: > > HEAD: > > select to_hex((-1234)::int); > to_hex > -- > fb2e > > With patch: > > select to_hex((-1234)::int); > to_hex > -- > fb2e Good catch. In v8, I fixed this by first casting the input to uint32 for the 32-bit versions of the functions. This prevents the conversion to uint64 from setting the rest of the bits. AFAICT this behavior is pretty well defined in the standard. > The way that negative inputs are handled really should be documented, > or at least it should include a couple of examples. I used your suggestion and noted that the output is the two's complement representation [0]. [0] https://postgr.es/m/CAEZATCVbkL1ynqpsKiTDpch34%3DSCr5nnau%3DnfNmiy2nM3SJHtw%40mail.gmail.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From a03c06d0b26ccbd49bda55c2efab565ab9fa90db Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 25 Jul 2023 16:09:01 -0700 Subject: [PATCH v8 1/1] add to_binary() and to_oct() --- doc/src/sgml/func.sgml| 47 ++- src/backend/utils/adt/varlena.c | 86 +++ src/include/catalog/pg_proc.dat | 12 src/test/regress/expected/strings.out | 62 ++- src/test/regress/sql/strings.sql | 15 - 5 files changed, 192 insertions(+), 30 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index be2f54c914..835c08905e 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -3737,6 +3737,50 @@ repeat('Pg', 4) PgPgPgPg + + + + to_binary + +to_binary ( integer ) +text + + +to_binary ( bigint ) +text + + +Converts the number to its equivalent two's complement binary +representation. + + +to_binary(2147483647) +111 + + + + + + + to_oct + +to_oct ( integer ) +text + + +to_oct ( bigint ) +text + + +Converts the number to its equivalent two's complement octal +representation. + + +to_oct(2147483647) +177 + + + @@ -3750,7 +3794,8 @@ repeat('Pg', 4) PgPgPgPg text -Converts the number to its equivalent hexadecimal representation. +Converts the number to its equivalent two's complement hexadecimal +representation. to_hex(2147483647) diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index b1ec5c32ce..cac3577480 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -4919,53 +4919,87 @@ array_to_text_internal(FunctionCallInfo fcinfo, ArrayType *v, return result; } -#define HEXBASE 16 /* - * Convert an int32 to a string containing a base 16 (hex) representation of - * the number. + * Workhorse for to_binary, to_oct, and to_hex. Note that base must be > 1 and + * <= 16. */ -Datum -to_hex32(PG_FUNCTION_ARGS) +static inline text * +convert_to_base(uint64 value, int base) { - uint32 value = (uint32) PG_GETARG_INT32(0); - char *ptr; const char *digits = "0123456789abcdef"; - char buf[32]; /* bigger than needed, but reasonable */ - ptr = buf + sizeof(buf) - 1; - *ptr = '\0'; + /* We size the buffer for to_binary's longest possible return value. */ + char buf[sizeof(uint64) * BITS_PER_BYTE]; + char *const end = buf + sizeof(buf); + char *ptr = end; + + Assert(base > 1); + Assert(base <= 16); do { - *--ptr = digits[value % HEXBASE]; - value /= HEXBASE; + *--ptr = digits[value % base]; + value /= base; } while (ptr > buf && value); - PG_RETURN_TEXT_P(cstring_to_text(ptr)); + return cstring_to_text_with_len(ptr, end - ptr); +} + +/* + * Convert an integer to a string containing a base-2 (binary) representation + * of the number. + */ +Datum +to_binary32(PG_FUNCTION_ARGS) +{ + uint64 value = (uint32) PG_GETARG_INT32(0); + + PG_RETURN_TEXT_P(convert_to_base(value, 2)); +} +Datum +to_binary64(PG_FUNCTION_ARGS) +{ + uint64 value = (uint64) PG_GETARG_INT64(0); + + PG_RETURN_TEXT_P(convert_to_base(value, 2)); } /* - * Convert an int64 to a string containing a base 16 (hex) representation of + * Convert an integer to a string containing a base-8 (oct) representation of * the number. */ Datum -to_hex64(PG_FUNCTION_ARGS) +to_oct32(PG_FUNCTION_ARGS) +{ + uint64 value = (uint32) PG_GETARG_INT32(0); + + PG_RETURN_TEXT_P(convert_to_base(value, 8)); +} +Datum +to_oct64(PG_FUNCTION_ARGS) { uint64
Re: [PoC] pg_upgrade: allow to upgrade publisher node
On Fri, Aug 18, 2023 at 10:51 PM Hayato Kuroda (Fujitsu) wrote: > > Dear Peter, > > PSA new version patch set. > I've looked at the v22 patch set, and here are some comments: 0001: Do we need regression tests to make sure that the slot's confirmed_flush_lsn matches the LSN of the latest shutdown_checkpoint record? 0002: + +Prepare for publisher upgrades + Should this step be done before "8. Stop both servers" as it might require to disable subscriptions and to drop 'lost' replication slots? Why is there no explanation about the slots' confirmed_flush_lsn check as prerequisites? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: [PATCH] Add function to_oct
On Sat, Aug 19, 2023 at 11:41:56AM +0700, John Naylor wrote: > This looks nicer, but still doesn't save the starting pointer, and so needs > to lug around that big honking macro. This is what I mean: > > static inline text * > convert_to_base(uint64 value, int base) > { > const char *digits = "0123456789abcdef"; > /* We size the buffer for to_binary's longest possible return value. */ > charbuf[sizeof(uint64) * BITS_PER_BYTE]; > char * const end = buf + sizeof(buf); > char *ptr = end; > > Assert(base > 1); > Assert(base <= 16); > > do > { > *--ptr = digits[value % base]; > value /= base; > } while (ptr > buf && value); > > return cstring_to_text_with_len(ptr, end - ptr); > } I will use this in v8. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: PostgreSQL 16 RC1 + GA release dates
"Jonathan S. Katz" writes: > The date for PostgreSQL 16 Release Candidate 1 (RC1) is August 31, 2023. > Please ensure all open items[1] are completed and committed before > August 26, 2023 12:00 UTC. FYI, I moved the "Oversight in reparameterize_path_by_child leading to executor crash" open item to the "Older bugs affecting stable branches" section, because it is in fact an old bug: the given test case crashes in every branch that has enable_partitionwise_join. I'll still look at getting in the fix before RC1, but we should understand what we're dealing with. regards, tom lane
Re: [PoC] pg_upgrade: allow to upgrade publisher node
On Thu, Aug 17, 2023 at 10:31 PM Amit Kapila wrote: > > On Thu, Aug 17, 2023 at 6:07 PM Masahiko Sawada wrote: > > > > On Tue, Aug 15, 2023 at 12:06 PM Amit Kapila > > wrote: > > > > > > On Tue, Aug 15, 2023 at 7:51 AM Masahiko Sawada > > > wrote: > > > > > > > > On Mon, Aug 14, 2023 at 2:07 PM Amit Kapila > > > > wrote: > > > > > > > > > > On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada > > > > > wrote: > > > > > > Another idea is (which might have already discussed thoguh) that we > > > > > > check if the latest shutdown checkpoint LSN in the control file > > > > > > matches the confirmed_flush_lsn in pg_replication_slots view. That > > > > > > way, we can ensure that the slot has consumed all WAL records > > > > > > before the last shutdown. We don't need to worry about WAL records > > > > > > generated after starting the old cluster during the upgrade, at > > > > > > least for logical replication slots. > > > > > > > > > > > > > > > > Right, this is somewhat closer to what Patch is already doing. But > > > > > remember in this case we need to remember and use the latest > > > > > checkpoint from the control file before the old cluster is started > > > > > because otherwise the latest checkpoint location could be even updated > > > > > during the upgrade. So, instead of reading from WAL, we need to change > > > > > so that we rely on the control file's latest LSN. > > > > > > > > Yes, I was thinking the same idea. > > > > > > > > But it works for only replication slots for logical replication. Do we > > > > want to check if no meaningful WAL records are generated after the > > > > latest shutdown checkpoint, for manually created slots (or non-logical > > > > replication slots)? If so, we would need to have something reading WAL > > > > records in the end. > > > > > > > > > > This feature only targets logical replication slots. I don't see a > > > reason to be different for manually created logical replication slots. > > > Is there something particular that you think we could be missing? > > > > Sorry I was not clear. I meant the logical replication slots that are > > *not* used by logical replication, i.e., are created manually and used > > by third party tools that periodically consume decoded changes. As we > > discussed before, these slots will never be able to pass that > > confirmed_flush_lsn check. > > > > I think normally one would have a background process to periodically > consume changes. Won't one can use the walsender infrastructure for > their plugins to consume changes probably by using replication > protocol? Not sure. > Also, I feel it is the plugin author's responsibility to > consume changes or advance slot to the required position before > shutdown. How does the plugin author ensure that the slot consumes all WAL records including shutdown_checkpoint before shutdown? > > > After some thoughts, one thing we might > > need to consider is that in practice, the upgrade project is performed > > during the maintenance window and has a backup plan that revert the > > upgrade process, in case something bad happens. If we require the > > users to drop such logical replication slots, they cannot resume to > > use the old cluster in that case, since they would need to create new > > slots, missing some changes. > > > > Can't one keep the backup before removing slots? Yes, but restoring the back could take time. > > > Other checks in pg_upgrade seem to be > > compatibility checks that would eventually be required for the upgrade > > anyway. Do we need to consider this case? For example, we do that > > confirmed_flush_lsn check for only the slots with pgoutput plugin. > > > > I think one is allowed to use pgoutput plugin even for manually > created slots. So, such a check may not work. Right, but I thought it's a very rare case. Since the slot's flushed_confirmed_lsn check is not a compatibility check unlike the existing check, I wonder if we can make it optional. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: WIP: new system catalog pg_wait_event
On Sat, Aug 19, 2023 at 06:30:12PM +0200, Drouvot, Bertrand wrote: > Thanks, fixed in v10. Okay. I have done an extra review of it, simplifying a few things in the function, the comments and the formatting, and applied the patch. Thanks! I have somewhat managed to miss the catalog version in the initial commit, but fixed that quickly. -- Michael signature.asc Description: PGP signature