Re: Assert in pageinspect with NULL pages
18.02.2022 08:00, Justin Pryzby пишет: BRIN can also crash if passed a non-brin index. I've been sitting on this one for awhile. Feel free to include it in your patchset. commit 08010a6037fc4e24a9ba05e5386e766f4310d35e Author: Justin Pryzby Date: Tue Jan 19 00:25:15 2021 -0600 pageinspect: brin_page_items(): check that given relation is not only an index, but a brin one diff --git a/contrib/pageinspect/brinfuncs.c b/contrib/pageinspect/brinfuncs.c index f1e64a39ef2..3de6dd943ef 100644 --- a/contrib/pageinspect/brinfuncs.c +++ b/contrib/pageinspect/brinfuncs.c @@ -16,6 +16,7 @@ #include "access/brin_tuple.h" #include "access/htup_details.h" #include "catalog/index.h" +#include "catalog/pg_am.h" #include "catalog/pg_type.h" #include "funcapi.h" #include "lib/stringinfo.h" @@ -59,7 +60,7 @@ brin_page_type(PG_FUNCTION_ARGS) if (raw_page_size != BLCKSZ) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), -errmsg("input page too small"), +errmsg("input page wrong size"), errdetail("Expected size %d, got %d", BLCKSZ, raw_page_size))); @@ -97,7 +98,7 @@ verify_brin_page(bytea *raw_page, uint16 type, const char *strtype) if (raw_page_size != BLCKSZ) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), -errmsg("input page too small"), +errmsg("input page wrong size"), errdetail("Expected size %d, got %d", BLCKSZ, raw_page_size))); @@ -169,7 +170,14 @@ brin_page_items(PG_FUNCTION_ARGS) MemoryContextSwitchTo(oldcontext); indexRel = index_open(indexRelid, AccessShareLock); - bdesc = brin_build_desc(indexRel); + + /* Must be a BRIN index */ + if (indexRel->rd_rel->relkind != RELKIND_INDEX || + indexRel->rd_rel->relam != BRIN_AM_OID) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), +errmsg("\"%s\" is not a BRIN index", +RelationGetRelationName(indexRel; /* minimally verify the page we got */ page = verify_brin_page(raw_page, BRIN_PAGETYPE_REGULAR, "regular"); @@ -178,6 +186,7 @@ brin_page_items(PG_FUNCTION_ARGS) * Initialize output functions for all indexed datatypes; simplifies * calling them later. */ + bdesc = brin_build_desc(indexRel); columns = palloc(sizeof(brin_column_state *) * RelationGetDescr(indexRel)->natts); for (attno = 1; attno <= bdesc->bd_tupdesc->natts; attno++) { Thanks! This is a very valuable note. I think I will definitely add it to the next version of the patch, as soon as I deal with the error exception. -- Daria Lepikhova Postgres Professional
Re: Assert in pageinspect with NULL pages
First of all, thanks for the review and remarks! 18.02.2022 08:02, Michael Paquier пишет: On Thu, Feb 17, 2022 at 05:40:41PM +0800, Julien Rouhaud wrote: About the patch, it's incorrectly using a hardcoded 8192 block-size rather than using the computed :block_size (or computing one when it's not already the case). Well, the tests of pageinspect fail would already fail when using a different page size than 8k, like the checksum ones :) Anywa, I agree with your point that if this is not a reason to not make the tests more portable if we can do easily. Fixed. I'm also wondering if it wouldn't be better to return NULL rather than throwing an error for uninitialized pages. Agreed that this is a sensible choice. NULL would be helpful for the case where one compiles all the checksums of a relation with a full scan based on the relation size, for example. All these behaviors ought to be documented properly, as well. For SRFs, this should just return no rows rather than generating an error. So, I tried to implement this remark. However, further getting rid of throwing an error and replacing it with a NULL return led to reproduce the problem that Alexander Lakhin mentioned And here I need little more time to figure it out. And one more addition. In the previous version of the patch, I forgot to add tests for the gist index, but the described problem is also relevant for it. -- Daria Lepikhova Postgres Professional From 7dbfa7f01bf485d2812b24af012721b4a1e1e785 Mon Sep 17 00:00:00 2001 From: "d.lepikhova" Date: Tue, 22 Feb 2022 16:25:31 +0500 Subject: [PATCH] Add checks for null pages into pageinspect --- contrib/pageinspect/brinfuncs.c | 10 ++ contrib/pageinspect/btreefuncs.c | 10 ++ contrib/pageinspect/expected/brin.out | 13 + contrib/pageinspect/expected/btree.out| 5 + contrib/pageinspect/expected/checksum.out | 7 +++ contrib/pageinspect/expected/gin.out | 7 +++ contrib/pageinspect/expected/gist.out | 5 + contrib/pageinspect/expected/hash.out | 9 + contrib/pageinspect/rawpage.c | 10 ++ contrib/pageinspect/sql/brin.sql | 6 ++ contrib/pageinspect/sql/btree.sql | 3 +++ contrib/pageinspect/sql/checksum.sql | 3 +++ contrib/pageinspect/sql/gin.sql | 5 + contrib/pageinspect/sql/gist.sql | 5 + contrib/pageinspect/sql/hash.sql | 6 ++ 15 files changed, 104 insertions(+) diff --git a/contrib/pageinspect/brinfuncs.c b/contrib/pageinspect/brinfuncs.c index 50892b5cc20..a2fee13e2d1 100644 --- a/contrib/pageinspect/brinfuncs.c +++ b/contrib/pageinspect/brinfuncs.c @@ -63,6 +63,10 @@ brin_page_type(PG_FUNCTION_ARGS) errdetail("Expected size %d, got %d", BLCKSZ, raw_page_size))); + /* check that the page is initialized before accessing any fields */ + if (PageIsNew(page)) + PG_RETURN_NULL(); + switch (BrinPageType(page)) { case BRIN_PAGETYPE_META: @@ -103,6 +107,12 @@ verify_brin_page(bytea *raw_page, uint16 type, const char *strtype) page = VARDATA(raw_page); + /* check that the page is initialized before accessing any fields */ + if (PageIsNew(page)) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("input page is empty"))); + /* verify the special space says this page is what we want */ if (BrinPageType(page) != type) ereport(ERROR, diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c index 03debe336ba..5ea7af5b998 100644 --- a/contrib/pageinspect/btreefuncs.c +++ b/contrib/pageinspect/btreefuncs.c @@ -519,6 +519,12 @@ bt_page_items_internal(PG_FUNCTION_ARGS, enum pageinspect_version ext_version) UnlockReleaseBuffer(buffer); relation_close(rel, AccessShareLock); + /* check that the page is initialized before accessing any fields */ + if (PageIsNew(uargs->page)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("input page is empty"))); + uargs->offset = FirstOffsetNumber; opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page); @@ -615,6 +621,10 @@ bt_page_items_bytea(PG_FUNCTION_ARGS) uargs->page = VARDATA(raw_page); + /* check that the page is initialized before accessing any fields */ + if (PageIsNew(uargs->page)) + PG_RETURN_NULL(); + uargs->offset = FirstOffsetNumber; opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page); diff --git a/contrib/pageinspect/expected/brin.out b/contrib/pageinspect/expected/brin.out index 71eb190380c..2839ee9435b 100644 --- a/contrib/pageinspect/expected/brin.out +++ b/contrib/pageinspect/expected/brin.out @@ -48,4 +48,17 @@ SELECT * FROM brin_page_items(get_raw_page('test1_a_idx', 2), 'test1_a_idx') 1 | 0 | 1 | f| f| f | {1 .. 1} (1 row) +-- Check that all functions fail gracefully with an empty page imput +select brin_page_type(repeat(E'\\000', c
Re: Failed transaction statistics to measure the logical replication progress
On Tue, Feb 22, 2022 at 6:45 AM tanghy.f...@fujitsu.com wrote: > > I found a problem when using it. When a replication workers exits, the > transaction stats should be sent to stats collector if they were not sent > before > because it didn't reach PGSTAT_STAT_INTERVAL. But I saw that the stats weren't > updated as expected. > > I looked into it and found that the replication worker would send the > transaction stats (if any) before it exits. But it got invalid subid in > pgstat_send_subworker_xact_stats(), which led to the following result: > > postgres=# select pg_stat_get_subscription_worker(0, null); > pg_stat_get_subscription_worker > - > (0,,2,0,00,"",) > (1 row) > > I think that's because subid has already been cleaned when trying to send the > stats. I printed the value of before_shmem_exit_list, the functions in this > list > would be called in shmem_exit() when the worker exits. > logicalrep_worker_onexit() would clean up the worker info (including subid), > and > pgstat_shutdown_hook() would send stats if any. logicalrep_worker_onexit() > was > called before calling pgstat_shutdown_hook(). > Yeah, I think that is a problem and maybe we can think of solving it by sending the stats via logicalrep_worker_onexit before subid is cleared but not sure that is a good idea. I feel we need to go back to the idea of v21 for sending stats instead of using pgstat_report_stat. I think the one thing which we could improve is to avoid trying to send it each time before receiving each message by walrcv_receive and rather try to send it before we try to wait (WaitLatchOrSocket). Trying after each message doesn't seem to be required and could lead to some overhead as well. What do you think? -- With Regards, Amit Kapila.
Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit
On Tue, Feb 22, 2022 at 1:03 AM Fujii Masao wrote: > On 2022/02/21 14:45, Etsuro Fujita wrote: > > On Fri, Feb 18, 2022 at 1:46 AM Fujii Masao > > wrote: > >> I reviewed 0001 patch. It looks good to me except the following minor > >> things. If these are addressed, I think that the 001 patch can be marked > >> as ready for committer. > >> +* Also determine to commit (sub)transactions opened on the remote > >> server > >> +* in parallel at (sub)transaction end. > >> > >> Like the comment "Determine whether to keep the connection ...", > >> "determine to commit" should be "determine whether to commit"? > > > > Agreed. I’ll change it as such. Done. > Thanks! If that's updated, IMO it's ok to commit the 0001 patch. Cool! Attached is an updated patch. Other changes other than that: 1) I added the curlevel parameter to pgfdw_finish_pre_subcommit_cleanup() to avoid doing GetCurrentTransactionNestLevel() there, as proposed, and 2) tweaked comments a bit further, mostly for/in pgfdw_finish_pre_commit_cleanup() and pgfdw_finish_pre_subcommit_cleanup(). Barring objections, I’ll commit the patch. Thanks for reviewing! Best regards, Etsuro Fujita v5-0001-postgres-fdw-Add-support-for-parallel-commit.patch Description: Binary data
Handle infinite recursion in logical replication setup
Hi, In logical replication, currently Walsender sends the data that is generated locally and the data that are replicated from other instances. This results in infinite recursion in circular logical replication setup. Here the user is trying to have a 2-way replication setup with node 1 publishing data to node2 and node2 publishing data to node1, so that the user can perform dml operations from any node, it can act as a 2-way multi master replication setup. This problem can be reproduced with the following steps: -- Instance 1 create publication pub1 for table t1; create table t1(c1 int); -- Instance 2 create table t1(c1 int); create publication pub2 for table t1; create subscription sub1 CONNECTION 'dbname=postgres port=5432' publication pub1; -- Instance 1 create subscription sub2 CONNECTION 'dbname=postgres port=5433' publication pub2; insert into t1 values(10); In this scenario, the Walsender in publisher pub1 sends data to the apply worker in subscriber sub1, the apply worker in sub1 maps the data to local tables and applies the individual changes as they are received. Then the Walsender in publisher pub2 sends data to the apply worker in subscriber sub2, the apply worker in sub2 maps the data to local tables and applies the individual changes as they are received. This process repeats infinitely. Currently we do not differentiate if the data is locally generated data, or a replicated data and we send both the data which causes infinite recursion. We could see that the record count has increased significantly within sometime: select count(*) from t1; count -- 400 (1 row) If the table had primary key constraint, we could notice that the first insert is successful and when the same insert is sent back, the insert fails because of constraint error: 2022-02-23 09:28:43.592 IST [14743] ERROR: duplicate key value violates unique constraint "t1_pkey" 2022-02-23 09:28:43.592 IST [14743] DETAIL: Key (c1)=(10) already exists. 2022-02-23 09:28:43.592 IST [14743] CONTEXT: processing remote data during "INSERT" for replication target relation "public.t1" in transaction 727 at 2022-02-23 09:28:43.406738+05:30 2022-02-23 09:28:43.593 IST [14678] LOG: background worker "logical replication worker" (PID 14743) exited with exit code 1 2022-02-23 09:28:48.608 IST [14745] LOG: logical replication apply worker for subscription "sub2" has started 2022-02-23 09:28:48.624 IST [14745] ERROR: duplicate key value violates unique constraint "t1_pkey" 2022-02-23 09:28:48.624 IST [14745] DETAIL: Key (c1)=(10) already exists. 2022-02-23 09:28:48.624 IST [14745] CONTEXT: processing remote data during "INSERT" for replication target relation "public.t1" in transaction 727 at 2022-02-23 09:28:43.406738+05:30 2022-02-23 09:28:48.626 IST [14678] LOG: background worker "logical replication worker" (PID 14745) exited with exit code 1 The same problem can occur in any circular node setup like 3 nodes, 4node etc like: a) node1 publishing to node2 b) node2 publishing to node3 c) node3 publishing back to node1. Here there are two problems for the user: a) incremental synchronization of table sending both local data and replicated data by walsender b) Table synchronization of table using copy command sending both local data and replicated data For the first problem "Incremental synchronization of table by Walsender" can be solved by: Currently the locally generated data does not have replication origin associated and the data that has originated from another instance will have a replication origin associated. We could use this information to differentiate locally generated data and replicated data and send only the locally generated data. This "only_local" could be provided as an option while subscription is created: ex: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=5433' PUBLICATION pub1 with (only_local = on); I have attached a basic patch for this, if the idea is accepted, I will work further to test more scenarios, add documentation, and test and post an updated patch. For the second problem, Table synchronization of table including local data and replicated data using copy command. Let us consider the following scenario: a) node1 publishing to node2 b) node2 publishing to node1. Here in this case node1 will have replicated data from node2 and vice versa. In the above if user wants to include node3 to subscribe data from node2. Users will have to create a subscription in node3 to get the data from node2. During table synchronization we send the complete table data from node2 to node3. Node2 will have local data from node2 and also replicated data from node1. Currently we don't have an option to differentiate between the locally generated data and replicated data in the heap which will cause infinite recursion as described above. To handle this if user has specified only_local option, we could throw a warning or error out while creating subscription in this case, we could have a colum
RE: logical replication empty transactions
On Feb, Wed 23, 2022 at 10:58 PM Ajin Cherian wrote: > Few comments to V19-0001: 1. I think we should adjust the alignment format. git am ../v19-0001-Skip-empty-transactions-for-logical-replication.patch .git/rebase-apply/patch:197: indent with spaces. * Before we send schema, make sure that STREAM START/BEGIN/BEGIN PREPARE .git/rebase-apply/patch:198: indent with spaces. * is sent. If not, send now. .git/rebase-apply/patch:199: indent with spaces. */ .git/rebase-apply/patch:201: indent with spaces. pgoutput_send_stream_start(ctx, toptxn); .git/rebase-apply/patch:204: indent with spaces. pgoutput_begin(ctx, toptxn); warning: 5 lines add whitespace errors. 2. Structure member initialization. static void pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) { + PGOutputTxnData*txndata = MemoryContextAllocZero(ctx->context, + sizeof(PGOutputTxnData)); + + txndata->sent_begin_txn = false; + txn->output_plugin_private = txndata; +} Do we need to set sent_stream_start and sent_any_stream to false here? 3. Maybe we should add Assert(txndata) like function pgoutput_commit_txn in other functions. 4. In addition, I think we should keep a unified style. a). log style (maybe first one is better.) First style : "Skipping replication of an empty transaction in XXX" Second style : "skipping replication of an empty transaction" b) flag name (maybe second one is better.) First style : variable "sent_begin_txn" in function pgoutput_stream_*. Second style : variable "skip" in function pgoutput_commit_txn. Regards, Wang wei
Re: Design of pg_stat_subscription_workers vs pgstats
Below are my review comments just for the v1 patch test code. == 1. "table sync error" versus "tablesync error" +# Wait for the table sync error to be reported. +$node_subscriber->poll_query_until( + 'postgres', + qq[ +SELECT apply_error_count = 0 AND sync_error_count > 0 +FROM pg_stat_subscription_activity +WHERE subname = 'tap_sub' +]) or die "Timed out while waiting for table sync error"; + +# Truncate test_tab1 so that table sync can continue. +$node_subscriber->safe_psql('postgres', "TRUNCATE test_tab1;"); + # Wait for initial table sync for test_tab1 to finish. IMO all these "table sync" should be changed to "tablesync", because a table "sync error" sounds like something completely different to a "tablesync error". SUGGESTIONS - "Wait for the table sync error to be reported." --> "Wait for the tablesync error to be reported." - "Timed out while waiting for table sync error" --> "Timed out while waiting for tablesync error" - "Truncate test_tab1 so that table sync can continue." --> "Truncate test_tab1 so that tablesync worker can fun to completion." - "Wait for initial table sync for test_tab1 to finish." --> "Wait for the tablesync worker of test_tab1 to finish." ~~~ 2. Unnecessary INSERT VALUES (2)? (I think this is a duplicate of what [Osumi] #6 reported) +# Insert more data to test_tab1 on the subscriber and then on the publisher, raising an +# error on the subscriber due to violation of the unique constraint on test_tab1. +$node_subscriber->safe_psql('postgres', "INSERT INTO test_tab1 VALUES (2)"); Why does the test do INSERT data (2)? There is already data (1) from the tablesync which will cause an apply worker PK violation when another VALUES (1) is published. Note, the test comment also needs to change... ~~~ 3. Wait for the apply worker error +# Wait for the apply error to be reported. +$node_subscriber->poll_query_until( + 'postgres', + qq[ +SELECT apply_error_count > 0 AND sync_error_count > 0 +FROM pg_stat_subscription_activity +WHERE subname = 'tap_sub' +]) or die "Timed out while waiting for apply error"; This test is only for apply worker errors. So why is the test SQL checking "AND sync_error_count > 0"? (This is similar to what [Tang] #2 reported, but I think she was referring to the other tablesync test) ~~~ 4. Wrong worker? (looks like a duplicate of what [Osumi] #5 already) + +# Truncate test_tab1 so that table sync can continue. +$node_subscriber->safe_psql('postgres', "TRUNCATE test_tab1;"); $node_subscriber->stop('fast'); $node_publisher->stop('fast'); Cut/paste error? Aren't you doing TRUNCATE here so the apply worker can continue; not the tablesync worker (which already completed) "Truncate test_tab1 so that table sync can continue." --> "Truncate test_tab1 so that the apply worker can continue." -- [Osumi] https://www.postgresql.org/message-id/CAD21AoBRt%3DcyKsZP83rcMkHnT498gHH0TEP34fZBrGCxT-Ahwg%40mail.gmail.com [Tang] https://www.postgresql.org/message-id/TYCPR01MB612840D018FEBD38268CC83BFB3C9%40TYCPR01MB6128.jpnprd01.prod.outlook.com Kind Regards, Peter Smith. Fujitsu Australia
Allow file inclusion in pg_hba and pg_ident files
Hi, I recently had to work on some internal tool which, among other things, has to merge pg_hba.conf and pg_ident.conf files between an existing (possibly already updated) instance and some external repository. My biggest concern is that any issue in either the external repository content or the tool could leave the instance entirely unreachable. There are some protection attemps to avoid that, but really there isn't any practical possibility if one wants to make sure that some of the entries are left alone or not hidden no matter what. To address that, I'd like to propose the possibility to include files in hba and ident configuration files. This was already discussed in the past, and in my understanding this is mostly wanted, while some people expressed concerned on a use case that wouldn't rely on thousands of entries. In my case, there shouldn't be more than a dozen generated pg_hba/pg_ident lines. All I want is to have my main hba (and ident) files do something like: include hba_forbid_non_ssl.conf include hba_superuser.conf include hba_replication.conf include hba_application_generated.conf So that the tool has a way to make sure that it won't prevent dba login or replication, or allow unsecure connections, and can simply rewrite its target file rather than merging it, sorting it with weird rules and deciding which lines are ok to be removed and which one aren't. I'm attaching a patchset that implements $SUBJECT: 0001 adds a new pg_ident_file_mappings view, which is basically the same as pg_hba_file_rules view but for mappings. It's probably already useful, for instance if you need to tweak some regexp. 0002 adds a new "include" directive to both auth files, which will include the given file at this exact position. I chose to do this is in the tokenization rather than the parsing, as it makes things simpler in general, and also allows a single code path for both files. It also adds 2 new columns to the pg_hba_file_rules and pg_ident_file_mappings views: file_name and (rule|mapping)_number, so it's possible to identify where a rule is coming from and more importantly which has the priority over the over. Note that I only added "include", but maybe people would want something like include_dir or include_if_exists too. Both patches have updated documentation, but no test yet. If there's an agreement on the feature, I plan to add some TAP test for it. Finally I also added 0003, which is a POC for a new pg_hba_matches() function, that can help DBA to understand why their configuration isn't working as they expect. This only to start the discussion on that topic, the code is for now really hackish, as I don't know how much this is wanted and/or if some other behavior would be better, and there's also no documentation or test. The function for now only takes an optional inet (null means unix socket), the target role and an optional ssl flag and returns the file, line and raw line matching if any, or null. For instance: =# select * from pg_hba_matches('127.0.0.1'::inet, 'postgres'); -[ RECORD 1 ]- file_name | /tmp/pgbuild/toto.conf line_num | 5 raw_line | hostallall127.0.0.1/32 trust I'm wondering for instance if it would be better to (optionally?) keep iterating over the lines and print all lines that would have matched and in which order, or if the function should return the parsed line information rather than the raw line, although it could make the output worse if using huge secondary auth files. I'm also not sure if the various possible flags (ssl, gss) should be explicitly set or if the function should try various permutations on its own. Note that this function only works against the current configuration files, not the one currently active. As far as I can see PostgresMain gets rid of PostmasterContext, which is where they currently live, so they don't exist anymore in the backends. To make it work we would have to always keep those in each backend, but also reload them along with the main config file on SIGHUP. Except on Windows (and -DEXEC_BACKEND), as the current version of auth files are always used anyway. So would it make sense to add the possibility to use the loaded files instead, given the required extra cost and the fact that it wouldn't make sense on Windows? If yes, I guess that we should also allow that in the pg_hba / pg_ident views. While at it, I noticed this comment added in de16ab72388: > The pg_hba.conf file is read on start-up and when > the main server process receives a > SIGHUPSIGHUP > signal. If you edit the file on an > active system, you will need to signal the postmaster > (using pg_ctl reload, calling the SQL function > pg_reload_conf(), or using kill > -HUP) to make it re-read the file. > [...] > The preceding statement is not true on Microsoft Windows: there, any > changes in the pg_hba.conf file are immediately > applied by subsequent n
Re: pg_stat_get_replication_slot and pg_stat_get_subscription_worker incorrectly marked as proretset
On Tue, Feb 22, 2022 at 8:15 AM Michael Paquier wrote: > > On Mon, Feb 21, 2022 at 05:00:43PM +0530, Amit Kapila wrote: > > On Mon, Feb 21, 2022 at 4:56 PM Michael Paquier wrote: > > > > It is still under discussion. I'll take the necessary action along > > with other things related to that view based on the conclusion on that > > thread. > > Sounds good to me. The patch you are proposing upthread is OK for > me. > Thanks, so you are okay with me pushing that patch just to HEAD. We don't want to backpatch this to 14 as this is a catalog change and won't cause any user-visible issue, is that correct? -- With Regards, Amit Kapila.
Re: Frontend error logging style
On 2022-02-22 22:44:25 -0500, Tom Lane wrote: > Andres Freund writes: > > What about adding a pg_fatal() that's pg_log_fatal() + exit()? That keeps > > pg_log_* stuff "log only", but adds something adjacent enough to hopefully > > reduce future misunderstandings? > > I'd be okay with that, except that pg_upgrade already has a pg_fatal > (because it has its *own* logging system, just in case you thought > this wasn't enough of a mess yet). I'm in favor of aligning > pg_upgrade's logging with the rest, but I'd hoped to leave that for > later. Making the names collide would be bad even as a short-term > thing, I fear. I guess we could name pg_upgrade's out of the way... > I'm not against choosing some name other than pg_log_fatal, but that > particular suggestion has got conflicts. Got any other ideas? Maybe pg_fatal_exit(), pg_exit_fatal() or pg_fatal_error()?
Re: Design of pg_stat_subscription_workers vs pgstats
On Tue, Feb 22, 2022 at 8:02 PM Masahiko Sawada wrote: > > On Tue, Feb 22, 2022 at 6:53 PM Amit Kapila wrote: > > > > > 3. Can we send error stats pgstat_report_stat() as that will be called > > via proc_exit() path. We can set the phase (apply/sync) in > > apply_error_callback_arg and then use that to send the appropriate > > message. I think this will obviate the need for try..catch. > > If we use pgstat_report_stat() to send subscription stats messages, > all processes end up going through that path. It might not bring > overhead in practice but I'd like to avoid it. > I am not sure about overhead but I see another problem if we use that approach. In the exit path, logicalrep_worker_onexit() will get called before pgstat_report_stat() and that will clear the MyLogicalRepWorker->subid, so we won't know the id for which to send stats. So, the way patch is doing seems reasonable to me unless someone has better ideas. -- With Regards, Amit Kapila.
Re: Frontend error logging style
Andres Freund writes: > What about adding a pg_fatal() that's pg_log_fatal() + exit()? That keeps > pg_log_* stuff "log only", but adds something adjacent enough to hopefully > reduce future misunderstandings? I'd be okay with that, except that pg_upgrade already has a pg_fatal (because it has its *own* logging system, just in case you thought this wasn't enough of a mess yet). I'm in favor of aligning pg_upgrade's logging with the rest, but I'd hoped to leave that for later. Making the names collide would be bad even as a short-term thing, I fear. Looks like libpq_pipeline.c has its own pg_fatal, too. I'm not against choosing some name other than pg_log_fatal, but that particular suggestion has got conflicts. Got any other ideas? regards, tom lane
Re: Frontend error logging style
Hi, On 2022-02-22 18:23:19 -0500, Tom Lane wrote: > Robert Haas writes: > > On Fri, Nov 19, 2021 at 5:17 AM Peter Eisentraut > > wrote: > >> If people want to do that kind of thing (I'm undecided whether the > >> complexity is worth it), then make it a different API. The pg_log_* > >> calls are for writing formatted output. They normalized existing > >> hand-coded patterns at the time. We can wrap another API on top of them > >> that does flow control and output. The pg_log_* stuff is more on the > >> level of syslog(), which also just outputs stuff. Nobody is suggesting > >> that syslog(LOG_EMERG) should exit the program automatically. But you > >> can wrap higher-level APIs such as ereport() on top of that that might > >> do that. That'd maybe be a convincing argument in a project that didn't have elog(FATAL). > > Yeah, that might be a way forward. > > This conversation seems to have tailed off without full resolution, > but I observe that pretty much everyone except Peter is on board > with defining pg_log_fatal as pg_log_error + exit(1). So I think > we should just do that, unless Peter wants to produce a finished > alternative proposal. What about adding a pg_fatal() that's pg_log_fatal() + exit()? That keeps pg_log_* stuff "log only", but adds something adjacent enough to hopefully reduce future misunderstandings? To further decrease the chance of such mistakes, we could rename pg_log_fatal() to something else. Or add a a pg_log_fatal() function that's marked pg_nodiscard(), so that each callsite explicitly has to be (void) pg_log_fatal(). > I've now gone through and fleshed out the patch I sketched upthread. > This exercise convinced me that we absolutely should do something > very like this, because > > (1) As it stands, this patch removes a net of just about 1000 lines > of code. So we clearly need *something*. > (2) The savings would be even more, except that people have invented > macros for pg_log_error + exit(1) in at least four places already. > (None of them quite the same of course.) Here I've just fixed those > macro definitions to use pg_log_fatal. In the name of consistency, we > should probably get rid of those macros in favor of using pg_log_fatal > directly; but I've not done so here, since this patch is eyewateringly > long and boring already. Agreed. I don't care that much about the naming here. For a moment I was wondering whether there'd be a benefit for having the "pure logging" stuff be in a different static library than the erroring variant. So that e.g. libpq's check for exit() doesn't run into trouble. But then I remembered that libpq doesn't link against common... Greetings, Andres Freund
Re: Design of pg_stat_subscription_workers vs pgstats
Hi. Below are my review comments for the v1 patch. == 1. Commit message - wording As the result of the discussion, we've concluded that the stats collector is not an appropriate place to store the error information of subscription workers. SUGGESTION It was decided (refer to the Discussion link below) that the stats collector is not an appropriate place to store the error information of subscription workers. ~~~ 2. Commit message - wording This commits changes the view so that it stores only statistics counters: apply_error_count and sync_error_count. SUGGESTION "This commits changes the view" --> "This patch changes the pg_stat_subscription_workers view" ~~~ 3. Commit message - wording Removing these error details, since we don't need to record the error information for apply workers and tablesync workers separately, the view now has one entry per subscription. DID THIS MEAN... After removing these error details, there is no longer separate information for apply workers and tablesync workers, so the view now has only one entry per subscription. -- But anyway, that is not entirely true either because those counters are separate information for the different workers, right? ~~ 4. Commit message - wording Also, it changes the view name to pg_stat_subscription_activity since the word "worker" is an implementation detail that we use one worker for one tablesync. SUGGESTION "Also, it changes" --> "The patch also changes" ... ~~~ 5. doc/src/sgml/monitoring.sgml - wording - The pg_stat_subscription_workers view will contain - one row per subscription worker on which errors have occurred, for workers - applying logical replication changes and workers handling the initial data - copy of the subscribed tables. The statistics entry is removed when the + The pg_stat_subscription_activity view will contain + one row per subscription. The statistics entry is removed when the corresponding subscription is dropped. SUGGESTION "The statistics entry is removed" --> "This row is removed" ~~~ 6. doc/src/sgml/monitoring.sgml - why two counters? Please forgive this noob question... I see there are 2 error_count columns (one for each kind of worker) but I was wondering why it is useful for users to be able to distinguish if the error came from the tablesync workers or from the apply workers? Do you have any example? Also, IIRC sometimes the tablesync might actually do a few "apply" changes itself... so the distinction may become a bit fuzzy... ~~~ 7. src/backend/postmaster/pgstat.c - comment @@ -1313,13 +1312,13 @@ pgstat_vacuum_stat(void) } /* - * Repeat for subscription workers. Similarly, we needn't bother in the - * common case where no subscription workers' stats are being collected. + * Repeat for subscription. Similarly, we needn't bother in the common + * case where no subscription stats are being collected. */ typo? "Repeat for subscription." --> "Repeat for subscriptions." ~~~ 8. src/backend/postmaster/pgstat.c @@ -3000,32 +2968,29 @@ pgstat_fetch_stat_funcentry(Oid func_id) /* * - - * pgstat_fetch_stat_subworker_entry() - + * pgstat_fetch_stat_subentry() - * * Support function for the SQL-callable pgstat* functions. Returns - * the collected statistics for subscription worker or NULL. + * the collected statistics for subscription or NULL. * - */ -PgStat_StatSubWorkerEntry * -pgstat_fetch_stat_subworker_entry(Oid subid, Oid subrelid) +PgStat_StatSubEntry * +pgstat_fetch_stat_subentry(Oid subid) There seems some kind of inconsistency because the member name is called "subscriptions" but sometimes it seems singular. Some places (e.g. pgstat_vacuum_stat) will iterate multiple results, but then other places (like this function) just return to a single "subscription" (or "entry"). I suspect all the code may be fine; probably it is just some inconsistent (singular/plural) comments that have confused things a bit. ~~~ 9. src/backend/replication/logical/worker.c - subscription id + /* Report the worker failed during table synchronization */ + pgstat_report_subscription_error(MyLogicalRepWorker->subid, false); and + /* Report the worker failed during the application of the change */ + pgstat_report_subscription_error(MyLogicalRepWorker->subid, true); Why don't these use MySubscription->oid instead of MyLogicalRepWorker->subid? ~~~ 10. src/include/pgstat.h - enum order @@ -84,8 +84,8 @@ typedef enum StatMsgType PGSTAT_MTYPE_REPLSLOT, PGSTAT_MTYPE_CONNECT, PGSTAT_MTYPE_DISCONNECT, + PGSTAT_MTYPE_SUBSCRIPTIONERROR, PGSTAT_MTYPE_SUBSCRIPTIONPURGE, - PGSTAT_MTYPE_SUBWORKERERROR, } StatMsgType; This change rearranges the enum order. Maybe it is safer not to do this? ~~~ 11. src/include/pgstat.h @@ -767,8 +747,8 @@ typedef union PgStat_Msg PgStat_MsgReplSlot msg_replslot; PgStat_MsgConnect msg_connect; PgStat_MsgDisconnect msg_disconnect; + PgStat_MsgSubscriptionError msg_subscriptionerror; PgStat_Ms
RE: Design of pg_stat_subscription_workers vs pgstats
On Tue, Feb 22, 2022 1:45 PM Masahiko Sawada wrote: > > I've attached a patch that changes pg_stat_subscription_workers view. > It removes non-cumulative values such as error details such as > error-XID and the error message from the view, and consequently the > view now has only cumulative statistics counters: apply_error_count > and sync_error_count. Since the new view name is under discussion I > temporarily chose pg_stat_subscription_activity. > Thanks for your patch. Few comments: 1. + apply_error_count uint8 ... + sync_error_count uint8 It seems that Postgres has no data type named uint8, should we change it to bigint? 2. +# Wait for the table sync error to be reported. +$node_subscriber->poll_query_until( + 'postgres', + qq[ +SELECT apply_error_count = 0 AND sync_error_count > 0 +FROM pg_stat_subscription_activity +WHERE subname = 'tap_sub' +]) or die "Timed out while waiting for table sync error"; We want to check table sync error here, but do we need to check "apply_error_count = 0"? I am not sure if it is possible that the apply worker has an unexpected error, which would cause this test to fail. Regards, Tang
Re: logical replication empty transactions
On Thu, Feb 17, 2022 at 9:42 PM Amit Kapila wrote: > > On Mon, Jan 31, 2022 at 6:18 PM Ajin Cherian wrote: > > > > Few comments: > = > 1. Is there any particular why the patch is not skipping empty xacts > for streaming (in-progress) transactions as noted in the commit > message as well? > I have added support for skipping streaming transaction. > 2. > +static void > +pgoutput_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) > +{ > bool send_replication_origin = txn->origin_id != InvalidRepOriginId; > + PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private; > + > + Assert(txndata); > > I think here you can add an assert for sent_begin_txn to be always false? > Added. > 3. > +/* > + * Send BEGIN. > + * This is where the BEGIN is actually sent. This is called > + * while processing the first change of the transaction. > + */ > > Have an empty line between the first two lines to ensure consistency > with nearby comments. Also, the formatting of these lines appears > awkward, either run pgindent or make sure lines are not too short. > Changed. > 4. Do we really need to make any changes in PREPARE > transaction-related functions if can't skip in that case? I think you > can have a check if the output plugin private variable is not set then > ignore special optimization for sending begin. > I have modified this as well. I have also rebased the patch after it did not apply due to a new commit. I will next work on testing and improving the keepalive logic while skipping transactions. regards, Ajin Cherian Fujitsu Australia v19-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: [BUG] Panic due to incorrect missingContrecPtr after promotion
>Ooh, nice find and diagnosys. I can confirm that the test fails as you >described without the code fix, and doesn't fail with it. >I attach the same patch, with the test file put in its final place >rather than as a patch. Due to recent xlog.c changes this need a bit of >work to apply to back branches; I'll see about getting it in all >branches soon. Thank you for reviewing! Sami
Re: catalog access with reset GUCs during parallel worker startup
Hi, I think we need to do something about this, because afaics this has data corruption risks, albeit with a narrow window. E.g. consider what happens if the value of wal_sync_method is reset to the boot value and one of the catalog accesses in GUC check hooks causes WAL writes (e.g. due to pruning, buffer replacement). One backend doing fdatasync() and the rest O_DSYNC or vice versa won't work lead to reliable durability. On 2022-02-09 21:47:09 -0500, Robert Haas wrote: > I remember trying that, and if I remember correctly, it broke with > core GUCs, without any need for extensions in the picture. I don't > remember the details too well, unfortunately, but I think it had to do > with the behavior of some of the check and/or assign hooks. > > It's probably time to try it again, because (a) maybe things are > different now, or (b) maybe I did it wrong, and in any event (c) I > can't really remember why it didn't work and we probably want to know > that. The tests fail: +ERROR: invalid value for parameter "session_authorization": "andres" +CONTEXT: while setting parameter "session_authorization" to "andres" +parallel worker +while scanning relation "public.pvactst" but that's easily worked around. In fact, there's already code to do so, it just is lacking awareness of session_authorization: static bool can_skip_gucvar(struct config_generic *gconf) ... * Role must be handled specially because its current value can be an * invalid value (for instance, if someone dropped the role since we set * it). So if we tried to serialize it normally, we might get a failure. * We skip it here, and use another mechanism to ensure the worker has the * right value. Don't those reasons apply just as well to session_authorization as they do to role? If I skip session_authorization in can_skip_gucvar() and move RestoreGUCState() out of the transaction, all tests pass. Unsurprisingly we get bogus results for parallel accesses to session_authorization: postgres[3312622][1]=> SELECT current_setting('session_authorization'); ┌─┐ │ current_setting │ ├─┤ │ frak│ └─┘ (1 row) postgres[3312622][1]=> SET force_parallel_mode = on; SET postgres[3312622][1]=> SELECT current_setting('session_authorization'); ┌─┐ │ current_setting │ ├─┤ │ andres │ └─┘ (1 row) but that could be remedied just already done for role. Greetings, Andres Freund
Re: Design of pg_stat_subscription_workers vs pgstats
On Wed, Feb 23, 2022 at 11:14 AM Andres Freund wrote: > > Hi, > > On 2022-02-22 14:45:19 +0900, Masahiko Sawada wrote: > > I've attached a patch that changes pg_stat_subscription_workers view. > > Thanks for working on this! > > Why are the stats stored in the per-database stats file / as a second level > below the database? While they're also associated with a database, it's a > global catalog, so it seems to make more sense to have them "live" globally as > well? Good point. The reason why we used to use per-database stats file is that we were storing some relation information there. But now that we don't need to have such information, it makes more sense to have them live globally. I'll change the patch accordingly. > > Not just from an aesthetical perspective, but there might also be cases where > it's useful to send stats from the stats launcher. E.g. the number of times > the launcher couldn't start a worker because the max numbers of workers was > already active or such. Good idea. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: shared_preload_libraries = 'pg_stat_statements' failing with installcheck (compute_query_id = auto)
On Tue, Feb 22, 2022 at 11:04:16AM +0900, Michael Paquier wrote: > On Fri, Feb 18, 2022 at 05:38:56PM +0800, Julien Rouhaud wrote: > > On Fri, Feb 18, 2022 at 05:22:36PM +0900, Michael Paquier wrote: > >> So, I have been looking at this problem, and I don't see a problem in > >> doing something like the attached, where we add a "regress" mode to > >> compute_query_id that is a synonym of "auto". Or, in short, we have > >> the default of letting a module decide if a query ID can be computed > >> or not, at the exception that we hide its result in EXPLAIN outputs. > Okay, thanks. I have worked on that today and applied the patch down While rebasing, I noticed that this patch does part of what I added in another thread. https://commitfest.postgresql.org/37/3409/ | explain_regress, explain(MACHINE), and default to explain(BUFFERS) - if (es->verbose && plannedstmt->queryId != UINT64CONST(0)) + if (es->verbose && plannedstmt->queryId != UINT64CONST(0) && es->machine) { /* * Output the queryid as an int64 rather than a uint64 so we match Apparently, I first wrote this two years ago today. -- Justin
Re: Use generation context to speed up tuplesorts
Hi, On 2022-02-18 12:10:51 +1300, David Rowley wrote: > The other way I thought to fix it was by changing the logic for when > generation blocks are freed. In the problem case mentioned above, the > block being freed is the current block (which was just allocated). I > made some changes to adjust this behaviour so that we no longer free > the block when the final chunk is pfree()'d. Instead, that now lingers > and can be reused by future allocations, providing they fit inside it. That makes sense to me, as long as we keep just one such block. > The problem I see with this method is that there still could be some > pathological case that causes us to end up storing just a single tuple per > generation block. Crazy idea: Detect the situation, and recompact. Create a new context, copy all the tuples over, delete the old context. That could be a win even in less adversarial situations than "a single tuple per generation block". Greetings, Andres Freund
Re: small development tip: Consider using the gold linker
On Tue, Feb 22, 2022 at 4:26 PM Andres Freund wrote: > I've switched back and forth between gold and lld a couple times. Eventually I > always get frustrated by lld causing slowdowns and other issues for gdb. Did > you ever hit one of those? Now that you mention it...gdb was very slow the last few times I used it, which wasn't for long enough for it to really matter. This must have been why. I might have to rescind my recommendation of lld. -- Peter Geoghegan
Re: Design of pg_stat_subscription_workers vs pgstats
Hi, On 2022-02-22 14:45:19 +0900, Masahiko Sawada wrote: > I've attached a patch that changes pg_stat_subscription_workers view. Thanks for working on this! Why are the stats stored in the per-database stats file / as a second level below the database? While they're also associated with a database, it's a global catalog, so it seems to make more sense to have them "live" globally as well? Not just from an aesthetical perspective, but there might also be cases where it's useful to send stats from the stats launcher. E.g. the number of times the launcher couldn't start a worker because the max numbers of workers was already active or such. Greetings, Andres Freund
Re: row filtering for logical replication
On Tue, Feb 22, 2022 at 4:47 AM Euler Taveira wrote: > > On Thu, Feb 17, 2022, at 3:36 AM, Amit Kapila wrote: > > As there is a new version, I would like to wait for a few more days > before committing. I am planning to commit this early next week (by > Tuesday) unless others or I see any more things that can be improved. > > Amit, I don't have additional comments or suggestions. Let's move on. Next > topic. :-) > Pushed! -- With Regards, Amit Kapila.
Re: Race conditions in 019_replslot_limit.pl
Hi, I think I did find a bug related to the test, but afaics not the cause of the test failures we're seeing. See https://www.postgresql.org/message-id/20220223014855.4lsddr464i7mymk2%40alap3.anarazel.de I don't think it's related to the problem of this thread, because the logs of primary3 don't have a single mention of ereport(LOG, (errmsg("terminating process %d to release replication slot \"%s\"", active_pid, NameStr(slotname; On 2022-02-18 15:14:15 -0800, Andres Freund wrote: > I'm running out of ideas for how to try to reproduce this. I think we might > need some additional debugging information to get more information from the > buildfarm. > I'm thinking of adding log_min_messages=DEBUG2 to primary3, passing --verbose > to pg_basebackup in $node_primary3->backup(...). > > It might also be worth adding DEBUG2 messages to ReplicationSlotShmemExit(), > ReplicationSlotCleanup(), InvalidateObsoleteReplicationSlots(). Planning to commit something like the attached. Greetings, Andres Freund >From afdeff10526e29e3fc63b18c08100458780489d9 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Tue, 22 Feb 2022 18:02:34 -0800 Subject: [PATCH v1] Add temporary debug info to help debug 019_replslot_limit.pl failures. I have not been able to reproduce the occasional failures of 019_replslot_limit.pl we are seeing in the buildfarm and not for lack of trying. The additional logging and increased log level will hopefully help. Will be reverted once the cause is identified. Discussion: https://postgr.es/m/20220218231415.c4plkp4i3reqc...@alap3.anarazel.de --- src/backend/replication/slot.c| 21 + src/bin/pg_basebackup/pg_basebackup.c | 10 +- src/test/recovery/t/019_replslot_limit.pl | 5 - 3 files changed, 34 insertions(+), 2 deletions(-) diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 5da5fa825a2..3d39fddaaef 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -177,6 +177,10 @@ ReplicationSlotInitialize(void) static void ReplicationSlotShmemExit(int code, Datum arg) { + /* temp debugging aid to analyze 019_replslot_limit failures */ + elog(DEBUG3, "replication slot exit hook, %s active slot", + MyReplicationSlot != NULL ? "with" : "without"); + /* Make sure active replication slots are released */ if (MyReplicationSlot != NULL) ReplicationSlotRelease(); @@ -554,6 +558,9 @@ ReplicationSlotCleanup(void) Assert(MyReplicationSlot == NULL); restart: + /* temp debugging aid to analyze 019_replslot_limit failures */ + elog(DEBUG3, "temporary replication slot cleanup: begin"); + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); for (i = 0; i < max_replication_slots; i++) { @@ -579,6 +586,8 @@ restart: } LWLockRelease(ReplicationSlotControlLock); + + elog(DEBUG3, "temporary replication slot cleanup: done"); } /* @@ -1284,6 +1293,12 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN, (void) kill(active_pid, SIGTERM); last_signaled_pid = active_pid; } + else + { +/* temp debugging aid to analyze 019_replslot_limit failures */ +elog(DEBUG3, "not signalling process %d during invalidation of slot \"%s\"", + active_pid, NameStr(slotname)); + } /* Wait until the slot is released. */ ConditionVariableSleep(&s->active_cv, @@ -1347,6 +1362,10 @@ InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno) XLogSegNoOffsetToRecPtr(oldestSegno, 0, wal_segment_size, oldestLSN); restart: + /* temp debugging aid to analyze 019_replslot_limit failures */ + elog(DEBUG3, "begin invalidating obsolete replication slots older than %X/%X", + LSN_FORMAT_ARGS(oldestLSN)); + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); for (int i = 0; i < max_replication_slots; i++) { @@ -1372,6 +1391,8 @@ restart: ReplicationSlotsComputeRequiredLSN(); } + elog(DEBUG3, "done invalidating obsolete replication slots"); + return invalidated; } diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 08b07d5a06e..8c77c533e64 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -700,8 +700,16 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier) bgchild = fork(); if (bgchild == 0) { + int ret; + /* in child process */ - exit(LogStreamerMain(param)); + ret = LogStreamerMain(param); + + /* temp debugging aid to analyze 019_replslot_limit failures */ + if (verbose) + pg_log_info("log streamer with pid %d exiting", getpid()); + + exit(ret); } else if (bgchild < 0) { diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl index 4257bd4d35a..0c9da9bf272 100644 --- a/src/test/recovery/t/019_replslot_limit.pl +++ b/src/test/recovery/t/019_replslot_limit.pl @@ -316,13 +316,16 @@
Re: Race condition in InvalidateObsoleteReplicationSlots()
Hi, On 2022-02-22 17:48:55 -0800, Andres Freund wrote: > I think the minor changes might unfortunately have introduced a race? Before > the patch just used ConditionVariableSleep(), but now it also has a > ConditionVariablePrepareToSleep(). Without re-checking the sleep condition > until > /* Wait until the slot is released. */ > ConditionVariableSleep(&s->active_cv, >WAIT_EVENT_REPLICATION_SLOT_DROP); > > which directly violates what ConditionVariablePrepareToSleep() documents: > > * This can optionally be called before entering a test/sleep loop. > * Doing so is more efficient if we'll need to sleep at least once. > * However, if the first test of the exit condition is likely to succeed, > * it's more efficient to omit the ConditionVariablePrepareToSleep call. > * See comments in ConditionVariableSleep for more detail. > * > * Caution: "before entering the loop" means you *must* test the exit > * condition between calling ConditionVariablePrepareToSleep and calling > * ConditionVariableSleep. If that is inconvenient, omit calling > * ConditionVariablePrepareToSleep. > > > Afaics this means we can potentially sleep forever if the prior owner of the > slot releases it before the ConditionVariablePrepareToSleep(). > > There's a comment that's mentioning this danger: > > /* > * Prepare the sleep on the slot's condition variable before > * releasing the lock, to close a possible race condition if the > * slot is released before the sleep below. > */ > ConditionVariablePrepareToSleep(&s->active_cv); > > LWLockRelease(ReplicationSlotControlLock); > > but afaics that is bogus, because releasing a slot doesn't take > ReplicationSlotControlLock. That just protects against the slot being dropped, > not against it being released. > > We can ConditionVariablePrepareToSleep() here, but we'd have to it earlier, > before the checks at the start of the while loop. Not at the start of the while loop, outside of the while loop. Doing it in the loop body doesn't make sense, even if it's at the top. Each ConditionVariablePrepareToSleep() will unregister itself: /* * If some other sleep is already prepared, cancel it; this is necessary * because we have just one static variable tracking the prepared sleep, * and also only one cvWaitLink in our PGPROC. It's okay to do this * because whenever control does return to the other test-and-sleep loop, * its ConditionVariableSleep call will just re-establish that sleep as * the prepared one. */ if (cv_sleep_target != NULL) ConditionVariableCancelSleep(); The intended use is documented in this comment: * This should be called in a predicate loop that tests for a specific exit * condition and otherwise sleeps, like so: * * ConditionVariablePrepareToSleep(cv); // optional * while (condition for which we are waiting is not true) * ConditionVariableSleep(cv, wait_event_info); * ConditionVariableCancelSleep(); Greetings, Andres Freund
Re: Time to drop plpython2?
Mark Wong writes: > Take 3. :) > I've upgraded everyone to the v14 buildfarm scripts and made sure the > --test passed on HEAD on each one. So I hopefully didn't miss any > (other than the one EOL OpenSUSE version that I will plan on upgrading.) Thanks! However ... it seems like most of your animals haven't run at all in the last couple of days. Did you maybe forget to re-enable their cron jobs after messing with them, or something like that? regards, tom lane
Re: Race condition in InvalidateObsoleteReplicationSlots()
Hi, On 2021-06-11 12:27:57 -0400, Álvaro Herrera wrote: > On 2021-Jun-10, Álvaro Herrera wrote: > > > Here's a version that I feel is committable (0001). There was an issue > > when returning from the inner loop, if in a previous iteration we had > > released the lock. In that case we need to return with the lock not > > held, so that the caller can acquire it again, but weren't. This seems > > pretty hard to hit in practice (I suppose somebody needs to destroy the > > slot just as checkpointer killed the walsender, but before checkpointer > > marks it as its own) ... but if it did happen, I think checkpointer > > would block with no recourse. Also added some comments and slightly > > restructured the code. > > > > There are plenty of conflicts in pg13, but it's all easy to handle. > > Pushed, with additional minor changes. I stared at this code, due to [1], and I think I found a bug. I think it's not the cause of the failures in that thread, but we probably should still do something about it. I think the minor changes might unfortunately have introduced a race? Before the patch just used ConditionVariableSleep(), but now it also has a ConditionVariablePrepareToSleep(). Without re-checking the sleep condition until /* Wait until the slot is released. */ ConditionVariableSleep(&s->active_cv, WAIT_EVENT_REPLICATION_SLOT_DROP); which directly violates what ConditionVariablePrepareToSleep() documents: * This can optionally be called before entering a test/sleep loop. * Doing so is more efficient if we'll need to sleep at least once. * However, if the first test of the exit condition is likely to succeed, * it's more efficient to omit the ConditionVariablePrepareToSleep call. * See comments in ConditionVariableSleep for more detail. * * Caution: "before entering the loop" means you *must* test the exit * condition between calling ConditionVariablePrepareToSleep and calling * ConditionVariableSleep. If that is inconvenient, omit calling * ConditionVariablePrepareToSleep. Afaics this means we can potentially sleep forever if the prior owner of the slot releases it before the ConditionVariablePrepareToSleep(). There's a comment that's mentioning this danger: /* * Prepare the sleep on the slot's condition variable before * releasing the lock, to close a possible race condition if the * slot is released before the sleep below. */ ConditionVariablePrepareToSleep(&s->active_cv); LWLockRelease(ReplicationSlotControlLock); but afaics that is bogus, because releasing a slot doesn't take ReplicationSlotControlLock. That just protects against the slot being dropped, not against it being released. We can ConditionVariablePrepareToSleep() here, but we'd have to it earlier, before the checks at the start of the while loop. Greetings, Andres Freund [1] https://www.postgresql.org/message-id/20220218231415.c4plkp4i3reqcwip%40alap3.anarazel.de
RE: Design of pg_stat_subscription_workers vs pgstats
On Tuesday, February 22, 2022 11:47 PM Masahiko Sawada wrote: > On Tue, Feb 22, 2022 at 9:22 PM osumi.takami...@fujitsu.com > wrote: > > (4) doc/src/sgml/monitoring.sgml > > > > > > role="column_definition"> > > - last_error_time timestamp with > time zone > > + sync_error_count uint8 > > > > > > - Last time at which this error occurred > > + Number of times the error occurred during the initial data > > + copy > > > > > > I supposed it might be better to use "initial data sync" > > or "initial data synchronization", rather than "initial data copy". > > "Initial data synchronization" sounds like the whole table synchronization > process including COPY and applying changes to catch up. But > sync_error_count is incremented only during COPY so I used "initial data > copy". > What do you think? Okay. Please keep it as is. Best Regards, Takamichi Osumi
Re: pgcrypto: Remove internal padding implementation
On Mon, 2022-02-21 at 11:42 +0100, Peter Eisentraut wrote: > On 15.02.22 00:07, Jacob Champion wrote: > > After this patch, bad padding is no longer ignored during decryption, > > and encryption without padding now requires the input size to be a > > multiple of the block size. To see the difference you can try the > > following queries with and without the patch: > > > > select encrypt_iv('foo', '0123456', 'abcd', 'aes/pad:none'); > > select > > encode(decrypt_iv('\xa21a9c15231465964e3396d32095e67eb52bab05f556a581621dee1b85385789', > > '0123456', 'abcd', 'aes'), 'escape'); > > Interesting. I have added test cases about this. Could you describe > how you arrived at the second test case? Sure -- that second ciphertext is the result of running SELECT encrypt_iv('abcdefghijklmnopqrstuvwxyz', '0123456', 'abcd', 'aes'); and then incrementing the last byte of the first block (i.e. the 16th byte) to corrupt the padding in the last block. > > Both changes seem correct to me. I can imagine some system out there > > being somehow dependent on the prior decryption behavior to avoid a > > padding oracle -- but if that's a concern, hopefully you're not using > > unauthenticated encryption in the first place? It might be worth a note > > in the documentation. > > Hmm. I started reading up on "padding oracle attack". I don't > understand it well enough yet to know whether this is a concern here. I *think* the only reasonable way to prevent that attack is by authenticating with a MAC or an authenticated cipher, to avoid running decryption on corrupted ciphertext in the first place. But I'm out of my expertise here. > Is there any reasonable meaning of the previous behaviors? I definitely don't think the previous behavior was reasonable. It's possible that someone built a strange system on top of that unreasonable behavior, but I hope not. > Does bad padding even give correct answers on decryption? No -- or rather, I'm not really sure how to define "correct answer" for garbage input. I especially don't like that two different ciphertexts can silently decrypt to the same plaintext. > What does encryption without padding even do on incorrect input sizes? Looks like it's being silently zero-padded by the previous code, which doesn't seem very helpful to me. The documentation says "data must be multiple of cipher block size" for the pad:none case, though I suppose it doesn't say what happens if you ignore the "must". --Jacob
Re: small development tip: Consider using the gold linker
Hi, On 2022-01-13 19:24:09 -0800, Peter Geoghegan wrote: > On Fri, Jul 20, 2018 at 8:16 PM Peter Geoghegan wrote: > > On Mon, Jul 9, 2018 at 4:40 PM, Andres Freund wrote: > > > FWIW, I've lately noticed that I spend a fair time waiting for the > > > linker during edit-compile-test cycles. Due to an independent issue I > > > just used the gold linker, and the speedup is quite noticable. > > > For me just adding '-fuse-ld=gold' to CFLAGS works. > > > > I tried this out today. It makes quite a noticeable difference for me. > > Thanks for the tip. > > The 2022 version of the same tip: switching over to LLVM lld seems to > offer a further significant speedup over gold. Not surprising, since > lld is specifically promoted as a linker that is faster than gold [1]. I've switched back and forth between gold and lld a couple times. Eventually I always get frustrated by lld causing slowdowns and other issues for gdb. Did you ever hit one of those? > Again, this appears to just be a case of installing lld, and adding > '-fuse-ld=lld' to CFLAGS -- a very easy switch to make. Link time is > still a noticeable contributor to overall build time for me, even with > gold (I use ccache, of course). Yea. gold also isn't really maintained anymore, so it'd be nice to move on... > There is another linker called mold [2]. It claims to be faster than > lld (which was faster than gold, which was faster than ld). I didn't > bother trying it out. Yea, didn't yet quite seem there yet last time I looked, and lld seems plenty fast for our purposes. Greetings, Andres Freund
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
On 2/22/22 21:12, Tomas Vondra wrote: > On 2/22/22 01:36, Fujii Masao wrote: >> >> >> On 2022/02/18 22:28, Tomas Vondra wrote: >>> Hi, >>> >>> here's a slightly updated version of the patch series. >> >> Thanks for updating the patches! >> >> >>> The 0001 part >>> adds tracking of server_version_num, so that it's possible to enable >>> other features depending on it. >> >> Like configure_remote_session() does, can't we use PQserverVersion() >> instead of implementing new function GetServerVersion()? >> > > Ah! My knowledge of libpq is limited, so I wasn't sure this function > exists. It'll simplify the patch. > And here's the slightly simplified patch, without the part adding the unnecessary GetServerVersion() function. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom 089ecf8c9ee44b89becfbcfb9c7a0ddc6c8f197a Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Fri, 18 Feb 2022 00:33:19 +0100 Subject: [PATCH] postgres_fdw: sample data on remote node for ANALYZE When performing ANALYZE on a foreign tables, we need to collect sample of rows. Until now, we simply read all data from the remote node and built the sample locally. That is very expensive, especially in terms of network traffic etc. But it's possible to move the sampling to the remote node, and use either TABLESAMPLE or simply random() to transfer just much smaller amount of data. So we run either SELECT * FROM table TABLESAMPLE SYSTEM(fraction) or SELECT * FROM table WHERE random() < fraction depending on the server version (TABLESAMPLE is supported since 9.5). To do that, we need to determine what fraction of the table to sample. We rely on reltuples (fetched from the remote node) to be sufficiently accurate and up to date, and calculate the fraction based on that. We increase the sample size a bit (in case the table shrunk), and we still do the reservoir sampling (in case it grew). Using tsm_system_rows would allow specifying sample size in rows, without determining sampling rate. But the sampling method may not be installed, and we'd still have to determine the relation size. This adds 'sample' option for remote servers / tables. By default, it's set to 'true' which enables remote sampling. Setting it to 'false' uses the old approach of fetching everything and sampling locally. --- contrib/postgres_fdw/deparse.c | 152 contrib/postgres_fdw/option.c | 7 +- contrib/postgres_fdw/postgres_fdw.c | 142 +- contrib/postgres_fdw/postgres_fdw.h | 7 ++ 4 files changed, 304 insertions(+), 4 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index bf12eac0288..32f2c0d5fb3 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -2267,6 +2267,26 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel) appendStringInfo(buf, "::pg_catalog.regclass) / %d", BLCKSZ); } +/* + * Construct SELECT statement to acquire numbe of rows of given relation. + * + * Note: Maybe this should compare relpages and current relation size + * and adjust reltuples accordingly? + */ +void +deparseAnalyzeTuplesSql(StringInfo buf, Relation rel) +{ + StringInfoData relname; + + /* We'll need the remote relation name as a literal. */ + initStringInfo(&relname); + deparseRelation(&relname, rel); + + appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = "); + deparseStringLiteral(buf, relname.data); + appendStringInfoString(buf, "::pg_catalog.regclass"); +} + /* * Construct SELECT statement to acquire sample rows of given relation. * @@ -2328,6 +2348,138 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs) deparseRelation(buf, rel); } +/* + * Construct SELECT statement to acquire sample rows of given relation, + * by sampling a fraction of the table using TABLESAMPLE SYSTEM. + * + * SELECT command is appended to buf, and list of columns retrieved + * is returned to *retrieved_attrs. + * + * Note: We could allow selecting system/bernoulli, and maybe even the + * optional TSM modules (especially tsm_system_rows would help). + */ +void +deparseAnalyzeTableSampleSql(StringInfo buf, Relation rel, List **retrieved_attrs, double sample_frac) +{ + Oid relid = RelationGetRelid(rel); + TupleDesc tupdesc = RelationGetDescr(rel); + int i; + char *colname; + List *options; + ListCell *lc; + bool first = true; + + *retrieved_attrs = NIL; + + appendStringInfoString(buf, "SELECT "); + for (i = 0; i < tupdesc->natts; i++) + { + /* Ignore dropped columns. */ + if (TupleDescAttr(tupdesc, i)->attisdropped) + continue; + + if (!first) + appendStringInfoString(buf, ", "); + first = false; + + /* Use attribute name or column_name option. */ + colname = NameStr(TupleDescAttr(tupdesc, i)->attname); + options = GetForeignColumnOptions(relid, i + 1); + + foreach(lc, options) + { + DefElem*def = (DefElem *) lfirst(
Re: logical decoding and replication of sequences
On 2/19/22 06:33, Amit Kapila wrote: > On Sat, Feb 19, 2022 at 7:48 AM Tomas Vondra > wrote: >> >> Do we need to handle both FOR ALL TABLES and FOR ALL SEQUENCES at the >> same time? That seems like a reasonable thing people might want. >> > > +1. That seems reasonable to me as well. I think the same will apply > to FOR ALL TABLES IN SCHEMA and FOR ALL SEQUENCES IN SCHEMA. > It already works for "IN SCHEMA" because that's handled as a publication object, but FOR ALL TABLES and FOR ALL SEQUENCES are defined directly in CreatePublicationStmt. Which also means you can't do ALTER PUBLICATION and change it to FOR ALL TABLES. Which is a bit annoying, but OK. It's a bit weird FOR ALL TABLES is mentioned in docs for ALTER PUBLICATION as if it was supported. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
wal_compression=zstd
Since 4035cd5d4, wal_compression=lz4 is supported. I think pg15 should also support wal_compression=zstd. There are free bits in the WAL header. The last message on that thread includes a patch doing just that, which I've rebased. https://www.postgresql.org/message-id/ynqwd2gsmrnqw...@paquier.xyz It might be nice if to also support wal_compression=zstd-level, but that seems optional and could wait for pg16... In [0], I showed a case where lz4 is just as fast as uncompressed data, and writes ~60% as much. zstd writes half as much as LZ4 and 12% slower. [0] https://www.postgresql.org/message-id/20210614012412.GA31772%40telsasoft.com I am not claiming that zstd is generally better for WAL. Rather, if we're going to support alternate compression methods, it's nice to give a couple options (in addition to pglz). Some use cases would certainly suffer from slower WAL. But providing the option will benefit some others. Note that a superuser can set wal_compression for a given session - this would probably benefit bulk-loading in an otherwise OLTP environment. As writen, this patch uses zstd level=1 (whereas the ZSTD's default compress level is 6). 0001 adds support for zstd 0002 makes more efficient our use of bits in the WAL header 0003 makes it the default compression, to exercise on CI (windows will fail). 0004 adds support for zstd-level 0005-6 are needed to allow recovery tests to pass when wal compression is enabled; 0007 (not included) also adds support for zlib. I'm of the impression nobody cares about this, otherwise it would've been included 5-10 years ago. Only 0001 should be reviewed for pg15 - the others are optional/future work. >From 9253013c789ffb121272bfeeaa9dcdebbef79ced Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Fri, 18 Feb 2022 22:54:18 -0600 Subject: [PATCH 1/6] add wal_compression=zstd --- doc/src/sgml/config.sgml | 9 +++--- doc/src/sgml/installation.sgml| 10 +++ src/backend/access/transam/xloginsert.c | 30 ++- src/backend/access/transam/xlogreader.c | 20 + src/backend/utils/misc/guc.c | 3 ++ src/backend/utils/misc/postgresql.conf.sample | 2 +- src/bin/pg_waldump/pg_waldump.c | 2 ++ src/include/access/xlog.h | 3 +- src/include/access/xlogrecord.h | 5 +++- 9 files changed, 76 insertions(+), 8 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 7ed8c82a9d..97740c7b66 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -3154,10 +3154,11 @@ include_dir 'conf.d' server compresses full page images written to WAL when is on or during a base backup. A compressed page image will be decompressed during WAL replay. -The supported methods are pglz and -lz4 (if PostgreSQL was -compiled with --with-lz4). The default value is -off. Only superusers can change this setting. +The supported methods are pglz, and +(if configured when PostgreSQL was built) +lz4 and zstd. +The default value is off. +Only superusers can change this setting. diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml index 0f74252590..d32767b229 100644 --- a/doc/src/sgml/installation.sgml +++ b/doc/src/sgml/installation.sgml @@ -271,6 +271,14 @@ su - postgres + + + The ZSTD library can be used to enable + compression using that method; see + . + + + To build the PostgreSQL documentation, @@ -988,6 +996,8 @@ build-postgresql: Build with ZSTD compression support. + This enables use of ZSTD for + compression of WAL data. diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c index c260310c4c..847ce0c3fc 100644 --- a/src/backend/access/transam/xloginsert.c +++ b/src/backend/access/transam/xloginsert.c @@ -44,9 +44,17 @@ #define LZ4_MAX_BLCKSZ 0 #endif +#ifdef USE_ZSTD +#include +#define ZSTD_MAX_BLCKSZ ZSTD_COMPRESSBOUND(BLCKSZ) +#else +#define ZSTD_MAX_BLCKSZ 0 +#endif + +/* Buffer size required to store a compressed version of backup block image */ #define PGLZ_MAX_BLCKSZ PGLZ_MAX_OUTPUT(BLCKSZ) -#define COMPRESS_BUFSIZE Max(PGLZ_MAX_BLCKSZ, LZ4_MAX_BLCKSZ) +#define COMPRESS_BUFSIZE Max(Max(PGLZ_MAX_BLCKSZ, LZ4_MAX_BLCKSZ), ZSTD_MAX_BLCKSZ) /* * For each block reference registered with XLogRegisterBuffer, we fill in @@ -695,6 +703,14 @@ XLogRecordAssemble(RmgrId rmid, uint8 info, #endif break; + case WAL_COMPRESSION_ZSTD: +#ifdef USE_ZSTD + bimg.bimg_info |= BKPIMAGE_COMPRESS_ZSTD; +#else + elog(ERROR, "ZSTD is not supported by this build"); +#endif + break; + case WAL_COMPRESSION_NONE: Assert(false); /* cannot happen */
Re: making pg_regress less noisy by removing boilerplate
Hi, On 2022-02-22 08:55:23 -0500, Andrew Dunstan wrote: > I'm pretty sure all my Windows machines with buildfarm animals are > sufficiently modern except the XP machine, which only builds release 10 > nowadays. Cool. > What's involved in moving to require Unix socket support? It works today (the CI scripts use it on windows for example). But it's awkward because make_temp_sockdir() defaults to /tmp/ if TMPDIR isn't set. Which it is not by default on windows. There's PG_REGRESS_SOCK_DIR, which kind of works for the msvc build, because pg_regress tests aren't run concurrently (whereas tap tests can be run concurrently with PROVE_FLAGS-j). I think we just make make_temp_sockdir() a tad smarter. Perhaps by lifting the code in src/bin/psql/command.c:do_edit() to src/port/path.c or such? Greetings, Andres Freund
Using operators to do query hints
I've been playing with an idea I had a while back. Basically that it would be useful to have some "noop" operators that are used purely to influence the planner. For context I've suggested in the past that there are two categories of hints: 1 Hints that override the planner's decisions with explicit planning decisions. These we don't like for a variety of reasons, notably because users don't have very good knowledge of all the plan types available and new plan types come out in the future. 2 Hints that just influence the estimates that the planner uses to make its decisions. These seem more acceptable as it amounts to admitting the users know the distribution of their data and the behaviour of their functions better than the statistics. Also, there are plenty of cases where the statistics are really just wild guesses. They still allow the planner to make decisions about what join orders or types and so on given the updated data. So I thought adding some noop operators that took a boolean on one side and a float on the other and simply returned the boolean at run-time but used the float (which would have to be a constant) from the lhs as the selectivity would be useful. In practice it's usable but not nearly as widely useful as I expected. 1) The boolean being passed through has to be a real clause with real fields. Otherwise the planner is too clever and turns it into a one-time filter which doesn't affect the plan :/ 2) If it's a clauase that is potentially going to be an index condition then you need to repeat the clause outside the selectivity noop operator. Otherwise the selectivity operator hides it from the planner. 3) It doesn't work on joins at all. Joins are done using the join selectivity function on the join clause's operator. There's no way to pass construct a simple noop wrapper that would still work with that that I can see. Nonetheless what I have does seem to be somewhat handy for simple cases. I added some simple SQL wrapper functions such as pg_unlikely(): postgres=# explain select * from test where i<100; QUERY PLAN - Index Only Scan using i on test (cost=0.28..10.01 rows=99 width=4) Index Cond: (i < 100) (2 rows) postgres=# explain select * from test where pg_unlikely(i<100); QUERY PLAN Index Only Scan using i on test (cost=0.28..10.50 rows=1 width=4) Index Cond: (i < 100) Filter: ('1e-06'::double precision %%% (i < 100)) (3 rows) However this doesn't really do what you might really be hoping. specifically, it doesn't actually affect the planner's choice of the index scan in that query. It affects the estimate of the result of the scan and that might be useful for subsequent nodes of the plan. But not for the index scan itself: postgres=# explain select * from test where pg_unlikely(i<500); QUERY PLAN - Seq Scan on test (cost=0.00..22.50 rows=1 width=4) Filter: ((i < 500) AND ('1e-06'::double precision %%% (i < 500))) (2 rows) So. I dunno. This feels like it's something that could be quite handy but it's not as good as I had hoped and I think I'm out of ideas for making it more powerful. -- greg
Re: Time to drop plpython2?
On Mon, Feb 21, 2022 at 03:28:35PM -0500, Tom Lane wrote: > Mark Wong writes: > > On Sat, Feb 19, 2022 at 08:22:29AM -0800, Andres Freund wrote: > >> Unfortunately it looks like it wasn't quite enough. All, or nearly all, > >> your > >> animals that ran since still seem to be failing in the same spot... > > > Oops, made another pass for python3 dev libraries. > > You might need to do one more thing, which is manually blow away the cache > files under $BUILDFARM/accache. For example, on demoiselle everything > looks fine in HEAD, but the back branches are failing like this: > > checking for python... (cached) /usr/bin/python > ./configure: line 10334: /usr/bin/python: No such file or directory > configure: using python > ./configure: line 10342: test: : integer expression expected > checking for Python sysconfig module... no > configure: error: sysconfig module not found > > Very recent versions of the buildfarm script will discard accache > automatically after a configure or make failure, but I think the > REL_11 you're running here doesn't have that defense. It'll only > flush accache after a change in the configure script in git. Take 3. :) I've upgraded everyone to the v14 buildfarm scripts and made sure the --test passed on HEAD on each one. So I hopefully didn't miss any (other than the one EOL OpenSUSE version that I will plan on upgrading.) Regards, Mark
Re: C++ Trigger Framework
On Tue, Feb 22, 2022 at 04:02:39PM +0200, Shmuel Kamensky wrote: > I'm interested in creating a Postgres extension that would enable > developers to write triggers in (modern) C++. Does anyone know if there is > already some sort of translation wrapper between the native Postgres C > API's and C++? Or if there's already a project that allows for writing > triggers in C++ with ease? > I see that https://github.com/clkao/plv8js-clkao/blob/master/plv8_type.cc > does implement an abstraction of sorts, but it's specific to v8 types and > is not genericized as a way of interacting with Postgres C API's from C++ > from *an*y C++ code. > > Can you imagine any potential technical challenges I may encounter (e.g. > massaging postgres' custom allocator to work with C++'s new and delete > operators, or unresolvable compiler incompatibilities)? This might answer your questions: https://www.postgresql.org/docs/devel/xfunc-c.html#EXTEND-CPP -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: TAP output format in pg_regress
> On 22 Feb 2022, at 18:13, Andres Freund wrote: > On 2022-02-22 15:10:11 +0100, Daniel Gustafsson wrote: >> The errorpaths that exit(2) the testrun should be converted to "bail out" >> lines >> when running with TAP output, but apart from that I think it's fairly spec >> compliant. > > I'd much rather not use BAIL - I haven't gotten around to doing anything about > it, but I really want to get rid of nearly all our uses of bail: Point. We already error out on stderr in pg_regress so we could probably make die() equivalent output to keep the TAP parsing consistent. At any rate, awaiting the conclusions on the bail thread and simply (for some value of) replicating that in this patch is probably the best option? -- Daniel Gustafsson https://vmware.com/
Re: bailing out in tap tests nearly always a bad idea
On 2022-02-22 15:10:30 -0500, Andrew Dunstan wrote: > I'll be surprised if we can't come up with something cleaner than that. Suggestions?
C++ Trigger Framework
Hello all, this is my first time posting here (I first posted on the IRC but didn't get any response), so let me know if there's a different procedure for asking questions. I'm interested in creating a Postgres extension that would enable developers to write triggers in (modern) C++. Does anyone know if there is already some sort of translation wrapper between the native Postgres C API's and C++? Or if there's already a project that allows for writing triggers in C++ with ease? I see that https://github.com/clkao/plv8js-clkao/blob/master/plv8_type.cc does implement an abstraction of sorts, but it's specific to v8 types and is not genericized as a way of interacting with Postgres C API's from C++ from *an*y C++ code. Can you imagine any potential technical challenges I may encounter (e.g. massaging postgres' custom allocator to work with C++'s new and delete operators, or unresolvable compiler incompatibilities)? Thanks for any input :-)
Re: [BUG] Panic due to incorrect missingContrecPtr after promotion
On 2022-Feb-22, Imseih (AWS), Sami wrote: > On 13.5 a wal flush PANIC is encountered after a standby is promoted. > > With debugging, it was found that when a standby skips a missing > continuation record on recovery, the missingContrecPtr is not > invalidated after the record is skipped. Therefore, when the standby > is promoted to a primary it writes an overwrite_contrecord with an LSN > of the missingContrecPtr, which is now in the past. On flush time, > this causes a PANIC. From what I can see, this failure scenario can > only occur after a standby is promoted. Ooh, nice find and diagnosys. I can confirm that the test fails as you described without the code fix, and doesn't fail with it. I attach the same patch, with the test file put in its final place rather than as a patch. Due to recent xlog.c changes this need a bit of work to apply to back branches; I'll see about getting it in all branches soon. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "I'm impressed how quickly you are fixing this obscure issue. I came from MS SQL and it would be hard for me to put into words how much of a better job you all are doing on [PostgreSQL]." Steve Midgley, http://archives.postgresql.org/pgsql-sql/2008-08/msg0.php >From 7948d1acdcd25b5b425c7804fad0d46cfb4b14b0 Mon Sep 17 00:00:00 2001 From: "Sami Imseih (AWS)" Date: Tue, 22 Feb 2022 19:09:36 + Subject: [PATCH v2] Fix "missing continuation record" after standby promotion Fix a condition where a recently promoted standby attempts to write an OVERWRITE_RECORD with an LSN of the previously read aborted record. Author: Sami Imseih Discussion: https://postgr.es/m/44d259de-7542-49c4-8a52-2ab01534d...@amazon.com --- src/backend/access/transam/xlog.c | 16 ++- .../t/029_overwrite_contrecord_promotion.pl | 111 ++ 2 files changed, 126 insertions(+), 1 deletion(-) create mode 100644 src/test/recovery/t/029_overwrite_contrecord_promotion.pl diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 0d2bd7a357..62e942f41a 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -5423,11 +5423,25 @@ StartupXLOG(void) * made it through and start writing after the portion that persisted. * (It's critical to first write an OVERWRITE_CONTRECORD message, which * we'll do as soon as we're open for writing new WAL.) + * + * If the last wal record is ahead of the missing contrecord, this is a + * recently promoted primary and we should not write an overwrite + * contrecord. */ if (!XLogRecPtrIsInvalid(missingContrecPtr)) { Assert(!XLogRecPtrIsInvalid(abortedRecPtr)); - EndOfLog = missingContrecPtr; + if (endOfRecoveryInfo->lastRec < missingContrecPtr) + { + elog(DEBUG2, "setting end of wal to missing continuation record %X/%X", + LSN_FORMAT_ARGS(missingContrecPtr)); + EndOfLog = missingContrecPtr; + } + else + { + elog(DEBUG2, "resetting aborted record"); + abortedRecPtr = InvalidXLogRecPtr; + } } /* diff --git a/src/test/recovery/t/029_overwrite_contrecord_promotion.pl b/src/test/recovery/t/029_overwrite_contrecord_promotion.pl new file mode 100644 index 00..ea4ebb32c0 --- /dev/null +++ b/src/test/recovery/t/029_overwrite_contrecord_promotion.pl @@ -0,0 +1,111 @@ +# Copyright (c) 2021-2022, PostgreSQL Global Development Group + +# Tests for resetting the "aborted record" after a promotion. + +use strict; +use warnings; + +use FindBin; +use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Utils; +use Test::More; + +# Test: Create a physical replica that's missing the last WAL file, +# then restart the primary to create a divergent WAL file and observe +# that the replica resets the "aborted record" after a promotion. + +my $node = PostgreSQL::Test::Cluster->new('primary'); +$node->init(allows_streaming => 1); +# We need these settings for stability of WAL behavior. +$node->append_conf( + 'postgresql.conf', qq( +autovacuum = off +wal_keep_size = 1GB +log_min_messages = DEBUG2 +)); +$node->start; + +$node->safe_psql('postgres', 'create table filler (a int, b text)'); + +# Now consume all remaining room in the current WAL segment, leaving +# space enough only for the start of a largish record. +$node->safe_psql( + 'postgres', q{ +DO $$ +DECLARE +wal_segsize int := setting::int FROM pg_settings WHERE name = 'wal_segment_size'; +remain int; +iters int := 0; +BEGIN +LOOP +INSERT into filler +select g, repeat(md5(g::text), (random() * 60 + 1)::int) +from generate_series(1, 10) g; + +remain := wal_segsize - (pg_current_wal_insert_lsn() - '0/0') % wal_segsize; +IF remain < 2 * setting::int from pg_settings where name = 'block_size' THEN +RAISE log 'exiting after % iterations, % bytes to end of WAL segment', iters, remain; +EXIT; +END IF; +iters := iters + 1; +END LOOP; +END +$$; +}); + +
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
On 2/22/22 01:36, Fujii Masao wrote: > > > On 2022/02/18 22:28, Tomas Vondra wrote: >> Hi, >> >> here's a slightly updated version of the patch series. > > Thanks for updating the patches! > > >> The 0001 part >> adds tracking of server_version_num, so that it's possible to enable >> other features depending on it. > > Like configure_remote_session() does, can't we use PQserverVersion() > instead of implementing new function GetServerVersion()? > Ah! My knowledge of libpq is limited, so I wasn't sure this function exists. It'll simplify the patch. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: bailing out in tap tests nearly always a bad idea
On 2/22/22 13:19, Andres Freund wrote: > Hi, > > On 2022-02-22 09:28:37 -0800, Andres Freund wrote: >> On 2022-02-14 09:46:20 -0800, Andres Freund wrote: t/die-immediately.t aieee at t/die-immediately.t line 1. t/die-immediately.t Dubious, test returned 255 (wstat 65280, 0xff00) No subtests run >>> Hm. Looks different when a test is including our test helpers. >>> >>> t/000_die_pg_test_utils.pl .. Dubious, test returned 25 (wstat 6400, 0x1900) >>> No subtests run >>> t/000_die_test_more.pl .. error at t/000_die_test_more.pl line 2. >>> t/000_die_test_more.pl .. Dubious, test returned 255 (wstat 65280, >>> 0xff00) >>> No subtests run >>> >>> So I think we broke something... I think it's the log file redirection stuff >>> in INIT. >> Any chance somebody with more perl knowledge could look into this? Clearly >> our >> effort to copy all the output the original file descriptors isn't successful >> here. >> >> $ perl -I /home/andres/src/postgresql/src/test/perl/ -e "use >> PostgreSQL::Test::Utils; die 'blorb';" >> $ perl -I /home/andres/src/postgresql/src/test/perl/ -e "use Test::More; die >> 'blorb';" >> blorb at -e line 1. > So the problem is that die just outputs to stderr, but we've decided to have > stderr connected to the output going to tap for some reason. I guess that > prevents us from messing up the tap output, but it's still weird. Because of > that redirection, die printing to stderr isn't visible to tap. > > Seems we can use black magic to "fix" that... Putting the following into > INIT{} seems to do the trick: > > # Because of the above redirection the tap output wouldn't contain > # information about tests failing due to die etc. Fix that by also > # printing the failure to the original stderr. > $SIG{__DIE__} = sub > { > # Don't redirect if it's a syntax error ($^S is undefined) or > in an > # eval block ($^S == 1). > if (defined $^S and $^S == 0) > { > print $orig_stderr "$_[0]\n"; > #fail("died: $_[0]"); > #done_testing(); > } > }; > > > $ cat /tmp/die_pg_utils.pl > use PostgreSQL::Test::Utils; > use Test::More; > > ok(1, "foo"); > die 'blorb'; > done_testing(); > > With the print we get something like: > > $ perl -I /home/andres/src/postgresql/src/test/perl/ /tmp/die_pg_utils.pl > ok 1 - foo > blorb at /tmp/die_pg_utils.pl line 5. > > # Tests were run but no plan was declared and done_testing() was not seen. > # Looks like your test exited with 25 just after 1. > > With the fail() and done_testing() we get > > $ perl -I /home/andres/src/postgresql/src/test/perl/ /tmp/die_pg_utils.pl > ok 1 - foo > not ok 2 - died: blorb at /tmp/die_pg_utils.pl line 5. > # > # Failed test 'died: blorb at /tmp/die_pg_utils.pl line 5. > # ' > # at /home/andres/src/postgresql/src/test/perl/PostgreSQL/Test/Utils.pm > line 235. > 1..2 > # Looks like your test exited with 25 just after 2. > > > The latter output doesn't confuse with stuff about plans and exit codes. But > contains redundant messages (whatever) and it doesn't seem particularly clean > to hijack a "test number". > I'll be surprised if we can't come up with something cleaner than that. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: remove more archiving overhead
On Tue, Feb 22, 2022 at 09:37:11AM -0800, Nathan Bossart wrote: > In my testing, I found that when I killed the server just before unlink() > during WAL recyling, I ended up with links to the same file in pg_wal after > restarting. My latest test produced links to the same file for the current > WAL file and the next one. Maybe WAL recyling should use durable_rename() > instead of durable_rename_excl(). Here is an updated patch. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From e080d493985cf0a203122c1e07952b0f765019e4 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 22 Feb 2022 11:39:28 -0800 Subject: [PATCH v2 1/1] reduce archiving overhead --- src/backend/access/transam/xlog.c | 10 ++ src/backend/postmaster/pgarch.c | 14 +- 2 files changed, 19 insertions(+), 5 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 0d2bd7a357..2ad047052f 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -3334,13 +3334,15 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath, } /* - * Perform the rename using link if available, paranoidly trying to avoid - * overwriting an existing file (there shouldn't be one). + * Perform the rename. Ideally, we'd use link() and unlink() to avoid + * overwriting an existing file (there shouldn't be one). However, that + * approach opens up the possibility that pg_wal will contain multiple hard + * links to the same WAL file after a crash. */ - if (durable_rename_excl(tmppath, path, LOG) != 0) + if (durable_rename(tmppath, path, LOG) != 0) { LWLockRelease(ControlFileLock); - /* durable_rename_excl already emitted log message */ + /* durable_rename already emitted log message */ return false; } diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index d916ed39a8..641297e9f5 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -746,7 +746,19 @@ pgarch_archiveDone(char *xlog) StatusFilePath(rlogready, xlog, ".ready"); StatusFilePath(rlogdone, xlog, ".done"); - (void) durable_rename(rlogready, rlogdone, WARNING); + + /* + * To avoid extra overhead, we don't durably rename the .ready file to + * .done. Archive commands and libraries must already gracefully handle + * attempts to re-archive files (e.g., if the server crashes just before + * this function is called), so it should be okay if the .ready file + * reappears after a crash. + */ + if (rename(rlogready, rlogdone) < 0) + ereport(WARNING, +(errcode_for_file_access(), + errmsg("could not rename file \"%s\" to \"%s\": %m", + rlogready, rlogdone))); } -- 2.25.1
[BUG] Panic due to incorrect missingContrecPtr after promotion
On 13.5 a wal flush PANIC is encountered after a standby is promoted. With debugging, it was found that when a standby skips a missing continuation record on recovery, the missingContrecPtr is not invalidated after the record is skipped. Therefore, when the standby is promoted to a primary it writes an overwrite_contrecord with an LSN of the missingContrecPtr, which is now in the past. On flush time, this causes a PANIC. From what I can see, this failure scenario can only occur after a standby is promoted. The overwrite_contrecord was introduced in 13.5 with https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ff9f111bce24. Attached is a patch and a TAP test to handle this condition. The patch ensures that an overwrite_contrecord is only created if the missingContrecPtr is ahead of the last wal record. To reproduce: Run the new tap test recovery/t/029_overwrite_contrecord_promotion.pl without the attached patch 2022-02-22 18:38:15.526 UTC [31138] LOG: started streaming WAL from primary at 0/200 on timeline 1 2022-02-22 18:38:15.535 UTC [31105] LOG: successfully skipped missing contrecord at 0/1FFC620, overwritten at 2022-02-22 18:38:15.136482+00 2022-02-22 18:38:15.535 UTC [31105] CONTEXT: WAL redo at 0/228 for XLOG/OVERWRITE_CONTRECORD: lsn 0/1FFC620; time 2022-02-22 18:38:15.136482+00 … ….. 2022-02-22 18:38:15.575 UTC [31103] PANIC: xlog flush request 0/201EC70 is not satisfied --- flushed only to 0/288 2022-02-22 18:38:15.575 UTC [31101] LOG: checkpointer process (PID 31103) was terminated by signal 6: Aborted …. ….. With the patch, running the same tap test succeeds and a PANIC is not observed. Thanks Sami Imseih Amazon Web Services 0001-Fix-missing-continuation-record-after-standby-promot.patch Description: 0001-Fix-missing-continuation-record-after-standby-promot.patch
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
On Tue, 22 Feb 2022 at 07:39, Nitin Jadhav wrote: > > > > Thank you for sharing the information. 'triggering backend PID' (int) > > > - can be stored without any problem. 'checkpoint or restartpoint?' > > > (boolean) - can be stored as a integer value like > > > PROGRESS_CHECKPOINT_TYPE_CHECKPOINT(0) and > > > PROGRESS_CHECKPOINT_TYPE_RESTARTPOINT(1). 'elapsed time' (store as > > > start time in stat_progress, timestamp fits in 64 bits) - As > > > Timestamptz is of type int64 internally, so we can store the timestamp > > > value in the progres parameter and then expose a function like > > > 'pg_stat_get_progress_checkpoint_elapsed' which takes int64 (not > > > Timestamptz) as argument and then returns string representing the > > > elapsed time. > > > > No need to use a string there; I think exposing the checkpoint start > > time is good enough. The conversion of int64 to timestamp[tz] can be > > done in SQL (although I'm not sure that exposing the internal bitwise > > representation of Interval should be exposed to that extent) [0]. > > Users can then extract the duration interval using now() - start_time, > > which also allows the user to use their own preferred formatting. > > The reason for showing the elapsed time rather than exposing the > timestamp directly is in case of checkpoint during shutdown and > end-of-recovery, I am planning to log a message in server logs using > 'log_startup_progress_interval' infrastructure which displays elapsed > time. So just to match both of the behaviour I am displaying elapsed > time here. I feel that elapsed time gives a quicker feel of the > progress. Kindly let me know if you still feel just exposing the > timestamp is better than showing the elapsed time. At least for pg_stat_progress_checkpoint, storing only a timestamp in the pg_stat storage (instead of repeatedly updating the field as a duration) seems to provide much more precise measures of 'time elapsed' for other sessions if one step of the checkpoint is taking a long time. I understand the want to integrate the log-based reporting in the same API, but I don't think that is necessarily the right approach: pg_stat_progress_* has low-overhead infrastructure specifically to ensure that most tasks will not run much slower while reporting, never waiting for locks. Logging, however, needs to take locks (if only to prevent concurrent writes to the output file at a kernel level) and thus has a not insignificant overhead and thus is not very useful for precise and very frequent statistics updates. So, although similar in nature, I don't think it is smart to use the exact same infrastructure between pgstat_progress*-based reporting and log-based progress reporting, especially if your logging-based progress reporting is not intended to be a debugging-only configuration option similar to log_min_messages=DEBUG[1..5]. - Matthias
Re: bailing out in tap tests nearly always a bad idea
Hi, On 2022-02-22 09:28:37 -0800, Andres Freund wrote: > On 2022-02-14 09:46:20 -0800, Andres Freund wrote: > > > t/die-immediately.t aieee at t/die-immediately.t line 1. > > > t/die-immediately.t Dubious, test returned 255 (wstat 65280, 0xff00) > > > No subtests run > > > > Hm. Looks different when a test is including our test helpers. > > > > t/000_die_pg_test_utils.pl .. Dubious, test returned 25 (wstat 6400, 0x1900) > > No subtests run > > t/000_die_test_more.pl .. error at t/000_die_test_more.pl line 2. > > t/000_die_test_more.pl .. Dubious, test returned 255 (wstat 65280, > > 0xff00) > > No subtests run > > > > So I think we broke something... I think it's the log file redirection stuff > > in INIT. > > Any chance somebody with more perl knowledge could look into this? Clearly our > effort to copy all the output the original file descriptors isn't successful > here. > > $ perl -I /home/andres/src/postgresql/src/test/perl/ -e "use > PostgreSQL::Test::Utils; die 'blorb';" > $ perl -I /home/andres/src/postgresql/src/test/perl/ -e "use Test::More; die > 'blorb';" > blorb at -e line 1. So the problem is that die just outputs to stderr, but we've decided to have stderr connected to the output going to tap for some reason. I guess that prevents us from messing up the tap output, but it's still weird. Because of that redirection, die printing to stderr isn't visible to tap. Seems we can use black magic to "fix" that... Putting the following into INIT{} seems to do the trick: # Because of the above redirection the tap output wouldn't contain # information about tests failing due to die etc. Fix that by also # printing the failure to the original stderr. $SIG{__DIE__} = sub { # Don't redirect if it's a syntax error ($^S is undefined) or in an # eval block ($^S == 1). if (defined $^S and $^S == 0) { print $orig_stderr "$_[0]\n"; #fail("died: $_[0]"); #done_testing(); } }; $ cat /tmp/die_pg_utils.pl use PostgreSQL::Test::Utils; use Test::More; ok(1, "foo"); die 'blorb'; done_testing(); With the print we get something like: $ perl -I /home/andres/src/postgresql/src/test/perl/ /tmp/die_pg_utils.pl ok 1 - foo blorb at /tmp/die_pg_utils.pl line 5. # Tests were run but no plan was declared and done_testing() was not seen. # Looks like your test exited with 25 just after 1. With the fail() and done_testing() we get $ perl -I /home/andres/src/postgresql/src/test/perl/ /tmp/die_pg_utils.pl ok 1 - foo not ok 2 - died: blorb at /tmp/die_pg_utils.pl line 5. # # Failed test 'died: blorb at /tmp/die_pg_utils.pl line 5. # ' # at /home/andres/src/postgresql/src/test/perl/PostgreSQL/Test/Utils.pm line 235. 1..2 # Looks like your test exited with 25 just after 2. The latter output doesn't confuse with stuff about plans and exit codes. But contains redundant messages (whatever) and it doesn't seem particularly clean to hijack a "test number". Greetings, Andres Freund
Re: Documentation about PL transforms
On 02/21/22 16:09, Chapman Flack wrote: > On 02/07/22 15:14, Chapman Flack wrote: >> I'll work on some doc patches. > > It seems a bit of an impedance mismatch that there is a get_func_trftypes > producing a C Oid[] (with its length returned separately), while > get_transform_fromsql and get_transform_tosql both expect a List. > ... > Would it be reasonable to deprecate get_func_trftypes and instead > provide a get_call_trftypes It would have been painful to write documentation of get_func_trftypes saying its result isn't what get_transform_{from.to}sql expect, so patch 1 does add a get_call_trftypes that returns a List *. Patch 2 then updates the docs as discussed in this thread. It turned out plhandler.sgml was kind of a long monolith of text even before adding transform information, so I broke it into sections first. This patch adds the section markup without reindenting, so the changes aren't obscured. The chapter had also fallen behind the introduction of procedures, so I have changed many instances of 'function' to the umbrella term 'routine'. Patch 3 simply reindents for the new section markup and rewraps. Whitespace only. Patch 4 updates src/test/modules/plsample to demonstrate handling of transforms (and to add some more comments generally). I'll add this to the CF. Regards, -Chap >From 7e6027cdcf1dbf7a10c0bad5059816cef762b1b9 Mon Sep 17 00:00:00 2001 From: Chapman Flack Date: Mon, 21 Feb 2022 20:56:28 -0500 Subject: [PATCH 1/4] Warmup: add a get_call_trftypes function The existing get_func_trftypes function produces an Oid[], where both existing get_transform_{from,to}sql functions that depend on the result expect a List*. Rather than writing documentation awkwardly describing functions that won't play together, add a get_call_trftypes function that returns List*. (The name get_call_... to distinguish from get_func_... follows the naming used in funcapi.h for a function returning information about either a function or a procedure.) --- src/backend/utils/cache/lsyscache.c | 18 ++ src/include/funcapi.h | 5 + src/include/utils/lsyscache.h | 1 + 3 files changed, 24 insertions(+) diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index feef999..a49ccad 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -2107,6 +2107,24 @@ get_transform_tosql(Oid typid, Oid langid, List *trftypes) return InvalidOid; } +/* + * get_call_trftypes + * + * A helper function that does not itself query the transform cache, but + * constructs the transform-type List expected by the functions above. + */ +List * +get_call_trftypes(HeapTuple procTup) +{ + Datum protrftypes; + bool isNull; + + protrftypes = SysCacheGetAttr(PROCOID, procTup, + Anum_pg_proc_protrftypes, + &isNull); + return isNull ? NIL : oid_array_to_list(protrftypes); +} + /*-- TYPE CACHE -- */ diff --git a/src/include/funcapi.h b/src/include/funcapi.h index ba927c2..7c61560 100644 --- a/src/include/funcapi.h +++ b/src/include/funcapi.h @@ -175,7 +175,12 @@ extern int get_func_arg_info(HeapTuple procTup, extern int get_func_input_arg_names(Datum proargnames, Datum proargmodes, char ***arg_names); +/* + * A deprecated earlier version of get_call_trftypes (in lsyscache.h). + * That version produces a List, which is the form downstream functions expect. + */ extern int get_func_trftypes(HeapTuple procTup, Oid **p_trftypes); + extern char *get_func_result_name(Oid functionId); extern TupleDesc build_function_result_tupdesc_d(char prokind, diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index b8dd27d..93b19e7 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -139,6 +139,7 @@ extern char get_rel_relkind(Oid relid); extern bool get_rel_relispartition(Oid relid); extern Oid get_rel_tablespace(Oid relid); extern char get_rel_persistence(Oid relid); +extern List *get_call_trftypes(HeapTuple procTup); extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes); extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes); extern bool get_typisdefined(Oid typid); -- 2.7.3 >From 447a4aaecdf3a23a47c11ad741b49f81475e34e3 Mon Sep 17 00:00:00 2001 From: Chapman Flack Date: Mon, 21 Feb 2022 20:59:32 -0500 Subject: [PATCH 2/4] Update PL handler implementation docs The original purpose was to add information on support for CREATE TRANSFORM (which must be explicitly coded in any PL implementation intending to support it). But the plhandler section was about as long as a monolith of text ought to be, even before adding transform information, so reorganized first into sections. Front-loaded with short descriptions of the three possible functions (call handler, validator, inline handler) registered with CREATE LANGUAGE. The latter two were afterthoughts in historical sequence, but the
Re: remove more archiving overhead
On Mon, Feb 21, 2022 at 05:19:48PM -0800, Nathan Bossart wrote: > I also spent some time investigating whether durably renaming the archive > status files was even necessary. In theory, it shouldn't be. If a crash > happens before the rename is persisted, we might try to re-archive files, > but your archive_command/archive_library should handle that. If the file > was already recycled or removed, the archiver will skip it (thanks to > 6d8727f). But when digging further, I found that WAL file recyling uses > durable_rename_excl(), which has the following note: > >* Note that a crash in an unfortunate moment can leave you with two > links to >* the target file. > > IIUC this means that in theory, a crash at an unfortunate moment could > leave you with a .ready file, the file to archive, and another link to that > file with a "future" WAL filename. If you re-archive the file after it has > been reused, you could end up with corrupt WAL archives. I think this > might already be possible today. Syncing the directory after every rename > probably reduces the likelihood quite substantially, but if > RemoveOldXlogFiles() quickly picks up the .done file and attempts > recycling before durable_rename() calls fsync() on the directory, > presumably the same problem could occur. In my testing, I found that when I killed the server just before unlink() during WAL recyling, I ended up with links to the same file in pg_wal after restarting. My latest test produced links to the same file for the current WAL file and the next one. Maybe WAL recyling should use durable_rename() instead of durable_rename_excl(). -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: bailing out in tap tests nearly always a bad idea
Hi, On 2022-02-14 09:46:20 -0800, Andres Freund wrote: > > t/die-immediately.t aieee at t/die-immediately.t line 1. > > t/die-immediately.t Dubious, test returned 255 (wstat 65280, 0xff00) > > No subtests run > > Hm. Looks different when a test is including our test helpers. > > t/000_die_pg_test_utils.pl .. Dubious, test returned 25 (wstat 6400, 0x1900) > No subtests run > t/000_die_test_more.pl .. error at t/000_die_test_more.pl line 2. > t/000_die_test_more.pl .. Dubious, test returned 255 (wstat 65280, 0xff00) > No subtests run > > So I think we broke something... I think it's the log file redirection stuff > in INIT. Any chance somebody with more perl knowledge could look into this? Clearly our effort to copy all the output the original file descriptors isn't successful here. $ perl -I /home/andres/src/postgresql/src/test/perl/ -e "use PostgreSQL::Test::Utils; die 'blorb';" $ perl -I /home/andres/src/postgresql/src/test/perl/ -e "use Test::More; die 'blorb';" blorb at -e line 1. (note that the former will create a tmp_check in your current directory) Greetings, Andres Freund
Re: TAP output format in pg_regress
Hi, Thanks for the updated version! On 2022-02-22 15:10:11 +0100, Daniel Gustafsson wrote: > The errorpaths that exit(2) the testrun should be converted to "bail out" > lines > when running with TAP output, but apart from that I think it's fairly spec > compliant. I'd much rather not use BAIL - I haven't gotten around to doing anything about it, but I really want to get rid of nearly all our uses of bail: https://www.postgresql.org/message-id/20220213232249.7sevhlioapydla37%40alap3.anarazel.de Greetings, Andres Freund
Re: CREATEROLE and role ownership hierarchies
On Thu, Feb 17, 2022 at 12:40 PM Robert Haas wrote: > > On Mon, Jan 31, 2022 at 1:57 PM Joshua Brindle > wrote: > > This is precisely the use case I am trying to accomplish with this > > patchset, roughly: > > > > - An automated bot that creates users and adds them to the employees role > > - Bot cannot access any employee (or other roles) table data > > - Bot cannot become any employee > > - Bot can disable the login of any employee > > > > Yes there are attack surfaces around the fringes of login, etc but > > those can be mitigated with certificate authentication. My pg_hba > > would require any role in the employees role to use cert auth. > > > > This would adequately mitigate many threats while greatly enhancing > > user management. > > So, where do we go from here? > > I've been thinking about this comment a bit. On the one hand, I have > some reservations about the phrase "the use case I am trying to > accomplish with this patchset," because in the end, this is not your > patch set. It's not reasonable to complain that a patch someone else > wrote doesn't solve your problem; of course everyone writes patches to > solve their own problems, or those of their employer, not other > people's problems. And that's as it should be, else we will have few > contributors. On the other hand, to the extent that this patch set > makes things worse for a reasonable use case which you have in mind, > that's an entirely legitimate complaint. Yes, absolutely. It is my understanding that generally a community consensus is attempted, I was throwing my (and Crunchy's) use case out there as a possible goal, and I have spent time reviewing and testing the patch, so I think that is fair. Obviously I am not in the position to stipulate hard requirements. > After a bit of testing, it seems to me that as things stand today, > things are nearly perfect for the use case that you have in mind. I > would be interested to know whether you agree. If I set up an account > and give it CREATEROLE, it can create users, and it can put them into > the employees role, but it can't SET ROLE to any of those accounts. It > can also ALTER ROLE ... NOLOGIN on any of those accounts. The only gap > I see is that there are certain role-based flags which the CREATEROLE > account cannot set: SUPERUSER, BYPASSRLS, REPLICATION. You might > prefer a system where your bot account had the option to grant those > privileges also, and I think that's a reasonable thing to want. I believe the only issue in the existing patchset was that membership was required in employees was required for the Bot, but I can apply the current patchset and test it out more in a bit. > However, I *also* think it's reasonable to want an account that can > create roles but can't give to those roles membership in roles that it > does not itself possess. Likewise, I think it's reasonable to want an > account that can only drop roles which that account itself created. > These kinds of requirements stem from a different use case than what > you are talking about here, but they seem like fine things to want, > and as far as I know we have pretty broad agreement that they are > reasonable. It seems extremely difficult to make a convincing argument > that this is not a thing which anyone should want to block: > > rhaas=> create role bob role pg_execute_server_program; > CREATE ROLE > > Honestly, that seems like a big yikes from here. How is it OK to block > "create role bob superuser" yet allow that command? I'm inclined to > think that's just broken. Even if the role were pg_write_all_data > rather than pg_execute_server_program, it's still a heck of a lot of > power to be handing out, and I don't see how anyone could make a > serious argument that we shouldn't have an option to restrict that. Yes, agreed 100%. To be clear, I do not want Bot in the above use case to be able to add any role other than employees to new roles it creates. So we are in complete agreement there, the only difference is that I do not want Bot to be able to become those roles (or use any access granted via those roles), it's only job is to manage roles, not look at data. > Let me separate the two features that I just mentioned and talk about > them individually: > > 1. Don't allow a CREATEROLE user to give out membership in groups > which that user does not possess. Leaving aside the details of any > previously-proposed patches and just speaking theoretically, how can > this be implemented? I can think of a few ideas. We could (1A) just > change CREATEROLE to work that way, but IIUC that would break the use > case you outline here, so I guess that's off the table unless I am > misunderstanding the situation. We could also (1B) add a second role > attribute with a different name, like, err, CREATEROLEWEAKLY, that > behaves in that way, leaving the existing one untouched. But we could > also take it a lot further, because someone might want to let an > account hand out a set of privileges which correspon
Re: Patch a potential memory leak in describeOneTableDetails()
Kyotaro Horiguchi writes: > Anyway, don't we use the -ftabstop option flag to silence compiler? > The warning is contradicting our convention. The attached adds that > flag. No, we use pgindent to not have such inconsistent indentation. regards, tom lane
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
I'm not sure about the current status, but found it while playing around with the latest changes a bit, so thought of sharing it here. + + strategy + + + This is used for copying the database directory. Currently, we have + two strategies the WAL_LOG_BLOCK and the Isn't it wal_log instead of wal_log_block? I think when users input wrong strategy with createdb command, we should provide a hint message showing allowed values for strategy types along with an error message. This will be helpful for the users. On Tue, Feb 15, 2022 at 5:19 PM Dilip Kumar wrote: > > On Tue, Feb 15, 2022 at 2:01 AM Maciek Sakrejda wrote: > > > Here is the updated version of the patch, the changes are 1) Fixed > review comments given by Robert and one open comment from Ashutosh. > 2) Preserved the old create db method. 3) As agreed upthread for now > we are using the new strategy only for createdb not for movedb so I > have removed the changes in ForgetDatabaseSyncRequests() and > DropDatabaseBuffers(). 3) Provided a database creation strategy > option as of now I have kept it as below. > > CREATE DATABASE ... WITH (STRATEGY = WAL_LOG); -- default if > option is omitted > CREATE DATABASE ... WITH (STRATEGY = FILE_COPY); > > I have updated the document but I was not sure how much internal > information to be exposed to the user so I will work on that based on > feedback from others. > > -- > Regards, > Dilip Kumar > EnterpriseDB: http://www.enterprisedb.com
Re: Design of pg_stat_subscription_workers vs pgstats
On Tue, Feb 22, 2022 at 9:22 PM osumi.takami...@fujitsu.com wrote: > > On Tuesday, February 22, 2022 2:45 PM Masahiko Sawada > wrote: > > I've attached a patch that changes pg_stat_subscription_workers view. > > It removes non-cumulative values such as error details such as error-XID and > > the error message from the view, and consequently the view now has only > > cumulative statistics counters: apply_error_count and sync_error_count. > > Since > > the new view name is under discussion I temporarily chose > > pg_stat_subscription_activity. > Hi, thank you for sharing the patch. > > > Few minor comments for v1. Thank you for the comments! > > (1) commit message's typo > > This commits changes the view so that it stores only statistics > counters: apply_error_count and sync_error_count. > > "This commits" -> "This commit" Will fix. > > (2) minor improvement suggestion for the commit message > > I suggest that we touch the commit id 8d74fc9 > that introduces the pg_stat_subscription_workers > in the commit message, for better traceability. Below is an example. > > From: > As the result of the discussion, we've concluded that the stats > collector is not an appropriate place to store the error information of > subscription workers. > > To: > As the result of the discussion about the view introduced by 8d74fc9,... Okay, will add the commit reference. > > (3) doc/src/sgml/logical-replication.sgml > > Kindly refer to commit id 85c61ba for the detail. > You forgot "the" in the below sentence. > > @@ -346,8 +346,6 @@ > > A conflict will produce an error and will stop the replication; it must be > resolved manually by the user. Details about the conflict can be found in > - > - pg_stat_subscription_workers and the > subscriber's server log. > > > From: > subscriber's server log. > to: > the subscriber's server log. Will fix. > > (4) doc/src/sgml/monitoring.sgml > > > > - last_error_time timestamp with time > zone > + sync_error_count uint8 > > > - Last time at which this error occurred > + Number of times the error occurred during the initial data copy > > > I supposed it might be better to use "initial data sync" > or "initial data synchronization", rather than "initial data copy". "Initial data synchronization" sounds like the whole table synchronization process including COPY and applying changes to catch up. But sync_error_count is incremented only during COPY so I used "initial data copy". What do you think? > > (5) src/test/subscription/t/026_worker_stats.pl > > +# Truncate test_tab1 so that table sync can continue. > +$node_subscriber->safe_psql('postgres', "TRUNCATE test_tab1;"); > > The second truncate is for apply, isn't it? Therefore, kindly change > > From: > Truncate test_tab1 so that table sync can continue. > To: > Truncate test_tab1 so that apply can continue. Right, will fix. > > (6) src/test/subscription/t/026_worker_stats.pl > > +# Insert more data to test_tab1 on the subscriber and then on the publisher, > raising an > +# error on the subscriber due to violation of the unique constraint on > test_tab1. > +$node_subscriber->safe_psql('postgres', "INSERT INTO test_tab1 VALUES (2)"); > > Did we need this insert ? > If you want to indicate the apply is working okay after the error of table > sync is solved, > waiting for the max value in the test_tab1 becoming 2 on the subscriber by > polling query > would work. But, I was not sure if this is essentially necessary for the > testing purpose. You're right, it's not necessary. Also, it seems better to change the TAP test file name from 026_worker_stats.pl to 026_stats.pl. Will incorporate these changes. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
+/* Kinds of checkpoint (as advertised via PROGRESS_CHECKPOINT_KIND) */ +#define PROGRESS_CHECKPOINT_KIND_WAL0 +#define PROGRESS_CHECKPOINT_KIND_TIME 1 +#define PROGRESS_CHECKPOINT_KIND_FORCE 2 +#define PROGRESS_CHECKPOINT_KIND_UNKNOWN3 On what basis have you classified the above into the various types of checkpoints? AFAIK, the first two types are based on what triggered the checkpoint (whether it was the checkpoint_timeout or maz_wal_size settings) while the third type indicates the force checkpoint that can happen when the checkpoint is triggered for various reasons e.g. . during createb or dropdb etc. This is quite possible that both the PROGRESS_CHECKPOINT_KIND_TIME and PROGRESS_CHECKPOINT_KIND_FORCE flags are set for the checkpoint because multiple checkpoint requests are processed at one go, so what type of checkpoint would that be? +*/ + if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY)) == 0) + { + pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT, InvalidOid); + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_PHASE, + PROGRESS_CHECKPOINT_PHASE_INIT); + if (flags & CHECKPOINT_CAUSE_XLOG) + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, + PROGRESS_CHECKPOINT_KIND_WAL); + else if (flags & CHECKPOINT_CAUSE_TIME) + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, + PROGRESS_CHECKPOINT_KIND_TIME); + else if (flags & CHECKPOINT_FORCE) + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, + PROGRESS_CHECKPOINT_KIND_FORCE); + else + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, + PROGRESS_CHECKPOINT_KIND_UNKNOWN); + } +} -- With Regards, Ashutosh Sharma. On Thu, Feb 10, 2022 at 12:23 PM Nitin Jadhav wrote: > > > > We need at least a trace of the number of buffers to sync (num_to_scan) > > > before the checkpoint start, instead of just emitting the stats at the > > > end. > > > > > > Bharat, it would be good to show the buffers synced counter and the total > > > buffers to sync, checkpointer pid, substep it is running, whether it is > > > on target for completion, checkpoint_Reason > > > (manual/times/forced). BufferSync has several variables tracking the sync > > > progress locally, and we may need some refactoring here. > > > > I agree to provide above mentioned information as part of showing the > > progress of current checkpoint operation. I am currently looking into > > the code to know if any other information can be added. > > Here is the initial patch to show the progress of checkpoint through > pg_stat_progress_checkpoint view. Please find the attachment. > > The information added to this view are pid - process ID of a > CHECKPOINTER process, kind - kind of checkpoint indicates the reason > for checkpoint (values can be wal, time or force), phase - indicates > the current phase of checkpoint operation, total_buffer_writes - total > number of buffers to be written, buffers_processed - number of buffers > processed, buffers_written - number of buffers written, > total_file_syncs - total number of files to be synced, files_synced - > number of files synced. > > There are many operations happen as part of checkpoint. For each of > the operation I am updating the phase field of > pg_stat_progress_checkpoint view. The values supported for this field > are initializing, checkpointing replication slots, checkpointing > snapshots, checkpointing logical rewrite mappings, checkpointing CLOG > pages, checkpointing CommitTs pages, checkpointing SUBTRANS pages, > checkpointing MULTIXACT pages, checkpointing SLRU pages, checkpointing > buffers, performing sync requests, performing two phase checkpoint, > recycling old XLOG files and Finalizing. In case of checkpointing > buffers phase, the fields total_buffer_writes, buffers_processed and > buffers_written shows the detailed progress of writing buffers. In > case of performing sync requests phase, the fields total_file_syncs > and files_synced shows the detailed progress of syncing files. In > other phases, only the phase field is getting updated and it is > difficult to show the progress because we do not get the total number > of files count without traversing the directory. It is not worth to > calculate that as it affects the performance of the checkpoint. I also > gave a thought to just mention the number of files processed, but this > wont give a meaningful progress information (It can be treated as > statistics). Hence just updating the phase field in those scenarios. > > Apart from above fields, I am planning to add few more fields to the > view in the next patch. That is, process ID of the backend process > which triggered
Re: Design of pg_stat_subscription_workers vs pgstats
On Tue, Feb 22, 2022 at 6:53 PM Amit Kapila wrote: > > On Tue, Feb 22, 2022 at 11:15 AM Masahiko Sawada > wrote: > > > > I've attached a patch that changes pg_stat_subscription_workers view. > > It removes non-cumulative values such as error details such as > > error-XID and the error message from the view, and consequently the > > view now has only cumulative statistics counters: apply_error_count > > and sync_error_count. Since the new view name is under discussion I > > temporarily chose pg_stat_subscription_activity. > > > > Few comments: > = Thank you for the comments! > 1. > --- a/src/backend/catalog/system_functions.sql > +++ b/src/backend/catalog/system_functions.sql > @@ -637,11 +637,9 @@ REVOKE EXECUTE ON FUNCTION > pg_stat_reset_single_table_counters(oid) FROM public; > > REVOKE EXECUTE ON FUNCTION > pg_stat_reset_single_function_counters(oid) FROM public; > > -REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public; > - > -REVOKE EXECUTE ON FUNCTION pg_stat_reset_subscription_worker(oid) FROM > public; > +REVOKE EXECUTE ON FUNCTION > pg_stat_reset_single_subscription_counters(oid) FROM public; > > -REVOKE EXECUTE ON FUNCTION pg_stat_reset_subscription_worker(oid, > oid) FROM public; > +REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public; > > Is there a need to change anything about > pg_stat_reset_replication_slot() in this patch? It doesn't change pg_stat_reset_replication_slot() but just changes the order in order to put the modified function pg_stat_reset_single_subscription_counters() closer to other similar functions such as pg_stat_reset_single_function_counters(). > > 2. Do we still need to use LATERAL in the view's query? There are some functions that use LATERAL in a similar way but it seems no need to put LATERAL before the function call. Will remove. > 3. Can we send error stats pgstat_report_stat() as that will be called > via proc_exit() path. We can set the phase (apply/sync) in > apply_error_callback_arg and then use that to send the appropriate > message. I think this will obviate the need for try..catch. If we use pgstat_report_stat() to send subscription stats messages, all processes end up going through that path. It might not bring overhead in practice but I'd like to avoid it. And, since the apply worker also calls pgstat_report_stat() at the end of the transaction, we might need to change pgstat_report_stat() so that it doesn't send the subscription messages when it gets called at the end of the transaction. I think it's likely that PG_TRY() and PG_CATCH() wil be added for example, when the disable_on_error feature or the storing error details feature is introduced, so obviating the need for them at this point would not benefit much. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: psql - add SHOW_ALL_RESULTS option
I wrote a few more small tests for psql to address uncovered territory in SendQuery() especially: - \timing - client encoding handling - notifications What's still missing is: - \watch - pagers For \watch, I think one would need something like the current cancel test (since you need to get the psql pid to send a signal to stop the watch). It would work in principle, but it will require more work to refactor the cancel test. For pagers, I don't know. I would be pretty easy to write a simple script that acts as a pass-through pager and check that it is called. There were some discussions earlier in the thread that some version of some patch had broken some use of pagers. Does anyone remember details? Anything worth testing specifically?From 6e75c1bec73f2128b94131305e6a37b97257f7c3 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 22 Feb 2022 13:42:38 +0100 Subject: [PATCH 1/2] Improve some psql test code Split psql_like() into two functions psql_like() and psql_fails_like() and make them mirror the existing command_like() and command_fails_like() more closely. In particular, follow the universal convention that the test name is the last argument. --- src/bin/psql/t/001_basic.pl | 59 ++--- 1 file changed, 29 insertions(+), 30 deletions(-) diff --git a/src/bin/psql/t/001_basic.pl b/src/bin/psql/t/001_basic.pl index ba3dd846ba..f416e0ab5e 100644 --- a/src/bin/psql/t/001_basic.pl +++ b/src/bin/psql/t/001_basic.pl @@ -12,40 +12,36 @@ program_version_ok('psql'); program_options_handling_ok('psql'); -my ($stdout, $stderr); -my $result; - -# Execute a psql command and check its result patterns. +# Execute a psql command and check its output. sub psql_like { local $Test::Builder::Level = $Test::Builder::Level + 1; - my $node= shift; - my $test_name = shift; - my $query = shift; - my $expected_stdout = shift; - my $expected_stderr = shift; + my ($node, $sql, $expected_stdout, $test_name) = @_; + + my ($ret, $stdout, $stderr) = $node->psql('postgres', $sql); + + is($ret,0, "$test_name: exit code 0"); + is($stderr, '', "$test_name: no stderr"); + like($stdout, $expected_stdout, "$test_name: matches"); + + return; +} + +# Execute a psql command and check that it fails and check the stderr. +sub psql_fails_like +{ + local $Test::Builder::Level = $Test::Builder::Level + 1; - die "cannot specify both expected stdout and stderr here" - if (defined($expected_stdout) && defined($expected_stderr)); + my ($node, $sql, $expected_stderr, $test_name) = @_; # Use the context of a WAL sender, some of the tests rely on that. my ($ret, $stdout, $stderr) = $node->psql( - 'postgres', $query, - on_error_die => 0, + 'postgres', $sql, replication => 'database'); - if (defined($expected_stdout)) - { - is($ret,0, "$test_name: expected result code"); - is($stderr, '', "$test_name: no stderr"); - like($stdout, $expected_stdout, "$test_name: stdout matches"); - } - if (defined($expected_stderr)) - { - isnt($ret, 0, "$test_name: expected result code"); - like($stderr, $expected_stderr, "$test_name: stderr matches"); - } + isnt($ret, 0, "$test_name: exit code not 0"); + like($stderr, $expected_stderr, "$test_name: matches"); return; } @@ -53,6 +49,9 @@ sub psql_like # test --help=foo, analogous to program_help_ok() foreach my $arg (qw(commands variables)) { + my ($stdout, $stderr); + my $result; + $result = IPC::Run::run [ 'psql', "--help=$arg" ], '>', \$stdout, '2>', \$stderr; ok($result, "psql --help=$arg exit code 0"); @@ -70,15 +69,15 @@ sub psql_like }); $node->start; -psql_like($node, '\copyright', '\copyright', qr/Copyright/, undef); -psql_like($node, '\help without arguments', '\help', qr/ALTER/, undef); -psql_like($node, '\help with argument', '\help SELECT', qr/SELECT/, undef); +psql_like($node, '\copyright', qr/Copyright/, '\copyright'); +psql_like($node, '\help', qr/ALTER/, '\help without arguments'); +psql_like($node, '\help SELECT', qr/SELECT/, '\help with argument'); # Test clean handling of unsupported replication command responses -psql_like( +psql_fails_like( $node, - 'handling of unexpected PQresultStatus', 'START_REPLICATION 0/0', - undef, qr/unexpected PQresultStatus: 8$/); + qr/unexpected PQresultStatus: 8$/, + 'handling of unexpected PQresultStatus'); done_testing(); -- 2.35.1 From 189e977e47c505230195551833e0a61ec71dced3 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 22 Feb 2022 14:20:05 +0100 Subject: [PATCH 2/2] psql: Additional tests Add a few TAP tests for things that happen while a user query is bein
Re: TAP output format in pg_regress
> On 22 Feb 2022, at 00:08, Daniel Gustafsson wrote: > I'll fix that. The attached v3 fixes that thinko, and cleans up a lot of the output which isn't diagnostic per the TAP spec to make it less noisy. It also fixes tag support used in the ECPG tests and a few small cleanups. There is a blank line printed which needs to be fixed, but I'm running out of time and wanted to get a non-broken version on the list before putting it aside for today. The errorpaths that exit(2) the testrun should be converted to "bail out" lines when running with TAP output, but apart from that I think it's fairly spec compliant. -- Daniel Gustafsson https://vmware.com/ v3-0001-pg_regress-TAP-output-format.patch Description: Binary data
Re: making pg_regress less noisy by removing boilerplate
On 2/21/22 12:31, Andres Freund wrote: > > We still have a few issues with ports conflicts on windows. We should really > consider just desupporting all windows versions without unix socket > support. But until then it seems useful to be able to figure out random > failures. > I'm pretty sure all my Windows machines with buildfarm animals are sufficiently modern except the XP machine, which only builds release 10 nowadays. What's involved in moving to require Unix socket support? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: External data files possible?
Hi Chris, > Breaking up the data into pages with page headers means a lot of extra work > [...]. It would be much better to store the index in a set of external data > files. This seems possible so long as I put the files under the database's > directory and name things properly. Just curious, what is your index for, and how you are going to handle crash recovery? -- Best regards, Aleksander Alekseev
RE: Design of pg_stat_subscription_workers vs pgstats
On Tuesday, February 22, 2022 2:45 PM Masahiko Sawada wrote: > I've attached a patch that changes pg_stat_subscription_workers view. > It removes non-cumulative values such as error details such as error-XID and > the error message from the view, and consequently the view now has only > cumulative statistics counters: apply_error_count and sync_error_count. Since > the new view name is under discussion I temporarily chose > pg_stat_subscription_activity. Hi, thank you for sharing the patch. Few minor comments for v1. (1) commit message's typo This commits changes the view so that it stores only statistics counters: apply_error_count and sync_error_count. "This commits" -> "This commit" (2) minor improvement suggestion for the commit message I suggest that we touch the commit id 8d74fc9 that introduces the pg_stat_subscription_workers in the commit message, for better traceability. Below is an example. From: As the result of the discussion, we've concluded that the stats collector is not an appropriate place to store the error information of subscription workers. To: As the result of the discussion about the view introduced by 8d74fc9,... (3) doc/src/sgml/logical-replication.sgml Kindly refer to commit id 85c61ba for the detail. You forgot "the" in the below sentence. @@ -346,8 +346,6 @@ A conflict will produce an error and will stop the replication; it must be resolved manually by the user. Details about the conflict can be found in - - pg_stat_subscription_workers and the subscriber's server log. From: subscriber's server log. to: the subscriber's server log. (4) doc/src/sgml/monitoring.sgml - last_error_time timestamp with time zone + sync_error_count uint8 - Last time at which this error occurred + Number of times the error occurred during the initial data copy I supposed it might be better to use "initial data sync" or "initial data synchronization", rather than "initial data copy". (5) src/test/subscription/t/026_worker_stats.pl +# Truncate test_tab1 so that table sync can continue. +$node_subscriber->safe_psql('postgres', "TRUNCATE test_tab1;"); The second truncate is for apply, isn't it? Therefore, kindly change From: Truncate test_tab1 so that table sync can continue. To: Truncate test_tab1 so that apply can continue. (6) src/test/subscription/t/026_worker_stats.pl +# Insert more data to test_tab1 on the subscriber and then on the publisher, raising an +# error on the subscriber due to violation of the unique constraint on test_tab1. +$node_subscriber->safe_psql('postgres', "INSERT INTO test_tab1 VALUES (2)"); Did we need this insert ? If you want to indicate the apply is working okay after the error of table sync is solved, waiting for the max value in the test_tab1 becoming 2 on the subscriber by polling query would work. But, I was not sure if this is essentially necessary for the testing purpose. Best Regards, Takamichi Osumi
Re: Support logical replication of DDLs
Em seg., 21 de fev. de 2022 às 13:13, Zheng Li escreveu: > > 2. Table level > Allows DDLs on the published tables to be replicated except for > certain edge cases. > > Think how to handle triggers and functions with same name but different purpose. Publisher create function public.audit() returns trigger language plpgsql as $$ begin new.Audit_User = current_user(); new.Audit_Date_Time = now(); return new; end;$$ create trigger audit before insert or update on foo for each row execute procedure public.audit(); Subscriber create function public.audit() returns trigger language plpgsql as $$ begin insert into Audit(Audit_Date_Time, Audit_User, Schema_Name, Table_Name, Audit_Action, Field_Values) values(new.Audit_ts, new.Audit_User, tg_table_schema, tg_table_name, tg_op, row_to_json(case when tg_op = 'DELETE' then old.* else new.* end)); return null; end;$$ create trigger audit after insert or update or delete on foo for each row execute procedure public.audit(); regards, Marcos
Re: Support logical replication of DDLs
Hi Zheng, > I’m working on a patch to support logical replication of data > definition language statements (DDLs). That's great! > However, there are still many edge cases to sort out because not every > DDL statement can/should be replicated. Maybe the first implementation shouldn't be perfect as long as known limitations are documented and the future improvements are unlikely to break anything for the users. Committing an MVP and iterating on this is much simpler in terms of development and code review. Also, it will deliver value to the users and give us feedback sooner. > 1. DDL involving multiple tables where only some tables are replicated, e.g. > > DROP TABLE replicated_foo, notreplicated_bar; > I would add DROP TABLE ... CASCADE to the list. Also, let's not forget that PostgreSQL supports table inheritance and table partitioning. > 2. Any DDL that calls a volatile function, such as NOW() or RAND(), is > likely to generate a different value on each replica. It is possible > to work around these issues—for example, the publisher can replace any > volatile function calls with a fixed return value when the statement > is logged so that the subscribers all get the same value. We will have > to consider some other cases. That would make sense. > [...] > Whether a DDL should be replicated also depends on what granularity do > we define DDL replication. For example, we can define DDL replication > on these levels: > > 1. Database level > Allows all DDLs for a database to be replicated except for certain > edge cases (refer to the edge cases mentioned above). > This is likely a major use case, such as in online major version upgrade. To my knowledge, this is not a primary use case for logical replication. Also, I suspect that implementing it may be a bit challenging. What if we focus on table-level replication for now? -- Best regards, Aleksander Alekseev
Re: Design of pg_stat_subscription_workers vs pgstats
On Tue, Feb 22, 2022 at 11:15 AM Masahiko Sawada wrote: > > I've attached a patch that changes pg_stat_subscription_workers view. > It removes non-cumulative values such as error details such as > error-XID and the error message from the view, and consequently the > view now has only cumulative statistics counters: apply_error_count > and sync_error_count. Since the new view name is under discussion I > temporarily chose pg_stat_subscription_activity. > Few comments: = 1. --- a/src/backend/catalog/system_functions.sql +++ b/src/backend/catalog/system_functions.sql @@ -637,11 +637,9 @@ REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_table_counters(oid) FROM public; REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_function_counters(oid) FROM public; -REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public; - -REVOKE EXECUTE ON FUNCTION pg_stat_reset_subscription_worker(oid) FROM public; +REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_subscription_counters(oid) FROM public; -REVOKE EXECUTE ON FUNCTION pg_stat_reset_subscription_worker(oid, oid) FROM public; +REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public; Is there a need to change anything about pg_stat_reset_replication_slot() in this patch? 2. Do we still need to use LATERAL in the view's query? 3. Can we send error stats pgstat_report_stat() as that will be called via proc_exit() path. We can set the phase (apply/sync) in apply_error_callback_arg and then use that to send the appropriate message. I think this will obviate the need for try..catch. -- With Regards, Amit Kapila.
Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:
Hi, On 2022-02-22 01:11:21 -0800, Andres Freund wrote: > I've started to work on a few debugging aids to find problem like > these. Attached are two WIP patches: Forgot to attach. Also importantly includes a tap test for several of these issues Greetings, Andres Freund >From 0bc64874f8e5faae9a38731a83aa7b001095cc35 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Mon, 21 Feb 2022 15:44:02 -0800 Subject: [PATCH v1 1/4] WIP: test for file reuse dangers around database and tablespace commands. --- src/test/recovery/t/029_relfilenode_reuse.pl | 233 +++ 1 file changed, 233 insertions(+) create mode 100644 src/test/recovery/t/029_relfilenode_reuse.pl diff --git a/src/test/recovery/t/029_relfilenode_reuse.pl b/src/test/recovery/t/029_relfilenode_reuse.pl new file mode 100644 index 000..22d8e85614c --- /dev/null +++ b/src/test/recovery/t/029_relfilenode_reuse.pl @@ -0,0 +1,233 @@ +use strict; +use warnings; +use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Utils; +use Test::More; +use File::Basename; + + +my $node_primary = PostgreSQL::Test::Cluster->new('primary'); +$node_primary->init(allows_streaming => 1); +$node_primary->append_conf('postgresql.conf', q[ +allow_in_place_tablespaces = true +log_connections=on +# to avoid "repairing" corruption +full_page_writes=off +log_min_messages=debug2 +autovacuum_naptime=1s +shared_buffers=1MB +]); +$node_primary->start; + + +# Create streaming standby linking to primary +my $backup_name = 'my_backup'; +$node_primary->backup($backup_name); +my $node_standby = PostgreSQL::Test::Cluster->new('standby'); +$node_standby->init_from_backup($node_primary, $backup_name, + has_streaming => 1); +$node_standby->start; + +# To avoid hanging while expecting some specific input from a psql +# instance being driven by us, add a timeout high enough that it +# should never trigger even on very slow machines, unless something +# is really wrong. +my $psql_timeout = IPC::Run::timer(300); + +my %psql_primary = (stdin => '', stdout => '', stderr => ''); +$psql_primary{run} = IPC::Run::start( + [ 'psql', '-XA', '-f', '-', '-d', $node_primary->connstr('postgres') ], + '<', + \$psql_primary{stdin}, + '>', + \$psql_primary{stdout}, + '2>', + \$psql_primary{stderr}, + $psql_timeout); + +my %psql_standby = ('stdin' => '', 'stdout' => '', 'stderr' => ''); +$psql_standby{run} = IPC::Run::start( + [ 'psql', '-XA', '-f', '-', '-d', $node_standby->connstr('postgres') ], + '<', + \$psql_standby{stdin}, + '>', + \$psql_standby{stdout}, + '2>', + \$psql_standby{stderr}, + $psql_timeout); + + +# Create template database with a table that we'll update, to trigger dirty +# rows. Using a template database + preexisting rows makes it a bit easier to +# reproduce, because there's no cache invalidations generated. + +$node_primary->safe_psql('postgres', "CREATE DATABASE conflict_db_template OID = 5;"); +$node_primary->safe_psql('conflict_db_template', q[ +CREATE TABLE large(id serial primary key, dataa text, datab text); +INSERT INTO large(dataa, datab) SELECT g.i::text, 1 FROM generate_series(1, 4000) g(i);]); +$node_primary->safe_psql('postgres', "CREATE DATABASE conflict_db TEMPLATE conflict_db_template OID = 50001;"); + +$node_primary->safe_psql('postgres', q[ +CREATE EXTENSION pg_prewarm; +CREATE TABLE replace_sb(data text); +INSERT INTO replace_sb(data) SELECT random()::text FROM generate_series(1, 15000);]); + +# Use longrunning transactions, so that AtEOXact_SMgr doesn't close files +send_query_and_wait( + \%psql_primary, + q[BEGIN;], + qr/BEGIN/m); +send_query_and_wait( + \%psql_standby, + q[BEGIN;], + qr/BEGIN/m); + +# Cause lots of dirty rows in shared_buffers +$node_primary->safe_psql('conflict_db', "UPDATE large SET datab = 1;"); + +# Now do a bunch of work in another database. That will end up needing to +# write back dirty data from the previous step, opening the relevant file +# descriptors +cause_eviction(\%psql_primary, \%psql_standby); + +# drop and recreate database +$node_primary->safe_psql('postgres', "DROP DATABASE conflict_db;"); +$node_primary->safe_psql('postgres', "CREATE DATABASE conflict_db TEMPLATE conflict_db_template OID = 50001;"); + +verify($node_primary, $node_standby, 1, + "initial contents as expected"); + +# Again cause lots of dirty rows in shared_buffers, but use a different update +# value so we can check everything is OK +$node_primary->safe_psql('conflict_db', "UPDATE large SET datab = 2;"); + +# Again cause a lot of IO. That'll again write back dirty data, but uses (XXX +# adjust after bugfix) the already opened file descriptor. +# FIXME +cause_eviction(\%psql_primary, \%psql_standby); + +verify($node_primary, $node_standby, 2, + "update to reused relfilenode (due to DB oid conflict) is not lost"); + + +$node_primary->safe_psql('conflict_db', "VACUUM FULL large;"); +$node_primary->safe_psql('conflict_db', "UPDATE large SET datab = 3;"); + +verify($node_primary, $node_standby, 3, + "restored conte
Re: pg_receivewal.exe unhandled exception in zlib1.dll
Please excuse my late reply. On Wed, Feb 16, 2022 at 12:20 PM Juan José Santamaría Flecha < juanjo.santama...@gmail.com> wrote: > On Tue, Feb 15, 2022 at 5:25 PM Andres Freund wrote: > >> >> FWIW, I've looked at using either vcpkg or conan to more easily >> install / >> > build dependencies on windows, in the right debug / release mode. >> >> The former was easier, but unfortunately the zlib, lz4, ... packages don't >> include the gzip, lz4 etc binaries right now. That might be easy enough >> to fix >> with an option. It does require building the dependencies locally. >> >> Conan seemed to be a nicer architecture, but there's no builtin way to >> update >> to newer dependency versions, and non-versioned dependencies don't >> actually >> work. It has pre-built dependencies for the common cases. >> >> I will have to do some testing with Conan and see how both compare. > I have found a couple of additional issues when trying to build using Conan, all libraries are static and there aren't any prebuilt packages for MSVC 2022 yet. Also for some specific packages: - zstd doesn't include the binaries either. - gettext packages only provides binaries, no libraries, nor headers. - openssl builds the binaries, but is missing some objects in the libraries (error LNK2019: unresolved external symbol). - libxml builds the binaries, but is missing some objects in the libraries (error LNK2019: unresolved external symbol). ,Regards, Juan José Santamaría Flecha
Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:
Hi, On 2022-02-10 14:26:59 -0800, Andres Freund wrote: > On 2022-02-11 09:10:38 +1300, Thomas Munro wrote: > > It seems like I should go ahead and do that today, and we can study > > further uses for PROCSIGNAL_BARRIER_SMGRRELEASE in follow-on work? > > Yes. I wrote a test to show the problem. While looking at the code I unfortunately found that CREATE DATABASE ... OID isn't the only problem. Two consecutive ALTER DATABASE ... SET TABLESPACE also can cause corruption. The test doesn't work on < 15 (due to PostgresNode renaming and use of allow_in_place_tablespaces). But manually it's unfortunately quite reproducible :( Start a server with shared_buffers = 1MB autovacuum=off bgwriter_lru_maxpages=1 bgwriter_delay=1 these are just to give more time / to require smaller tables. Start two psql sessions s1: \c postgres s1: CREATE DATABASE conflict_db; s1: CREATE TABLESPACE test_tablespace LOCATION '/tmp/blarg'; s1: CREATE EXTENSION pg_prewarm; s2: \c conflict_db s2: CREATE TABLE large(id serial primary key, dataa text, datab text); s2: INSERT INTO large(dataa, datab) SELECT g.i::text, 0 FROM generate_series(1, 4000) g(i); s2: UPDATE large SET datab = 1; -- prevent AtEOXact_SMgr s1: BEGIN; -- Fill shared_buffers with other contents. This needs to write out the prior dirty contents -- leading to opening smgr rels / file descriptors s1: SELECT SUM(pg_prewarm(oid)) warmed_buffers FROM pg_class WHERE pg_relation_filenode(oid) != 0; s2: \c postgres -- unlinks the files s2: ALTER DATABASE conflict_db SET TABLESPACE test_tablespace; -- now new files with the same relfilenode exist s2: ALTER DATABASE conflict_db SET TABLESPACE pg_default; s2: \c conflict_db -- dirty buffers again s2: UPDATE large SET datab = 2; -- this again writes out the dirty buffers. But because nothing forced the smgr handles to be closed, -- fds pointing to the *old* file contents are used. I.e. we'll write to the wrong file s1: SELECT SUM(pg_prewarm(oid)) warmed_buffers FROM pg_class WHERE pg_relation_filenode(oid) != 0; -- verify that everything is [not] ok s2: SELECT datab, count(*) FROM large GROUP BY 1 ORDER BY 1 LIMIT 10; ┌───┬───┐ │ datab │ count │ ├───┼───┤ │ 1 │ 2117 │ │ 2 │67 │ └───┴───┘ (2 rows) oops. I don't yet have a good idea how to tackle this in the backbranches. I've started to work on a few debugging aids to find problem like these. Attached are two WIP patches: 1) use fstat(...).st_nlink == 0 to detect dangerous actions on fds with deleted files. That seems to work (almost) everywhere. Optionally, on linux, use readlink(/proc/$pid/fd/$fd) to get the filename. 2) On linux, iterate over PGPROC to get a list of pids. Then iterate over /proc/$pid/fd/ to check for deleted files. This only can be done after we've just made sure there's no fds open to deleted files. Greetings, Andres Freund
Re: Patch a potential memory leak in describeOneTableDetails()
> On 22 Feb 2022, at 07:14, Kyotaro Horiguchi wrote: > Anyway, don't we use the -ftabstop option flag to silence compiler? > The warning is contradicting our convention. The attached adds that > flag. Isn't this flag contradicting our convention? From the docs section you reference further down: "Source code formatting uses 4 column tab spacing, with tabs preserved (i.e., tabs are not expanded to spaces)." The -ftabstop option is (to a large extent, not entirely) to warn when tabs and spaces are mixed creating misleading indentation that the author didn't even notice due to tabwidth settings? ISTM we are better of getting these warnings than suppressing them. > By the way, the doc says that: > > https://www.postgresql.org/docs/14/source-format.html > >> The text browsing tools more and less can be invoked as: >> more -x4 >> less -x4 >> to make them show tabs appropriately. > > But AFAICS the "more" command on CentOS doesn't have -x options nor > any option to change tab width and I don't find a document that > suggests its existence on other platforms. macOS, FreeBSD and NetBSD both show the less(1) manpage for more(1) which suggests that more is implemented using less there, and thus supports -x (it does on macOS). OpenBSD and Solaris more(1) does not show a -x option, but AIX does have it. Linux in its various flavors doesn't seem to have it. The section in question was added to our docs 22 years ago, to make it reflect the current operating systems in use maybe just not mentioning more(1) is the easiest?: "The text browsing tool less can be invoked as less -x4 to make it show tabs appropriately." Or perhaps remove that section altogether? -- Daniel Gustafsson https://vmware.com/
Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
At Tue, 22 Feb 2022 01:59:45 +0900, Fujii Masao wrote in > > > Of the following, I think we should do (a) and (b) to make future > > backpatchings easier. > > a) Use RedoRecPtr and PriorRedoPtr after they are assigned. > > b) Move assignment to PriorRedoPtr into the ControlFileLock section. > > I failed to understand how (a) and (b) can make the backpatching > easier. How easy to backpatch seems the same whether we apply (a) and > (b) or not... That premises that the patch applied to master contains (a) and (b). So if it doesn't, those are not need by older branches. > > c) Skip udpate of minRecoveryPoint only when the checkpoint gets old. > > Yes. > > > > d) Skip call to UpdateCheckPointDistanceEstimate() when RedoRecPtr <= > >PriorRedoPtr. > > But "RedoRecPtr <= PriorRedoPtr" will never happen, will it? Because a I didn't believe that it happens. (So, it came from my convervativeness, or laziness, or both:p) The code dates from 2009 and StartupXLOG makes a concurrent checkpoint with bgwriter. But as of at least 9.5, StartupXLOG doesn't directly call CreateCheckPoint. So I think that won't happen. So, in short, I agree to remove it or turn it into Assert(). > restartpoint is skipped at the beginning of CreateRestartPoint() in > that case. If this understanding is right, the check of "RedoRecPtr <= > PriorRedoPtr" is not necessary before calling > UpdateCheckPointDistanceEstimate(). > > > + ControlFile->minRecoveryPoint = InvalidXLogRecPtr; > + ControlFile->minRecoveryPointTLI = 0; > > Don't we need to update LocalMinRecoveryPoint and > LocalMinRecoveryPointTLI after this? Maybe it's not necessary, but > ISTM that it's safer and better to always update them whether the > state is DB_IN_ARCHIVE_RECOVERY or not. Agree that it's safer and tidy. > if (flags & CHECKPOINT_IS_SHUTDOWN) > ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY; > > Same as above. IMO it's safer and better to always update the state > (whether the state is DB_IN_ARCHIVE_RECOVERY or not) if > CHECKPOINT_IS_SHUTDOWN flag is passed. That means we may exit recovery mode after ShutdownXLOG called CreateRestartPoint. I don't think that may happen. So I'd rather add Assert ((flags&CHECKPOINT_IS_SHUTDOWN) == 0) there instaed. I'll post the new version later. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: List of all* PostgreSQL EXTENSIONs in the world
Hi Joel, > I've compiled a list of all* PostgreSQL EXTENSIONs in the world: The list is great. Thanks for doing this! Just curious, is it a one-time experiment or you are going to keep the list in an actual state similarly to awesome-* lists on GitHub, or ... ? -- Best regards, Aleksander Alekseev