Re: Addition of --no-sync to pg_upgrade for test speedup
On Mon, Dec 20, 2021 at 10:46:13AM -0300, Alvaro Herrera wrote: > On 2021-Dec-16, Michael Paquier wrote: >> In pg_upgrade, we let the flush happen with initdb --sync-only, based >> on the binary path of the new cluster, so I think that we are not >> going to miss any test coverage by skipping that. > > There was one patch of mine with breakage that only manifested in the > pg_upgrade test *because* of its lack of no-sync. I'm afraid that this > change would hide certain problems. > https://postgr.es/m/20210130023011.n545o54j65t4k...@alap3.anarazel.de Hmm. This talks about fsync=on being a factor counting in detecting a failure with the backend. Why would the fsync done with initdb --sync-only on the target cluster once pg_upgrade is done change something here? > I'm not 100% comfortable with this. What can we do to preserve *some* > testing that include syncing? Maybe some option that a few buildfarm > animals use? If you object about this part, I am fine to revert the change in test.sh until there is a better facility to enforce syncs across tests in the buildfarm, though. I can hack something to centralize all that, of course, but I am not sure when I'll be able to do so in the short term. Could I keep that in MSVC's vcregress.pl at least for the time being? -- Michael signature.asc Description: PGP signature
Re: Unifying VACUUM VERBOSE and log_autovacuum_min_duration output
On Tue, Dec 21, 2021 at 2:39 AM Peter Geoghegan wrote: > > On Mon, Nov 29, 2021 at 6:51 PM Peter Geoghegan wrote: > > Attached is a WIP patch doing this. > > This has bitrot, so I attach v2, mostly just to keep the CFTester > status green. The only real change is one minor simplification to how > we set everything up, inside heap_vacuum_rel(). I've looked at the patch and here are comments: @@ -3076,16 +3021,12 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat, LVRelState *vacrel) { IndexVacuumInfo ivinfo; - PGRUsageru0; LVSavedErrInfo saved_err_info; - pg_rusage_init(); - ivinfo.index = indrel; ivinfo.analyze_only = false; ivinfo.report_progress = false; ivinfo.estimated_count = estimated_count; - ivinfo.message_level = elevel; I think we should set message_level. Otherwise, index AM will set an invalid log level, although any index AM in core seems not to use it. --- - /* -* Update error traceback information. This is the last phase during -* which we add context information to errors, so we don't need to -* revert to the previous phase. -*/ Why is this comment removed? ISTM this comment is still valid. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Emit a warning if the extension's GUC is set incorrectly
On 2021-12-22 02:23, Tom Lane wrote: Kyotaro Horiguchi writes: At Mon, 20 Dec 2021 21:05:23 +0900, Shinya Kato wrote in We should use EmitWarningsOnPlaceholders when we use DefineCustomXXXVariable. I don't think there is any room for debate. Unfortunately, pltcl.c defines variables both in pltcl and pltclu but the patch adds the warning only for pltclu. Right. The rest of 0001 looks fine, so pushed with that correction. I concur that 0002 is unacceptable. The result of it will be that a freshly-loaded extension will fail to hook into the system as it should, because it will fail to complete its initialization. That will cause user frustration and bug reports. (BTW, I wonder if EmitWarningsOnPlaceholders should be using LOG rather than WARNING when it's running in the postmaster. Use of WARNING is moderately likely to result in nothing getting printed at all.) 0003 looks to me like it was superseded by 75d22069e. I do not particularly like that patch though; it seems both inefficient and ill-designed. 0003 is adding a check in an equally bizarre place. Why isn't add_placeholder_variable the place to deal with this? Also, once we know that a prefix is reserved, why don't we throw an error not just a warning for attempts to set some unknown variable under that prefix? We don't allow you to set unknown non-prefixed variables. regards, tom lane Thank you for pushing! Thank you all for the reviews, I think I will take down 0002 and 0003. -- Regards, -- Shinya Kato Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: pg_upgrade should truncate/remove its logs before running
On Mon, Dec 20, 2021 at 09:39:26PM -0600, Justin Pryzby wrote: > On Mon, Dec 20, 2021 at 08:21:51PM +0900, Michael Paquier wrote: >> we could choose something simpler for the default, like >> "pg_upgrade_log". I don't have a good history in naming new things, >> though :) > > I specifically called it .d to made it obvious that it's a dir - nearly > everything that ends in "log" is a file, so people are likely to run "rm" and > "less" on it - including myself. Okay. >> .gitignore should be updated, I guess? > > Are you suggesting to remove these ? > -/pg_upgrade_internal.log > -/loadable_libraries.txt Yep, it looks so as these are part of the logs, the second one being a failure state. > -/reindex_hash.sql But this one is not, no? >> Besides, this patch has no documentation. > > TBH I'm not even sure if the dir needs to be configurable ? I'd think it is better to have some control on that. Not sure what the opinion of others is on this specific point, though. -- Michael signature.asc Description: PGP signature
Re: row filtering for logical replication
On Tue, Dec 21, 2021 at 9:03 PM vignesh C wrote: > ... > Few comments: > 1) list_free(schemarelids) is called inside if and and outside if in > break case. Can we move it above continue so that it does not gets > called in the break case: > + schemarelids = > GetAllSchemaPublicationRelations(pub->oid, > + > pub->pubviaroot ? > + > PUBLICATION_PART_ROOT > : > + > > PUBLICATION_PART_LEAF); > + if (list_member_oid(schemarelids, entry->relid)) > + { > + if (pub->pubactions.pubinsert) > + no_filter[idx_ins] = true; > + if (pub->pubactions.pubupdate) > + no_filter[idx_upd] = true; > + if (pub->pubactions.pubdelete) > + no_filter[idx_del] = true; > + > + list_free(schemarelids); > + > + /* Quick exit loop if all pubactions > have no row-filter. */ > + if (no_filter[idx_ins] && > no_filter[idx_upd] && no_filter[idx_del]) > + break; > + > + continue; > + } > + list_free(schemarelids); > I think this review comment is mistaken. The break will break from the loop, so the free is still needed. So I skipped this comment. If you still think there is a problem please give a more detailed explanation about it. > 2) create_edata_for_relation also is using similar logic, can it also > call this function to reduce duplicate code > +static EState * > +create_estate_for_relation(Relation rel) > +{ > + EState *estate; > + RangeTblEntry *rte; > + > + estate = CreateExecutorState(); > + > + rte = makeNode(RangeTblEntry); > + rte->rtekind = RTE_RELATION; > + rte->relid = RelationGetRelid(rel); > + rte->relkind = rel->rd_rel->relkind; > + rte->rellockmode = AccessShareLock; > + ExecInitRangeTable(estate, list_make1(rte)); > + > + estate->es_output_cid = GetCurrentCommandId(false); > + > + return estate; > +} > Yes, that other code looks similar, but I am not sure it is worth rearranging things for the sake of trying to make use of only 5 or 6 common LOC. Anyway, I felt this review comment is not really related to the RF patch; It seems more like a potential idea for a future patch to use some common code *after* the RF code is committed. So I skipped this comment. > 3) In one place select is in lower case, it can be changed to upper case > + resetStringInfo(); > + appendStringInfo(, > +"SELECT DISTINCT > pg_get_expr(prqual, prrelid) " > +" FROM pg_publication p " > +" INNER JOIN > pg_publication_rel pr ON (p.oid = pr.prpubid) " > +"WHERE pr.prrelid > = %u AND p.pubname IN ( %s ) " > +"AND NOT (select > bool_or(puballtables) " > +" FROM pg_publication " > +" WHERE pubname > in ( %s )) " > +"AND (SELECT count(1)=0 " > +" FROM > pg_publication_namespace pn, pg_class c " > +" WHERE c.oid = > %u AND c.relnamespace = pn.pnnspid)", > + lrel->remoteid, > + pub_names.data, > + pub_names.data, > + lrel->remoteid); > Fixed in v52-0001 [1] > 5) Should we mention user should take care of deletion of row filter > records after table sync is done. > + ALL TABLES or FOR ALL TABLES IN > SCHEMA publication, > + then all other WHERE clauses (for the same > publish operation) > + become redundant. > + If the subscriber is a PostgreSQL > version before 15 > + then any row filtering is ignored during the initial data > synchronization phase. > Fixed in v52-0001 [1] > 6) Should this be an Assert, since this will be taken care earlier by > GetTransformedWhereClause->transformWhereClause->coerce_to_boolean: > + expr = (Expr *) coerce_to_target_type(NULL, rfnode, exprtype, > BOOLOID, -1, COERCION_ASSIGNMENT, COERCE_IMPLICIT_CAST, -1); > + > + if (expr == NULL) > + ereport(ERROR, > + (errcode(ERRCODE_CANNOT_COERCE), > +
Re: Confused comment about drop replica identity index
On Mon, Dec 20, 2021 at 11:57:32AM -0300, Euler Taveira wrote: > On Mon, Dec 20, 2021, at 8:11 AM, Michael Paquier wrote: >> That's mostly fine. I have made some adjustments as per the >> attached. > > Your patch looks good to me. Thanks. I have done this part for now. -- Michael signature.asc Description: PGP signature
Re: In-placre persistance change of a relation
Hello, Jakub. At Tue, 21 Dec 2021 13:07:28 +, Jakub Wartak wrote in > So what's suspicious is that 122880 -> 0 file size truncation. I've > investigated WAL and it seems to contain TRUNCATE records > after logged FPI images, so when the crash recovery would kick in it probably > clears this table (while it shouldn't). Darn.. It is too silly that I wrongly issued truncate records for the target relation of the function (rel) instaed of the relation on which we're currently operating at that time (r). > However if I perform CHECKPOINT just before crash the WAL stream contains > just RUNNING_XACTS and CHECKPOINT_ONLINE > redo records, this probably prevents truncating. I'm newbie here so please > take this theory with grain of salt, it can be > something completely different. It is because the WAL records are inconsistent with the on-disk state. After a crash before a checkpoint after the SET LOGGED, recovery ends with recoverying the broken WAL records, but after that the on-disk state is persisted and the broken WAL records are not replayed. The following fix works. --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -5478,7 +5478,7 @@ RelationChangePersistence(AlteredTableInfo *tab, char persistence, xl_smgr_truncate xlrec; xlrec.blkno = 0; - xlrec.rnode = rel->rd_node; + xlrec.rnode = r->rd_node; xlrec.flags = SMGR_TRUNCATE_ALL; I made another change in this version. Previously only btree among all index AMs was processed in the in-place manner. In this version we do that all AMs except GiST. Maybe if gistGetFakeLSN behaved the same way for permanent and unlogged indexes, we could skip index rebuild in exchange of some extra WAL records emitted while it is unlogged. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 0cac0fade05322c1aa8b7ec020f8fe1f9e5fb50e Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Wed, 11 Nov 2020 21:51:11 +0900 Subject: [PATCH v11 1/2] In-place table persistence change Even though ALTER TABLE SET LOGGED/UNLOGGED does not require data rewriting, currently it runs heap rewrite which causes large amount of file I/O. This patch makes the command run without heap rewrite. Addition to that, SET LOGGED while wal_level > minimal emits WAL using XLOG_FPI instead of massive number of HEAP_INSERT's, which should be smaller. Also this allows for the cleanup of files left behind in the crash of the transaction that created it. --- src/backend/access/rmgrdesc/smgrdesc.c | 52 +++ src/backend/access/transam/README | 8 + src/backend/access/transam/xact.c | 7 + src/backend/access/transam/xlog.c | 17 + src/backend/catalog/storage.c | 545 - src/backend/commands/tablecmds.c | 265 ++-- src/backend/replication/basebackup.c | 3 +- src/backend/storage/buffer/bufmgr.c| 88 src/backend/storage/file/fd.c | 4 +- src/backend/storage/file/reinit.c | 344 +++- src/backend/storage/smgr/md.c | 93 - src/backend/storage/smgr/smgr.c| 32 ++ src/backend/storage/sync/sync.c| 20 +- src/bin/pg_rewind/parsexlog.c | 24 ++ src/common/relpath.c | 47 ++- src/include/catalog/storage.h | 3 + src/include/catalog/storage_xlog.h | 42 +- src/include/common/relpath.h | 9 +- src/include/storage/bufmgr.h | 2 + src/include/storage/fd.h | 1 + src/include/storage/md.h | 8 +- src/include/storage/reinit.h | 10 +- src/include/storage/smgr.h | 17 + 23 files changed, 1459 insertions(+), 182 deletions(-) diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c index 773d57..d251f22207 100644 --- a/src/backend/access/rmgrdesc/smgrdesc.c +++ b/src/backend/access/rmgrdesc/smgrdesc.c @@ -40,6 +40,49 @@ smgr_desc(StringInfo buf, XLogReaderState *record) xlrec->blkno, xlrec->flags); pfree(path); } + else if (info == XLOG_SMGR_UNLINK) + { + xl_smgr_unlink *xlrec = (xl_smgr_unlink *) rec; + char *path = relpathperm(xlrec->rnode, xlrec->forkNum); + + appendStringInfoString(buf, path); + pfree(path); + } + else if (info == XLOG_SMGR_MARK) + { + xl_smgr_mark *xlrec = (xl_smgr_mark *) rec; + char *path = GetRelationPath(xlrec->rnode.dbNode, + xlrec->rnode.spcNode, + xlrec->rnode.relNode, + InvalidBackendId, + xlrec->forkNum, xlrec->mark); + char *action; + + switch (xlrec->action) + { + case XLOG_SMGR_MARK_CREATE: +action = "CREATE"; +break; + case XLOG_SMGR_MARK_UNLINK: +action = "DELETE"; +break; + default: +action = ""; +break; + } + + appendStringInfo(buf, "%s %s", action, path); + pfree(path); + } + else if (info ==
Re: row filtering for logical replication
On Wed, Dec 22, 2021 at 9:23 AM vignesh C wrote: > > On Tue, Dec 21, 2021 at 2:29 PM Peter Smith wrote: > > > > Here is the v51* patch set: > > > > I tweaked the query slightly based on Euler's changes, the explain > analyze of the updated query based on Euler's suggestions, existing > query and Euler's query is given below: > 1) updated query based on Euler's suggestion: > explain analyze SELECT DISTINCT pg_get_expr(prqual, prrelid) FROM > pg_publication p INNER JOIN pg_publication_rel pr ON (p.oid = > pr.prpubid) WHERE pr.prrelid = 16384 AND p.pubname IN ( 'pub1' ) > AND NOT (select bool_or(puballtables) FROM pg_publication > WHERE pubname in ( 'pub1' )) AND NOT EXISTS (SELECT 1 FROM > pg_publication_namespace pn, pg_class c WHERE c.oid = 16384 AND > c.relnamespace = pn.pnnspid); >QUERY > PLAN > > Unique (cost=14.68..14.69 rows=1 width=32) (actual time=0.121..0.126 > rows=1 loops=1) >InitPlan 1 (returns $0) > -> Aggregate (cost=1.96..1.97 rows=1 width=1) (actual > time=0.025..0.026 rows=1 loops=1) >-> Seq Scan on pg_publication (cost=0.00..1.96 rows=1 > width=1) (actual time=0.016..0.017 rows=1 loops=1) > Filter: (pubname = 'pub1'::name) >InitPlan 2 (returns $1) > -> Nested Loop (cost=0.27..8.30 rows=1 width=0) (actual > time=0.002..0.003 rows=0 loops=1) >Join Filter: (pn.pnnspid = c.relnamespace) >-> Seq Scan on pg_publication_namespace pn > (cost=0.00..0.00 rows=1 width=4) (actual time=0.001..0.002 rows=0 > loops=1) >-> Index Scan using pg_class_oid_index on pg_class c > (cost=0.27..8.29 rows=1 width=4) (never executed) > Index Cond: (oid = '16384'::oid) >-> Sort (cost=4.40..4.41 rows=1 width=32) (actual > time=0.119..0.121 rows=1 loops=1) > Sort Key: (pg_get_expr(pr.prqual, pr.prrelid)) COLLATE "C" > Sort Method: quicksort Memory: 25kB > -> Result (cost=0.00..4.39 rows=1 width=32) (actual > time=0.094..0.098 rows=1 loops=1) >One-Time Filter: ((NOT $0) AND (NOT $1)) >-> Nested Loop (cost=0.00..4.39 rows=1 width=36) > (actual time=0.013..0.015 rows=1 loops=1) > Join Filter: (p.oid = pr.prpubid) > -> Seq Scan on pg_publication p > (cost=0.00..1.96 rows=1 width=4) (actual time=0.004..0.005 rows=1 > loops=1) >Filter: (pubname = 'pub1'::name) > -> Seq Scan on pg_publication_rel pr > (cost=0.00..2.41 rows=1 width=40) (actual time=0.005..0.005 rows=1 > loops=1) >Filter: (prrelid = '16384'::oid) > Planning Time: 1.014 ms > Execution Time: 0.259 ms > (24 rows) > > 2) Existing query: > postgres=# explain analyze SELECT DISTINCT pg_get_expr(prqual, > prrelid) FROM pg_publication p >INNER JOIN pg_publication_rel pr ON (p.oid = pr.prpubid) WHERE > pr.prrelid = 16384 AND p.pubname IN ( 'pub1' ) > AND NOT (select bool_or(puballtables) FROM pg_publication > WHERE pubname in ( 'pub1' )) > AND (SELECT count(1)=0 FROM pg_publication_namespace pn, > pg_class c WHERE c.oid = 16384 AND c.relnamespace = pn.pnnspid); >QUERY > PLAN > - > Unique (cost=14.69..14.70 rows=1 width=32) (actual time=0.162..0.166 > rows=1 loops=1) >InitPlan 1 (returns $0) > -> Aggregate (cost=1.96..1.97 rows=1 width=1) (actual > time=0.023..0.025 rows=1 loops=1) >-> Seq Scan on pg_publication (cost=0.00..1.96 rows=1 > width=1) (actual time=0.014..0.016 rows=1 loops=1) > Filter: (pubname = 'pub1'::name) >InitPlan 2 (returns $1) > -> Aggregate (cost=8.30..8.32 rows=1 width=1) (actual > time=0.044..0.045 rows=1 loops=1) >-> Nested Loop (cost=0.27..8.30 rows=1 width=0) (actual > time=0.028..0.029 rows=0 loops=1) > Join Filter: (pn.pnnspid = c.relnamespace) > -> Seq Scan on pg_publication_namespace pn > (cost=0.00..0.00 rows=1 width=4) (actual time=0.004..0.004 rows=0 > loops=1) > -> Index Scan using pg_class_oid_index on pg_class c > (cost=0.27..8.29 rows=1 width=4) (never executed) >Index Cond: (oid = '16384'::oid) >-> Sort (cost=4.40..4.41 rows=1 width=32) (actual > time=0.159..0.161 rows=1 loops=1) > Sort Key: (pg_get_expr(pr.prqual, pr.prrelid)) COLLATE "C" > Sort Method: quicksort Memory: 25kB > -> Result (cost=0.00..4.39 rows=1 width=32) (actual > time=0.142..0.147 rows=1 loops=1) >One-Time Filter:
Re: Unifying VACUUM VERBOSE and log_autovacuum_min_duration output
I haven't read the patch yet. But some thoughts based on the posted output: 1) At first I was quite skeptical about losing the progress reporting. I've often found it quite useful. But looking at the examples I'm convinced. Or rather I think a better way to look at it is that the progress output for the operator should be separated from the metrics logged. As an operator what I want to see is some progress indicator ""starting table scan", "overflow at x% of table scanned, starting index scan", "processing index 1" "index 2"... so I can have some idea of how much longer the vacuum will take and see whether I need to raise maintenance_work_mem and by how much. I don't need to see all the metrics while it's running. 2) I don't much like the format. I want to be able to parse the output with awk or mtail or even just grep for relevant lines. Things like "index scan not needed" make it hard to parse since you don't know what it will look like if they are needed. I would have expected something like "index scans: 0" which is actually already there up above. I'm not clear how this line is meant to be read. Is it just explaining *why* the index scan was skipped? It would just be missing entirely if it wasn't skipped? Fwiw, having it be parsable is why I wouldn't want it to be multiple ereports. That would mean it could get interleaved with other errors from other backends. That would be a disaster.
Re: postgres_fdw: incomplete subabort cleanup of connections used in async execution
Hi Alexander, On Wed, Dec 22, 2021 at 1:08 AM Alexander Pyhalov wrote: > Etsuro Fujita писал 2021-12-19 13:25: > > While working on [1], I noticed $SUBJECT: postgres_fdw resets the > > per-connection states of connections, which store async requests sent > > to remote servers in async_capable mode, during post-abort > > (pgfdw_xact_callback()), but it fails to do so during post-subabort > > (pgfdw_subxact_callback()). This causes a crash when re-executing a > > query that was aborted in a subtransaction: > Looks good to me. Great! Thanks for reviewing! Best regards, Etsuro Fujita
Re: sequences vs. synchronous replication
On 2021/12/22 10:57, Tomas Vondra wrote: On 12/19/21 04:03, Amit Kapila wrote: On Sat, Dec 18, 2021 at 7:24 AM Tomas Vondra wrote: while working on logical decoding of sequences, I ran into an issue with nextval() in a transaction that rolls back, described in [1]. But after thinking about it a bit more (and chatting with Petr Jelinek), I think this issue affects physical sync replication too. Imagine you have a primary <-> sync_replica cluster, and you do this: CREATE SEQUENCE s; -- shutdown the sync replica BEGIN; SELECT nextval('s') FROM generate_series(1,50); ROLLBACK; BEGIN; SELECT nextval('s'); COMMIT; The natural expectation would be the COMMIT gets stuck, waiting for the sync replica (which is not running), right? But it does not. How about if we always WAL log the first sequence change in a transaction? I've been thinking about doing something like this, but I think it would not have any significant advantages compared to using "SEQ_LOG_VALS 0". It would still have the same performance hit for plain nextval() calls, and there's no measurable impact on simple workloads that already write WAL in transactions even with SEQ_LOG_VALS 0. Just idea; if wal_level > minimal, how about making nextval_internal() (1) check whether WAL is replicated to sync standbys, up to the page lsn of the sequence, and (2) forcibly emit a WAL record if not replicated yet? The similar check is performed at the beginning of SyncRepWaitForLSN(), so probably we can reuse that code. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: row filtering for logical replication
On Tue, Dec 21, 2021 at 2:29 PM Peter Smith wrote: > > Here is the v51* patch set: > I tweaked the query slightly based on Euler's changes, the explain analyze of the updated query based on Euler's suggestions, existing query and Euler's query is given below: 1) updated query based on Euler's suggestion: explain analyze SELECT DISTINCT pg_get_expr(prqual, prrelid) FROM pg_publication p INNER JOIN pg_publication_rel pr ON (p.oid = pr.prpubid) WHERE pr.prrelid = 16384 AND p.pubname IN ( 'pub1' ) AND NOT (select bool_or(puballtables) FROM pg_publication WHERE pubname in ( 'pub1' )) AND NOT EXISTS (SELECT 1 FROM pg_publication_namespace pn, pg_class c WHERE c.oid = 16384 AND c.relnamespace = pn.pnnspid); QUERY PLAN Unique (cost=14.68..14.69 rows=1 width=32) (actual time=0.121..0.126 rows=1 loops=1) InitPlan 1 (returns $0) -> Aggregate (cost=1.96..1.97 rows=1 width=1) (actual time=0.025..0.026 rows=1 loops=1) -> Seq Scan on pg_publication (cost=0.00..1.96 rows=1 width=1) (actual time=0.016..0.017 rows=1 loops=1) Filter: (pubname = 'pub1'::name) InitPlan 2 (returns $1) -> Nested Loop (cost=0.27..8.30 rows=1 width=0) (actual time=0.002..0.003 rows=0 loops=1) Join Filter: (pn.pnnspid = c.relnamespace) -> Seq Scan on pg_publication_namespace pn (cost=0.00..0.00 rows=1 width=4) (actual time=0.001..0.002 rows=0 loops=1) -> Index Scan using pg_class_oid_index on pg_class c (cost=0.27..8.29 rows=1 width=4) (never executed) Index Cond: (oid = '16384'::oid) -> Sort (cost=4.40..4.41 rows=1 width=32) (actual time=0.119..0.121 rows=1 loops=1) Sort Key: (pg_get_expr(pr.prqual, pr.prrelid)) COLLATE "C" Sort Method: quicksort Memory: 25kB -> Result (cost=0.00..4.39 rows=1 width=32) (actual time=0.094..0.098 rows=1 loops=1) One-Time Filter: ((NOT $0) AND (NOT $1)) -> Nested Loop (cost=0.00..4.39 rows=1 width=36) (actual time=0.013..0.015 rows=1 loops=1) Join Filter: (p.oid = pr.prpubid) -> Seq Scan on pg_publication p (cost=0.00..1.96 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=1) Filter: (pubname = 'pub1'::name) -> Seq Scan on pg_publication_rel pr (cost=0.00..2.41 rows=1 width=40) (actual time=0.005..0.005 rows=1 loops=1) Filter: (prrelid = '16384'::oid) Planning Time: 1.014 ms Execution Time: 0.259 ms (24 rows) 2) Existing query: postgres=# explain analyze SELECT DISTINCT pg_get_expr(prqual, prrelid) FROM pg_publication p INNER JOIN pg_publication_rel pr ON (p.oid = pr.prpubid) WHERE pr.prrelid = 16384 AND p.pubname IN ( 'pub1' ) AND NOT (select bool_or(puballtables) FROM pg_publication WHERE pubname in ( 'pub1' )) AND (SELECT count(1)=0 FROM pg_publication_namespace pn, pg_class c WHERE c.oid = 16384 AND c.relnamespace = pn.pnnspid); QUERY PLAN - Unique (cost=14.69..14.70 rows=1 width=32) (actual time=0.162..0.166 rows=1 loops=1) InitPlan 1 (returns $0) -> Aggregate (cost=1.96..1.97 rows=1 width=1) (actual time=0.023..0.025 rows=1 loops=1) -> Seq Scan on pg_publication (cost=0.00..1.96 rows=1 width=1) (actual time=0.014..0.016 rows=1 loops=1) Filter: (pubname = 'pub1'::name) InitPlan 2 (returns $1) -> Aggregate (cost=8.30..8.32 rows=1 width=1) (actual time=0.044..0.045 rows=1 loops=1) -> Nested Loop (cost=0.27..8.30 rows=1 width=0) (actual time=0.028..0.029 rows=0 loops=1) Join Filter: (pn.pnnspid = c.relnamespace) -> Seq Scan on pg_publication_namespace pn (cost=0.00..0.00 rows=1 width=4) (actual time=0.004..0.004 rows=0 loops=1) -> Index Scan using pg_class_oid_index on pg_class c (cost=0.27..8.29 rows=1 width=4) (never executed) Index Cond: (oid = '16384'::oid) -> Sort (cost=4.40..4.41 rows=1 width=32) (actual time=0.159..0.161 rows=1 loops=1) Sort Key: (pg_get_expr(pr.prqual, pr.prrelid)) COLLATE "C" Sort Method: quicksort Memory: 25kB -> Result (cost=0.00..4.39 rows=1 width=32) (actual time=0.142..0.147 rows=1 loops=1) One-Time Filter: ((NOT $0) AND $1) -> Nested Loop (cost=0.00..4.39 rows=1 width=36) (actual time=0.016..0.018 rows=1 loops=1) Join Filter: (p.oid = pr.prpubid) -> Seq Scan on pg_publication p (cost=0.00..1.96
Re: parallel vacuum comments
On Tue, Dec 21, 2021 at 10:24 PM Amit Kapila wrote: > > On Tue, Dec 21, 2021 at 11:24 AM Masahiko Sawada > wrote: > > > > On Tue, Dec 21, 2021 at 2:04 PM Amit Kapila wrote: > > > > > > > Thank you for the comment. Agreed. > > > > I've attached updated version patches. Please review them. > > > > These look mostly good to me. Please find attached the slightly edited > version of the 0001 patch. I have modified comments, ran pgindent, and > modify the commit message. I'll commit this tomorrow if you are fine > with it. Thank you for committing the first patch. I've attached an updated version second patch. Please review it. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/ v11-0001-Move-parallel-vacuum-code-to-vacuumparallel.c.patch Description: Binary data
Re: do only critical work during single-user vacuum?
On Wed, Dec 22, 2021 at 6:56 AM Peter Geoghegan wrote: > > On Tue, Dec 21, 2021 at 1:31 PM John Naylor > wrote: > > On second thought, we don't really need another number here. We could > > simply go by the existing failsafe parameter, and if the admin wants a > > different value, it's already possible to specify > > vacuum_failsafe_age/vacuum_multixact_failsafe_age in a session, > > including in single-user mode. Perhaps a new boolean called > > FAILSAFE_ONLY. If no table is specified, then when generating the list > > of tables, include only those with relfrozenxid/relminmxid greater > > than their failsafe thresholds. > > That's equivalent to the quick and dirty patch I wrote (assuming that > the user actually uses this new FAILSAFE_ONLY thing). > > But if we're going to add a new option to the VACUUM command (or > something of similar scope), then we might as well add a new behavior > that is reasonably exact -- something that (say) only *starts* a > VACUUM for those tables whose relfrozenxid age currently exceeds half > the autovacuum_freeze_max_age for the table (usually taken from the > GUC, sometimes taken from the reloption), which also forces the > failsafe. And with similar handling for > relminmxid/autovacuum_multixact_freeze_max_age. > > In other words, while triggering the failsafe is important, simply *not > starting* VACUUM for relations where there is really no need for it is > at least as important. We shouldn't even think about pruning or > freezing with these tables. (ISTM that the only thing that might be a > bit controversial about any of this is my definition of "safe", which > seems like about the right trade-off to me.) +1 > > This new command/facility should probably not be a new flag to the > VACUUM command, as such. Rather, I think that it should either be an > SQL-callable function, or a dedicated top-level command (that doesn't > accept any tables). The only reason to have this is for scenarios > where the user is already in a tough spot with wraparound failure, > like that client of yours. Nobody wants to force the failsafe for one > specific table. It's not general purpose, at all, and shouldn't claim > to be. Even not in the situation where the database has to run as the single-user mode to freeze tuples, I think there would be some use cases where users want to run vacuum (in failsafe mode) on tables with relfrozenxid/relminmxid greater than their failsafe thresholds before falling into that situation. I think it’s common that users are somehow monitoring relfrozenxid/relminmxid and want to manually run vacuum on them rather than relying on autovacuums. --min-xid-age option and --min-mxid-age option of vacuumdb command would be good examples. So I think this new command/facility might not necessarily need to be specific to single-user mode. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: sequences vs. synchronous replication
On 12/19/21 04:03, Amit Kapila wrote: On Sat, Dec 18, 2021 at 7:24 AM Tomas Vondra wrote: while working on logical decoding of sequences, I ran into an issue with nextval() in a transaction that rolls back, described in [1]. But after thinking about it a bit more (and chatting with Petr Jelinek), I think this issue affects physical sync replication too. Imagine you have a primary <-> sync_replica cluster, and you do this: CREATE SEQUENCE s; -- shutdown the sync replica BEGIN; SELECT nextval('s') FROM generate_series(1,50); ROLLBACK; BEGIN; SELECT nextval('s'); COMMIT; The natural expectation would be the COMMIT gets stuck, waiting for the sync replica (which is not running), right? But it does not. How about if we always WAL log the first sequence change in a transaction? I've been thinking about doing something like this, but I think it would not have any significant advantages compared to using "SEQ_LOG_VALS 0". It would still have the same performance hit for plain nextval() calls, and there's no measurable impact on simple workloads that already write WAL in transactions even with SEQ_LOG_VALS 0. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Checkpointer crashes with "PANIC: could not fsync file "pg_tblspc/.."
On Wed, 22 Dec 2021 at 12:28 AM, Ashutosh Sharma wrote: > > Is it okay to share the same tablespace (infact relfile) between the > primary and standby server? Perhaps NO. > > Oops, yeah absolutely they can never share the tablespace path. So we can ignore this issue. — Dilip -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: CREATEROLE and role ownership hierarchies
> On Dec 21, 2021, at 5:11 PM, Shinya Kato > wrote: > > I fixed the patches because they cannot be applied to HEAD. Thank you. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements
On 03.12.2021 19:55, Andrei Zubkov wrote: On Fri, 2021-12-03 at 17:03 +0300, Andrei Zubkov wrote: ... What if we'll create a new view for such resettable fields? It will make description of views and reset functions in the docs much more clear. Hi, Andrey! I completely agree that creating a separate view for these new fields is the most correct thing to do. As for variable names, the term global is already used for global statistics, in particular in the struct pgssGlobalStats. The considered timestamps refer to per statement level as pointed out in the struct pgssEntry's comment. Therefore, i think it's better to rename gstats_since to just stats_reset in the same way. Also it might be better to use the term 'auxiliary' and use the same approach as for existent similar vars. So internally it might look something like this: double aux_min_time[PGSS_NUMKIND]; double aux_max_time[PGSS_NUMKIND]; TimestampTz aux_stats_reset; And at the view level: aux_min_plan_time float8, aux_max_plan_time float8, aux_min_exec_time float8, aux_max_exec_time float8, aux_stats_reset timestamp with time zone Functions names might be pg_stat_statements_aux() and pg_stat_statements_aux_reset(). The top-level application may find out period the aux extrema were collected by determining which reset was closer as follows: data_collect_period = aux_stats_reset > stats_reset ? now - aux_stats_reset : now - stats_reset; and decide not to trust this data if the period was too small. For correct work aux_stats_reset must be updated and aux extrema values must be reset simultaneously with updating of stats_reset therefore some synchronization needed to avoid race with possible external reset. I've tested the patch v4 and didn't find any evident problems. Contrib installcheck said: test pg_stat_statements ... ok 163 ms test oldextversions ... ok 46 ms With best regards, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
RE: parallel vacuum comments
On Wed, Dec 22, 2021 8:38 AM Masahiko Sawada wrote: > On Tue, Dec 21, 2021 at 10:24 PM Amit Kapila > wrote: > > > > On Tue, Dec 21, 2021 at 11:24 AM Masahiko Sawada > wrote: > > > > > > On Tue, Dec 21, 2021 at 2:04 PM Amit Kapila > wrote: > > > > > > > > > > Thank you for the comment. Agreed. > > > > > > I've attached updated version patches. Please review them. > > > > > > > These look mostly good to me. Please find attached the slightly edited > > version of the 0001 patch. I have modified comments, ran pgindent, and > > modify the commit message. I'll commit this tomorrow if you are fine > > with it. > > It looks good to me. +1, the patch passes check-world and looks good to me. Best regards, Hou zj
Re: Use extended statistics to estimate (Var op Var) clauses
> On Dec 21, 2021, at 4:28 PM, Mark Dilger wrote: > > Maybe there is some reason this is ok. ... and there is. Sorry for the noise. The planner appears to be smart enough to know that column "salary" is not being changed, and therefore NEW.salary and OLD.salary are equal. If I test a different update statement that contains a new value for "salary", the added assertion is not triggered. (I didn't quite realize what the clause's varnosyn field was telling me until after I hit "send".) — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: parallel vacuum comments
On Tue, Dec 21, 2021 at 10:24 PM Amit Kapila wrote: > > On Tue, Dec 21, 2021 at 11:24 AM Masahiko Sawada > wrote: > > > > On Tue, Dec 21, 2021 at 2:04 PM Amit Kapila wrote: > > > > > > > Thank you for the comment. Agreed. > > > > I've attached updated version patches. Please review them. > > > > These look mostly good to me. Please find attached the slightly edited > version of the 0001 patch. I have modified comments, ran pgindent, and > modify the commit message. I'll commit this tomorrow if you are fine > with it. Thank you for the patch! It looks good to me. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Use extended statistics to estimate (Var op Var) clauses
> On Dec 12, 2021, at 6:21 PM, Tomas Vondra > wrote: > > <0001-Improve-estimates-for-Var-op-Var-with-the-same-Var.patch> +* It it's (variable = variable) with the same variable on both sides, it's s/It it's/If it's/ 0001 lacks regression coverage. > <0002-simplification.patch> Changing comments introduced by patch 0001 in patch 0002 just creates git churn: -* estimate the selectivity as 1.0 (or 0.0 if it's negated). +* estimate the selectivity as 1.0 (or 0.0 when it's negated). and: * matching_restriction_variable - * Examine the args of a restriction clause to see if it's of the - * form (variable op variable) with the same variable on both sides. + * Check if the two arguments of a restriction clause refer to the same + * variable, i.e. if the condition is of the form (variable op variable). + * We can deduce selectivity for such (in)equality clauses. 0002 also lacks regression coverage. > <0003-relax-the-restrictions.patch> 0003 also lacks regression coverage. > <0004-main-patch.patch> Ok. > <0005-Don-t-treat-Var-op-Var-as-simple-clauses.patch> 0005 again lacks regression coverage. There might be a problem in how selectivity thinks about comparison between identical columns from the NEW and OLD pseudotables. To show this, add an Assert to see where matching_restriction_variables() might return true: --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -4952,6 +4952,8 @@ matching_restriction_variables(PlannerInfo *root, List *args, int varRelid) return false; /* The two variables need to match */ + Assert(!equal(left, right)); + return equal(left, right); This results in the regression tests failing on "update rtest_emp set ename = 'wiecx' where ename = 'wiecc';". It may seem counterintuitive that matching_restriction_variables() would return true for a where-clause with only one occurrence of variable "ename", until you read the rule defined in rules.sql: create rule rtest_emp_upd as on update to rtest_emp where new.salary != old.salary do insert into rtest_emplog values (new.ename, current_user, 'honored', new.salary, old.salary); I think what's really happening here is that "new.salary != old.salary" is being processed by matching_restriction_variables() and returning that the new.salary refers to the same thing that old.salary refers to. Here is the full stack trace, for reference: (lldb) bt * thread #1, stop reason = signal SIGSTOP frame #0: 0x7fff6d8fd33a libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x7fff6d9b9e60 libsystem_pthread.dylib`pthread_kill + 430 frame #2: 0x7fff6d884808 libsystem_c.dylib`abort + 120 frame #3: 0x0001048a6c31 postgres`ExceptionalCondition(conditionName="!equal(left, right)", errorType="FailedAssertion", fileName="selfuncs.c", lineNumber=4955) at assert.c:69:2 frame #4: 0x00010481e733 postgres`matching_restriction_variables(root=0x7fe65e02b2d0, args=0x7fe65e02bf38, varRelid=0) at selfuncs.c:4955:2 frame #5: 0x00010480f63c postgres`eqsel_internal(fcinfo=0x7ffeebb9aeb8, negate=true) at selfuncs.c:265:6 frame #6: 0x00010481040a postgres`neqsel(fcinfo=0x7ffeebb9aeb8) at selfuncs.c:565:2 frame #7: 0x0001048b420c postgres`FunctionCall4Coll(flinfo=0x7ffeebb9af38, collation=0, arg1=140627396440784, arg2=901, arg3=140627396443960, arg4=0) at fmgr.c:1212:11 frame #8: 0x0001048b50e6 postgres`OidFunctionCall4Coll(functionId=102, collation=0, arg1=140627396440784, arg2=901, arg3=140627396443960, arg4=0) at fmgr.c:1448:9 * frame #9: 0x000104557e4d postgres`restriction_selectivity(root=0x7fe65e02b2d0, operatorid=901, args=0x7fe65e02bf38, inputcollid=0, varRelid=0) at plancat.c:1828:26 frame #10: 0x0001044d4c76 postgres`clause_selectivity_ext(root=0x7fe65e02b2d0, clause=0x7fe65e02bfe8, varRelid=0, jointype=JOIN_INNER, sjinfo=0x, use_extended_stats=true) at clausesel.c:902:10 frame #11: 0x0001044d4186 postgres`clauselist_selectivity_ext(root=0x7fe65e02b2d0, clauses=0x7fe65e02cdd0, varRelid=0, jointype=JOIN_INNER, sjinfo=0x, use_extended_stats=true) at clausesel.c:185:8 frame #12: 0x0001044d3f97 postgres`clauselist_selectivity(root=0x7fe65e02b2d0, clauses=0x7fe65e02cdd0, varRelid=0, jointype=JOIN_INNER, sjinfo=0x) at clausesel.c:108:9 frame #13: 0x0001044de192 postgres`set_baserel_size_estimates(root=0x7fe65e02b2d0, rel=0x7fe65e01e640) at costsize.c:4931:3 frame #14: 0x0001044d16be postgres`set_plain_rel_size(root=0x7fe65e02b2d0, rel=0x7fe65e01e640, rte=0x7fe65e02ac40) at allpaths.c:584:2 frame #15: 0x0001044d0a79 postgres`set_rel_size(root=0x7fe65e02b2d0, rel=0x7fe65e01e640, rti=1, rte=0x7fe65e02ac40) at allpaths.c:413:6 frame #16:
Re: row filtering for logical replication
On Tue, Dec 21, 2021 at 5:58 AM Euler Taveira wrote: > > On Mon, Dec 20, 2021, at 12:10 AM, houzj.f...@fujitsu.com wrote: > > Attach the V49 patch set, which addressed all the above comments on the 0002 > patch. > > I've been testing the latest versions of this patch set. I'm attaching a new > patch set based on v49. The suggested fixes are in separate patches after the > current one so it is easier to integrate them into the related patch. The > majority of these changes explains some decision to improve readability IMO. > > row-filter x row filter. I'm not a native speaker but "row filter" is widely > used in similar contexts so I suggest to use it. (I didn't adjust the commit > messages) Fixed in v51* [1]. And I also updated the commit comments. > > An ancient patch use the term coerce but it was changed to cast. Coercion > implies an implicit conversion [1]. If you look at a few lines above you will > see that this expression expects an implicit conversion. Fixed in v51* [1] > > I modified the query to obtain the row filter expressions to (i) add the > schema > pg_catalog to some objects and (ii) use NOT EXISTS instead of subquery (it > reads better IMO). Not changed in v51, but IIUC this might be fixed soon if it is confirmed to be better. [2 Amit] > A detail message requires you to capitalize the first word of sentences and > includes a period at the end. Fixed in v51* [1] > > It seems all server messages and documentation use the terminology "WHERE > clause". Let's adopt it instead of "row filter". Fixed in v51* [1] > > I reviewed 0003. It uses TupleTableSlot instead of HeapTuple. I probably > missed > the explanation but it requires more changes (logicalrep_write_tuple and 3 new > entries into RelationSyncEntry). I replaced this patch with a slightly > different one (0005 in this patch set) that uses HeapTuple instead. I didn't > only simple tests and it requires tests. I noticed that this patch does not > include a test to cover the case where TOASTed values are not included in the > new tuple. We should probably add one. Not changed in v51. See response from Ajin [3 Ajin]. > > I agree with Amit that it is a good idea to merge 0001, 0002, and 0005. I > would > probably merge 0004 because it is just isolated changes. Fixed in v51* [1] per Amit's suggestion (so the 0004 is still separate) -- [1] https://www.postgresql.org/message-id/CAHut%2BPs%2BdACvefCZasRE%3DP%3DDtaNmQvM3kiGyKyBHANA0yGcTZw%40mail.gmail.com [2 Amit] https://www.postgresql.org/message-id/CAA4eK1KwoA5k8v9z9e4ZPN_X%3D1GAmQmsWyauFwZpKiSHqy6eZA%40mail.gmail.com [3 Ajin] https://www.postgresql.org/message-id/CAFPTHDbfpPNh3GLGjySS%3DAuRfbQPQFNvfiyG1GDQW2kz1yT7Og%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: row filtering for logical replication
On Mon, Dec 20, 2021 at 9:30 PM Amit Kapila wrote: > > On Mon, Dec 20, 2021 at 8:41 AM houzj.f...@fujitsu.com > wrote: > > > > Thanks for the comments, I agree with all the comments. > > Attach the V49 patch set, which addressed all the above comments on the 0002 > > patch. > > > > Few comments/suugestions: > == > 1. > + Oid publish_as_relid = InvalidOid; > + > + /* > + * For a partition, if pubviaroot is true, check if any of the > + * ancestors are published. If so, note down the topmost ancestor > + * that is published via this publication, the row filter > + * expression on which will be used to filter the partition's > + * changes. We could have got the topmost ancestor when collecting > + * the publication oids, but that will make the code more > + * complicated. > + */ > + if (pubform->pubviaroot && relation->rd_rel->relispartition) > + { > + if (pubform->puballtables) > + publish_as_relid = llast_oid(ancestors); > + else > + publish_as_relid = GetTopMostAncestorInPublication(pubform->oid, > +ancestors); > + } > + > + if (publish_as_relid == InvalidOid) > + publish_as_relid = relid; > > I think you can initialize publish_as_relid as relid and then later > override it if required. That will save the additional check of > publish_as_relid. > Fixed in v51* [1] > 2. I think your previous version code in GetRelationPublicationActions > was better as now we have to call memcpy at two places. > Fixed in v51* [1] > 3. > + > + if (list_member_oid(GetRelationPublications(ancestor), > + puboid) || > + list_member_oid(GetSchemaPublications(get_rel_namespace(ancestor)), > + puboid)) > + { > + topmost_relid = ancestor; > + } > > I think here we don't need to use braces ({}) as there is just a > single statement in the condition. > Fixed in v51* [1] > 4. > +#define IDX_PUBACTION_n 3 > + ExprState*exprstate[IDX_PUBACTION_n]; /* ExprState array for row filter. > +One per publication action. */ > .. > .. > > I think we can have this define outside the structure. I don't like > this define name, can we name it NUM_ROWFILTER_TYPES or something like > that? > Partly fixed in v51* [1], I've changed the #define name but I did not move it. The adjacent comment talks about these ExprState caches and explains the reason why the number is 3. So if I move the #define then half that comment would have to move with it. I thought it is better to keep all the related parts grouped together with the one explanatory comment, but if you still want the #define moved please confirm and I can do it in a future version. > I think we can now merge 0001, 0002, and 0005. We are still evaluating > the performance for 0003, so it is better to keep it separate. We can > take the decision to merge it once we are done with our evaluation. > Merged as suggested in v51* [1] -- [1] https://www.postgresql.org/message-id/CAHut%2BPs%2BdACvefCZasRE%3DP%3DDtaNmQvM3kiGyKyBHANA0yGcTZw%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: do only critical work during single-user vacuum?
On Tue, Dec 21, 2021 at 5:56 PM Peter Geoghegan wrote: > > On Tue, Dec 21, 2021 at 1:31 PM John Naylor > wrote: > > On second thought, we don't really need another number here. We could > > simply go by the existing failsafe parameter, and if the admin wants a > > different value, it's already possible to specify > > vacuum_failsafe_age/vacuum_multixact_failsafe_age in a session, > > including in single-user mode. Perhaps a new boolean called > > FAILSAFE_ONLY. If no table is specified, then when generating the list > > of tables, include only those with relfrozenxid/relminmxid greater > > than their failsafe thresholds. > > That's equivalent to the quick and dirty patch I wrote (assuming that > the user actually uses this new FAILSAFE_ONLY thing). Equivalent but optional. > But if we're going to add a new option to the VACUUM command (or > something of similar scope), then we might as well add a new behavior > that is reasonably exact -- something that (say) only *starts* a > VACUUM for those tables whose relfrozenxid age currently exceeds half > the autovacuum_freeze_max_age for the table (usually taken from the > GUC, sometimes taken from the reloption), which also forces the > failsafe. And with similar handling for > relminmxid/autovacuum_multixact_freeze_max_age. > > In other words, while triggering the failsafe is important, simply *not > starting* VACUUM for relations where there is really no need for it is > at least as important. We shouldn't even think about pruning or > freezing with these tables. Right, not starting where not necessary is crucial for getting out of single-user mode as quickly as possible. We're in agreement there. > (ISTM that the only thing that might be a > bit controversial about any of this is my definition of "safe", which > seems like about the right trade-off to me.) It seems reasonable to want to start back up and not immediately have anti-wraparound vacuums kick in. On the other hand, it's not good to do work while unable to monitor progress, and while more WAL is piling up. I'm not sure where the right trade off is. > This new command/facility should probably not be a new flag to the > VACUUM command, as such. Rather, I think that it should either be an > SQL-callable function, or a dedicated top-level command (that doesn't > accept any tables). The only reason to have this is for scenarios > where the user is already in a tough spot with wraparound failure, > like that client of yours. Nobody wants to force the failsafe for one > specific table. It's not general purpose, at all, and shouldn't claim > to be. Makes sense, I'll have a think about what that would look like. -- John Naylor EDB: http://www.enterprisedb.com
Re: A test for replay of regression tests
Hi, Rebased and updated based on feedback. Responses to multiple emails below: On Thu, Dec 16, 2021 at 1:22 PM Michael Paquier wrote: > On Wed, Dec 15, 2021 at 05:50:45PM +0900, Michael Paquier wrote: > > Hmm. FWIW, I am working on doing similar for pg_upgrade to switch to > > TAP there, and we share a lot in terms of running pg_regress on an > > exising cluster. Wouldn't it be better to move this definition to > > src/Makefile.global.in rather than src/test/recovery/? > > > > My pg_regress command is actually very similar to yours, so I am > > wondering if this would be better if somehow centralized, perhaps in > > Cluster.pm. Thanks for looking. Right, it sounds like you'll have the same problems I ran into. I haven't updated this patch for that yet, as I'm not sure exactly what you need and we could easily move it in a later commit. Does that seem reasonable? > By the way, while I was sorting out my things, I have noticed that v4 > does not handle EXTRA_REGRESS_OPT. Is that wanted? You could just > add that into your patch set and push the extra options to the > pg_regress command: > my $extra_opts_val = $ENV{EXTRA_REGRESS_OPT} || ""; > my @extra_opts = split(/\s+/, $extra_opts_val); Seems like a good idea for consistency, but isn't that a variable that's supposed to be expanded by a shell, not naively split on whitespace? Perhaps we should use the single-argument variant of system(), so the whole escaped enchilada is passed to a shell? Tried like that in this version (though now I'm wondering what the correct perl incantation is to shell-escape $outputdir and $dlpath...) On Sat, Dec 11, 2021 at 1:17 PM Andres Freund wrote: > On 2021-12-10 12:58:01 +1300, Thomas Munro wrote: > > +$node_primary->adjust_conf('postgresql.conf', 'max_connections', '25', 1); > > Possible that this will cause problem on some *BSD platform with a limited > count of semaphores. But we can deal with that if / when it happens. Right, those systems don't work out of the box for us already without sysctl tweaks, so it doesn't matter if animals have to be adjusted further. > > @@ -590,16 +595,35 @@ create_tablespace_directories(const char *location, > > const Oid tablespaceoid) > > char *linkloc; > > char *location_with_version_dir; > > struct stat st; > > + boolin_place; > > > > linkloc = psprintf("pg_tblspc/%u", tablespaceoid); > > - location_with_version_dir = psprintf("%s/%s", location, > > + > > + /* > > + * If we're asked to make an 'in place' tablespace, create the > > directory > > + * directly where the symlink would normally go. This is a > > developer-only > > + * option for now, to facilitate regression testing. > > + */ > > + in_place = > > + (allow_in_place_tablespaces || InRecovery) && > > strlen(location) == 0; > > Why is in_place set to true by InRecovery? Well the real condition is strlen(location) == 0, and the other part is a sort of bit belt-and-braces check, but yeah, I should just remove that part. Done. > ISTM that allow_in_place_tablespaces should be checked in CreateTableSpace(), > and create_tablespace_directories() should just do whatever it's told? > Otherwise it seems there's ample potential for confusion, e.g. because of the > GUC differing between primary and replica, or between crash and a new start. Agreed, that was the effect but the extra unnecessary check was a bit confusing. > > /* > >* Attempt to coerce target directory to safe permissions. If this > > fails, > >* it doesn't exist or has the wrong owner. > >*/ > > - if (chmod(location, pg_dir_create_mode) != 0) > > + if (!in_place && chmod(location, pg_dir_create_mode) != 0) > > { > > if (errno == ENOENT) > > ereport(ERROR, > > Maybe add a coment saying that we don't need to chmod here because > MakePGDirectory() takes care of that? Done. > > @@ -648,13 +672,13 @@ create_tablespace_directories(const char *location, > > const Oid tablespaceoid) > > /* > >* In recovery, remove old symlink, in case it points to the wrong > > place. > >*/ > > - if (InRecovery) > > + if (!in_place && InRecovery) > > remove_tablespace_symlink(linkloc); > > I don't think it's right to check !in_place as you do here, given that it > currently depends on a GUC setting that's possibly differs between WAL > generation and replay time? I have to, because otherwise we'll remove the directory we just created at the top of the function. It doesn't really depend on a GUC (clearer after previous change). > Perhaps also add a test that we catch "in-place" tablespace creation with > allow_in_place_tablespaces = false? Although perhaps that's better done in the > previous commit... There was a test for that already, see this bit: +-- empty tablespace locations are not usually allowed +CREATE TABLESPACE regress_tblspace
Re: do only critical work during single-user vacuum?
On Tue, Dec 21, 2021 at 1:31 PM John Naylor wrote: > On second thought, we don't really need another number here. We could > simply go by the existing failsafe parameter, and if the admin wants a > different value, it's already possible to specify > vacuum_failsafe_age/vacuum_multixact_failsafe_age in a session, > including in single-user mode. Perhaps a new boolean called > FAILSAFE_ONLY. If no table is specified, then when generating the list > of tables, include only those with relfrozenxid/relminmxid greater > than their failsafe thresholds. That's equivalent to the quick and dirty patch I wrote (assuming that the user actually uses this new FAILSAFE_ONLY thing). But if we're going to add a new option to the VACUUM command (or something of similar scope), then we might as well add a new behavior that is reasonably exact -- something that (say) only *starts* a VACUUM for those tables whose relfrozenxid age currently exceeds half the autovacuum_freeze_max_age for the table (usually taken from the GUC, sometimes taken from the reloption), which also forces the failsafe. And with similar handling for relminmxid/autovacuum_multixact_freeze_max_age. In other words, while triggering the failsafe is important, simply *not starting* VACUUM for relations where there is really no need for it is at least as important. We shouldn't even think about pruning or freezing with these tables. (ISTM that the only thing that might be a bit controversial about any of this is my definition of "safe", which seems like about the right trade-off to me.) This new command/facility should probably not be a new flag to the VACUUM command, as such. Rather, I think that it should either be an SQL-callable function, or a dedicated top-level command (that doesn't accept any tables). The only reason to have this is for scenarios where the user is already in a tough spot with wraparound failure, like that client of yours. Nobody wants to force the failsafe for one specific table. It's not general purpose, at all, and shouldn't claim to be. -- Peter Geoghegan
Re: daitch_mokotoff module
Hello again, It turns out that there actually exists an(other) implementation of the Daitch-Mokotoff Soundex System which gets it right; the JOS Soundex Calculator at https://www.jewishgen.org/jos/jossound.htm Other implementations I have been able to find, like the one in Apache Commons Coded used in e.g. Elasticsearch, are far from correct. The source code for the JOS Soundex Calculator is not available, as far as I can tell, however I have run the complete list of 98412 last names at https://raw.githubusercontent.com/philipperemy/name-dataset/master/names_dataset/v1/last_names.all.txt through the calculator, in order to have a good basis for comparison. This revealed a few shortcomings in my implementation. In particular I had to go back to the drawing board in order to handle the dual nature of "J" correctly. "J" can be either a vowel or a consonant in Daitch-Mokotoff soundex, and this complicates encoding of the *previous* character. I have also done a more thorough review and refactoring of the code, which should hopefully make things quite a bit more understandable to others. The changes are summarized as follows: * Returns NULL for input without any encodable characters. * Uses the same "unoffical" rules for "UE" as other implementations. * Correctly considers "J" as either a vowel or a consonant. * Corrected encoding for e.g. "HANNMANN". * Code refactoring and comments for readability. * Better examples in the documentation. The implementation is now in correspondence with the JOS Soundex Calculator for the 98412 last names mentioned above, with only the following exceptions: JOS: cedeño 43 53 PG: cedeño 436000 536000 JOS: sadab(khura) 437000 PG: sadab(khura) 437590 I hope this addition to the fuzzystrmatch extension module will prove to be useful to others as well! This is my very first code contribution to PostgreSQL, and I would be grateful for any advice on how to proceed in order to get the patch accepted. Best regards Dag Lem diff --git a/contrib/fuzzystrmatch/Makefile b/contrib/fuzzystrmatch/Makefile index 0704894f88..826e529e3e 100644 --- a/contrib/fuzzystrmatch/Makefile +++ b/contrib/fuzzystrmatch/Makefile @@ -3,11 +3,12 @@ MODULE_big = fuzzystrmatch OBJS = \ $(WIN32RES) \ + daitch_mokotoff.o \ dmetaphone.o \ fuzzystrmatch.o EXTENSION = fuzzystrmatch -DATA = fuzzystrmatch--1.1.sql fuzzystrmatch--1.0--1.1.sql +DATA = fuzzystrmatch--1.2.sql fuzzystrmatch--1.1--1.2.sql fuzzystrmatch--1.0--1.1.sql PGFILEDESC = "fuzzystrmatch - similarities and distance between strings" REGRESS = fuzzystrmatch @@ -22,3 +23,8 @@ top_builddir = ../.. include $(top_builddir)/src/Makefile.global include $(top_srcdir)/contrib/contrib-global.mk endif + +daitch_mokotoff.o: daitch_mokotoff.h + +daitch_mokotoff.h: daitch_mokotoff_header.pl + perl $< > $@ diff --git a/contrib/fuzzystrmatch/daitch_mokotoff.c b/contrib/fuzzystrmatch/daitch_mokotoff.c new file mode 100644 index 00..1b7263c349 --- /dev/null +++ b/contrib/fuzzystrmatch/daitch_mokotoff.c @@ -0,0 +1,593 @@ +/* + * Daitch-Mokotoff Soundex + * + * Copyright (c) 2021 Finance Norway + * Author: Dag Lem + * + * This implementation of the Daitch-Mokotoff Soundex System aims at high + * performance. + * + * - The processing of each phoneme is initiated by an O(1) table lookup. + * - For phonemes containing more than one character, a coding tree is traversed + * to process the complete phoneme. + * - The (alternate) soundex codes are produced digit by digit in-place in + * another tree structure. + * + * References: + * + * https://www.avotaynu.com/soundex.htm + * https://www.jewishgen.org/InfoFiles/Soundex.html + * https://familypedia.fandom.com/wiki/Daitch-Mokotoff_Soundex + * https://stevemorse.org/census/soundex.html (dmlat.php, dmsoundex.php) + * https://github.com/apache/commons-codec/ (dmrules.txt, DaitchMokotoffSoundex.java) + * https://metacpan.org/pod/Text::Phonetic (DaitchMokotoff.pm) + * + * A few notes on other implementations: + * + * - All other known implementations have the same unofficial rules for "UE", + * these are also adapted by this implementation (0, 1, NC). + * - The only other known implementation which is capable of generating all + * correct soundex codes in all cases is the JOS Soundex Calculator at + * https://www.jewishgen.org/jos/jossound.htm + * - "J" is considered (only) a vowel in dmlat.php + * - The official rules for "RS" are commented out in dmlat.php + * - Identical code digits for adjacent letters are not collapsed correctly in + * dmsoundex.php when double digit codes are involved. E.g. "BESST" yields + * 744300 instead of 743000 as for "BEST". + * - "J" is considered (only) a consonant in DaitchMokotoffSoundex.java + * - "Y" is not considered a vowel in DaitchMokotoffSoundex.java + * + * Permission to use, copy, modify, and distribute this software and its + * documentation for any purpose, without fee, and without a written agreement + * is hereby granted,
Re: do only critical work during single-user vacuum?
I wrote: > Instead of a boolean, it seems like the new option should specify some > age below which VACUUM will skip the table entirely, and above which > will enter fail-safe mode. As mentioned earlier, the shutdown hint > could spell out the exact command. With this design, it would specify > the fail-safe default, or something else, to use with the option. On second thought, we don't really need another number here. We could simply go by the existing failsafe parameter, and if the admin wants a different value, it's already possible to specify vacuum_failsafe_age/vacuum_multixact_failsafe_age in a session, including in single-user mode. Perhaps a new boolean called FAILSAFE_ONLY. If no table is specified, then when generating the list of tables, include only those with relfrozenxid/relminmxid greater than their failsafe thresholds. -- John Naylor EDB: http://www.enterprisedb.com
Re: Buildfarm support for older versions
On 12/16/2021 3:23 pm, Andrew Dunstan wrote: On 12/16/21 15:53, Larry Rosenman wrote: I get: ERROR for site owner: Invalid domain for site key on https://pgbuildfarm.org/cgi-bin/register-form.pl try https://buildfarm.postgresql.org/cgi-bin/register-form.pl cheers andrew I filled out that form on the 16th, and haven't gotten a new animal assignment. Is there a problem with my data? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: XMAX_LOCK_ONLY and XMAX_COMMITTED (fk/multixact code)
> On Dec 1, 2021, at 10:59 AM, Bossart, Nathan wrote: > > here is a v3 It took a while for me to get to this @@ -1304,33 +1370,46 @@ HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer, TransactionId *de if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask)) { + if (unlikely(tuple->t_infomask & HEAP_XMAX_COMMITTED)) + { + if (tuple->t_infomask & HEAP_XMAX_LOCK_ONLY) + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), +errmsg_internal("found tuple with HEAP_XMAX_COMMITTED " +"and HEAP_XMAX_LOCK_ONLY"))); + + /* pre-v9.3 lock-only bit pattern */ + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), +errmsg_internal("found tuple with HEAP_XMAX_COMMITTED and" +"HEAP_XMAX_EXCL_LOCK set and " +"HEAP_XMAX_IS_MULTI unset"))); + } + I find this bit hard to understand. Does the comment mean to suggest that the *upgrade* process should have eliminated all pre-v9.3 bit patterns, and therefore any such existing patterns are certainly corruption, or does it mean that data written by pre-v9.3 servers (and not subsequently updated) is defined as corrupt, or ? I am not complaining that the logic is wrong, just trying to wrap my head around what the comment means. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: minor gripe about lax reloptions parsing for views
Rebased patch attached: v3-0001-Reject-storage-options-in-toast-namespace-in-view.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Checkpointer crashes with "PANIC: could not fsync file "pg_tblspc/.."
This is happening because in the case of the primary server, we let the checkpointer process to unlink the first segment of the rel file but that's not the case with the standby server. In case of standby, the startup/recovery process unlinks the first segment of the rel file immediately during WAL replay. Now, in this case as the tablespace path is shared between the primary and standby servers, when the startup process unlinks the first segment of the rel file, it gets unlinked/deleted for the primary server as well. So, when we run the checkpoint on the primary server the subsequent fsync fails with the error "No such file.." which causes the database system to PANIC. Please have a look at the code below in mdunlinkfork(). if (isRedo || forkNum != MAIN_FORKNUM || RelFileNodeBackendIsTemp(rnode)) { if (!RelFileNodeBackendIsTemp(rnode)) { /* Prevent other backends' fds from holding on to the disk space */ ret = do_truncate(path); /* Forget any pending sync requests for the first segment */ register_forget_request(rnode, forkNum, 0 /* first seg */ ); } else ret = 0; /* Next unlink the file, unless it was already found to be missing */ if (ret == 0 || errno != ENOENT) { ret = unlink(path); if (ret < 0 && errno != ENOENT) ereport(WARNING, (errcode_for_file_access(), errmsg("could not remove file \"%s\": %m", path))); } } else { /* Prevent other backends' fds from holding on to the disk space */ ret = do_truncate(path); /* Register request to unlink first segment later */ register_unlink_segment(rnode, forkNum, 0 /* first seg */ ); } == Is it okay to share the same tablespace (infact relfile) between the primary and standby server? Perhaps NO. -- With Regards, Ashutosh Sharma. On Tue, Dec 21, 2021 at 4:47 PM Dilip Kumar wrote: > While testing the below case with the hot standby setup (with the > latest code), I have noticed that the checkpointer process crashed > with the $subject error. As per my observation, we have registered the > SYNC_REQUEST when inserting some tuple into the table, and later on > ALTER SET TABLESPACE we have registered the SYNC_UNLINK_REQUEST, which > looks fine so far, then I have noticed that only when the standby is > connected the underlying table file w.r.t the old tablespace is > already deleted. Now, in AbsorbFsyncRequests we don't do anything for > the SYNC_REQUEST even though we have SYNC_UNLINK_REQUEST for the same > file, but since the underlying file is already deleted the > checkpointer cashed while processing the SYNC_REQUEST. > > I have spent some time on this but could not figure out how the > relfilenodenode file w.r.t. to the old tablespace is getting deleted > and if I disconnect the standby then it is not getting deleted, not > sure how walsender is playing a role in deleting the file even before > checkpointer process the unlink request. > > postgres[8905]=# create tablespace tab location > '/home/dilipkumar/work/PG/install/bin/test'; > CREATE TABLESPACE > postgres[8905]=# create tablespace tab1 location > '/home/dilipkumar/work/PG/install/bin/test1'; > CREATE TABLESPACE > postgres[8905]=# create database test tablespace tab; > CREATE DATABASE > postgres[8905]=# \c test > You are now connected to database "test" as user "dilipkumar". > test[8912]=# create table t( a int PRIMARY KEY,b text); > CREATE TABLE > test[8912]=# insert into t values (generate_series(1,10), 'aaa'); > INSERT 0 10 > test[8912]=# alter table t set tablespace tab1 ; > ALTER TABLE > test[8912]=# CHECKPOINT ; > WARNING: 57P02: terminating connection because of crash of another > server process > > log shows: > PANIC: could not fsync file > "pg_tblspc/16384/PG_15_202112131/16386/16387": No such file or > directory > > backtrace: > #0 0x7f2f865ff387 in raise () from /lib64/libc.so.6 > #1 0x7f2f86600a78 in abort () from /lib64/libc.so.6 > #2 0x00b13da3 in errfinish (filename=0xcf283f "sync.c", .. > #3 0x00978dc7 in ProcessSyncRequests () at sync.c:439 > #4 0x005949d2 in CheckPointGuts (checkPointRedo=67653624, > flags=108) at xlog.c:9590 > #5 0x005942fe in CreateCheckPoint (flags=108) at xlog.c:9318 > #6 0x008a80b7 in CheckpointerMain () at checkpointer.c:444 > > Note: This smaller test case is derived from one of the bigger > scenarios raised by Neha Sharma [1] > > [1] > https://www.postgresql.org/message-id/CANiYTQs0E8TcB11eU0C4eNN0tUd%3DSQqsqEtL1AVZP1%3DEnD-49A%40mail.gmail.com > > -- > Regards, > Dilip Kumar > EnterpriseDB: http://www.enterprisedb.com > > >
Re: pg14 psql broke \d datname.nspname.relname
Rebased patch attached: v3-0001-Reject-patterns-with-too-many-parts-or-wrong-db.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Emit a warning if the extension's GUC is set incorrectly
I wrote: > 0003 looks to me like it was superseded by 75d22069e. I do not > particularly like that patch though; it seems both inefficient > and ill-designed. 0003 is adding a check in an equally bizarre > place. Why isn't add_placeholder_variable the place to deal > with this? Also, once we know that a prefix is reserved, > why don't we throw an error not just a warning for attempts to > set some unknown variable under that prefix? We don't allow > you to set unknown non-prefixed variables. Concretely, I think we should do the attached, which removes much of 75d22069e and does the check at the point of placeholder creation. (After looking closer, add_placeholder_variable's caller is the function that is responsible for rejecting bad names, not add_placeholder_variable itself.) This also fixes an indisputable bug in 75d22069e, namely that it did prefix comparisons incorrectly and would throw warnings for cases it shouldn't. Reserving "plpgsql" shouldn't have the effect of reserving "pl" as well. I'm tempted to propose that we also rename EmitWarningsOnPlaceholders to something like MarkGUCPrefixReserved, to more clearly reflect what it does now. (We could provide the old name as a macro alias to avoid breaking extensions needlessly.) regards, tom lane diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index bff949a40b..f345eceb5b 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -148,6 +148,8 @@ extern bool optimize_bounded_sort; static int GUC_check_errcode_value; +static List *reserved_class_prefix = NIL; + /* global variables for check hook support */ char *GUC_check_errmsg_string; char *GUC_check_errdetail_string; @@ -235,8 +237,6 @@ static bool check_recovery_target_lsn(char **newval, void **extra, GucSource sou static void assign_recovery_target_lsn(const char *newval, void *extra); static bool check_primary_slot_name(char **newval, void **extra, GucSource source); static bool check_default_with_oids(bool *newval, void **extra, GucSource source); -static void check_reserved_prefixes(const char *varName); -static List *reserved_class_prefix = NIL; /* Private functions in guc-file.l that need to be called from guc.c */ static ConfigVariable *ProcessConfigFileInternal(GucContext context, @@ -5569,18 +5569,44 @@ find_option(const char *name, bool create_placeholders, bool skip_errors, * doesn't contain a separator, don't assume that it was meant to be a * placeholder. */ - if (strchr(name, GUC_QUALIFIER_SEPARATOR) != NULL) + const char *sep = strchr(name, GUC_QUALIFIER_SEPARATOR); + + if (sep != NULL) { - if (valid_custom_variable_name(name)) -return add_placeholder_variable(name, elevel); - /* A special error message seems desirable here */ - if (!skip_errors) -ereport(elevel, - (errcode(ERRCODE_INVALID_NAME), - errmsg("invalid configuration parameter name \"%s\"", -name), - errdetail("Custom parameter names must be two or more simple identifiers separated by dots."))); - return NULL; + size_t classLen = sep - name; + ListCell *lc; + + /* The name must be syntactically acceptable ... */ + if (!valid_custom_variable_name(name)) + { +if (!skip_errors) + ereport(elevel, + (errcode(ERRCODE_INVALID_NAME), + errmsg("invalid configuration parameter name \"%s\"", + name), + errdetail("Custom parameter names must be two or more simple identifiers separated by dots."))); +return NULL; + } + /* ... and it must not match any previously-reserved prefix */ + foreach(lc, reserved_class_prefix) + { +const char *rcprefix = lfirst(lc); + +if (strlen(rcprefix) == classLen && + strncmp(name, rcprefix, classLen) == 0) +{ + if (!skip_errors) + ereport(elevel, +(errcode(ERRCODE_INVALID_NAME), + errmsg("invalid configuration parameter name \"%s\"", + name), + errdetail("\"%s\" is a reserved prefix.", + rcprefix))); + return NULL; +} + } + /* OK, create it */ + return add_placeholder_variable(name, elevel); } } @@ -8764,7 +8790,6 @@ ExecSetVariableStmt(VariableSetStmt *stmt, bool isTopLevel) (superuser() ? PGC_SUSET : PGC_USERSET), PGC_S_SESSION, action, true, 0, false); - check_reserved_prefixes(stmt->name); break; case VAR_SET_MULTI: @@ -8850,8 +8875,6 @@ ExecSetVariableStmt(VariableSetStmt *stmt, bool isTopLevel) (superuser() ? PGC_SUSET : PGC_USERSET), PGC_S_SESSION, action, true, 0, false); - - check_reserved_prefixes(stmt->name); break; case VAR_RESET_ALL: ResetAllOptions(); @@ -9345,8 +9368,9 @@ EmitWarningsOnPlaceholders(const char *className) { int classLen = strlen(className); int i; - MemoryContext oldcontext; + MemoryContext oldcontext; + /* Check for existing placeholders. */ for (i = 0; i <
Re: do only critical work during single-user vacuum?
On Tue, Dec 21, 2021 at 3:56 AM Masahiko Sawada wrote: > > On Tue, Dec 21, 2021 at 1:53 PM Peter Geoghegan wrote: > > > > On Mon, Dec 20, 2021 at 8:40 PM Masahiko Sawada > > wrote: > > > BTW a vacuum automatically enters failsafe mode under the situation > > > where the user has to run a vacuum in the single-user mode, right? > > > > Only for the table that had the problem. Maybe there are no other > > tables that a database level "VACUUM" will need to spend much time on, > > or maybe there are, and they will make it take much much longer (it > > all depends). > > > > The goal of the patch is to make sure that when we're in single user > > mode, we'll consistently trigger the failsafe, for every VACUUM > > against every table -- not just the table (or tables) whose > > relfrozenxid is very old. That's still naive, but much less naive than > > simply telling users to VACUUM the whole database in single user mode > > while vacuuming indexes, etc. > > I understand the patch, thank you for the explanation! > > I remember Simon proposed a VACUUM command option[1], called > FAST_FREEZE, to turn off index cleanup and heap truncation. Now that > we have failsafe mechanism probably we can have a VACUUM command > option to turn on failsafe mode instead. I've been thinking a bit more about this, and I see two desirable goals of anti-wraparound vacuum in single-user mode: 1. Get out of single-user mode as quickly as possible. 2. Minimize the catch-up work we have to do once we're out. Currently, a naive vacuum does as much work as possible and leaves a bunch of WAL streaming and archiving work for later, so that much is easy to improve upon and we don't have to be terribly sophisticated. Keeping in mind Andres' point that we don't want to force possibly unwanted behavior just because we're in single-user mode, it makes sense to have some kind of option that has the above two goals. Instead of a boolean, it seems like the new option should specify some age below which VACUUM will skip the table entirely, and above which will enter fail-safe mode. As mentioned earlier, the shutdown hint could spell out the exact command. With this design, it would specify the fail-safe default, or something else, to use with the option. That seems doable for v15 -- any thoughts on that approach? In standard operation, the above goals could be restated as "advance xmin as quickly as possible" and "generate as little future 'work/debt' as possible, whether dirty pages or WAL". There are some more sophisticated things we can do in this regard, but something like the above could also be useful in normal operation. In fact, that "normal" could be just after we restarted after doing the bare-minimum in single-user mode, and want to continue freezing and keep things under control. -- John Naylor EDB: http://www.enterprisedb.com
Re: Emit a warning if the extension's GUC is set incorrectly
Kyotaro Horiguchi writes: > At Mon, 20 Dec 2021 21:05:23 +0900, Shinya Kato > wrote in >> We should use EmitWarningsOnPlaceholders when we use >> DefineCustomXXXVariable. >> I don't think there is any room for debate. > Unfortunately, pltcl.c defines variables both in pltcl and pltclu but > the patch adds the warning only for pltclu. Right. The rest of 0001 looks fine, so pushed with that correction. I concur that 0002 is unacceptable. The result of it will be that a freshly-loaded extension will fail to hook into the system as it should, because it will fail to complete its initialization. That will cause user frustration and bug reports. (BTW, I wonder if EmitWarningsOnPlaceholders should be using LOG rather than WARNING when it's running in the postmaster. Use of WARNING is moderately likely to result in nothing getting printed at all.) 0003 looks to me like it was superseded by 75d22069e. I do not particularly like that patch though; it seems both inefficient and ill-designed. 0003 is adding a check in an equally bizarre place. Why isn't add_placeholder_variable the place to deal with this? Also, once we know that a prefix is reserved, why don't we throw an error not just a warning for attempts to set some unknown variable under that prefix? We don't allow you to set unknown non-prefixed variables. regards, tom lane
Re: postgres_fdw: incomplete subabort cleanup of connections used in async execution
Etsuro Fujita писал 2021-12-19 13:25: Hi, While working on [1], I noticed $SUBJECT: postgres_fdw resets the per-connection states of connections, which store async requests sent to remote servers in async_capable mode, during post-abort (pgfdw_xact_callback()), but it fails to do so during post-subabort (pgfdw_subxact_callback()). This causes a crash when re-executing a query that was aborted in a subtransaction: Hi. Looks good to me. -- Best regards, Alexander Pyhalov, Postgres Professional
RE: Optionally automatically disable logical replication subscriptions on error
On Thursday, December 16, 2021 9:51 PM I wrote: > Attached the updated patch v14. FYI, I've conducted a test of disable_on_error flag using pg_upgrade. I prepared PG14 and HEAD applied with disable_on_error patch. Then, I setup a logical replication pair of the publisher and the subscriber by 14 and executed pg_upgrade for both the publisher and the subscriber individually. After the updation, on the subscriber, I've confirmed the disable_on_error is false via both pg_subscription and \dRs+, as expected. Best Regards, Takamichi Osumi
Re: parallel vacuum comments
On Tue, Dec 21, 2021 at 11:24 AM Masahiko Sawada wrote: > > On Tue, Dec 21, 2021 at 2:04 PM Amit Kapila wrote: > > > > Thank you for the comment. Agreed. > > I've attached updated version patches. Please review them. > These look mostly good to me. Please find attached the slightly edited version of the 0001 patch. I have modified comments, ran pgindent, and modify the commit message. I'll commit this tomorrow if you are fine with it. -- With Regards, Amit Kapila. v11-0001-Move-index-vacuum-routines-to-vacuum.c.patch Description: Binary data
RE: In-placre persistance change of a relation
Hi Kyotaro, > I took a bit too long detour but the patch gets to pass make-world for me. Good news, v10 passes all the tests for me (including TAP recover ones). There's major problem I think: drop table t6; create unlogged table t6 (id bigint, t text); create sequence s1; insert into t6 select nextval('s1'), repeat('A', 1000) from generate_series(1, 100); alter table t6 set logged; select pg_sleep(1); <--optional checkpoint, more on this later. /usr/pgsql-15/bin/pg_ctl -D /var/lib/pgsql/15/data -l logfile -m immediate stop /usr/pgsql-15/bin/pg_ctl -D /var/lib/pgsql/15/data -l logfile start select count(*) from t6; -- shows 0 rows But If I perform checkpoint before crash, data is there.. apparently the missing steps done by checkpointer seem to help. If checkpoint is not done, then some peeking reveals that upon startup there is truncation (?!): $ /usr/pgsql-15/bin/pg_ctl -D /var/lib/pgsql/15/data -l logfile -m immediate stop $ find /var/lib/pgsql/15/data/ -name '73741*' -ls 112723206 120 -rw--- 1 postgres postgres 122880 Dec 21 12:42 /var/lib/pgsql/15/data/base/73740/73741 112723202 24 -rw--- 1 postgres postgres24576 Dec 21 12:42 /var/lib/pgsql/15/data/base/73740/73741_fsm $ /usr/pgsql-15/bin/pg_ctl -D /var/lib/pgsql/15/data -l logfile start waiting for server to start done server started $ find /var/lib/pgsql/15/data/ -name '73741*' -ls 1127232060 -rw--- 1 postgres postgres0 Dec 21 12:42 /var/lib/pgsql/15/data/base/73740/73741 112723202 16 -rw--- 1 postgres postgres16384 Dec 21 12:42 /var/lib/pgsql/15/data/base/73740/73741_fsm So what's suspicious is that 122880 -> 0 file size truncation. I've investigated WAL and it seems to contain TRUNCATE records after logged FPI images, so when the crash recovery would kick in it probably clears this table (while it shouldn't). However if I perform CHECKPOINT just before crash the WAL stream contains just RUNNING_XACTS and CHECKPOINT_ONLINE redo records, this probably prevents truncating. I'm newbie here so please take this theory with grain of salt, it can be something completely different. -J.
Re: Multi-Column List Partitioning
On Tue, Dec 21, 2021 at 2:47 PM Ashutosh Sharma wrote: > On Mon, Dec 20, 2021 at 7:04 PM Amit Langote wrote: >> On Mon, Dec 13, 2021 at 11:37 PM Ashutosh Sharma >> wrote: >> > >> > Hi, >> > >> > Is this okay? >> > >> > postgres=# CREATE TABLE t1 (a int, b int) PARTITION BY LIST ( a, a, a ); >> > CREATE TABLE >> > >> > postgres=# CREATE TABLE t1_1 PARTITION OF t1 FOR VALUES IN ((1, 2, 3), (4, >> > 5, 6)); >> > CREATE TABLE >> > >> > postgres=# \d t1 >> >Partitioned table "public.t1" >> > Column | Type | Collation | Nullable | Default >> > +-+---+--+- >> > a | integer | | | >> > b | integer | | | >> > Partition key: LIST (a, a, a) >> > Number of partitions: 1 (Use \d+ to list them.) >> >> I'd say it's not okay for a user to expect this to work sensibly, and >> I don't think it would be worthwhile to write code to point that out >> to the user if that is what you were implying. > > OK. As you wish. Actually, we *do* have some code in check_new_partition_bound() to point it out if an empty range is specified for a partition, something that one (or a DDL script) may accidentally do: /* * First check if the resulting range would be empty with * specified lower and upper bounds... */ cmpval = partition_rbound_cmp(key->partnatts, key->partsupfunc, key->partcollation, lower->datums, lower->kind, true, upper); Assert(cmpval != 0); if (cmpval > 0) { /* Point to problematic key in the lower datums list. */ PartitionRangeDatum *datum = list_nth(spec->lowerdatums, cmpval - 1); ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg("empty range bound specified for partition \"%s\"", relname), errdetail("Specified lower bound %s is greater than or equal to upper bound %s.", get_range_partbound_string(spec->lowerdatums), get_range_partbound_string(spec->upperdatums)), parser_errposition(pstate, datum->location))); } So one may wonder why we don't catch and point out more such user mistakes, like the one in your example. It may not be hard to implement a proof that the partition bound definition a user entered results in a self-contradictory partition constraint using the facilities given in predtest.c. (The empty-range proof seemed simple enough to implement as the above block of code.) I don't however see why we should do that for partition constraints if we don't do the same for CHECK constraints; for example, the following definition, while allowed, is not very useful: create table foo (a int check (a = 1 and a = 2)); \d foo Table "public.foo" Column | Type | Collation | Nullable | Default +-+---+--+- a | integer | | | Check constraints: "foo_a_check" CHECK (a = 1 AND a = 2) Maybe partitioning should be looked at differently than the free-form CHECK constraints, but I'm not so sure. Or if others insist that it may be worthwhile to improve the user experience in such cases, we could do that as a separate patch than the patch to implement multi-column list partitioning. -- Amit Langote EDB: http://www.enterprisedb.com
RE: [PATCH] pg_stat_toast v0.3
Dear Gunnar, > The attached v0.3 includes > * columns "storagemethod" and "compressmethod" added as per Hayato > Kuroda's suggestion I prefer your implementation that referring another system view. > * gathering timing information Here is a minor comment: I'm not sure when we should start to measure time. If you want to record time spent for compressing, toast_compress_datum() should be sandwiched by INSTR_TIME_SET_CURRENT() and caclurate the time duration. Currently time_spent is calcurated in the pgstat_report_toast_activity(), but this have a systematic error. If you want to record time spent for inserting/updating, however, I think we should start measuring at the upper layer, maybe heap_toast_insert_or_update(). > Any hints on how to write meaningful tests would be much appreciated! > I.e., will a test I searched hints from PG sources, and I thought that modules/ subdirectory can be used. Following is the example: https://github.com/postgres/postgres/tree/master/src/test/modules/commit_ts I attached a patch to create a new test. Could you rewrite the sql and the output file? Best Regards, Hayato Kuroda FUJITSU LIMITED adding_new_test.patch Description: adding_new_test.patch
Re: Schema variables - new implementation for Postgres 15
út 21. 12. 2021 v 13:36 odesílatel Justin Pryzby napsal: > On Tue, Dec 21, 2021 at 01:29:00PM +0100, Pavel Stehule wrote: > > Hi > > > > út 21. 12. 2021 v 0:09 odesílatel Justin Pryzby > > napsal: > > > > > I don't understand what 0002 patch does relative to the 0001 patch. > > > Is 0002 to change the error messages from "schema variables" to > "session > > > variables" , in a separate commit to show that the main patch doesn't > > > change > > > regression results ? Could you add commit messages ? > > > > > > I mentioned before that there's a pre-existing use of the phrase > "session > > > variable", which you should change to something else: > > > > > > origin:doc/src/sgml/ref/set_role.sgml: SET ROLE > does > > > not process session variables as specified by > > > origin:doc/src/sgml/ref/set_role.sgml- the role's > > linkend="sql-alterrole">ALTER ROLE settings; > > > this only happens during > > > origin:doc/src/sgml/ref/set_role.sgml- login. > > > > > > Maybe "session variable" should be added to the glossary. > > > > > > The new tests crash if debug_discard_caches=on. > > > > > > 2021-12-20 16:15:44.476 CST postmaster[7478] LOG: server process (PID > > > 7657) was terminated by signal 6: Aborted > > > 2021-12-20 16:15:44.476 CST postmaster[7478] DETAIL: Failed process > was > > > running: DISCARD VARIABLES; > > > > How do you inject this parameter to regress tests? > > You can run PGOPTIONS='-c debug_invalidate_caches=1' make check > > I used make installcheck against a running instance where I'd used > ALTER SYSTEM SET debug_discard_caches=on. > > You can also manually run psql against the .sql file itself. > ...which is a good idea since this causes the regression tests take hours. > > Or just add SET debug_discard_caches=on to your .sql file. > ok thank you I'll try it. > -- > Justin >
Re: Schema variables - new implementation for Postgres 15
On Tue, Dec 21, 2021 at 01:29:00PM +0100, Pavel Stehule wrote: > Hi > > út 21. 12. 2021 v 0:09 odesílatel Justin Pryzby > napsal: > > > I don't understand what 0002 patch does relative to the 0001 patch. > > Is 0002 to change the error messages from "schema variables" to "session > > variables" , in a separate commit to show that the main patch doesn't > > change > > regression results ? Could you add commit messages ? > > > > I mentioned before that there's a pre-existing use of the phrase "session > > variable", which you should change to something else: > > > > origin:doc/src/sgml/ref/set_role.sgml: SET ROLE does > > not process session variables as specified by > > origin:doc/src/sgml/ref/set_role.sgml- the role's > linkend="sql-alterrole">ALTER ROLE settings; > > this only happens during > > origin:doc/src/sgml/ref/set_role.sgml- login. > > > > Maybe "session variable" should be added to the glossary. > > > > The new tests crash if debug_discard_caches=on. > > > > 2021-12-20 16:15:44.476 CST postmaster[7478] LOG: server process (PID > > 7657) was terminated by signal 6: Aborted > > 2021-12-20 16:15:44.476 CST postmaster[7478] DETAIL: Failed process was > > running: DISCARD VARIABLES; > > How do you inject this parameter to regress tests? You can run PGOPTIONS='-c debug_invalidate_caches=1' make check I used make installcheck against a running instance where I'd used ALTER SYSTEM SET debug_discard_caches=on. You can also manually run psql against the .sql file itself. ...which is a good idea since this causes the regression tests take hours. Or just add SET debug_discard_caches=on to your .sql file. -- Justin
Re: Schema variables - new implementation for Postgres 15
Hi út 21. 12. 2021 v 0:09 odesílatel Justin Pryzby napsal: > I don't understand what 0002 patch does relative to the 0001 patch. > Is 0002 to change the error messages from "schema variables" to "session > variables" , in a separate commit to show that the main patch doesn't > change > regression results ? Could you add commit messages ? > > I mentioned before that there's a pre-existing use of the phrase "session > variable", which you should change to something else: > > origin:doc/src/sgml/ref/set_role.sgml: SET ROLE does > not process session variables as specified by > origin:doc/src/sgml/ref/set_role.sgml- the role's linkend="sql-alterrole">ALTER ROLE settings; > this only happens during > origin:doc/src/sgml/ref/set_role.sgml- login. > > Maybe "session variable" should be added to the glossary. > > The new tests crash if debug_discard_caches=on. > > 2021-12-20 16:15:44.476 CST postmaster[7478] LOG: server process (PID > 7657) was terminated by signal 6: Aborted > 2021-12-20 16:15:44.476 CST postmaster[7478] DETAIL: Failed process was > running: DISCARD VARIABLES; > How do you inject this parameter to regress tests? Regards Pavel > TRAP: FailedAssertion("sessionvars", File: "sessionvariable.c", Line: 270, > PID: 7657) > > #2 0x564858a4f1a8 in ExceptionalCondition > (conditionName=conditionName@entry=0x564858b8626d "sessionvars", > errorType=errorType@entry=0x564858aa700b "FailedAssertion", > fileName=fileName@entry=0x564858b86234 "sessionvariable.c", > lineNumber=lineNumber@entry=270) at assert.c:69 > #3 0x56485874fec6 in sync_sessionvars_xact_callback (event= out>, arg=) at sessionvariable.c:270 > #4 sync_sessionvars_xact_callback (event=, arg= out>) at sessionvariable.c:253 > #5 0x56485868030a in CallXactCallbacks (event=XACT_EVENT_PRE_COMMIT) > at xact.c:3644 > #6 CommitTransaction () at xact.c:2178 > #7 0x564858681975 in CommitTransactionCommand () at xact.c:3043 > #8 0x56485892b7a9 in finish_xact_command () at postgres.c:2722 > #9 0x56485892dc5b in finish_xact_command () at postgres.c:2720 > #10 exec_simple_query () at postgres.c:1240 > #11 0x56485892f70a in PostgresMain () at postgres.c:4498 > #12 0x56485889a479 in BackendRun (port=, > port=) at postmaster.c:4594 > #13 BackendStartup (port=) at postmaster.c:4322 > #14 ServerLoop () at postmaster.c:1802 > #15 0x56485889b47c in PostmasterMain () at postmaster.c:1474 > #16 0x5648585c60c0 in main (argc=5, argv=0x564858e553f0) at main.c:198 >
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On Thu, Dec 16, 2021 at 9:26 PM Neha Sharma wrote: > > Hi, > > While testing the v8 patches in a hot-standby setup, it was observed the > master is crashing with the below error; > > 2021-12-16 19:32:47.757 +04 [101483] PANIC: could not fsync file > "pg_tblspc/16385/PG_15_202112111/16386/16391": No such file or directory > 2021-12-16 19:32:48.917 +04 [101482] LOG: checkpointer process (PID 101483) > was terminated by signal 6: Aborted > > Parameters configured at master: > wal_level = hot_standby > max_wal_senders = 3 > hot_standby = on > max_standby_streaming_delay= -1 > wal_consistency_checking='all' > max_wal_size= 10GB > checkpoint_timeout= 1d > log_min_messages=debug1 > > Test Case: > create tablespace tab1 location > '/home/edb/PGsources/postgresql/inst/bin/test1'; > create tablespace tab location '/home/edb/PGsources/postgresql/inst/bin/test'; > create database test tablespace tab; > \c test > create table t( a int PRIMARY KEY,b text); > CREATE OR REPLACE FUNCTION large_val() RETURNS TEXT LANGUAGE SQL AS 'select > array_agg(md5(g::text))::text from generate_series(1, 256) g'; > insert into t values (generate_series(1,10), large_val()); > alter table t set tablespace tab1 ; > \c postgres > create database test1 template test; > \c test1 > alter table t set tablespace tab; > \c postgres > alter database test1 set tablespace tab1; > > --cancel the below command > alter database test1 set tablespace pg_default; --press ctrl+c > \c test1 > alter table t set tablespace tab1; > > > Log file attached for reference. Seems like this is an existing issue and I am able to reproduce on the PostgreSQL head as well [1] [1] https://www.postgresql.org/message-id/CAFiTN-szX%3DayO80EnSWonBu1YMZrpOr9V0R3BzHBSjMrMPAeMg%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Checkpointer crashes with "PANIC: could not fsync file "pg_tblspc/.."
While testing the below case with the hot standby setup (with the latest code), I have noticed that the checkpointer process crashed with the $subject error. As per my observation, we have registered the SYNC_REQUEST when inserting some tuple into the table, and later on ALTER SET TABLESPACE we have registered the SYNC_UNLINK_REQUEST, which looks fine so far, then I have noticed that only when the standby is connected the underlying table file w.r.t the old tablespace is already deleted. Now, in AbsorbFsyncRequests we don't do anything for the SYNC_REQUEST even though we have SYNC_UNLINK_REQUEST for the same file, but since the underlying file is already deleted the checkpointer cashed while processing the SYNC_REQUEST. I have spent some time on this but could not figure out how the relfilenodenode file w.r.t. to the old tablespace is getting deleted and if I disconnect the standby then it is not getting deleted, not sure how walsender is playing a role in deleting the file even before checkpointer process the unlink request. postgres[8905]=# create tablespace tab location '/home/dilipkumar/work/PG/install/bin/test'; CREATE TABLESPACE postgres[8905]=# create tablespace tab1 location '/home/dilipkumar/work/PG/install/bin/test1'; CREATE TABLESPACE postgres[8905]=# create database test tablespace tab; CREATE DATABASE postgres[8905]=# \c test You are now connected to database "test" as user "dilipkumar". test[8912]=# create table t( a int PRIMARY KEY,b text); CREATE TABLE test[8912]=# insert into t values (generate_series(1,10), 'aaa'); INSERT 0 10 test[8912]=# alter table t set tablespace tab1 ; ALTER TABLE test[8912]=# CHECKPOINT ; WARNING: 57P02: terminating connection because of crash of another server process log shows: PANIC: could not fsync file "pg_tblspc/16384/PG_15_202112131/16386/16387": No such file or directory backtrace: #0 0x7f2f865ff387 in raise () from /lib64/libc.so.6 #1 0x7f2f86600a78 in abort () from /lib64/libc.so.6 #2 0x00b13da3 in errfinish (filename=0xcf283f "sync.c", .. #3 0x00978dc7 in ProcessSyncRequests () at sync.c:439 #4 0x005949d2 in CheckPointGuts (checkPointRedo=67653624, flags=108) at xlog.c:9590 #5 0x005942fe in CreateCheckPoint (flags=108) at xlog.c:9318 #6 0x008a80b7 in CheckpointerMain () at checkpointer.c:444 Note: This smaller test case is derived from one of the bigger scenarios raised by Neha Sharma [1] [1]https://www.postgresql.org/message-id/CANiYTQs0E8TcB11eU0C4eNN0tUd%3DSQqsqEtL1AVZP1%3DEnD-49A%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: In-placre persistance change of a relation
At Tue, 21 Dec 2021 17:13:21 +0900 (JST), Kyotaro Horiguchi wrote in > Ugh! I completely forgot about TAP tests.. Thanks for the testing and > sorry for the bugs. > > This is a bit big change so I need a bit of time before posting the > next version. I took a bit too long detour but the patch gets to pass make-world for me. In this version: - When relation persistence is changed from logged to unlogged, buffer persistence is flipped then an init-fork is created along with a mark file for the fork (RelationCreateInitFork). The mark file is removed at commit but left alone after a crash before commit. At the next startup, ResetUnloggedRelationsInDbspaceDir() removes the init fork file if it finds the mark file corresponding to the file. - When relation persistence is changed from unlogged to logged, buffer persistence is flipped then the exisging init-fork is marked to be dropped at commit (RelationDropInitFork). Finally the whole content is WAL-logged in the page-wise manner (RelationChangePersistence), - The two operations above are repeatable within a transaction and commit makes the last operation persist and rollback make the all operations abandoned. - Storage files are created along with a "mark" file for the relfilenode. It behaves the same way to the above except the mark files corresponds to the whole relfilenode. - The at-commit operations this patch adds require to be WAL-logged so they don't fit pendingDeletes list, which is executed after commit. I added a new pending-work list pendingCleanups that is executed just after pendingSyncs. (new in this version) regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From bc7e14b8af3c72e4ab99c964688d18ef4545f8b9 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Wed, 11 Nov 2020 21:51:11 +0900 Subject: [PATCH v10 1/2] In-place table persistence change Even though ALTER TABLE SET LOGGED/UNLOGGED does not require data rewriting, currently it runs heap rewrite which causes large amount of file I/O. This patch makes the command run without heap rewrite. Addition to that, SET LOGGED while wal_level > minimal emits WAL using XLOG_FPI instead of massive number of HEAP_INSERT's, which should be smaller. Also this allows for the cleanup of files left behind in the crash of the transaction that created it. --- src/backend/access/rmgrdesc/smgrdesc.c | 52 +++ src/backend/access/transam/README | 8 + src/backend/access/transam/xact.c | 7 + src/backend/access/transam/xlog.c | 17 + src/backend/catalog/storage.c | 545 - src/backend/commands/tablecmds.c | 256 ++-- src/backend/replication/basebackup.c | 3 +- src/backend/storage/buffer/bufmgr.c| 88 src/backend/storage/file/fd.c | 4 +- src/backend/storage/file/reinit.c | 344 +++- src/backend/storage/smgr/md.c | 93 - src/backend/storage/smgr/smgr.c| 32 ++ src/backend/storage/sync/sync.c| 20 +- src/bin/pg_rewind/parsexlog.c | 24 ++ src/common/relpath.c | 47 ++- src/include/catalog/storage.h | 3 + src/include/catalog/storage_xlog.h | 42 +- src/include/common/relpath.h | 9 +- src/include/storage/bufmgr.h | 2 + src/include/storage/fd.h | 1 + src/include/storage/md.h | 8 +- src/include/storage/reinit.h | 10 +- src/include/storage/smgr.h | 17 + 23 files changed, 1450 insertions(+), 182 deletions(-) diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c index 773d57..d251f22207 100644 --- a/src/backend/access/rmgrdesc/smgrdesc.c +++ b/src/backend/access/rmgrdesc/smgrdesc.c @@ -40,6 +40,49 @@ smgr_desc(StringInfo buf, XLogReaderState *record) xlrec->blkno, xlrec->flags); pfree(path); } + else if (info == XLOG_SMGR_UNLINK) + { + xl_smgr_unlink *xlrec = (xl_smgr_unlink *) rec; + char *path = relpathperm(xlrec->rnode, xlrec->forkNum); + + appendStringInfoString(buf, path); + pfree(path); + } + else if (info == XLOG_SMGR_MARK) + { + xl_smgr_mark *xlrec = (xl_smgr_mark *) rec; + char *path = GetRelationPath(xlrec->rnode.dbNode, + xlrec->rnode.spcNode, + xlrec->rnode.relNode, + InvalidBackendId, + xlrec->forkNum, xlrec->mark); + char *action; + + switch (xlrec->action) + { + case XLOG_SMGR_MARK_CREATE: +action = "CREATE"; +break; + case XLOG_SMGR_MARK_UNLINK: +action = "DELETE"; +break; + default: +action = ""; +break; + } + + appendStringInfo(buf, "%s %s", action, path); + pfree(path); + } + else if (info == XLOG_SMGR_BUFPERSISTENCE) + { + xl_smgr_bufpersistence *xlrec = (xl_smgr_bufpersistence *) rec; + char *path = relpathperm(xlrec->rnode, MAIN_FORKNUM); + + appendStringInfoString(buf, path); + appendStringInfo(buf, " persistence %d",
Re: row filtering for logical replication
On Tue, Dec 21, 2021 at 2:29 PM Peter Smith wrote: > > Here is the v51* patch set: > > Main changes from Euler's v50* are > 1. Most of Euler's "fixes" patches are now merged back in > 2. Patches are then merged per Amit's suggestion [Amit 20/12] > 3. Some other review comments are addressed > > ~~ > > Phase 1 - Merge the Euler fixes > === > > v51-0001 (main) <== v50-0001 (main 0001) + v50-0002 (fixes 0001) > - OK, accepted and merged all the "fixes" back into the 0001 > - (fixed typos) > - There is a slight disagreement with what the PG docs say about > NULLs, but I will raise a separate comment on -hackers later > (meanwhile, the current PG docs text is from the Euler patch) > > v51-0002 (validation) <== v50-0003 (validation 0002) + v50-0004 (fixes 0002) > - OK, accepted and merges all these "fixes" back into the 0002 > - (fixed typo) > > v51-0003 (new/old) <== v50-0005 (new/old 0003) > - REVERTED to the v49 version of this patch! > - Please see Ajin's explanation why the v49 code was required [Ajin 21/12] > > v51-0004 (tab-complete + dump) <== v50-0006 (tab-complete + dump 0004) > - No changes > > v51-0005 (for all tables) <== v50-0007 (for all tables 0005) + > v50-0008 (fixes 0005) > - OK, accepted and merged most of these "fixes" back into the 0005 > - (fixed typo/grammar) > - Amit requested we not use Euler's new tablesync SQL just yet [Amit 21/12] > > > Phase 2 - Merge main (Phase 1) patches per Amit suggestion > > > v51-0001 (main) <== v51-0001 (main) + v51-0002 (validation) + v51-0005 > (for all tables) > - (also combined all the commit comments) > > v51-0002 (new/old) <== v51-0003 (new/old) > > v51-0005 (tab-complete + dump) <== v51-0004 > > > Review comments (details) > = > > v51-0001 (main) > - Addressed review comments from Amit. [Amit 20/12] #1,#2,#3,#4 > > v51-0002 (new/old tuple) > - Includes a patch from Greg (provided internally) > > v51-0003 (tab, dump) > - No changes > > -- > [Amit 20/12] > https://www.postgresql.org/message-id/CAA4eK1JJgfEPKYvQAcwGa5jjuiUiQRQZ0Pgo-HF0KFHh-jyNQQ%40mail.gmail.com > [Ajin 21/12] > https://www.postgresql.org/message-id/CAFPTHDbfpPNh3GLGjySS%3DAuRfbQPQFNvfiyG1GDQW2kz1yT7Og%40mail.gmail.com > [Amit 21/12] > https://www.postgresql.org/message-id/CAA4eK1KwoA5k8v9z9e4ZPN_X%3D1GAmQmsWyauFwZpKiSHqy6eZA%40mail.gmail.com Few comments: 1) list_free(schemarelids) is called inside if and and outside if in break case. Can we move it above continue so that it does not gets called in the break case: + schemarelids = GetAllSchemaPublicationRelations(pub->oid, + pub->pubviaroot ? + PUBLICATION_PART_ROOT : + PUBLICATION_PART_LEAF); + if (list_member_oid(schemarelids, entry->relid)) + { + if (pub->pubactions.pubinsert) + no_filter[idx_ins] = true; + if (pub->pubactions.pubupdate) + no_filter[idx_upd] = true; + if (pub->pubactions.pubdelete) + no_filter[idx_del] = true; + + list_free(schemarelids); + + /* Quick exit loop if all pubactions have no row-filter. */ + if (no_filter[idx_ins] && no_filter[idx_upd] && no_filter[idx_del]) + break; + + continue; + } + list_free(schemarelids); 2) create_edata_for_relation also is using similar logic, can it also call this function to reduce duplicate code +static EState * +create_estate_for_relation(Relation rel) +{ + EState *estate; + RangeTblEntry *rte; + + estate = CreateExecutorState(); + + rte = makeNode(RangeTblEntry); + rte->rtekind = RTE_RELATION; + rte->relid = RelationGetRelid(rel); + rte->relkind = rel->rd_rel->relkind; + rte->rellockmode = AccessShareLock; + ExecInitRangeTable(estate, list_make1(rte)); + + estate->es_output_cid = GetCurrentCommandId(false); + + return estate; +} 3) In one place select is in lower case, it can be changed to upper case + resetStringInfo(); + appendStringInfo(, +"SELECT DISTINCT pg_get_expr(prqual, prrelid) " +" FROM pg_publication p " +" INNER JOIN pg_publication_rel pr ON (p.oid = pr.prpubid) " +"WHERE pr.prrelid = %u AND p.pubname IN ( %s ) " +"AND NOT (select
Re: Failed transaction statistics to measure the logical replication progress
On Mon, Dec 20, 2021 at 8:40 PM osumi.takami...@fujitsu.com wrote: > > Updated the patch to address your concern. > Some review comments on the v18 patches: v18-0002 doc/src/sgml/monitoring.sgml (1) tablesync worker stats? Shouldn't the comment below only mention the apply worker? (since we're no longer recording stats of the tablesync worker) + Number of transactions that failed to be applied by the table + sync worker or main apply worker in this subscription. This + counter is updated after confirming the error is not same as + the previous one. + Also, it should say "... the error is not the same as the previous one." src/backend/catalog/system_views.sql (2) pgstat_report_subworker_xact_end() Fix typo and some wording: BEFORE: + * This should be called before the call of process_syning_tables() not to AFTER: + * This should be called before the call of process_syncing_tables(), so to not src/backend/postmaster/pgstat.c (3) pgstat_send_subworker_xact_stats() BEFORE: + * Send a subworker transaction stats to the collector. AFTER: + * Send a subworker's transaction stats to the collector. (4) Wouldn't it be best for: + if (!TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL)) to be: + if (last_report != 0 && !TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL)) ? (5) pgstat_send_subworker_xact_stats() I think that the comment: + * Clear out the statistics buffer, so it can be re-used. should instead say: + * Clear out the supplied statistics. because the current comment infers that repWorker is pointed at the MyLogicalRepWorker buffer (which it might be but it shouldn't be relying on that) Also, I think that the function header should mention something like: "The supplied repWorker statistics are cleared upon return, to assist re-use by the caller." Regards, Greg Nancarrow Fujitsu Australia
Re: simplifying foreign key/RI checks
On Mon, Dec 20, 2021 at 5:17 AM Amit Langote wrote: > On Mon, Dec 20, 2021 at 6:19 PM Zhihong Yu wrote: > > On Sun, Dec 19, 2021 at 10:20 PM Amit Langote > wrote: > >> > >> On Mon, Dec 20, 2021 at 2:00 PM Corey Huinker > wrote: > >> > Sorry for the delay. This patch no longer applies, it has some > conflict with d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a > >> > >> Thanks Corey for the heads up. Rebased with some cosmetic adjustments. > >> > > Hi, > > > > + Assert(partidx < 0 || partidx < partdesc->nparts); > > + partoid = partdesc->oids[partidx]; > > > > If partidx < 0, do we still need to fill out partoid and is_leaf ? It > seems we can return early based on (should call table_close(rel) first): > > > > + /* No partition found. */ > > + if (partidx < 0) > > + return NULL; > > Good catch, thanks. Patch updated. > > Hi, + int lockflags = 0; + TM_Result test; + + lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS; The above assignment can be meged with the line where variable lockflags is declared. + GetUserIdAndSecContext(_userid, _sec_context); save_userid -> saved_userid save_sec_context -> saved_sec_context +* the transaction-snapshot mode. If we didn't push one already, do didn't push -> haven't pushed For ri_PerformCheck(): + boolsource_is_pk = true; It seems the value of source_is_pk doesn't change - the value true can be plugged into ri_ExtractValues() calls directly. Cheers
Re: In-placre persistance change of a relation
Ugh! I completely forgot about TAP tests.. Thanks for the testing and sorry for the bugs. This is a bit big change so I need a bit of time before posting the next version. At Mon, 20 Dec 2021 13:38:35 +, Jakub Wartak wrote in > 1) check-worlds seems OK but make -C src/test/recovery check shows a couple > of failing tests here locally and in > https://cirrus-ci.com/task/4699985735319552?logs=test#L807 : > t/009_twophase.pl (Wstat: 256 Tests: 24 Failed: 1) > Failed test: 21 > Non-zero exit status: 1 PREPARE TRANSACTION requires uncommited file creation to be committed. Concretely we need to remove the "mark" files for the in-transaction created relation file during PREPARE TRANSACTION. pendingSync is not a parallel mechanism with pendingDeletes so we cannot move mark deletion to pendingSync. After all I decided to add a separate list pendingCleanups for pending non-deletion tasks separately from pendingDeletes and execute it before insering the commit record. Not only the above but also all of the following failures vanished by the change. > t/014_unlogged_reinit.pl (Wstat: 512 Tests: 12 Failed: 2) > Failed tests: 9-10 > Non-zero exit status: 2 > t/018_wal_optimize.pl (Wstat: 7424 Tests: 0 Failed: 0) > Non-zero exit status: 29 > Parse errors: Bad plan. You planned 38 tests but ran 0. > t/022_crash_temp_files.pl (Wstat: 7424 Tests: 6 Failed: 0) > Non-zero exit status: 29 > Parse errors: Bad plan. You planned 9 tests but ran 6. > 018 made no sense, I've tried to take a quick look with wal_level=minimal why > it is failing , it is mystery to me as the sequence seems to be pretty basic > but the outcome is not: I think this shares the same cause. > ~> cat repro.sql > create tablespace tbs1 location '/tbs1'; > CREATE TABLE moved (id int); > INSERT INTO moved VALUES (1); > BEGIN; > ALTER TABLE moved SET TABLESPACE tbs1; > CREATE TABLE originated (id int); > INSERT INTO originated VALUES (1); > CREATE UNIQUE INDEX ON originated(id) TABLESPACE tbs1; > COMMIT; .. > ERROR: could not open file "base/32833/32839": No such file or directory > z3=# \dt+ > List of relations > Schema |Name| Type | Owner | Persistence | Size | Description > ++---+--+-+-+- > public | moved | table | postgres | permanent | 0 bytes | > public | originated | table | postgres | permanent | 0 bytes | > > This happens even without placing on tablespace at all {for originated table > , but no for moved on}, some major mishap is there (commit should guarantee > correctness) or I'm tired and having sloppy fingers. > > 2) minor one testcase, still something is odd. > > drop tablespace tbs1; > create tablespace tbs1 location '/tbs1'; > CREATE UNLOGGED TABLE t4 (a int) tablespace tbs1; > CREATE UNLOGGED TABLE t5 (a int) tablespace tbs1; > CREATE UNLOGGED TABLE t6 (a int) tablespace tbs1; > CREATE TABLE t7 (a int) tablespace tbs1; > insert into t7 values (1); > insert into t5 values (1); > insert into t6 values (1); > \dt+ > List of relations > Schema | Name | Type | Owner | Persistence |Size| Description > +--+---+--+-++- > public | t4 | table | postgres | unlogged| 0 bytes| > public | t5 | table | postgres | unlogged| 8192 bytes | > public | t6 | table | postgres | unlogged| 8192 bytes | > public | t7 | table | postgres | permanent | 8192 bytes | > (4 rows) > > ALTER TABLE ALL IN TABLESPACE tbs1 set logged; > ==> STILL WARNING: unrecognized node type: 349 > \dt+ > List of relations > Schema | Name | Type | Owner | Persistence |Size| Description > +--+---+--+-++- > public | t4 | table | postgres | permanent | 0 bytes| > public | t5 | table | postgres | permanent | 8192 bytes | > public | t6 | table | postgres | permanent | 8192 bytes | > public | t7 | table | postgres | permanent | 8192 bytes | > > So it did rewrite however this warning seems to be unfixed. I've tested on > e2c52beecdea152ca680a22ef35c6a7da55aa30f. regards. -- Kyotaro Horiguchi NTT Open Source Software Center