Re: row filtering for logical replication
On Thu, Dec 23, 2021 at 7:23 PM Peter Smith wrote: > > Here is the v54* patch set: > > Main changes from v53* are > 1. All files of all three patches have been pgindented. > 2. Another review comment is addressed > > ~~ > > Details > === > > v51-0001 (main) > - pgindent for all source files if this patch > > v51-0002 (new/old tuple) > - pgindent for all source files of this patch > - Merged the Euler v50-0005 (pgoutput.c) comments as suggested [Amit 21/12] #5 > - Also updated the commit message to include the Euler v50-0005 commit > message text > > v51-0003 (tab, dump) > - pgindent for all source files of this patch > Sorry for the confusing cut/.paste typos in the previous mail just sent. Of course, the "details" should refer to v54* not v51* ~ Here is the v54* patch set: Main changes from v53* are 1. All files of all three patches have been pgindented. 2. Another review comment is addressed ~~ Details === v54-0001 (main) - pgindent for all source files if this patch v54-0002 (new/old tuple) - pgindent for all source files of this patch - Merged the Euler v50-0005 (pgoutput.c) comments as suggested [Amit 21/12] #5 - Also updated the commit message to include the Euler v50-0005 commit message text v54-0003 (tab, dump) - pgindent for all source files of this patch -- [Amit 21/12] https://www.postgresql.org/message-id/CAA4eK1JgdhDnAvFV-eEWcqMmXYwo9kmCE1wA17xWGE621e8WDg%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: Add index scan progress to pg_stat_progress_vacuum
On Tue, Dec 21, 2021 at 3:37 AM Peter Geoghegan wrote: > > On Wed, Dec 15, 2021 at 2:10 PM Bossart, Nathan wrote: > > nitpick: Shouldn't index_blks_scanned be index_blks_vacuumed? IMO it > > is more analogous to heap_blks_vacuumed. > > +1. > > > This will tell us which indexes are currently being vacuumed and the > > current progress of those operations, but it doesn't tell us which > > indexes have already been vacuumed or which ones are pending vacuum. > > VACUUM will process a table's indexes in pg_class OID order (outside > of parallel VACUUM, I suppose). See comments about sort order above > RelationGetIndexList(). Right. > > Anyway, it might be useful to add ordinal numbers to each index, that > line up with this processing/OID order. It would also be reasonable to > display the same number in log_autovacuum* (and VACUUM VERBOSE) > per-index output, to reinforce the idea. Note that we don't > necessarily display a distinct line for each distinct index in this > log output, which is why including the ordinal number there makes > sense. An alternative idea would be to show the number of indexes on the table and the number of indexes that have been processed in the leader's entry of pg_stat_progress_vacuum. Even in parallel vacuum cases, since we have index vacuum status for each index it would not be hard for the leader process to count how many indexes have been processed. Regarding the details of the progress of index vacuum, I'm not sure this progress information can fit for pg_stat_progress_vacuum. As Peter already mentioned, the behavior quite varies depending on index AM. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
RE: Failed transaction statistics to measure the logical replication progress
On Wednesday, December 22, 2021 10:30 PM osumi.takami...@fujitsu.com wrote: > On Wednesday, December 22, 2021 8:38 PM I wrote: > > Do we expect these commit counts which come from empty transactions ? > This is another issue discussed in [1] > where the patch in the thread is a work in progress, I think. > .. > IMHO, the conclusion is we are currently in the middle of fixing the behavior. Thank you for telling me this. After applying v19-* and v15-0001-Skip-empty-transactions-for-logical-replication.patch, I retested v19-* patches. The result of previous case looks good to me. But the results of following cases are also similar to previous unexpected result which increases commit_count or abort_count unexpectedly. [1] (Based on environment in the previous example, set TWO_PHASE=true) [Publisher] begin; insert into replica_test1 values(1,'1'); prepare transaction 'id'; commit prepared 'id'; In subscriber side, the commit_count of two records(sub1 and sub2) is increased. [2] (Based on environment in the previous example, set STREAMING=on) [Publisher] begin; INSERT INTO replica_test1 SELECT i, md5(i::text) FROM generate_series(1, 5000) s(i); commit; In subscriber side, the commit_count of two records(sub1 and sub2) is increased. [3] (Based on environment in the previous example, set TWO_PHASE=true) [Publisher] begin; insert into replica_test1 values(1,'1'); prepare transaction 'id'; rollback prepared 'id'; In subscriber side, the abort_count of two records(sub1 and sub2) is increased. I think the problem maybe is the patch you mentioned (Skip-empty-transactions-for-logical-replication.patch) is not finished yet. Share this information here. Regards, Wang wei
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
On Thu, Dec 23, 2021 at 5:53 AM SATYANARAYANA NARLAPURAM wrote: > > Hi Hackers, > > I am considering implementing RPO (recovery point objective) enforcement > feature for Postgres where the WAL writes on the primary are stalled when the > WAL distance between the primary and standby exceeds the configured > (replica_lag_in_bytes) threshold. This feature is useful particularly in the > disaster recovery setups where primary and standby are in different regions > and synchronous replication can't be set up for latency and performance > reasons yet requires some level of RPO enforcement. +1 for the idea in general. However, blocking writes on primary seems an extremely radical idea. The replicas can fall behind transiently at times and blocking writes on the primary may stop applications failing for these transient times. This is not a problem if the applications have retry logic for the writes. How about blocking writes on primary if the replicas fall behind the primary for a certain period of time? > The idea here is to calculate the lag between the primary and the standby > (Async?) server during XLogInsert and block the caller until the lag is less > than the threshold value. We can calculate the max lag by iterating over > ReplicationSlotCtl->replication_slots. The "falling behind" can also be quantified by the number of write-transactions on the primary. I think it's good to have the users choose what the "falling behind" means for them. We can have something like the "recovery_target" param with different options name, xid, time, lsn. > If this is not something we don't want to do in the core, at least adding a > hook for XlogInsert is of great value. IMHO, this feature may not be needed by everyone, the hook-way seems reasonable so that the postgres vendors can provide different implementations (for instance they can write an extension that implements this hook which can block writes on primary, write some log messages, inform some service layer of the replicas falling behind the primary etc.). If we were to have the hook in XLogInsert which gets called so frequently or XLogInsert is a hot-path, the hook really should do as little work as possible, otherwise the write operations latency may increase. > A few other scenarios I can think of with the hook are: > > Enforcing RPO as described above > Enforcing rate limit and slow throttling when sync standby is falling behind > (could be flush lag or replay lag) > Transactional log rate governance - useful for cloud providers to provide SKU > sizes based on allowed WAL writes. > > Thoughts? The hook can help to achieve the above objectives but where to place it and what parameters it should take as input (or what info it should emit out of the server via the hook) are important too. Having said all, the RPO feature can also be implemented outside of the postgres, a simple implementation could be - get the primary current wal lsn using pg_current_wal_lsn and all the replicas restart_lsn using pg_replication_slot, if they differ by certain amount, then issue ALTER SYSTEM SET READ ONLY command [1] on the primary, this requires the connections to the server and proper access rights. This feature can also be implemented as an extension (without the hook) which doesn't require any connections to the server yet can access the required info primary current_wal_lsn, restart_lsn of the replication slots etc, but the RPO enforcement may not be immediate as the server doesn't have any hooks in XLogInsert or some other area. [1] - https://www.postgresql.org/message-id/CAAJ_b967uKBiW6gbHr5aPzweURYjEGv333FHVHxvJmMhanwHXA%40mail.gmail.com Regards, Bharath Rupireddy.
Re: Add index scan progress to pg_stat_progress_vacuum
On 21/12/2021 00:05, Peter Geoghegan wrote: * Some index AMs don't work like nbtree and GiST in that they cannot do their scan sequentially -- they have to do something like a logical/keyspace order scan instead, which is *totally* different to heapam (not just a bit different). There is no telling how many times each page will be accessed in these other index AMs, and in what order, even under optimal conditions. We should arguably not even try to provide any granular progress information here, since it'll probably be too messy. Maybe we could add callbacks into AM interface for send/receive/representation implementation of progress? So AM would define a set of parameters to send into stat collector and show to users. -- regards, Andrey Lepikhov Postgres Professional
Re: Logical replication timeout problem
On Wed, Dec 22, 2021 at 8:50 PM Fabrice Chapuis wrote: > > Hello Amit, > > I was able to reproduce the timeout problem in the lab. > After loading more than 20 millions of rows in a table which is not > replicated (insert command ends without error), errors related to logical > replication processes appear in the postgres log. > Approximately every 5 minutes worker process is restarted. The snap files in > the slot directory are still present. The replication system seems to be > blocked. Why these snap files are not removed. What do they contain? > These contain changes of insert. I think these are not removed for your case as your long transaction is never finished. As mentioned earlier, for such cases, it is better to use 'streaming' feature released as part of PG-14 but anyway here we are trying to debug timeout problem. > I will recompile postgres with your patch to debug. > Okay, that might help. -- With Regards, Amit Kapila.
skip replication slot snapshot/map file removal during end-of-recovery checkpoint
Hi, Currently the end-of-recovery checkpoint can be much slower, impacting the server availability, if there are many replication slot files .snap or map- to be enumerated and deleted. How about skipping the .snap and map- file handling during the end-of-recovery checkpoint? It makes the server available faster and the next regular checkpoint can deal with these files. If required, we can have a GUC (skip_replication_slot_file_handling or some other better name) to control this default being the existing behavior. Thoughts? Regards, Bharath Rupireddy.
correct the sizes of values and nulls arrays in pg_control_checkpoint
Hi, pg_control_checkpoint emits 18 columns whereas the values and nulls arrays are defined to be of size 19. Although it's not critical, attaching a tiny patch to fix this. diff --git a/src/backend/utils/misc/pg_controldata.c b/src/backend/utils/misc/pg_controldata.c index 209a20a882..b1db9a8d07 100644 --- a/src/backend/utils/misc/pg_controldata.c +++ b/src/backend/utils/misc/pg_controldata.c @@ -79,8 +79,8 @@ pg_control_system(PG_FUNCTION_ARGS) Datum pg_control_checkpoint(PG_FUNCTION_ARGS) { - Datum values[19]; - boolnulls[19]; + Datum values[18]; + boolnulls[18]; TupleDesc tupdesc; HeapTuple htup; ControlFileData *ControlFile; Regards, Bharath Rupireddy. v1-0001-correct-the-sizes-of-values-and-nulls-arrays-in-p.patch Description: Binary data
Re: psql - add SHOW_ALL_RESULTS option
Hello Peter, I finally took some time to look at this. Attached v11 is a rebase. This patch still has a few of the problems reported earlier this year. In [0], it was reported that certain replication commands result in infinite loops because of faulty error handling. This still happens. I wrote a test for it, attached here. (I threw in a few more basic tests, just to have some more coverage that was lacking, and to have a file to put the new test in.) Hmmm… For some unclear reason on errors on a PGRES_COPY_* state PQgetResult keeps on returning an empty result. PQexec manually ignores it, so I did the same with a comment, but for me the real bug is somehow in PQgetResult behavior… In [1], it was reported that server crashes produce duplicate error messages. This also still happens. I didn't write a test for it, maybe you have an idea. (Obviously, we could check whether the error message is literally there twice in the output, but that doesn't seem very general.) But it's easy to test manually: just have psql connect, shut down the server, then run a query. This is also a feature/bug of libpq which happens to be hidden by PQexec: when one command crashes PQgetResult actually returns *2* results. First one with the FATAL message, second one when libpq figures out that the connection was lost with the second message appended to the first. PQexec just happen to silently ignore the first result. I added a manual reset of the error message when first shown so that it is not shown twice. It is unclear to me whether the reset should be somewhere in libpq instead. I added a voluntary crash at the end of the psql test. Attached v12 somehow fixes these issues in "psql" code rather than in libpq. -- Fabien.diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out index e0abe34bb6..8f7f93172a 100644 --- a/contrib/pg_stat_statements/expected/pg_stat_statements.out +++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out @@ -50,8 +50,28 @@ BEGIN \; SELECT 2.0 AS "float" \; SELECT 'world' AS "text" \; COMMIT; + float +--- + 2.0 +(1 row) + + text +--- + world +(1 row) + -- compound with empty statements and spurious leading spacing \;\; SELECT 3 + 3 \;\;\; SELECT ' ' || ' !' \;\; SELECT 1 + 4 \;; + ?column? +-- +6 +(1 row) + + ?column? +-- + ! +(1 row) + ?column? -- 5 @@ -61,6 +81,11 @@ COMMIT; SELECT 1 + 1 + 1 AS "add" \gset SELECT :add + 1 + 1 AS "add" \; SELECT :add + 1 + 1 AS "add" \gset + add +- + 5 +(1 row) + -- set operator SELECT 1 AS i UNION SELECT 2 ORDER BY i; i diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index ae38d3dcc3..1d411ae124 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -127,18 +127,11 @@ echo '\x \\ SELECT * FROM foo;' | psql commands included in the string to divide it into multiple transactions. (See for more details about how the server handles multi-query strings.) - Also, psql only prints the - result of the last SQL command in the string. - This is different from the behavior when the same string is read from - a file or fed to psql's standard input, - because then psql sends - each SQL command separately. - Because of this behavior, putting more than one SQL command in a - single -c string often has unexpected results. - It's better to use repeated -c commands or feed - multiple commands to psql's standard input, + If having several commands executed in one transaction is not desired, + use repeated -c commands or feed multiple commands to + psql's standard input, either using echo as illustrated above, or via a shell here-document, for example: @@ -3564,10 +3557,6 @@ select 1\; select 2\; select 3; commands included in the string to divide it into multiple transactions. (See for more details about how the server handles multi-query strings.) -psql prints only the last query result -it receives for each request; in this example, although all -three SELECTs are indeed executed, psql -only prints the 3. @@ -4154,6 +4143,18 @@ bar +SHOW_ALL_RESULTS + + +When this variable is set to off, only the last +result of a combined query (\;) is shown instead of +all of them. The default is on. The off behavior +is for compatibility with older versions of psql. + + + + + SHOW_CONTEXT diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index ec975c3e2a..e06699878b 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -33,6 +33,8 @@ static bool DescribeQue
回复:回复:Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?
Fixed a bug found during testing. Wenjing --原始邮件 -- 发件人:曾文旌(义从) 发送时间:Sun Dec 12 20:51:08 2021 收件人:Zhihong Yu 抄送:Tomas Vondra , wjzeng , PostgreSQL Hackers , shawn wang , ggys...@gmail.com 主题:回复:Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan? --原始邮件 -- 发件人:Zhihong Yu 发送时间:Sun Dec 12 01:13:11 2021 收件人:曾文旌(义从) 抄送:Tomas Vondra , wjzeng , PostgreSQL Hackers , shawn wang , ggys...@gmail.com 主题:Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan? On Sat, Dec 11, 2021 at 7:31 AM 曾文旌(义从) wrote: --原始邮件 -- 发件人:Tomas Vondra 发送时间:Wed Dec 8 11:26:35 2021 收件人:曾文旌(义从) , shawn wang , ggys...@gmail.com , PostgreSQL Hackers 抄送:wjzeng 主题:Re: 回复:Re: Is it worth pushing conditions to sublink/subplan? Hi, On 12/7/21 10:44, 曾文旌(义从) wrote: > Hi Hackers > > For my previous proposal, I developed a prototype and passed > regression testing. It works similarly to subquery's qual pushdown. > We know that sublink expands at the beginning of each level of > query. At this stage, The query's conditions and equivalence classes > are not processed. But after generate_base_implied_equalities the > conditions are processed, which is why qual can push down to > subquery but sublink not. > > My POC implementation chose to delay the sublink expansion in the > SELECT clause (targetList) and where clause. Specifically, it is > delayed after generate_base_implied_equalities. Thus, the equivalent > conditions already established in the Up level query can be easily > obtained in the sublink expansion process (make_subplan). > > For example, if the up level query has a.id = 10 and the sublink > query has a.id = b.id, then we get b.id = 10 and push it down to the > sublink quey. If b is a partitioned table and is partitioned by id, > then a large number of unrelated subpartitions are pruned out, This > optimizes a significant amount of Planner and SQL execution time, > especially if the partitioned table has a large number of > subpartitions and is what I want. > > Currently, There were two SQL failures in the regression test, > because the expansion order of sublink was changed, which did not > affect the execution result of SQL. > > Look forward to your suggestions on this proposal. > I took a quick look, and while I don't see / can't think of any problems with delaying it until after generating implied equalities, there seems to be a number of gaps. Thank you for your attention. 1) Are there any regression tests exercising this modified behavior? Maybe there are, but if the only changes are due to change in order of targetlist entries, that doesn't seem like a clear proof. It'd be good to add a couple tests exercising both the positive and negative case (i.e. when we can and can't pushdown a qual). I added several samples to the regress(qual_pushdown_to_sublink.sql). and I used the partition table to show the plan status of qual being pushed down into sublink. Hopefully this will help you understand the details of this patch. Later, I will add more cases. 2) apparently, contrib/postgres_fdw does crash like this: #3 0x0077b412 in adjust_appendrel_attrs_mutator (node=0x13f7ea0, context=0x7fffc3351b30) at appendinfo.c:470 470 Assert(!IsA(node, SubLink)); (gdb) p node $1 = (Node *) 0x13f7ea0 (gdb) p *node $2 = {type = T_SubLink} Backtrace attached. For the patch attached in the last email, I passed all the tests under src/test/regress. As you pointed out, there was a problem with regression under contrib(in contrib/postgres_fdw). This time I fixed it and the current patch (V2) can pass the check-world. 3) various parts of the patch really need at least some comments, like: - try_push_outer_qual_to_sublink_query really needs some docs - new stuff at the end of initsplan.c Ok, I added some comments and will add more. If you have questions about any details, please point them out directly. 4) generate_base_implied_equalities shouldn't this if (ec->ec_processed) ; really be? if (ec->ec_processed) continue; You are right. I fixed it. 5) I'm not sure why we need the new ec_processed flag. I did this to eliminate duplicate equalities from the two generate_base_implied_equalities calls 1) I need the base equivalent expression generated after generate_base_implied_equalities, which is used to pushdown qual to sublink(lazy_process_sublinks) 2) The expansion of sublink may result in an equivalent expression with parameters, such as a = $1, which needs to deal with the equivalence classes again. 3) So, I added ec_processed and asked to process it again (generate_base_implied_equalities) after the equivalence class changed (add_eq_member/process_equivalence). Maybe you have a better suggestion, please let me know. 6) So we now have lazy_process_sublink callback? Does that mea
Re: more descriptive message for process termination due to max_slot_wal_keep_size
On Wed, Dec 15, 2021 at 9:42 AM Kyotaro Horiguchi wrote: > > At Tue, 14 Dec 2021 19:31:21 +0530, Ashutosh Bapat > wrote in > > On Tue, Dec 14, 2021 at 9:35 AM Kyotaro Horiguchi > > wrote: > > > > [17605] LOG: terminating process 17614 to release replication slot "s1" > > > + [17605] DETAIL: The slot's restart_lsn 0/2CA0 exceeds > > > max_slot_wal_keep_size. > > > > [17614] FATAL: terminating connection due to administrator command > > > > [17605] LOG: invalidating slot "s1" because its restart_lsn 0/2CA0 > > > > exceeds max_slot_wal_keep_size > > > > > > Somewhat the second and fourth lines look inconsistent each other but > > > that wouldn't be such a problem. I don't think we want to concatenate > > > the two lines together as the result is a bit too long. > > > > > > > LOG: terminating process 17614 to release replication slot "s1" > > > > because it's restart_lsn 0/2CA0 exceeds max_slot_wal_keep_size. > > > > > > What do you think about this? > > > > Agree. I think we should also specify the restart_lsn value which > > would be within max_slot_wal_keep_size for better understanding. > > Thanks! It seems to me the main message of the "invalidating" log has > no room for further detail. So I split the reason out to DETAILS line > the same way with the "terminating" message in the attached second > patch. (It is separated from the first patch just for review) I > believe someone can make the DETAIL message simpler or more natural. > > The attached patch set emits the following message. > > > LOG: invalidating slot "s1" > > DETAIL: The slot's restart_lsn 0/1D68 is behind the limit 0/1100 > > defined by max_slot_wal_keep_size. > > The second line could be changed like the following or anything other. > > > DETAIL: The slot's restart_lsn 0/1D68 got behind the limit 0/1100 > > determined by max_slot_wal_keep_size. > . > The second version looks better as it gives more details. I am fine with either of the above wordings. I would prefer everything in the same message though since "invalidating slot ..." is too short a LOG message. Not everybody enabled details always. -- Best Wishes, Ashutosh Bapat
pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary
Hi, pg_archivecleanup currently takes a WAL file name as input to delete the WAL files prior to it [1]. As suggested by Satya (cc-ed) in pg_replslotdata thread [2], can we enhance the pg_archivecleanup to automatically detect the last checkpoint (from control file) LSN, calculate the lowest restart_lsn required by the replication slots, if any (by reading the replication slot info from pg_logical directory), archive the unneeded (an archive_command similar to that of the one provided in the server config can be provided as an input) WAL files before finally deleting them? Making pg_archivecleanup tool as an end-to-end solution will help greatly in disk full situations because of WAL files growth (inactive replication slots, archive command failures, infrequent checkpoint etc.). Thoughts? [1] - When used as a standalone program all WAL files logically preceding the oldestkeptwalfile will be removed from archivelocation. https://www.postgresql.org/docs/devel/pgarchivecleanup.html [2] - https://www.postgresql.org/message-id/CAHg%2BQDc9xwN7EmuONT3T91pCqFG6Q-BCe6B-kM-by7r1uPEicg%40mail.gmail.com Regards, Bharath Rupireddy.
Re: Multi-Column List Partitioning
On Tue, Dec 21, 2021 at 6:34 PM Nitin Jadhav wrote: > > --- > > > + if (isnulls && isnulls[i]) > > + cmpval = 0; /* NULL "=" NULL */ > > + else > > + cmpval = 1; /* NULL ">" not-NULL */ > > + } > > + else if (isnulls && isnulls[i]) > > + cmpval = -1;/* not-NULL "<" NULL */ > > > > I really doubt this assumption is correct; aren't those strict operators? > > Now there are possibilities of multiple NULL values. We should have a > mechanism to sort it when the bound values contain Non NULL and NULL > values. As per the above logic we put the NULL values at the end. > Please let me know if I am wrong. Ok, but I am not sure about the comparison approach, let's see what others think. > --- [...] > > > typedef struct PartitionBoundInfoData > > { > >charstrategy; /* hash, list or range? */ > > + int partnatts; /* number of partition key columns */ > >int ndatums;/* Length of the datums[] array */ > >Datum **datums; > > + bool **isnulls; > > > > Adding "partnatts" to this struct seems to be unnecessary, AFAIUC, > > added that for partition_bound_accepts_nulls(), but we can easily get > > that value from the partitioning key & pass an additional argument. > > Also, no information about the length of the "isnulls" array. > > This is required during merge_list_bounds(). AFAIK partition key > information is not available here. > You can get that as an argument, see merge_range_bounds(). > > I think it would be helpful if you could split the patch: one for > > multi-value list partitioning and another for the partition wise join, > > thanks. > > I have split the patch into 2 patches. One is for the multi column > list partitioning core changes and the other is for partition-wise > join support. Each patch has its respective test cases in the > regression suit and regression tests run successfully on each patch. > Kindly let me know if any other changes are required here. > Thanks, for the slit that is much helpful, I have a few comments for the 0001 patch as follow: + char **colname = (char **) palloc0(partnatts * sizeof(char *)); palloc0 is unnecessary. --- + foreach(cell2, rowexpr->args) + { + int idx = foreach_current_index(cell2); + Node*expr = lfirst(cell2); + Const*val = + transformPartitionBoundValue(pstate, expr, colname[i], + get_partition_col_typid(key, idx), + get_partition_col_typmod(key, idx), + get_partition_col_collation(key, idx)); + + values = lappend(values, val); + } Array index for colname should be "idx". --- result->scan_default = partition_bound_has_default(boundinfo); + return result; ... /* Always include the default partition if any. */ result->scan_default = partition_bound_has_default(boundinfo); - return result; ... else result->scan_default = partition_bound_has_default(boundinfo); + return result; ... - /* Add columns specified to SET NULL or SET DEFAULT if provided. */ + /* +* Add columns specified to SET NULL or SET DEFAULT if +* provided. +*/ spurious change -- look like something not related to your patch. -- -* For range partitioning, we must only perform pruning with values -* for either all partition keys or a prefix thereof. +* For range partitioning and list partitioning, we must only perform +* pruning with values for either all partition keys or a prefix +* thereof. */ - if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE) + if (keyno > nvalues && (context->strategy == PARTITION_STRATEGY_RANGE || + context->strategy == PARTITION_STRATEGY_LIST)) break; I think this is not true for multi-value list partitions, we might still want prune partitions for e.g. (100, IS NULL, 20). Correct me if I am missing something here. --- /* -* For range partitioning, if we have no clauses for the current key, -* we can't consider any later keys either, so we can stop here. +* For range partitioning and list partitioning, if we have no clauses +* for the current key, we can't consider any later keys either, so we +* can stop here. */ - if (part_scheme->strategy == PARTITION_STRATEGY_RANGE && + if ((part_scheme->strategy == PARTITION_STRATEGY_RANGE || +part_scheme->strategy == PARTITION_STRATEGY_LIST) && clauselist == NIL) break Similarly, why would this be true for list partitioning? How can we prune partitions if values is for e.g. (100, , 20). -- - if (bms_is_member(keyno, opstep->nullkeys)) + if (bms_is_member(keyno, opstep->nullkeys) && + context->strategy != PARTITION_STRATEGY_LIST) continue; Will that p
Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)
Hello Justin, It seems that the v31 patch does not apply anymore: postgresql> git apply ~/v31-0001-Document-historic-behavior-of-links-to-directori.patch error: patch failed: doc/src/sgml/func.sgml:27410 error: doc/src/sgml/func.sgml: patch does not apply -- Fabien.
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
On Thu, Dec 23, 2021 at 5:53 AM SATYANARAYANA NARLAPURAM wrote: > > Hi Hackers, > > I am considering implementing RPO (recovery point objective) enforcement > feature for Postgres where the WAL writes on the primary are stalled when the > WAL distance between the primary and standby exceeds the configured > (replica_lag_in_bytes) threshold. This feature is useful particularly in the > disaster recovery setups where primary and standby are in different regions > and synchronous replication can't be set up for latency and performance > reasons yet requires some level of RPO enforcement. Limiting transaction rate when the standby fails behind is a good feature ... > > The idea here is to calculate the lag between the primary and the standby > (Async?) server during XLogInsert and block the caller until the lag is less > than the threshold value. We can calculate the max lag by iterating over > ReplicationSlotCtl->replication_slots. If this is not something we don't want > to do in the core, at least adding a hook for XlogInsert is of great value. but doing it in XLogInsert does not seem to be a good idea. It's a common point for all kinds of logging including VACUUM. We could accidently stall a critical VACUUM operation because of that. As Bharath described, it better be handled at the application level monitoring. -- Best Wishes, Ashutosh Bapat
Re: Buildfarm support for older versions
On 12/22/21 23:20, Larry Rosenman wrote: > On 12/22/2021 10:15 pm, Tom Lane wrote: >> Larry Rosenman writes: >>> On 12/22/2021 9:59 pm, Tom Lane wrote: Does it work if you drop --enable-nls? (It'd likely be worth fixing if so, but I'm trying to narrow the possible causes.) >> >>> Nope... >> >> OK. Since 9.3 succeeds, it seems like it's a link problem >> we fixed at some point. Can you bisect to find where we >> fixed it? >> >> regards, tom lane > > I can try -- I haven't been very good at that. > > I can give you access to the machine and the id the Buildfarm runs under. > > (or give me a good process starting from a buildfarm layout). > > I will work on it on my FBSD setup. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Allow escape in application_name
On 2021/12/17 16:50, Kyotaro Horiguchi wrote: Thus rewriting the code we're focusing on like the following would make sense to me. if (strcmp(keywords[i], "application_name") == 0) { values[i] = process_pgfdw_appname(values[i]); /* * Break if we have a non-empty string. If we end up failing with * all candidates, fallback_application_name would work. */ if (appanme[0] != '\0') break; } I'm ok to remove the check "values[i] != NULL", but think that it's better to keep the other check "*(values[i]) != '\0'" as it is. Because *(values[i]) can be null character and it's a waste of cycles to call process_pgfdw_appname() in that case. Thanks for revisiting. #1. use "[unknown]" #2. add the check but not use "[unknown]" #3. don't add the check (i.e., what the current patch does) For now, I'm ok to choose #2 or #3. As I said before, given that we don't show "unkown" or somethig like as the fallback, I'm fine with not having a NULL check since anyway it bumps into SEGV immediately. In short I'm fine with #3 here. Yep, let's use #3 approach. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: sequences vs. synchronous replication
On 2021/12/23 3:49, Tomas Vondra wrote: Attached is a patch tweaking WAL logging - in wal_level=minimal we do the same thing as now, in higher levels we log every sequence fetch. Thanks for the patch! With the patch, I found that the regression test for sequences failed. + fetch = log = fetch; This should be "log = fetch"? On second thought, originally a sequence doesn't guarantee that the value already returned by nextval() will never be returned by subsequent nextval() after the server crash recovery. That is, nextval() may return the same value across crash recovery. Is this understanding right? For example, this case can happen if the server crashes after nextval() returned the value but before WAL for the sequence was flushed to the permanent storage. So it's not a bug that sync standby may return the same value as already returned in the primary because the corresponding WAL has not been replicated yet, isn't it? BTW, if the returned value is stored in database, the same value is guaranteed not to be returned again after the server crash or by sync standby. Because in that case the WAL of the transaction storing that value is flushed and replicated. So I think this makes it acceptable / manageable. Of course, this means the values are much less monotonous (across backends), but I don't think we really promised that. And I doubt anyone is really using sequences like this (just nextval) in performance critical use cases. I think that this approach is not acceptable to some users. So, if we actually adopt WAL-logging every sequence fetch, also how about exposing SEQ_LOG_VALS as reloption for a sequence? If so, those who want to log every sequence fetch can set this SEQ_LOG_VALS reloption to 0. OTOH, those who prefer the current behavior in spite of the risk we're discussing at this thread can set the reloption to 32 like it is for now, for example. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary
On Thu, Dec 23, 2021, at 9:58 AM, Bharath Rupireddy wrote: > pg_archivecleanup currently takes a WAL file name as input to delete > the WAL files prior to it [1]. As suggested by Satya (cc-ed) in > pg_replslotdata thread [2], can we enhance the pg_archivecleanup to > automatically detect the last checkpoint (from control file) LSN, > calculate the lowest restart_lsn required by the replication slots, if > any (by reading the replication slot info from pg_logical directory), > archive the unneeded (an archive_command similar to that of the one > provided in the server config can be provided as an input) WAL files > before finally deleting them? Making pg_archivecleanup tool as an > end-to-end solution will help greatly in disk full situations because > of WAL files growth (inactive replication slots, archive command > failures, infrequent checkpoint etc.). pg_archivecleanup is a tool to remove WAL files from the *archive*. Are you suggesting to use it for removing files from pg_wal directory too? No, thanks. WAL files are a key component for backup and replication. Hence, you cannot deliberately allow a tool to remove WAL files from PGDATA. IMO this issue wouldn't occur if you have a monitoring system and alerts and someone to keep an eye on it. If the disk full situation was caused by a failed archive command or a disconnected standby, it is easy to figure out; the fix is simple. -- Euler Taveira EDB https://www.enterprisedb.com/
Add new function to convert 32-bit XID to 64-bit
Hi, When periodically collecting and accumulating statistics or status information like pg_locks, pg_stat_activity, pg_prepared_xacts, etc for future troubleshooting or some reasons, I'd like to store a transaction ID of such information as 64-bit version so that the information of specified transaction easily can be identified and picked up by transaction ID. Otherwise it's not easy to distinguish transactions with the same 32-bit XID but different epoch, only by 32-bit XID. But since pg_locks or pg_stat_activity etc returns 32-bit XID, we could not store it as 64-bit version. To improve this situation, I'd like to propose to add new function that converts the given 32-bit XID (i.e., xid data type) to 64-bit one (xid8 data type). If we do this, for example we can easily get 64-bit XID from pg_stat_activity by "SELECT convert_xid_to_xid8(backend_xid) FROM pg_stat_activity", if necessary. Thought? As another approach, we can change the data type of pg_stat_activity.backend_xid or pg_locks.transaction, etc from xid to xid8. But this idea looks overkill to me, and it may break the existing applications accessing pg_locks etc. So IMO it's better to just provide the convertion function. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Add checkpoint and redo LSN to LogCheckpointEnd log message
Hi, It is useful (for debugging purposes) if the checkpoint end message has the checkpoint LSN and REDO LSN [1]. It gives more context while analyzing checkpoint-related issues. The pg_controldata gives the last checkpoint LSN and REDO LSN, but having this info alongside the log message helps analyze issues that happened previously, connect the dots and identify the root cause. Attaching a small patch herewith. Thoughts? [1] 2021-12-23 14:58:54.694 UTC [1965649] LOG: checkpoint starting: shutdown immediate 2021-12-23 14:58:54.714 UTC [1965649] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.025 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB; LSN=0/14D0AD0, REDO LSN=0/14D0AD0 Regards, Bharath Rupireddy. v1-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch Description: Binary data
Re: Are datcollate/datctype always libc even under --with-icu ?
Chapman Flack wrote: > Next question: the "currently" in that comment suggests that could change, > but is there any present intention to change it, or is this likely to just > be the way it is for the foreseeable future? Some related patches and discussions: * ICU as default collation provider https://commitfest.postgresql.org/21/1543/ * ICU for global collation https://commitfest.postgresql.org/25/2256/ Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Be clear about what log_checkpoints emits in the documentation
Hi, Currently the documentation of the log_checkpoints GUC says the following: Some statistics are included in the log messages, including the number of buffers written and the time spent writing them. Usage of the word "Some" makes it a vague statement. Why can't we just be clear about what statistics the log_checkpoints GUC can emit, like the attached patch? Thoughts? Regards, Bharath Rupireddy. v1-0001-Be-clear-about-what-log_checkpoints-emits-in-the-.patch Description: Binary data
Re: correct the sizes of values and nulls arrays in pg_control_checkpoint
On Thu, Dec 23, 2021, at 8:39 AM, Bharath Rupireddy wrote: > pg_control_checkpoint emits 18 columns whereas the values and nulls > arrays are defined to be of size 19. Although it's not critical, > attaching a tiny patch to fix this. Good catch! I'm wondering if a constant wouldn't be useful for such case. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: correct the sizes of values and nulls arrays in pg_control_checkpoint
On Thu, Dec 23, 2021 at 9:13 PM Euler Taveira wrote: > > On Thu, Dec 23, 2021, at 8:39 AM, Bharath Rupireddy wrote: > > pg_control_checkpoint emits 18 columns whereas the values and nulls > arrays are defined to be of size 19. Although it's not critical, > attaching a tiny patch to fix this. > > Good catch! I'm wondering if a constant wouldn't be useful for such case. Thanks. I thought of having a macro, but it creates a lot of diff with the previous versions as we have to change for other pg_control_XXX functions. Regards, Bharath Rupireddy.
Re: Delay the variable initialization in get_rel_sync_entry
On Wed, Dec 22, 2021, at 10:11 AM, houzj.f...@fujitsu.com wrote: > When reviewing some logical replication patches. I noticed that in > function get_rel_sync_entry() we always invoke get_rel_relispartition() > and get_rel_relkind() at the beginning which could cause unnecessary > cache access. > > --- > get_rel_sync_entry(PGOutputData *data, Oid relid) > { > RelationSyncEntry *entry; > bool am_partition = get_rel_relispartition(relid); > char relkind = get_rel_relkind(relid); > --- > > The extra cost could sometimes be noticeable because get_rel_sync_entry is a > hot function which is executed for each change. And the 'am_partition' and > 'relkind' are necessary only when we need to rebuild the RelationSyncEntry. > > Here is the perf result for the case when inserted large amounts of data into > a > un-published table in which case the cost is noticeable. > > --12.83%--pgoutput_change > |--11.84%--get_rel_sync_entry > |--4.76%--get_rel_relispartition > |--4.70%--get_rel_relkind Good catch. WFM. Deferring variable initialization close to its first use is good practice. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: Buildfarm support for older versions
On 12/23/21 08:50, Andrew Dunstan wrote: > On 12/22/21 23:20, Larry Rosenman wrote: >> On 12/22/2021 10:15 pm, Tom Lane wrote: >>> Larry Rosenman writes: On 12/22/2021 9:59 pm, Tom Lane wrote: > Does it work if you drop --enable-nls? (It'd likely be worth fixing > if so, but I'm trying to narrow the possible causes.) Nope... >>> OK. Since 9.3 succeeds, it seems like it's a link problem >>> we fixed at some point. Can you bisect to find where we >>> fixed it? >>> >>> regards, tom lane >> I can try -- I haven't been very good at that. >> >> I can give you access to the machine and the id the Buildfarm runs under. >> >> (or give me a good process starting from a buildfarm layout). >> >> > I will work on it on my FBSD setup. > > For the 9.2 error, try setting this in the config_env stanza: CFLAGS => '-O2 -fPIC', cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Buildfarm support for older versions
On 12/23/2021 10:13 am, Andrew Dunstan wrote: On 12/23/21 08:50, Andrew Dunstan wrote: On 12/22/21 23:20, Larry Rosenman wrote: On 12/22/2021 10:15 pm, Tom Lane wrote: Larry Rosenman writes: On 12/22/2021 9:59 pm, Tom Lane wrote: Does it work if you drop --enable-nls? (It'd likely be worth fixing if so, but I'm trying to narrow the possible causes.) Nope... OK. Since 9.3 succeeds, it seems like it's a link problem we fixed at some point. Can you bisect to find where we fixed it? regards, tom lane I can try -- I haven't been very good at that. I can give you access to the machine and the id the Buildfarm runs under. (or give me a good process starting from a buildfarm layout). I will work on it on my FBSD setup. For the 9.2 error, try setting this in the config_env stanza: CFLAGS => '-O2 -fPIC', cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com That got us further, but it dies on startdb: $ cat startdb-C-1.log waiting for server to start stopped waiting pg_ctl: could not start server Examine the log output. === db log file == LOG: unrecognized configuration parameter "unix_socket_directories" in file "/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/data-C/postgresql.conf" line 576 FATAL: configuration file "/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/data-C/postgresql.conf" contains errors $ And we have the errors on the other branches with a temp(?) directory. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
> On Dec 22, 2021, at 12:01 AM, Pavel Borisov wrote: > > Thank you, Mark! > > In v6 (PFA) I've made the changes on your advice i.e. > > - pg_amcheck with --checkunique option will ignore uniqueness check (with a > warning) if amcheck version in a db is <1.4 and doesn't support the feature. Ok. + int vmaj = 0, + vmin = 0, + vrev = 0; + const char *amcheck_version = pstrdup(PQgetvalue(result, 0, 1)); + + sscanf(amcheck_version, "%d.%d.%d", &vmaj, &vmin, &vrev); The pstrdup is unnecessary but harmless. > - fixed unnecessary drop table in regression Ok. > - use the existing table for uniqueness check in 005_opclass_damage.pl It appears you still create a new table, bttest_unique, rather than using the existing table int4tbl. That's fine. > - added tests into 003_check.pl . It is only smoke test that just verifies > new functions. + +$node->command_checks_all( + [ + @cmd, '-s', 's1', '-i', 't1_btree', '--parent-check', + '--checkunique', 'db1' + ], + 2, + [$index_missing_relation_fork_re], + [$no_output_re], + 'pg_amcheck smoke test --parent-check'); + +$node->command_checks_all( + [ + @cmd, '-s', 's1', '-i', 't1_btree', '--heapallindexed', + '--rootdescend', '--checkunique', 'db1' + ], + 2, + [$index_missing_relation_fork_re], + [$no_output_re], + 'pg_amcheck smoke test --heapallindexed --rootdescend'); + +$node->command_checks_all( + [ @cmd, '--checkunique', '-d', 'db1', '-d', 'db2', '-d', 'db3', '-S', 's*' ], + 0, [$no_output_re], [$no_output_re], + 'pg_amcheck excluding all corrupt schemas'); + You have borrowed the existing tests but forgot to change their names. (The last line of each check is the test name, such as 'pg_amcheck smoke test --parent-check'.) Please make each test name unique. > - added test contrib/amcheck/t/004_verify_nbtree_unique.pl it is more > extensive test based on opclass damage which was intended to be main test for > amcheck, but which I've forgotten to add to commit in v5. > 005_opclass_damage.pl test, which you've seen in v5 is largely based on first > part of 004_verify_nbtree_unique.pl (with the later calling pg_amcheck, and > the former calling bt_index_check(), bt_index_parent_check() ) Ok. > - added forgotten upgrade script amcheck--1.3--1.4.sql (from v4) Ok. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
On Thu, Dec 23, 2021 at 5:18 AM Ashutosh Bapat wrote: > On Thu, Dec 23, 2021 at 5:53 AM SATYANARAYANA NARLAPURAM > wrote: > > > > Hi Hackers, > > > > I am considering implementing RPO (recovery point objective) enforcement > feature for Postgres where the WAL writes on the primary are stalled when > the WAL distance between the primary and standby exceeds the configured > (replica_lag_in_bytes) threshold. This feature is useful particularly in > the disaster recovery setups where primary and standby are in different > regions and synchronous replication can't be set up for latency and > performance reasons yet requires some level of RPO enforcement. > > Limiting transaction rate when the standby fails behind is a good feature > ... > > > > > The idea here is to calculate the lag between the primary and the > standby (Async?) server during XLogInsert and block the caller until the > lag is less than the threshold value. We can calculate the max lag by > iterating over ReplicationSlotCtl->replication_slots. If this is not > something we don't want to do in the core, at least adding a hook for > XlogInsert is of great value. > > but doing it in XLogInsert does not seem to be a good idea. XLogInsert isn't the best place to throttle/govern in a simple and fair way, particularly the long-running transactions on the server? > It's a > common point for all kinds of logging including VACUUM. We could > accidently stall a critical VACUUM operation because of that. > Agreed, but again this is a policy decision that DBA can relax/enforce. I expect RPO is in the range of a few 100MBs to GBs and on a healthy system typically lag never comes close to this value. The Hook implementation can take care of nitty-gritty details on the policy enforcement based on the needs, for example, not throttling some backend processes like vacuum, checkpointer; throttling based on the roles, for example not to throttle superuser connections; and throttling based on replay lag, write lag, checkpoint taking longer, closer to disk full. Each of these can be easily translated into GUCs. Depending on the direction of the thread on the hook vs a feature in the Core, I can add more implementation details. > As Bharath described, it better be handled at the application level > monitoring. > Both RPO based WAL throttling and application level monitoring can co-exist as each one has its own merits and challenges. Each application developer has to implement their own throttling logic and often times it is hard to get it right. > -- > Best Wishes, > Ashutosh Bapat >
Re: Buildfarm support for older versions
On 12/23/21 11:27, Larry Rosenman wrote: > >>> >> >> For the 9.2 error, try setting this in the config_env stanza: >> >> >> CFLAGS => '-O2 -fPIC', >> >> >> > > That got us further, but it dies on startdb: > $ cat startdb-C-1.log > waiting for server to start stopped waiting > pg_ctl: could not start server > Examine the log output. > === db log file == > LOG: unrecognized configuration parameter "unix_socket_directories" > in file > "/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/data-C/postgresql.conf" > line 576 > FATAL: configuration file > "/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/data-C/postgresql.conf" > contains errors > $ > > And we have the errors on the other branches with a temp(?) directory. looks like it's picking up the wrong perl libraries. Please show us the output of grep -v secret /home/pgbuildfarm/buildroot/REL9_2_STABLE/$animal.lastrun-logs/web-txn.data cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)
On Thu, Dec 23, 2021 at 09:14:18AM -0400, Fabien COELHO wrote: > It seems that the v31 patch does not apply anymore: > > postgresql> git apply > ~/v31-0001-Document-historic-behavior-of-links-to-directori.patch > error: patch failed: doc/src/sgml/func.sgml:27410 > error: doc/src/sgml/func.sgml: patch does not apply Thanks for continuing to follow this patch ;) I fixed a conflict with output/tablespace from d1029bb5a et seq. I'm not sure why you got a conflict with 0001, though. I think the 2nd half of the patches are still waiting for fixes to lstat() on windows. You complained before that there were too many patches, and I can see how it might be a pain depending on your workflow. But the division is deliberate. Dealing with patchsets is easy for me: I use "mutt" and for each patch attachment, I type "|git am" (or |git am -3). >From b8ed86daad1bf9fbb647f7c007130bb0878a2a94 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Mon, 16 Mar 2020 14:12:55 -0500 Subject: [PATCH v32 01/11] Document historic behavior of links to directories.. Backpatch to 9.5: pg_stat_file --- doc/src/sgml/func.sgml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index e58efce5865..b5c1befe627 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -27495,7 +27495,7 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8'); Returns a record containing the file's size, last access time stamp, last modification time stamp, last file status change time stamp (Unix platforms only), file creation time stamp (Windows only), and a flag -indicating if it is a directory. +indicating if it is a directory (or a symbolic link to a directory). This function is restricted to superusers by default, but other users -- 2.17.0 >From 88b324a6877ce916589872f5067bd5810e60608e Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Tue, 17 Mar 2020 13:16:24 -0500 Subject: [PATCH v32 02/11] Add tests on pg_ls_dir before changing it --- src/test/regress/expected/misc_functions.out | 24 src/test/regress/sql/misc_functions.sql | 8 +++ 2 files changed, 32 insertions(+) diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out index 1013d17f87d..830de507e7c 100644 --- a/src/test/regress/expected/misc_functions.out +++ b/src/test/regress/expected/misc_functions.out @@ -243,6 +243,30 @@ select count(*) > 0 from t (1 row) +select * from (select pg_ls_dir('.', false, true) as name) as ls where ls.name='.'; -- include_dot_dirs=true + name +-- + . +(1 row) + +select * from (select pg_ls_dir('.', false, false) as name) as ls where ls.name='.'; -- include_dot_dirs=false + name +-- +(0 rows) + +select pg_ls_dir('does not exist', true, false); -- ok with missingok=true + pg_ls_dir +--- +(0 rows) + +select pg_ls_dir('does not exist'); -- fails with missingok=false +ERROR: could not open directory "does not exist": No such file or directory +-- Check that expected columns are present +select * from pg_stat_file('.') limit 0; + size | access | modification | change | creation | isdir +--++--++--+--- +(0 rows) + -- -- Test replication slot directory functions -- diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql index 7ab9b2a1509..422a1369aeb 100644 --- a/src/test/regress/sql/misc_functions.sql +++ b/src/test/regress/sql/misc_functions.sql @@ -91,6 +91,14 @@ select count(*) > 0 from where spcname = 'pg_default') pts join pg_database db on pts.pts = db.oid; +select * from (select pg_ls_dir('.', false, true) as name) as ls where ls.name='.'; -- include_dot_dirs=true +select * from (select pg_ls_dir('.', false, false) as name) as ls where ls.name='.'; -- include_dot_dirs=false +select pg_ls_dir('does not exist', true, false); -- ok with missingok=true +select pg_ls_dir('does not exist'); -- fails with missingok=false + +-- Check that expected columns are present +select * from pg_stat_file('.') limit 0; + -- -- Test replication slot directory functions -- -- 2.17.0 >From 6b7b4a3a7f4fe3edcb5a67f467b10543fa47c343 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Mon, 9 Mar 2020 22:40:24 -0500 Subject: [PATCH v32 03/11] Add pg_ls_dir_metadata to list a dir with file metadata.. Generalize pg_ls_dir_files and retire pg_ls_dir Need catversion bumped? --- doc/src/sgml/func.sgml | 21 ++ src/backend/catalog/system_functions.sql | 1 + src/backend/utils/adt/genfile.c | 239 +++ src/include/catalog/pg_proc.dat | 12 + src/test/regress/expected/misc_functions.out | 24 ++ src/test/regress/expected/tablespace.out | 8 + src/test/regress/sql/misc_functions.sql | 11 + src/test/regress/sql/tablespace.sql |
Re: Buildfarm support for older versions
On 12/23/2021 11:23 am, Andrew Dunstan wrote: On 12/23/21 11:27, Larry Rosenman wrote: For the 9.2 error, try setting this in the config_env stanza: CFLAGS => '-O2 -fPIC', That got us further, but it dies on startdb: $ cat startdb-C-1.log waiting for server to start stopped waiting pg_ctl: could not start server Examine the log output. === db log file == LOG: unrecognized configuration parameter "unix_socket_directories" in file "/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/data-C/postgresql.conf" line 576 FATAL: configuration file "/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/data-C/postgresql.conf" contains errors $ And we have the errors on the other branches with a temp(?) directory. looks like it's picking up the wrong perl libraries. Please show us the output of grep -v secret /home/pgbuildfarm/buildroot/REL9_2_STABLE/$animal.lastrun-logs/web-txn.data cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com $ grep -v secret web-txn.data $changed_this_run = ''; $changed_since_success = ''; $branch = 'REL9_2_STABLE'; $status = 1; $stage = 'StartDb-C:1'; $animal = 'gerenuk'; $ts = 1640276469; $log_data = 'Last file mtime in snapshot: Wed Dec 15 23:00:28 2021 GMT === waiting for server to start stopped waiting pg_ctl: could not start server Examine the log output. === db log file == LOG: unrecognized configuration parameter "unix_socket_directories" in file "/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/data-C/postgresql.conf" line 576 FATAL: configuration file "/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/data-C/postgresql.conf" contains errors '; $confsum = 'This file was created by PostgreSQL configure 9.2.24, which was generated by GNU Autoconf 2.63. Invocation command line was $ ./configure --enable-cassert --enable-debug --enable-nls --enable-tap-tests \\ --with-perl --prefix=/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst \\ --with-pgport=5700 --cache-file=/home/pgbuildfarm/buildroot/accache-gerenuk/config-REL9_2_STABLE.cache hostname = borg.lerctr.org uname -m = amd64 uname -r = 14.0-CURRENT uname -s = FreeBSD uname -v = FreeBSD 14.0-CURRENT #26 main-n251898-acdc1de369a: Wed Dec 22 18:11:08 CST 2021 r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL /usr/bin/uname -p = amd64 PATH: /home/ler/.asdf/shims PATH: /home/ler/.asdf/bin PATH: /sbin PATH: /bin PATH: /usr/sbin PATH: /usr/bin PATH: /usr/local/sbin PATH: /usr/local/bin PATH: /home/ler/bin PATH: /home/ler/go/bin PATH: /usr/local/opt/terraform@0.12/bin PATH: /home/ler/bin $Script_Config = { \'alerts\' => {}, \'animal\' => \'gerenuk\', \'base_port\' => 5678, \'bf_perl_version\' => \'5.32.1\', \'build_env\' => { \'INCLUDES\' => \'-I/usr/local/include\', \'LDFLAGS\' => \'-L/usr/local/lib\' }, \'build_root\' => \'/home/pgbuildfarm/buildroot\', \'ccache_failure_remove\' => undef, \'config\' => [], \'config_env\' => { \'CFLAGS\' => \'-O2 -fPIC\' }, \'config_opts\' => [ \'--enable-cassert\', \'--enable-debug\', \'--enable-nls\', \'--enable-tap-tests\', \'--with-perl\' ], \'core_file_glob\' => \'*.core*\', \'extra_config\' => { \'DEFAULT\' => [ \'log_line_prefix = \\\'%m [%p:%l] %q%a \\\'\', \'log_connections = \\\'true\\\'\', \'log_disconnections = \\\'true\\\'\', \'log_statement = \\\'all\\\'\', \'fsync = off\' ] }, \'force_every\' => {}, \'git_ignore_mirror_failure\' => 1, \'git_keep_mirror\' => 1, \'git_use_workdirs\' => 1, \'invocation_args\' => [ \'--config\', \'/home/pgbuildfarm/conf/gerenuk.conf\', \'--test\', \'
Re: Buildfarm support for older versions
On 12/23/21 12:23, Andrew Dunstan wrote: > On 12/23/21 11:27, Larry Rosenman wrote: >>> For the 9.2 error, try setting this in the config_env stanza: >>> >>> >>> CFLAGS => '-O2 -fPIC', >>> >>> >>> >> That got us further, but it dies on startdb: >> $ cat startdb-C-1.log >> waiting for server to start stopped waiting >> pg_ctl: could not start server >> Examine the log output. >> === db log file == >> LOG: unrecognized configuration parameter "unix_socket_directories" >> in file >> "/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/data-C/postgresql.conf" >> line 576 >> FATAL: configuration file >> "/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/data-C/postgresql.conf" >> contains errors >> $ >> >> And we have the errors on the other branches with a temp(?) directory. > > looks like it's picking up the wrong perl libraries. Please show us the > output of > >grep -v secret > /home/pgbuildfarm/buildroot/REL9_2_STABLE/$animal.lastrun-logs/web-txn.data > > Oh, you need to be building with the buildfarm client's git tip, not the released code. Alternatively, apply this patch: https://github.com/PGBuildFarm/client-code/commit/75c762ba74fdec96ebf6c2433d61d3eeead825c3 cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
> > The pstrdup is unnecessary but harmless. > > > - use the existing table for uniqueness check in 005_opclass_damage.pl > > It appears you still create a new table, bttest_unique, rather than using > the existing table int4tbl. That's fine. > > > - added tests into 003_check.pl . It is only smoke test that just > verifies new functions. > > You have borrowed the existing tests but forgot to change their names. > (The last line of each check is the test name, such as 'pg_amcheck smoke > test --parent-check'.) Please make each test name unique. > Thanks for your review! Fixed all these remaining things from patch v6. PFA v7 patch. --- Best regards, Maxim Orlov. v7-0001-Add-option-for-amcheck-and-pg_amcheck-to-check-un.patch Description: Binary data
Re: sequences vs. synchronous replication
On 12/23/21 15:42, Fujii Masao wrote: On 2021/12/23 3:49, Tomas Vondra wrote: Attached is a patch tweaking WAL logging - in wal_level=minimal we do the same thing as now, in higher levels we log every sequence fetch. Thanks for the patch! With the patch, I found that the regression test for sequences failed. + fetch = log = fetch; This should be "log = fetch"? On second thought, originally a sequence doesn't guarantee that the value already returned by nextval() will never be returned by subsequent nextval() after the server crash recovery. That is, nextval() may return the same value across crash recovery. Is this understanding right? For example, this case can happen if the server crashes after nextval() returned the value but before WAL for the sequence was flushed to the permanent storage. I think the important step is commit. We don't guarantee anything for changes in uncommitted transactions. If you do nextval in a transaction and the server crashes before the WAL gets flushed before COMMIT, then yes, nextval may generate the same nextval again. But after commit that is not OK - it must not happen. So it's not a bug that sync standby may return the same value as already returned in the primary because the corresponding WAL has not been replicated yet, isn't it? No, I don't think so. Once the COMMIT happens (and gets confirmed by the sync standby), it should be possible to failover to the sync replica without losing any data in committed transaction. Generating duplicate values is a clear violation of that. IMHO the fact that we allow a transaction to commit (even just locally) without flushing all the WAL it depends on is clearly a data loss bug. BTW, if the returned value is stored in database, the same value is guaranteed not to be returned again after the server crash or by sync standby. Because in that case the WAL of the transaction storing that value is flushed and replicated. True, assuming the table is WAL-logged etc. I agree the issue may be affecting a fairly small fraction of workloads, because most people use sequences to generate data for inserts etc. So I think this makes it acceptable / manageable. Of course, this means the values are much less monotonous (across backends), but I don't think we really promised that. And I doubt anyone is really using sequences like this (just nextval) in performance critical use cases. I think that this approach is not acceptable to some users. So, if we actually adopt WAL-logging every sequence fetch, also how about exposing SEQ_LOG_VALS as reloption for a sequence? If so, those who want to log every sequence fetch can set this SEQ_LOG_VALS reloption to 0. OTOH, those who prefer the current behavior in spite of the risk we're discussing at this thread can set the reloption to 32 like it is for now, for example. I think it'd be worth explaining why you think it's not acceptable? I've demonstrated the impact on regular workloads (with other changes that write stuff to WAL) is not measurable, and enabling sequence caching eliminates most of the overhead for the rare corner case workloads if needed. It does generate a bit more WAL, but the sequence WAL records are pretty tiny. I'm opposed to adding relooptions that affect correctness - it just seems like a bad idea to me. Moreover setting the CACHE for a sequence does almost the same thing - if you set CACHE 32, we only generate WAL once every 32 increments. The only difference is that this cache is not shared between backends, so one backend will generate 1,2,3,... and another backend will generate 33,34,35,... etc. I don't think that's a problem, because if you want strictly monotonous / gap-less sequences you can't use our sequences anyway. Yes, with short-lived backends this may consume the sequences faster, but well - short-lived backends are expensive anyway and overflowing bigserial is still unlikely. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: CREATEROLE and role ownership hierarchies
On Tue, Dec 21, 2021 at 8:26 PM Mark Dilger wrote: > > > > > On Dec 21, 2021, at 5:11 PM, Shinya Kato > > wrote: > > > > I fixed the patches because they cannot be applied to HEAD. > > Thank you. I reviewed and tested these and they LGTM. FYI the rebased v3 patches upthread are raw diffs so git am won't apply them. I can add myself to the CF as a reviewer if it is helpful.
Fwd: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
Please find the attached draft patch. On Thu, Dec 23, 2021 at 2:47 AM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Thu, Dec 23, 2021 at 5:53 AM SATYANARAYANA NARLAPURAM > wrote: > > > > Hi Hackers, > > > > I am considering implementing RPO (recovery point objective) enforcement > feature for Postgres where the WAL writes on the primary are stalled when > the WAL distance between the primary and standby exceeds the configured > (replica_lag_in_bytes) threshold. This feature is useful particularly in > the disaster recovery setups where primary and standby are in different > regions and synchronous replication can't be set up for latency and > performance reasons yet requires some level of RPO enforcement. > > +1 for the idea in general. However, blocking writes on primary seems > an extremely radical idea. The replicas can fall behind transiently at > times and blocking writes on the primary may stop applications failing > for these transient times. This is not a problem if the applications > have retry logic for the writes. How about blocking writes on primary > if the replicas fall behind the primary for a certain period of time? > My proposal is to block the caller from writing until the lag situation is improved. Don't want to throw any errors and fail the tranaction. I think we are aligned? > > > The idea here is to calculate the lag between the primary and the > standby (Async?) server during XLogInsert and block the caller until the > lag is less than the threshold value. We can calculate the max lag by > iterating over ReplicationSlotCtl->replication_slots. > > The "falling behind" can also be quantified by the number of > write-transactions on the primary. I think it's good to have the users > choose what the "falling behind" means for them. We can have something > like the "recovery_target" param with different options name, xid, > time, lsn. > The transactions can be of arbitrary size and length and these options may not provide the desired results. Time is a worthy option to add. > > > If this is not something we don't want to do in the core, at least > adding a hook for XlogInsert is of great value. > > IMHO, this feature may not be needed by everyone, the hook-way seems > reasonable so that the postgres vendors can provide different > implementations (for instance they can write an extension that > implements this hook which can block writes on primary, write some log > messages, inform some service layer of the replicas falling behind the > primary etc.). If we were to have the hook in XLogInsert which gets > called so frequently or XLogInsert is a hot-path, the hook really > should do as little work as possible, otherwise the write operations > latency may increase. > A Hook is a good start. If there is enough interest then an extension can be added to the contrib module. > > A few other scenarios I can think of with the hook are: > > > > Enforcing RPO as described above > > Enforcing rate limit and slow throttling when sync standby is falling > behind (could be flush lag or replay lag) > > Transactional log rate governance - useful for cloud providers to > provide SKU sizes based on allowed WAL writes. > > > > Thoughts? > > The hook can help to achieve the above objectives but where to place > it and what parameters it should take as input (or what info it should > emit out of the server via the hook) are important too. > XLogInsert in my opinion is the best place to call it and the hook can be something like this "void xlog_insert_hook(NULL)" as all the throttling logic required is the current flush position which can be obtained from GetFlushRecPtr and the ReplicationSlotCtl. Attached a draft patch. > > Having said all, the RPO feature can also be implemented outside of > the postgres, a simple implementation could be - get the primary > current wal lsn using pg_current_wal_lsn and all the replicas > restart_lsn using pg_replication_slot, if they differ by certain > amount, then issue ALTER SYSTEM SET READ ONLY command [1] on the > primary, this requires the connections to the server and proper access > rights. This feature can also be implemented as an extension (without > the hook) which doesn't require any connections to the server yet can > access the required info primary current_wal_lsn, restart_lsn of the > replication slots etc, but the RPO enforcement may not be immediate as > the server doesn't have any hooks in XLogInsert or some other area. > READ ONLY is a decent choice but can fail the writes or not take into effect until the end of the transaction? > [1] - > https://www.postgresql.org/message-id/CAAJ_b967uKBiW6gbHr5aPzweURYjEGv333FHVHxvJmMhanwHXA%40mail.gmail.com > > Regards, > Bharath Rupireddy. > 0001-Add-xlog_insert_hook-to-give-control-to-the-plugins.patch Description: Binary data
Re: Delay the variable initialization in get_rel_sync_entry
On Thu, Dec 23, 2021 at 12:54:41PM -0300, Euler Taveira wrote: > On Wed, Dec 22, 2021, at 10:11 AM, houzj.f...@fujitsu.com wrote: >> The extra cost could sometimes be noticeable because get_rel_sync_entry is a >> hot function which is executed for each change. And the 'am_partition' and >> 'relkind' are necessary only when we need to rebuild the RelationSyncEntry. >> >> Here is the perf result for the case when inserted large amounts of data >> into a >> un-published table in which case the cost is noticeable. >> >> --12.83%--pgoutput_change >> |--11.84%--get_rel_sync_entry >> |--4.76%--get_rel_relispartition >> |--4.70%--get_rel_relkind How does the perf balance change once you apply the patch? Do we have anything else that stands out? Getting rid of this bottleneck is fine by itself, but I am wondering if there are more things to worry about or not. > Good catch. WFM. Deferring variable initialization close to its first use is > good practice. Yeah, it is usually a good practice to have the declaration within the code block that uses it rather than piling everything at the beginning of the function. Being able to see that in profiles is annoying, and the change is simple, so I'd like to backpatch it. This is a period of vacations for a lot of people, so I'll wait until the beginning-ish of January before doing anything. -- Michael signature.asc Description: PGP signature
Inconsistent ellipsis in regression test error message?
The most recent cfbot run for a patch I am interested in has failed a newly added regression test. Please see http://cfbot.cputube.org/ for 36/2906 ~ The failure logs [2] are very curious because the error message is what was expected but it has a different position of the ellipsis (...). But only for Windows. -- fail - publication WHERE clause must be boolean ALTER PUBLICATION testpub5 SET TABLE testpub_rf_tbl3 WHERE (1234); ERROR: argument of PUBLICATION WHERE must be type boolean, not type integer -LINE 1: ALTER PUBLICATION testpub5 SET TABLE testpub_rf_tbl3 WHERE (... +LINE 1: ...PUBLICATION testpub5 SET TABLE testpub_rf_tbl3 WHERE (1234); How is that possible? Is this a problem caused by the patch code? If so, how? Or is this some obscure boundary case bug of the error ellipsis calculation which I've exposed by accident due to the specific length of my bad command? Thanks for any ideas! -- [2] https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.157408 Kind Regards, Peter Smith. Fujitsu Australia
Re: Buildfarm support for older versions
On 12/23/2021 11:58 am, Andrew Dunstan wrote: Oh, you need to be building with the buildfarm client's git tip, not the released code. Alternatively, apply this patch: https://github.com/PGBuildFarm/client-code/commit/75c762ba74fdec96ebf6c2433d61d3eeead825c3 with git tip: output attached. I can give access if needed. $ grep -v secret conf/gerenuk.conf # -*-perl-*- hey - emacs - this is a perl file =comment Copyright (c) 2003-2010, Andrew Dunstan See accompanying License file for license details =cut package PGBuild; use strict; use warnings FATAL => 'qw'; use vars qw(%conf); # use vars qw($VERSION); $VERSION = 'REL_4.19'; my $branch; { no warnings qw(once); $branch = $main::branch; } %conf =( # identity animal => "gerenuk", # source code scm => 'git', # or 'cvs' git_keep_mirror => 1, # manage a git mirror in the build root git_ignore_mirror_failure => 1, # ignore failures in fetching to mirror # use symlinked git repo from non-HEAD branches, # like git-new-workdir does git_use_workdirs => 1, scmrepo => undef, # default is community repo for either type scm_url => undef, # webref for diffs on server - use default for community # git_reference => undef, # for --reference on git repo # cvsmethod => 'update', # or 'export' use_git_cvsserver => undef, # or 'true' if repo is a git cvsserver # external commands and control make => 'gmake', # or gmake if required. can include path if necessary. make_jobs => 10, # >1 for parallel "make" and "make check" steps tar_log_cmd => undef, # default is "tar -z -cf runlogs.tgz *.log" # replacement must have the same effect # max time in seconds allowed for a single branch run # undef/0 means unlimited wait_timeout => undef, # where and how to build # must be absolute, can be either Unix or Windows style for MSVC # undef means default, buildroot dir in script directory build_root => '/home/pgbuildfarm/buildroot', , # or '/path/to/buildroot', use_vpath => undef, # set true to do vpath builds # path to directory with auxiliary web script # if relative, the must be relative to buildroot/branch # Now only used on older Msys installations # aux_path => "../..", keep_error_builds => 0, core_file_glob => "*.core*", # Linux style, use "*.core" for BSD # where to report status target => "https://buildfarm.postgresql.org/cgi-bin/pgstatus.pl";, # where to report change in OS version or compiler version upgrade_target => "https://buildfarm.postgresql.org/cgi-bin/upgrade.pl";, # change this to a true value if using MSVC, in which case also # see MSVC section below using_msvc => undef, # if force_every is a scalar it will be used on all branches, like this # for legacy reasons: # force_every => 336 , # max hours between builds, undef or 0 = unforced # we now prefer it to be a hash with branch names as the keys, like this # # this setting should be kept conservatively high, or not used at all - # for the most part it's best to let the script decide if something # has changed that requires a new run for the branch. # # an entry with a name of 'default' matches any branch not named force_every => { # HEAD => 48, # REL8_3_STABLE => 72, # default => 168, }, # alerts are triggered if the server doesn't see a build on a branch after # this many hours, and then sent out every so often, alerts => { #HEAD => { alert_after => 72, alert_every => 24 }, # REL8_1_STABLE => { alert_after => 240, alert_every => 48 }, }, # include / exclude patterns for files that trigger a build # if both are specified then they are both applied as filters # undef means don't ignore anything. # exclude qr[^doc/|\.po$] to ignore changes to docs and po files # (recommended) # undef means null filter. trigger_exclude => qr[^doc/|\.po$], trigger_include => undef, # settings for mail notices - default to notifying nobody # these lists contain addresses to be notified # must be complete email addresses, as the email is sent from the server mail_events =>{ all => [], # unconditional fail => ["ler\@lerctr.org"], # if this build fails change => ["ler\@lerctr.org"], # if this build causes a state change green => ["ler\@lerctr.org"], # if this build causes a state change to/from OK }, # if this flag is set and ccache is used, an unsuccessful run will result # in the removal of the ccache directory (and you need to make sure that # its parent is writable). The default is off - ccache should be able to # handle failures, although there have been suspicions in the past that # it's not quite as reliable as we'd want, and thus we have this option. ccache_failure_remove => u
Re: parallel vacuum comments
On Thu, Dec 23, 2021 at 10:56 AM Masahiko Sawada wrote: > > On Wed, Dec 22, 2021 at 10:55 PM Amit Kapila wrote: > > > > On Wed, Dec 22, 2021 at 6:22 PM Amit Kapila wrote: > > > > > > On Wed, Dec 22, 2021 at 5:39 PM houzj.f...@fujitsu.com > > > wrote: > > > > > > > > > > > > 2) > > > > +#include "utils/rel.h" > > > > +#include "utils/lsyscache.h" > > > > +#include "utils/memutils.h" > > > > > > > > It might be better to keep the header file in alphabetical order. > > > > : > > > > +#include "utils/lsyscache.h" > > > > +#include "utils/memutils.h" > > > > +#include "utils/rel.h" > > > > > > > > > > Right, I'll take care of this as I am already making some other edits > > > in the patch. > > > > > > > Fixed this and made a few other changes in the patch that includes (a) > > passed down the num_index_scans information in parallel APIs based on > > which it can make the decision whether to reinitialize DSM or consider > > conditional parallel vacuum clean up; (b) get rid of first-time > > variable in ParallelVacuumState as that is not required if we have > > num_index_scans information; (c) there seems to be quite a few > > unnecessary includes in vacuumparallel.c which I have removed; (d) > > unnecessary error callback info was being set in ParallelVacuumState > > in leader backend; (e) changed/added comments at quite a few places. > > > > Can you please once verify the changes in the attached? > > Thank you for updating the patch! > > I agreed with these changes and it looks good to me. > Pushed. As per my knowledge, we have addressed the improvements raised/discussed in this thread. -- With Regards, Amit Kapila.
Re: parallel vacuum comments
On Fri, Dec 24, 2021 at 11:59 AM Amit Kapila wrote: > > On Thu, Dec 23, 2021 at 10:56 AM Masahiko Sawada > wrote: > > > > On Wed, Dec 22, 2021 at 10:55 PM Amit Kapila > > wrote: > > > > > > On Wed, Dec 22, 2021 at 6:22 PM Amit Kapila > > > wrote: > > > > > > > > On Wed, Dec 22, 2021 at 5:39 PM houzj.f...@fujitsu.com > > > > wrote: > > > > > > > > > > > > > > > 2) > > > > > +#include "utils/rel.h" > > > > > +#include "utils/lsyscache.h" > > > > > +#include "utils/memutils.h" > > > > > > > > > > It might be better to keep the header file in alphabetical order. > > > > > : > > > > > +#include "utils/lsyscache.h" > > > > > +#include "utils/memutils.h" > > > > > +#include "utils/rel.h" > > > > > > > > > > > > > Right, I'll take care of this as I am already making some other edits > > > > in the patch. > > > > > > > > > > Fixed this and made a few other changes in the patch that includes (a) > > > passed down the num_index_scans information in parallel APIs based on > > > which it can make the decision whether to reinitialize DSM or consider > > > conditional parallel vacuum clean up; (b) get rid of first-time > > > variable in ParallelVacuumState as that is not required if we have > > > num_index_scans information; (c) there seems to be quite a few > > > unnecessary includes in vacuumparallel.c which I have removed; (d) > > > unnecessary error callback info was being set in ParallelVacuumState > > > in leader backend; (e) changed/added comments at quite a few places. > > > > > > Can you please once verify the changes in the attached? > > > > Thank you for updating the patch! > > > > I agreed with these changes and it looks good to me. > > > > Pushed. Thank you for committing the patch! > As per my knowledge, we have addressed the improvements > raised/discussed in this thread. Yeah, I think so too. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Emit a warning if the extension's GUC is set incorrectly
On 2021/12/22 3:33, Tom Lane wrote: I wrote: 0003 looks to me like it was superseded by 75d22069e. I do not particularly like that patch though; it seems both inefficient and ill-designed. 0003 is adding a check in an equally bizarre place. Why isn't add_placeholder_variable the place to deal with this? Also, once we know that a prefix is reserved, why don't we throw an error not just a warning for attempts to set some unknown variable under that prefix? We don't allow you to set unknown non-prefixed variables. Concretely, I think we should do the attached, which removes much of 75d22069e and does the check at the point of placeholder creation. (After looking closer, add_placeholder_variable's caller is the function that is responsible for rejecting bad names, not add_placeholder_variable itself.) This also fixes an indisputable bug in 75d22069e, namely that it did prefix comparisons incorrectly and would throw warnings for cases it shouldn't. Reserving "plpgsql" shouldn't have the effect of reserving "pl" as well. Thank you for creating the patch. This is exactly what I wanted to create. It looks good to me. I'm tempted to propose that we also rename EmitWarningsOnPlaceholders to something like MarkGUCPrefixReserved, to more clearly reflect what it does now. (We could provide the old name as a macro alias to avoid breaking extensions needlessly.) +1 -- Regards, -- Shinya Kato Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: more descriptive message for process termination due to max_slot_wal_keep_size
At Thu, 23 Dec 2021 18:08:08 +0530, Ashutosh Bapat wrote in > On Wed, Dec 15, 2021 at 9:42 AM Kyotaro Horiguchi > wrote: > > > LOG: invalidating slot "s1" > > > DETAIL: The slot's restart_lsn 0/1D68 is behind the limit 0/1100 > > > defined by max_slot_wal_keep_size. > > > > The second line could be changed like the following or anything other. > > > > > DETAIL: The slot's restart_lsn 0/1D68 got behind the limit > > > 0/1100 determined by max_slot_wal_keep_size. > > . > > > > The second version looks better as it gives more details. I am fine > with either of the above wordings. > > I would prefer everything in the same message though since > "invalidating slot ..." is too short a LOG message. Not everybody > enabled details always. Mmm. Right. I have gone too much to the same way with the process-termination message. I rearranged the meesages as follows in the attached version. (at master) > LOG: terminating process %d to release replication slot \"%s\" because its > restart_lsn %X/%X exceeds max_slot_wal_keep_size > DETAIL: The slot got behind the limit %X/%X determined by > max_slot_wal_keep_size. > LOG: invalidating slot \"%s\" because its restart_LSN %X/%X exceeds > max_slot_wal_keep_size c> DETAIL: The slot got behind the limit %X/%X determined by max_slot_wal_keep_size. The messages is actually incomplete even in 13 so I think the change to the errmsg() message of the first message is worth back-patching. - v3-0001-Make-a-message-on-process-termination-more-dscrip.patch Changes only the first main message and it can be back-patched to 14. - v3-0001-Make-a-message-on-process-termination-more-dscrip_13.patch The same to the above but for 13, which doesn't have LSN_FORMAT_ARGS. - v3-0002-Add-detailed-information-to-slot-invalidation-mes.patch Attaches the DETAIL line shown above to both messages, only for the master. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 4bfbb58801e9d2a0e0ab8320dd95bc850a0a397b Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 24 Dec 2021 12:52:07 +0900 Subject: [PATCH v3 1/2] Make a message on process termination more dscriptive The message at process termination due to slot limit doesn't provide the reason. In the major scenario the message is followed by another message about slot invalidatation, which shows the detail for the termination. However the second message is missing if the slot is temporary one. Augment the first message with the reason same as the second message. Backpatch through to 13 where the message was introduced. Reported-by: Alex Enachioaie Author: Kyotaro Horiguchi Reviewed-by: Ashutosh Bapat Reviewed-by: Bharath Rupireddy Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org Backpatch-through: 13 --- src/backend/replication/slot.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 90ba9b417d..0ceac3a54d 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1253,8 +1253,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN, if (last_signaled_pid != active_pid) { ereport(LOG, - (errmsg("terminating process %d to release replication slot \"%s\"", -active_pid, NameStr(slotname; + (errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size", +active_pid, NameStr(slotname), +LSN_FORMAT_ARGS(restart_lsn; (void) kill(active_pid, SIGTERM); last_signaled_pid = active_pid; -- 2.27.0 >From 444b3bb64ee907280606696e0cb90f6b022c9cd6 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 24 Dec 2021 13:23:54 +0900 Subject: [PATCH v3] Make a message on process termination more dscriptive The message at process termination due to slot limit doesn't provide the reason. In the major scenario the message is followed by another message about slot invalidatation, which shows the detail for the termination. However the second message is missing if the slot is temporary one. Augment the first message with the reason same as the second message. Backpatch through to 13 where the message was introduced. Reported-by: Alex Enachioaie Author: Kyotaro Horiguchi Reviewed-by: Ashutosh Bapat Reviewed-by: Bharath Rupireddy Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org Backpatch-through: 13 --- src/backend/replication/slot.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 02047ea920..15b8934ae2 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1228,8 +1228,10 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN, if (last_signaled_pid != active_pid) { er
Re: Allow escape in application_name
At Thu, 23 Dec 2021 23:10:38 +0900, Fujii Masao wrote in > > > On 2021/12/17 16:50, Kyotaro Horiguchi wrote: > > Thus rewriting the code we're focusing on like the following would > > make sense to me. > > > >>if (strcmp(keywords[i], "application_name") == 0) > >>{ > >>values[i] = process_pgfdw_appname(values[i]); > >> > >>/* > >> * Break if we have a non-empty string. If we end up failing > >> with > >> * all candidates, fallback_application_name would work. > >> */ > >>if (appanme[0] != '\0') (appname?) > >>break; > >>} > > I'm ok to remove the check "values[i] != NULL", but think that it's > better to keep the other check "*(values[i]) != '\0'" as it > is. Because *(values[i]) can be null character and it's a waste of > cycles to call process_pgfdw_appname() in that case. Right. I removed too much. > > Thanks for revisiting. > > > >> #1. use "[unknown]" > >> #2. add the check but not use "[unknown]" > >> #3. don't add the check (i.e., what the current patch does) > >> > >> For now, I'm ok to choose #2 or #3. > > As I said before, given that we don't show "unkown" or somethig like > > as the fallback, I'm fine with not having a NULL check since anyway it > > bumps into SEGV immediately. In short I'm fine with #3 here. > > Yep, let's use #3 approach. Agreed. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: row filtering for logical replication
The current PG docs text for CREATE PUBLICATION (in the v54-0001 patch) has a part that says + A nullable column in the WHERE clause could cause the + expression to evaluate to false; avoid using columns without not-null + constraints in the WHERE clause. I felt that the caution to "avoid using" nullable columns is too strongly worded. AFAIK nullable columns will work perfectly fine so long as you take due care of them in the WHERE clause. In fact, it might be very useful sometimes to filter on nullable columns. Here is a small test example: // publisher test_pub=# create table t1 (id int primary key, msg text null); test_pub=# create publication p1 for table t1 where (msg != 'three'); // subscriber test_sub=# create table t1 (id int primary key, msg text null); test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost dbname=test_pub application_name=sub1' PUBLICATION p1; // insert some data test_pub=# insert into t1 values (1, 'one'), (2, 'two'), (3, 'three'), (4, null), (5, 'five'); test_pub=# select * from t1; id | msg +--- 1 | one 2 | two 3 | three 4 | 5 | five (5 rows) // data at sub test_sub=# select * from t1; id | msg +-- 1 | one 2 | two 5 | five (3 rows) Notice the row 4 with the NULL is also not replicated. But, perhaps we were expecting it to be replicated (because NULL is not 'three'). To do this, simply rewrite the WHERE clause to properly account for nulls. // truncate both sides test_pub=# truncate table t1; test_sub=# truncate table t1; // alter the WHERE clause test_pub=# alter publication p1 set table t1 where (msg is null or msg != 'three'); // insert data at pub test_pub=# insert into t1 values (1, 'one'), (2, 'two'), (3, 'three'), (4, null), (5, 'five'); INSERT 0 5 test_pub=# select * from t1; id | msg +--- 1 | one 2 | two 3 | three 4 | 5 | five (5 rows) // data at sub (not it includes the row 4) test_sub=# select * from t1; id | msg +-- 1 | one 2 | two 4 | 5 | five (4 rows) ~~ So, IMO the PG docs wording for this part should be relaxed a bit. e.g. BEFORE: + A nullable column in the WHERE clause could cause the + expression to evaluate to false; avoid using columns without not-null + constraints in the WHERE clause. AFTER: + A nullable column in the WHERE clause could cause the + expression to evaluate to false. To avoid unexpected results, any possible + null values should be accounted for. Thoughts? -- Kind Regards, Peter Smith. Fujitsu Australia
Re: sequences vs. synchronous replication
At Thu, 23 Dec 2021 19:50:22 +0100, Tomas Vondra wrote in > On 12/23/21 15:42, Fujii Masao wrote: > > On 2021/12/23 3:49, Tomas Vondra wrote: > >> Attached is a patch tweaking WAL logging - in wal_level=minimal we do > >> the same thing as now, in higher levels we log every sequence fetch. > > Thanks for the patch! > > With the patch, I found that the regression test for sequences failed. > > + fetch = log = fetch; > > This should be "log = fetch"? > > On second thought, originally a sequence doesn't guarantee that the > > value already returned by nextval() will never be returned by > > subsequent nextval() after the server crash recovery. That is, > > nextval() may return the same value across crash recovery. Is this > > understanding right? For example, this case can happen if the server > > crashes after nextval() returned the value but before WAL for the > > sequence was flushed to the permanent storage. > > I think the important step is commit. We don't guarantee anything for > changes in uncommitted transactions. If you do nextval in a > transaction and the server crashes before the WAL gets flushed before > COMMIT, then yes, nextval may generate the same nextval again. But > after commit that is not OK - it must not happen. I don't mean to stand on Fujii-san's side particularly, but it seems to me sequences of RDBSs are not rolled back generally. Some googling told me that at least Oracle (documented), MySQL, DB2 and MS-SQL server doesn't rewind sequences at rollback, that is, sequences are incremented independtly from transaction control. It seems common to think that two nextval() calls for the same sequence must not return the same value in any context. > > So it's not a bug that sync standby may return the same value as > > already returned in the primary because the corresponding WAL has not > > been replicated yet, isn't it? > > > > No, I don't think so. Once the COMMIT happens (and gets confirmed by > the sync standby), it should be possible to failover to the sync > replica without losing any data in committed transaction. Generating > duplicate values is a clear violation of that. So, strictly speaking, that is a violation of the constraint I mentioned regardless whether the transaction is committed or not. However we have technical limitations as below. > IMHO the fact that we allow a transaction to commit (even just > locally) without flushing all the WAL it depends on is clearly a data > loss bug. > > > BTW, if the returned value is stored in database, the same value is > > guaranteed not to be returned again after the server crash or by sync > > standby. Because in that case the WAL of the transaction storing that > > value is flushed and replicated. > > > > True, assuming the table is WAL-logged etc. I agree the issue may be > affecting a fairly small fraction of workloads, because most people > use sequences to generate data for inserts etc. It seems to me, from the fact that sequences are designed explicitly untransactional and that behavior is widely adopted, the discussion might be missing some significant use-cases. But there's a possibility that the spec of sequence came from some technical limitation in the past, but I'm not sure.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
At Thu, 23 Dec 2021 20:35:54 +0530, Bharath Rupireddy wrote in > Hi, > > It is useful (for debugging purposes) if the checkpoint end message > has the checkpoint LSN and REDO LSN [1]. It gives more context while > analyzing checkpoint-related issues. The pg_controldata gives the last > checkpoint LSN and REDO LSN, but having this info alongside the log > message helps analyze issues that happened previously, connect the > dots and identify the root cause. > > Attaching a small patch herewith. Thoughts? A big +1 from me. I thought about proposing the same in the past. > [1] > 2021-12-23 14:58:54.694 UTC [1965649] LOG: checkpoint starting: > shutdown immediate > 2021-12-23 14:58:54.714 UTC [1965649] LOG: checkpoint complete: wrote > 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; > write=0.001 s, sync=0.001 s, total=0.025 s; sync files=0, > longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB; > LSN=0/14D0AD0, REDO LSN=0/14D0AD0 I thougt about something like the following, but your proposal may be clearer. > WAL range=[0/14D0340, 0/14D0AD0] regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Be clear about what log_checkpoints emits in the documentation
At Thu, 23 Dec 2021 20:56:22 +0530, Bharath Rupireddy wrote in > Hi, > > Currently the documentation of the log_checkpoints GUC says the following: > > Some statistics are included in the log messages, including the number > of buffers written and the time spent writing them. > > Usage of the word "Some" makes it a vague statement. Why can't we just > be clear about what statistics the log_checkpoints GUC can emit, like > the attached patch? > > Thoughts? It seems like simply a maintenance burden of documentation since it doesn't add any further detail of any item in a checkpoint log message. But I'm not sure we want detailed explanations for them and it seems to me we deliberately never explained the detail of any log messages. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: An obsolete comment of pg_stat_statements
At Mon, 22 Nov 2021 22:50:04 +0800, Julien Rouhaud wrote in > On Mon, Nov 22, 2021 at 2:48 PM Kyotaro Horiguchi > wrote: > > > > At Mon, 22 Nov 2021 15:38:23 +0900 (JST), Kyotaro Horiguchi > > wrote in > > > * queryId is supposed to be a valid value, otherwise this function > > > dosen't > > > * calucate it by its own as before then returns immediately. > > > > Mmm. That's bad. This is the correted version. > > > > * queryId is supposed to be a valid value, otherwise this function doesn't > > * calculate it by its own as before then returns immediately. > > Ah good catch! Indeed the semantics changed and I missed that comment. > > I think that the new comment should be a bit more precise about what > is a valid value and should probably not refer to a previous version > of the code. How about something like: > > - * If queryId is 0 then this is a utility statement for which we couldn't > - * compute a queryId during parse analysis, and we should compute a suitable > - * queryId internally. > + * If queryId is 0 then no query fingerprinting source has been enabled, so > we > + * act as if the extension was disabled: silently exit without doing any > work. > * Thanks! Looks better. It is used as-is in the attached. And I will register this to the next CF. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 1563bd869cb8da080e3488e64116da402f79be8c Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 24 Dec 2021 15:25:57 +0900 Subject: [PATCH] Fix a function comment Commit 5fd9dfa5f50e4906c35133a414ebec5b6d518493 forgot to fix the comment. Fix it so that it desribes the new behavior. Author: Julien Rouhaud Reviewed-by: Kyotaro Horiguchi --- contrib/pg_stat_statements/pg_stat_statements.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 726ba59e2b..3224da9275 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -1191,9 +1191,8 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, /* * Store some statistics for a statement. * - * If queryId is 0 then this is a utility statement for which we couldn't - * compute a queryId during parse analysis, and we should compute a suitable - * queryId internally. + * If queryId is 0 then no query fingerprinting source has been enabled, so we + * act as if the extension was disabled: silently exit without doing any work. * * If jstate is not NULL then we're trying to create an entry for which * we have no statistics as yet; we just want to record the normalized -- 2.27.0
Re: sequences vs. synchronous replication
On 12/24/21 06:37, Kyotaro Horiguchi wrote: At Thu, 23 Dec 2021 19:50:22 +0100, Tomas Vondra wrote in On 12/23/21 15:42, Fujii Masao wrote: On 2021/12/23 3:49, Tomas Vondra wrote: Attached is a patch tweaking WAL logging - in wal_level=minimal we do the same thing as now, in higher levels we log every sequence fetch. Thanks for the patch! With the patch, I found that the regression test for sequences failed. + fetch = log = fetch; This should be "log = fetch"? On second thought, originally a sequence doesn't guarantee that the value already returned by nextval() will never be returned by subsequent nextval() after the server crash recovery. That is, nextval() may return the same value across crash recovery. Is this understanding right? For example, this case can happen if the server crashes after nextval() returned the value but before WAL for the sequence was flushed to the permanent storage. I think the important step is commit. We don't guarantee anything for changes in uncommitted transactions. If you do nextval in a transaction and the server crashes before the WAL gets flushed before COMMIT, then yes, nextval may generate the same nextval again. But after commit that is not OK - it must not happen. I don't mean to stand on Fujii-san's side particularly, but it seems to me sequences of RDBSs are not rolled back generally. Some googling told me that at least Oracle (documented), MySQL, DB2 and MS-SQL server doesn't rewind sequences at rollback, that is, sequences are incremented independtly from transaction control. It seems common to think that two nextval() calls for the same sequence must not return the same value in any context. Yes, sequences are not rolled back on abort generally. That would require much stricter locking, and that'd go against using sequences in concurrent sessions. But we're not talking about sequence rollback - we're talking about data loss, caused by failure to flush WAL for a sequence. But that affects the *current* code too, and to much greater extent. Consider this: BEGIN; SELECT nextval('s') FROM generate_series(1,1000) s(i); ROLLBACK; -- or crash of a different backend BEGIN; SELECT nextval('s'); COMMIT; With the current code, this may easily lose the WAL, and we'll generate duplicate values from the sequence. We pretty much ignore the COMMIT. With the proposed change to WAL logging, that is not possible. The COMMIT flushes enough WAL to prevent this issue. So this actually makes this issue less severe. Maybe I'm missing some important detail, though. Can you show an example where the proposed changes make the issue worse? So it's not a bug that sync standby may return the same value as already returned in the primary because the corresponding WAL has not been replicated yet, isn't it? No, I don't think so. Once the COMMIT happens (and gets confirmed by the sync standby), it should be possible to failover to the sync replica without losing any data in committed transaction. Generating duplicate values is a clear violation of that. So, strictly speaking, that is a violation of the constraint I mentioned regardless whether the transaction is committed or not. However we have technical limitations as below. I don't follow. What violates what? If the transaction commits (and gets a confirmation from sync replica), the modified WAL logging prevents duplicate values. It does nothing for uncommitted transactions. Seems like an improvement to me. IMHO the fact that we allow a transaction to commit (even just locally) without flushing all the WAL it depends on is clearly a data loss bug. BTW, if the returned value is stored in database, the same value is guaranteed not to be returned again after the server crash or by sync standby. Because in that case the WAL of the transaction storing that value is flushed and replicated. True, assuming the table is WAL-logged etc. I agree the issue may be affecting a fairly small fraction of workloads, because most people use sequences to generate data for inserts etc. It seems to me, from the fact that sequences are designed explicitly untransactional and that behavior is widely adopted, the discussion might be missing some significant use-cases. But there's a possibility that the spec of sequence came from some technical limitation in the past, but I'm not sure.. No idea. IMHO from the correctness / behavior point of view, the modified logging is an improvement. The only issue is the additional overhead, and I think the cache addresses that quite well. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company