Re: row filtering for logical replication

2021-12-23 Thread Peter Smith
On Thu, Dec 23, 2021 at 7:23 PM Peter Smith  wrote:
>
> Here is the v54* patch set:
>
> Main changes from v53* are
> 1. All files of all three patches have been pgindented.
> 2. Another review comment is addressed
>
> ~~
>
> Details
> ===
>
> v51-0001 (main)
> - pgindent for all source files if this patch
>
> v51-0002 (new/old tuple)
> - pgindent for all source files of this patch
> - Merged the Euler v50-0005 (pgoutput.c) comments as suggested [Amit 21/12] #5
> - Also updated the commit message to include the Euler v50-0005 commit
> message text
>
> v51-0003 (tab, dump)
> - pgindent for all source files of this patch
>

Sorry for the confusing cut/.paste typos in the previous mail just
sent. Of course, the "details" should refer to v54* not v51*

~

Here is the v54* patch set:

Main changes from v53* are
1. All files of all three patches have been pgindented.
2. Another review comment is addressed

~~

Details
===

v54-0001 (main)
- pgindent for all source files if this patch

v54-0002 (new/old tuple)
- pgindent for all source files of this patch
- Merged the Euler v50-0005 (pgoutput.c) comments as suggested [Amit 21/12] #5
- Also updated the commit message to include the Euler v50-0005 commit
message text

v54-0003 (tab, dump)
- pgindent for all source files of this patch

--
[Amit 21/12] 
https://www.postgresql.org/message-id/CAA4eK1JgdhDnAvFV-eEWcqMmXYwo9kmCE1wA17xWGE621e8WDg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Add index scan progress to pg_stat_progress_vacuum

2021-12-23 Thread Masahiko Sawada
On Tue, Dec 21, 2021 at 3:37 AM Peter Geoghegan  wrote:
>
> On Wed, Dec 15, 2021 at 2:10 PM Bossart, Nathan  wrote:
> > nitpick: Shouldn't index_blks_scanned be index_blks_vacuumed?  IMO it
> > is more analogous to heap_blks_vacuumed.
>
> +1.
>
> > This will tell us which indexes are currently being vacuumed and the
> > current progress of those operations, but it doesn't tell us which
> > indexes have already been vacuumed or which ones are pending vacuum.
>
> VACUUM will process a table's indexes in pg_class OID order (outside
> of parallel VACUUM, I suppose). See comments about sort order above
> RelationGetIndexList().

Right.

>
> Anyway, it might be useful to add ordinal numbers to each index, that
> line up with this processing/OID order. It would also be reasonable to
> display the same number in log_autovacuum* (and VACUUM VERBOSE)
> per-index output, to reinforce the idea. Note that we don't
> necessarily display a distinct line for each distinct index in this
> log output, which is why including the ordinal number there makes
> sense.

An alternative idea would be to show the number of indexes on the
table and the number of indexes that have been processed in the
leader's entry of pg_stat_progress_vacuum. Even in parallel vacuum
cases, since we have index vacuum status for each index it would not
be hard for the leader process to count how many indexes have been
processed.

Regarding the details of the progress of index vacuum, I'm not sure
this progress information can fit for pg_stat_progress_vacuum. As
Peter already mentioned, the behavior quite varies depending on index
AM.

Regards,


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




RE: Failed transaction statistics to measure the logical replication progress

2021-12-23 Thread wangw.f...@fujitsu.com
On Wednesday, December 22, 2021 10:30 PM
osumi.takami...@fujitsu.com  wrote:
> On Wednesday, December 22, 2021 8:38 PM I wrote:
> > Do we expect these commit counts which come from empty transactions ?
> This is another issue discussed in [1]
> where the patch in the thread is a work in progress, I think.
> ..
> IMHO, the conclusion is we are currently in the middle of fixing the behavior.

Thank you for telling me this.
After applying v19-* and 
v15-0001-Skip-empty-transactions-for-logical-replication.patch, 
I retested v19-* patches. The result of previous case looks good to me.

But the results of following cases are also similar to previous unexpected 
result
which increases commit_count or abort_count unexpectedly.
[1]
(Based on environment in the previous example, set TWO_PHASE=true)
[Publisher]
begin;
insert into replica_test1 values(1,'1');
prepare transaction 'id';
commit prepared 'id';

In subscriber side, the commit_count of two records(sub1 and sub2) is increased.

[2]
(Based on environment in the previous example, set STREAMING=on)
[Publisher]
begin;
INSERT INTO replica_test1 SELECT i, md5(i::text) FROM generate_series(1, 5000) 
s(i);
commit;

In subscriber side, the commit_count of two records(sub1 and sub2) is increased.

[3]
(Based on environment in the previous example, set TWO_PHASE=true)
[Publisher]
begin;
insert into replica_test1 values(1,'1');
prepare transaction 'id';
rollback prepared 'id';

In subscriber side, the abort_count of two records(sub1 and sub2) is increased.

I think the problem maybe is the patch you mentioned
(Skip-empty-transactions-for-logical-replication.patch) is not finished yet.
Share this information here.


Regards,
Wang wei


Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2021-12-23 Thread Bharath Rupireddy
On Thu, Dec 23, 2021 at 5:53 AM SATYANARAYANA NARLAPURAM
 wrote:
>
> Hi Hackers,
>
> I am considering implementing RPO (recovery point objective) enforcement 
> feature for Postgres where the WAL writes on the primary are stalled when the 
> WAL distance between the primary and standby exceeds the configured 
> (replica_lag_in_bytes) threshold. This feature is useful particularly in the 
> disaster recovery setups where primary and standby are in different regions 
> and synchronous replication can't be set up for latency and performance 
> reasons yet requires some level of RPO enforcement.

+1 for the idea in general. However, blocking writes on primary seems
an extremely radical idea. The replicas can fall behind transiently at
times and blocking writes on the primary may stop applications failing
for these transient times. This is not a problem if the applications
have retry logic for the writes. How about blocking writes on primary
if the replicas fall behind the primary for a certain period of time?

> The idea here is to calculate the lag between the primary and the standby 
> (Async?) server during XLogInsert and block the caller until the lag is less 
> than the threshold value. We can calculate the max lag by iterating over 
> ReplicationSlotCtl->replication_slots.

The "falling behind" can also be quantified by the number of
write-transactions on the primary. I think it's good to have the users
choose what the "falling behind" means for them. We can have something
like the "recovery_target" param with different options name, xid,
time, lsn.

> If this is not something we don't want to do in the core, at least adding a 
> hook for XlogInsert is of great value.

IMHO, this feature may not be needed by everyone, the hook-way seems
reasonable so that the postgres vendors can provide different
implementations (for instance they can write an extension that
implements this hook which can block writes on primary, write some log
messages, inform some service layer of the replicas falling behind the
primary etc.). If we were to have the hook in XLogInsert which gets
called so frequently or XLogInsert is a hot-path, the hook really
should do as little work as possible, otherwise the write operations
latency may increase.

> A few other scenarios I can think of with the hook are:
>
> Enforcing RPO as described above
> Enforcing rate limit and slow throttling when sync standby is falling behind 
> (could be flush lag or replay lag)
> Transactional log rate governance - useful for cloud providers to provide SKU 
> sizes based on allowed WAL writes.
>
> Thoughts?

The hook can help to achieve the above objectives but where to place
it and what parameters it should take as input (or what info it should
emit out of the server via the hook) are important too.

Having said all, the RPO feature can also be implemented outside of
the postgres, a simple implementation could be - get the primary
current wal lsn using pg_current_wal_lsn and all the replicas
restart_lsn using pg_replication_slot, if they differ by certain
amount, then issue ALTER SYSTEM SET READ ONLY command [1] on the
primary, this requires the connections to the server and proper access
rights. This feature can also be implemented as an extension (without
the hook) which doesn't require any connections to the server yet can
access the required info primary current_wal_lsn, restart_lsn of the
replication slots etc, but the RPO enforcement may not be immediate as
the server doesn't have any hooks in XLogInsert or some other area.

[1] - 
https://www.postgresql.org/message-id/CAAJ_b967uKBiW6gbHr5aPzweURYjEGv333FHVHxvJmMhanwHXA%40mail.gmail.com

Regards,
Bharath Rupireddy.




Re: Add index scan progress to pg_stat_progress_vacuum

2021-12-23 Thread Andrey Lepikhov

On 21/12/2021 00:05, Peter Geoghegan wrote:

* Some index AMs don't work like nbtree and GiST in that they cannot
do their scan sequentially -- they have to do something like a
logical/keyspace order scan instead, which is *totally* different to
heapam (not just a bit different). There is no telling how many times
each page will be accessed in these other index AMs, and in what
order, even under optimal conditions. We should arguably not even try
to provide any granular progress information here, since it'll
probably be too messy.


Maybe we could add callbacks into AM interface for 
send/receive/representation implementation of progress?
So AM would define a set of parameters to send into stat collector and 
show to users.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Logical replication timeout problem

2021-12-23 Thread Amit Kapila
On Wed, Dec 22, 2021 at 8:50 PM Fabrice Chapuis  wrote:
>
> Hello Amit,
>
> I was able to reproduce the timeout problem in the lab.
> After loading more than 20 millions of rows in a table which is not 
> replicated (insert command ends without error), errors related to logical 
> replication processes appear in the postgres log.
> Approximately every 5 minutes worker process is restarted. The snap files in 
> the slot directory are still present. The replication system seems to be 
> blocked. Why these snap files are not removed. What do they contain?
>

These contain changes of insert. I think these are not removed for
your case as your long transaction is never finished. As mentioned
earlier, for such cases, it is better to use 'streaming' feature
released as part of PG-14 but anyway here we are trying to debug
timeout problem.

> I will recompile postgres with your patch to debug.
>

Okay, that might help.

-- 
With Regards,
Amit Kapila.




skip replication slot snapshot/map file removal during end-of-recovery checkpoint

2021-12-23 Thread Bharath Rupireddy
Hi,

Currently the end-of-recovery checkpoint can be much slower, impacting
the server availability, if there are many replication slot files
.snap or map- to be enumerated and deleted. How about skipping
the .snap and map- file handling during the end-of-recovery
checkpoint? It makes the server available faster and the next regular
checkpoint can deal with these files. If required, we can have a GUC
(skip_replication_slot_file_handling or some other better name) to
control this default being the existing behavior.

Thoughts?

Regards,
Bharath Rupireddy.




correct the sizes of values and nulls arrays in pg_control_checkpoint

2021-12-23 Thread Bharath Rupireddy
Hi,

pg_control_checkpoint emits 18 columns whereas the values and nulls
arrays are defined to be of size 19. Although it's not critical,
attaching a tiny patch to fix this.

diff --git a/src/backend/utils/misc/pg_controldata.c
b/src/backend/utils/misc/pg_controldata.c
index 209a20a882..b1db9a8d07 100644
--- a/src/backend/utils/misc/pg_controldata.c
+++ b/src/backend/utils/misc/pg_controldata.c
@@ -79,8 +79,8 @@ pg_control_system(PG_FUNCTION_ARGS)
 Datum
 pg_control_checkpoint(PG_FUNCTION_ARGS)
 {
-   Datum   values[19];
-   boolnulls[19];
+   Datum   values[18];
+   boolnulls[18];
TupleDesc   tupdesc;
HeapTuple   htup;
ControlFileData *ControlFile;

Regards,
Bharath Rupireddy.


v1-0001-correct-the-sizes-of-values-and-nulls-arrays-in-p.patch
Description: Binary data


Re: psql - add SHOW_ALL_RESULTS option

2021-12-23 Thread Fabien COELHO


Hello Peter,

I finally took some time to look at this.


Attached v11 is a rebase.


This patch still has a few of the problems reported earlier this year.

In [0], it was reported that certain replication commands result in infinite 
loops because of faulty error handling.  This still happens.  I wrote a test 
for it, attached here.  (I threw in a few more basic tests, just to have some 
more coverage that was lacking, and to have a file to put the new test in.)


Hmmm… For some unclear reason on errors on a PGRES_COPY_* state 
PQgetResult keeps on returning an empty result. PQexec manually ignores 
it, so I did the same with a comment, but for me the real bug is somehow 
in PQgetResult behavior…


In [1], it was reported that server crashes produce duplicate error messages. 
This also still happens.  I didn't write a test for it, maybe you have an 
idea.  (Obviously, we could check whether the error message is literally 
there twice in the output, but that doesn't seem very general.)  But it's 
easy to test manually: just have psql connect, shut down the server, then run 
a query.


This is also a feature/bug of libpq which happens to be hidden by PQexec: 
when one command crashes PQgetResult actually returns *2* results. First 
one with the FATAL message, second one when libpq figures out that the 
connection was lost with the second message appended to the first. PQexec 
just happen to silently ignore the first result. I added a manual reset of 
the error message when first shown so that it is not shown twice. It is 
unclear to me whether the reset should be somewhere in libpq instead. I 
added a voluntary crash at the end of the psql test.


Attached v12 somehow fixes these issues in "psql" code rather than in 
libpq.


--
Fabien.diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index e0abe34bb6..8f7f93172a 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -50,8 +50,28 @@ BEGIN \;
 SELECT 2.0 AS "float" \;
 SELECT 'world' AS "text" \;
 COMMIT;
+ float 
+---
+   2.0
+(1 row)
+
+ text  
+---
+ world
+(1 row)
+
 -- compound with empty statements and spurious leading spacing
 \;\;   SELECT 3 + 3 \;\;\;   SELECT ' ' || ' !' \;\;   SELECT 1 + 4 \;;
+ ?column? 
+--
+6
+(1 row)
+
+ ?column? 
+--
+   !
+(1 row)
+
  ?column? 
 --
 5
@@ -61,6 +81,11 @@ COMMIT;
 SELECT 1 + 1 + 1 AS "add" \gset
 SELECT :add + 1 + 1 AS "add" \;
 SELECT :add + 1 + 1 AS "add" \gset
+ add 
+-
+   5
+(1 row)
+
 -- set operator
 SELECT 1 AS i UNION SELECT 2 ORDER BY i;
  i 
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index ae38d3dcc3..1d411ae124 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -127,18 +127,11 @@ echo '\x \\ SELECT * FROM foo;' | psql
commands included in the string to divide it into multiple
transactions.  (See 
for more details about how the server handles multi-query strings.)
-   Also, psql only prints the
-   result of the last SQL command in the string.
-   This is different from the behavior when the same string is read from
-   a file or fed to psql's standard input,
-   because then psql sends
-   each SQL command separately.
   
   
-   Because of this behavior, putting more than one SQL command in a
-   single -c string often has unexpected results.
-   It's better to use repeated -c commands or feed
-   multiple commands to psql's standard input,
+   If having several commands executed in one transaction is not desired, 
+   use repeated -c commands or feed multiple commands to
+   psql's standard input,
either using echo as illustrated above, or
via a shell here-document, for example:
 
@@ -3564,10 +3557,6 @@ select 1\; select 2\; select 3;
 commands included in the string to divide it into multiple
 transactions.  (See 
 for more details about how the server handles multi-query strings.)
-psql prints only the last query result
-it receives for each request; in this example, although all
-three SELECTs are indeed executed, psql
-only prints the 3.
 
 
   
@@ -4154,6 +4143,18 @@ bar
   
 
   
+SHOW_ALL_RESULTS
+
+
+When this variable is set to off, only the last
+result of a combined query (\;) is shown instead of
+all of them.  The default is on.  The off behavior
+is for compatibility with older versions of psql.
+
+
+  
+
+   
 SHOW_CONTEXT
 
 
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index ec975c3e2a..e06699878b 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -33,6 +33,8 @@ static bool DescribeQue

回复:回复:Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?

2021-12-23 Thread 曾文旌(义从)

Fixed a bug found during testing.


Wenjing



 --原始邮件 --
发件人:曾文旌(义从) 
发送时间:Sun Dec 12 20:51:08 2021
收件人:Zhihong Yu 
抄送:Tomas Vondra , wjzeng , 
PostgreSQL Hackers , shawn wang 
, ggys...@gmail.com 
主题:回复:Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?




 --原始邮件 --
发件人:Zhihong Yu 
发送时间:Sun Dec 12 01:13:11 2021
收件人:曾文旌(义从) 
抄送:Tomas Vondra , wjzeng , 
PostgreSQL Hackers , shawn wang 
, ggys...@gmail.com 
主题:Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?



On Sat, Dec 11, 2021 at 7:31 AM 曾文旌(义从)  wrote:




 --原始邮件 --
发件人:Tomas Vondra 
发送时间:Wed Dec 8 11:26:35 2021
收件人:曾文旌(义从) , shawn wang 
, ggys...@gmail.com , PostgreSQL 
Hackers 
抄送:wjzeng 
主题:Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?

Hi,

On 12/7/21 10:44, 曾文旌(义从) wrote:
> Hi Hackers
> 
> For my previous proposal, I developed a prototype and passed
> regression testing. It works similarly to subquery's qual pushdown.
> We know that sublink expands at the beginning of each level of
> query. At this stage, The query's conditions and equivalence classes
> are not processed. But after generate_base_implied_equalities the
> conditions are processed,  which is why qual can push down to 
> subquery but sublink not.
> 
> My POC implementation chose to delay the sublink expansion in the
> SELECT clause (targetList) and where clause. Specifically, it is
> delayed after generate_base_implied_equalities. Thus, the equivalent
> conditions already established in the Up level query can be easily
> obtained in the sublink expansion process (make_subplan).
> 
> For example, if the up level query has a.id = 10 and the sublink
> query has a.id = b.id, then we get b.id = 10 and push it down to the
> sublink quey. If b is a partitioned table and is partitioned by id,
> then a large number of unrelated subpartitions are pruned out, This
> optimizes a significant amount of Planner and SQL execution time, 
> especially if the partitioned table has a large number of
> subpartitions and is what I want.
> 
> Currently, There were two SQL failures in the regression test,
> because the expansion order of sublink was changed, which did not
> affect the execution result of SQL.
> 
> Look forward to your suggestions on this proposal.
> 

I took a quick look, and while I don't see / can't think of any problems
with delaying it until after generating implied equalities, there seems
to be a number of gaps.

Thank you for your attention.

1) Are there any regression tests exercising this modified behavior?
Maybe there are, but if the only changes are due to change in order of
targetlist entries, that doesn't seem like a clear proof.

It'd be good to add a couple tests exercising both the positive and
negative case (i.e. when we can and can't pushdown a qual).

I added several samples to the regress(qual_pushdown_to_sublink.sql). 
and I used the partition table to show the plan status of qual being pushed 
down into sublink.
Hopefully this will help you understand the details of this patch. Later, I 
will add more cases.
2) apparently, contrib/postgres_fdw does crash like this:

  #3  0x0077b412 in adjust_appendrel_attrs_mutator
  (node=0x13f7ea0, context=0x7fffc3351b30) at appendinfo.c:470
  470  Assert(!IsA(node, SubLink));
  (gdb) p node
  $1 = (Node *) 0x13f7ea0
  (gdb) p *node
  $2 = {type = T_SubLink}

  Backtrace attached.

For the patch attached in the last email, I passed all the tests under 
src/test/regress.
As you pointed out, there was a problem with regression under contrib(in 
contrib/postgres_fdw). 
This time I fixed it and the current patch (V2) can pass the check-world.


3) various parts of the patch really need at least some comments, like:

  - try_push_outer_qual_to_sublink_query really needs some docs

  - new stuff at the end of initsplan.c

Ok, I added some comments and will add more. If you have questions about any 
details,
please point them out directly.

4) generate_base_implied_equalities

   shouldn't this

if (ec->ec_processed)
;

   really be?

if (ec->ec_processed)
continue;

You are right. I fixed it.

5) I'm not sure why we need the new ec_processed flag.

I did this to eliminate duplicate equalities from the two 
generate_base_implied_equalities calls
1) I need the base equivalent expression generated after 
generate_base_implied_equalities,
which is used to pushdown qual to sublink(lazy_process_sublinks)
2) The expansion of sublink may result in an equivalent expression with 
parameters, such as a = $1,
which needs to deal with the equivalence classes again.
3) So, I added ec_processed and asked to process it again 
(generate_base_implied_equalities)
after the equivalence class changed (add_eq_member/process_equivalence).

Maybe you have a better suggestion, please let me know.

6) So we now have lazy_process_sublink callback? Does that mea

Re: more descriptive message for process termination due to max_slot_wal_keep_size

2021-12-23 Thread Ashutosh Bapat
On Wed, Dec 15, 2021 at 9:42 AM Kyotaro Horiguchi
 wrote:
>
> At Tue, 14 Dec 2021 19:31:21 +0530, Ashutosh Bapat 
>  wrote in
> > On Tue, Dec 14, 2021 at 9:35 AM Kyotaro Horiguchi
> >  wrote:
> > > > [17605] LOG:  terminating process 17614 to release replication slot "s1"
> > > + [17605] DETAIL:  The slot's restart_lsn 0/2CA0 exceeds 
> > > max_slot_wal_keep_size.
> > > > [17614] FATAL:  terminating connection due to administrator command
> > > > [17605] LOG:  invalidating slot "s1" because its restart_lsn 0/2CA0 
> > > > exceeds max_slot_wal_keep_size
> > >
> > > Somewhat the second and fourth lines look inconsistent each other but
> > > that wouldn't be such a problem.  I don't think we want to concatenate
> > > the two lines together as the result is a bit too long.
> > >
> > > > LOG:  terminating process 17614 to release replication slot "s1" 
> > > > because it's restart_lsn 0/2CA0 exceeds max_slot_wal_keep_size.
> > >
> > > What do you think about this?
> >
> > Agree. I think we should also specify the restart_lsn value which
> > would be within max_slot_wal_keep_size for better understanding.
>
> Thanks!  It seems to me the main message of the "invalidating" log has
> no room for further detail.  So I split the reason out to DETAILS line
> the same way with the "terminating" message in the attached second
> patch. (It is separated from the first patch just for review) I
> believe someone can make the DETAIL message simpler or more natural.
>
> The attached patch set emits the following message.
>
> > LOG:  invalidating slot "s1"
> > DETAIL:  The slot's restart_lsn 0/1D68 is behind the limit 0/1100 
> > defined by max_slot_wal_keep_size.
>
> The second line could be changed like the following or anything other.
>
> > DETAIL:  The slot's restart_lsn 0/1D68 got behind the limit 0/1100 
> > determined by max_slot_wal_keep_size.
> .
>

The second version looks better as it gives more details. I am fine
with either of the above wordings.

I would prefer everything in the same message though since
"invalidating slot ..." is too short a LOG message. Not everybody
enabled details always.



-- 
Best Wishes,
Ashutosh Bapat




pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary

2021-12-23 Thread Bharath Rupireddy
Hi,

pg_archivecleanup currently takes a WAL file name as input to delete
the WAL files prior to it [1]. As suggested by Satya (cc-ed) in
pg_replslotdata thread [2], can we enhance the pg_archivecleanup to
automatically detect the last checkpoint (from control file) LSN,
calculate the lowest restart_lsn required by the replication slots, if
any (by reading the replication slot info from pg_logical directory),
archive the unneeded (an archive_command similar to that of the one
provided in the server config can be provided as an input) WAL files
before finally deleting them? Making pg_archivecleanup tool as an
end-to-end solution will help greatly in disk full situations because
of WAL files growth (inactive replication slots, archive command
failures, infrequent checkpoint etc.).

Thoughts?

[1] - When used as a standalone program all WAL files logically
preceding the oldestkeptwalfile will be removed from archivelocation.
https://www.postgresql.org/docs/devel/pgarchivecleanup.html
[2] - 
https://www.postgresql.org/message-id/CAHg%2BQDc9xwN7EmuONT3T91pCqFG6Q-BCe6B-kM-by7r1uPEicg%40mail.gmail.com

Regards,
Bharath Rupireddy.




Re: Multi-Column List Partitioning

2021-12-23 Thread Amul Sul
On Tue, Dec 21, 2021 at 6:34 PM Nitin Jadhav
 wrote:
>
> ---
>
> > +   if (isnulls && isnulls[i])
> > +   cmpval = 0; /* NULL "=" NULL */
> > +   else
> > +   cmpval = 1; /* NULL ">" not-NULL */
> > +   }
> > +   else if (isnulls && isnulls[i])
> > +   cmpval = -1;/* not-NULL "<" NULL */
> >
> > I really doubt this assumption is correct; aren't those strict operators?
>
> Now there are possibilities of multiple NULL values. We should have a
> mechanism to sort it when the bound values contain Non NULL and NULL
> values. As per the above logic we put the NULL values at the end.
> Please let me know if I am wrong.

Ok, but I am not sure about the comparison approach, let's see what
others think.

> ---
[...]
>
> > typedef struct PartitionBoundInfoData
> > {
> >charstrategy;   /* hash, list or range? */
> > +   int partnatts;  /* number of partition key columns */
> >int ndatums;/* Length of the datums[] array */
> >Datum **datums;
> > +   bool  **isnulls;
> >
> > Adding "partnatts" to this struct seems to be unnecessary, AFAIUC,
> > added that for partition_bound_accepts_nulls(), but we can easily get
> > that value from the partitioning key & pass an additional argument.
> > Also, no information about the length of the "isnulls" array.
>
> This is required during merge_list_bounds(). AFAIK partition key
> information is not available here.
>

You can get that as an argument, see merge_range_bounds().

> > I think it would be helpful if you could split the patch: one for
> > multi-value list partitioning and another for the partition wise join, 
> > thanks.
>
> I have split the patch into 2 patches. One is for the multi column
> list partitioning core changes and the other is for partition-wise
> join support. Each patch has its respective test cases in the
> regression suit and regression tests run successfully on each patch.
> Kindly let me know if any other changes are required here.
>

Thanks, for the slit that is much helpful, I have a few comments for
the 0001 patch as follow:

+ char   **colname = (char **) palloc0(partnatts * sizeof(char *));

palloc0 is unnecessary.
---

+ foreach(cell2, rowexpr->args)
+ {
+ int idx = foreach_current_index(cell2);
+ Node*expr = lfirst(cell2);
+ Const*val =
+ transformPartitionBoundValue(pstate, expr, colname[i],
+ get_partition_col_typid(key, idx),
+ get_partition_col_typmod(key, idx),
+ get_partition_col_collation(key, idx));
+
+ values = lappend(values, val);
+ }

Array index for colname should be "idx".
---

result->scan_default = partition_bound_has_default(boundinfo);
+
return result;
...

/* Always include the default partition if any. */
result->scan_default = partition_bound_has_default(boundinfo);
-
return result;

...
else
result->scan_default = partition_bound_has_default(boundinfo);
+
return result;
...

-   /* Add columns specified to SET NULL or SET DEFAULT if
provided. */
+   /*
+* Add columns specified to SET NULL or SET DEFAULT if
+* provided.
+*/

spurious change -- look like something not related to your patch.
--

-* For range partitioning, we must only perform pruning with values
-* for either all partition keys or a prefix thereof.
+* For range partitioning and list partitioning, we must only perform
+* pruning with values for either all partition keys or a prefix
+* thereof.
 */
-   if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
+   if (keyno > nvalues && (context->strategy == PARTITION_STRATEGY_RANGE ||
+   context->strategy == PARTITION_STRATEGY_LIST))
break;

I think this is not true for multi-value list partitions, we might
still want prune partitions for e.g. (100, IS NULL, 20).  Correct me
if I am missing something here.
---

/*
-* For range partitioning, if we have no clauses for the current key,
-* we can't consider any later keys either, so we can stop here.
+* For range partitioning and list partitioning, if we have no clauses
+* for the current key, we can't consider any later keys either, so we
+* can stop here.
 */
-   if (part_scheme->strategy == PARTITION_STRATEGY_RANGE &&
+   if ((part_scheme->strategy == PARTITION_STRATEGY_RANGE ||
+part_scheme->strategy == PARTITION_STRATEGY_LIST) &&
clauselist == NIL)
break

Similarly, why would this be true for list partitioning? How can we
prune partitions if values is for e.g. (100, , 20).
--

-   if (bms_is_member(keyno, opstep->nullkeys))
+   if (bms_is_member(keyno, opstep->nullkeys) &&
+   context->strategy != PARTITION_STRATEGY_LIST)
continue;
Will that p

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2021-12-23 Thread Fabien COELHO



Hello Justin,

It seems that the v31 patch does not apply anymore:

  postgresql> git apply 
~/v31-0001-Document-historic-behavior-of-links-to-directori.patch
  error: patch failed: doc/src/sgml/func.sgml:27410
  error: doc/src/sgml/func.sgml: patch does not apply

--
Fabien.




Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2021-12-23 Thread Ashutosh Bapat
On Thu, Dec 23, 2021 at 5:53 AM SATYANARAYANA NARLAPURAM
 wrote:
>
> Hi Hackers,
>
> I am considering implementing RPO (recovery point objective) enforcement 
> feature for Postgres where the WAL writes on the primary are stalled when the 
> WAL distance between the primary and standby exceeds the configured 
> (replica_lag_in_bytes) threshold. This feature is useful particularly in the 
> disaster recovery setups where primary and standby are in different regions 
> and synchronous replication can't be set up for latency and performance 
> reasons yet requires some level of RPO enforcement.

Limiting transaction rate when the standby fails behind is a good feature ...

>
> The idea here is to calculate the lag between the primary and the standby 
> (Async?) server during XLogInsert and block the caller until the lag is less 
> than the threshold value. We can calculate the max lag by iterating over 
> ReplicationSlotCtl->replication_slots. If this is not something we don't want 
> to do in the core, at least adding a hook for XlogInsert is of great value.

but doing it in XLogInsert does not seem to be a good idea. It's a
common point for all kinds of logging including VACUUM. We could
accidently stall a critical VACUUM operation because of that.

As Bharath described, it better be handled at the application level monitoring.

-- 
Best Wishes,
Ashutosh Bapat




Re: Buildfarm support for older versions

2021-12-23 Thread Andrew Dunstan


On 12/22/21 23:20, Larry Rosenman wrote:
> On 12/22/2021 10:15 pm, Tom Lane wrote:
>> Larry Rosenman  writes:
>>> On 12/22/2021 9:59 pm, Tom Lane wrote:
 Does it work if you drop --enable-nls?  (It'd likely be worth fixing
 if so, but I'm trying to narrow the possible causes.)
>>
>>> Nope...
>>
>> OK.  Since 9.3 succeeds, it seems like it's a link problem
>> we fixed at some point.  Can you bisect to find where we
>> fixed it?
>>
>>     regards, tom lane
>
> I can try -- I haven't been very good at that.
>
> I can give you access to the machine and the id the Buildfarm runs under.
>
> (or give me a good process starting from a buildfarm layout).
>
>

I will work on it on my FBSD setup.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Allow escape in application_name

2021-12-23 Thread Fujii Masao




On 2021/12/17 16:50, Kyotaro Horiguchi wrote:

Thus rewriting the code we're focusing on like the following would
make sense to me.


if (strcmp(keywords[i], "application_name") == 0)
{
values[i]  = process_pgfdw_appname(values[i]);

/*
 * Break if we have a non-empty string. If we end up failing 
with
 * all candidates, fallback_application_name would work.
 */
if (appanme[0] != '\0')
break;
}   


I'm ok to remove the check "values[i] != NULL", but think that it's better to keep the 
other check "*(values[i]) != '\0'" as it is. Because *(values[i]) can be null character 
and it's a waste of cycles to call process_pgfdw_appname() in that case.


Thanks for revisiting.


#1. use "[unknown]"
#2. add the check but not use "[unknown]"
#3. don't add the check (i.e., what the current patch does)

For now, I'm ok to choose #2 or #3.


As I said before, given that we don't show "unkown" or somethig like
as the fallback, I'm fine with not having a NULL check since anyway it
bumps into SEGV immediately.  In short I'm fine with #3 here.


Yep, let's use #3 approach.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: sequences vs. synchronous replication

2021-12-23 Thread Fujii Masao




On 2021/12/23 3:49, Tomas Vondra wrote:

Attached is a patch tweaking WAL logging - in wal_level=minimal we do the same 
thing as now, in higher levels we log every sequence fetch.


Thanks for the patch!

With the patch, I found that the regression test for sequences failed.

+   fetch = log = fetch;

This should be "log = fetch"?

On second thought, originally a sequence doesn't guarantee that the value 
already returned by nextval() will never be returned by subsequent nextval() 
after the server crash recovery. That is, nextval() may return the same value 
across crash recovery. Is this understanding right? For example, this case can 
happen if the server crashes after nextval() returned the value but before WAL 
for the sequence was flushed to the permanent storage. So it's not a bug that 
sync standby may return the same value as already returned in the primary 
because the corresponding WAL has not been replicated yet, isn't it?

BTW, if the returned value is stored in database, the same value is guaranteed 
not to be returned again after the server crash or by sync standby. Because in 
that case the WAL of the transaction storing that value is flushed and 
replicated.


So I think this makes it acceptable / manageable. Of course, this means the 
values are much less monotonous (across backends), but I don't think we really 
promised that. And I doubt anyone is really using sequences like this (just 
nextval) in performance critical use cases.


I think that this approach is not acceptable to some users. So, if we actually 
adopt WAL-logging every sequence fetch, also how about exposing SEQ_LOG_VALS as 
reloption for a sequence? If so, those who want to log every sequence fetch can 
set this SEQ_LOG_VALS reloption to 0. OTOH, those who prefer the current 
behavior in spite of the risk we're discussing at this thread can set the 
reloption to 32 like it is for now, for example.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary

2021-12-23 Thread Euler Taveira
On Thu, Dec 23, 2021, at 9:58 AM, Bharath Rupireddy wrote:
> pg_archivecleanup currently takes a WAL file name as input to delete
> the WAL files prior to it [1]. As suggested by Satya (cc-ed) in
> pg_replslotdata thread [2], can we enhance the pg_archivecleanup to
> automatically detect the last checkpoint (from control file) LSN,
> calculate the lowest restart_lsn required by the replication slots, if
> any (by reading the replication slot info from pg_logical directory),
> archive the unneeded (an archive_command similar to that of the one
> provided in the server config can be provided as an input) WAL files
> before finally deleting them? Making pg_archivecleanup tool as an
> end-to-end solution will help greatly in disk full situations because
> of WAL files growth (inactive replication slots, archive command
> failures, infrequent checkpoint etc.).
pg_archivecleanup is a tool to remove WAL files from the *archive*. Are you
suggesting to use it for removing files from pg_wal directory too? No, thanks.
WAL files are a key component for backup and replication. Hence, you cannot
deliberately allow a tool to remove WAL files from PGDATA. IMO this issue
wouldn't occur if you have a monitoring system and alerts and someone to keep
an eye on it. If the disk full situation was caused by a failed archive command
or a disconnected standby, it is easy to figure out; the fix is simple.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Add new function to convert 32-bit XID to 64-bit

2021-12-23 Thread Fujii Masao

Hi,

When periodically collecting and accumulating statistics or status information 
like pg_locks, pg_stat_activity, pg_prepared_xacts, etc for future 
troubleshooting or some reasons, I'd like to store a transaction ID of such 
information as 64-bit version so that the information of specified transaction 
easily can be identified and picked up by transaction ID. Otherwise it's not 
easy to distinguish transactions with the same 32-bit XID but different epoch, 
only by 32-bit XID.

But since pg_locks or pg_stat_activity etc returns 32-bit XID, we could not store it as 
64-bit version. To improve this situation, I'd like to propose to add new function that 
converts the given 32-bit XID (i.e., xid data type) to 64-bit one (xid8 data type). If we 
do this, for example we can easily get 64-bit XID from pg_stat_activity by "SELECT 
convert_xid_to_xid8(backend_xid) FROM pg_stat_activity", if necessary. Thought?

As another approach, we can change the data type of 
pg_stat_activity.backend_xid or pg_locks.transaction, etc from xid to xid8. But 
this idea looks overkill to me, and it may break the existing applications 
accessing pg_locks etc. So IMO it's better to just provide the convertion 
function.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Add checkpoint and redo LSN to LogCheckpointEnd log message

2021-12-23 Thread Bharath Rupireddy
Hi,

It is useful (for debugging purposes) if the checkpoint end message
has the checkpoint LSN and REDO LSN [1]. It gives more context while
analyzing checkpoint-related issues. The pg_controldata gives the last
checkpoint LSN and REDO LSN, but having this info alongside the log
message helps analyze issues that happened previously, connect the
dots and identify the root cause.

Attaching a small patch herewith. Thoughts?

[1]
2021-12-23 14:58:54.694 UTC [1965649] LOG:  checkpoint starting:
shutdown immediate
2021-12-23 14:58:54.714 UTC [1965649] LOG:  checkpoint complete: wrote
0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.001 s, sync=0.001 s, total=0.025 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB;
LSN=0/14D0AD0, REDO LSN=0/14D0AD0

Regards,
Bharath Rupireddy.


v1-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch
Description: Binary data


Re: Are datcollate/datctype always libc even under --with-icu ?

2021-12-23 Thread Daniel Verite
Chapman Flack wrote:

> Next question: the "currently" in that comment suggests that could change,
> but is there any present intention to change it, or is this likely to just
> be the way it is for the foreseeable future?

Some related patches and discussions:

* ICU as default collation provider
https://commitfest.postgresql.org/21/1543/

* ICU for global collation
https://commitfest.postgresql.org/25/2256/


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Be clear about what log_checkpoints emits in the documentation

2021-12-23 Thread Bharath Rupireddy
Hi,

Currently the documentation of the log_checkpoints GUC says the following:

Some statistics are included in the log messages, including the number
of buffers written and the time spent writing them.

Usage of the word "Some" makes it a vague statement. Why can't we just
be clear about what statistics the log_checkpoints GUC can emit, like
the attached patch?

Thoughts?

Regards,
Bharath Rupireddy.


v1-0001-Be-clear-about-what-log_checkpoints-emits-in-the-.patch
Description: Binary data


Re: correct the sizes of values and nulls arrays in pg_control_checkpoint

2021-12-23 Thread Euler Taveira
On Thu, Dec 23, 2021, at 8:39 AM, Bharath Rupireddy wrote:
> pg_control_checkpoint emits 18 columns whereas the values and nulls
> arrays are defined to be of size 19. Although it's not critical,
> attaching a tiny patch to fix this.
Good catch! I'm wondering if a constant wouldn't be useful for such case.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: correct the sizes of values and nulls arrays in pg_control_checkpoint

2021-12-23 Thread Bharath Rupireddy
On Thu, Dec 23, 2021 at 9:13 PM Euler Taveira  wrote:
>
> On Thu, Dec 23, 2021, at 8:39 AM, Bharath Rupireddy wrote:
>
> pg_control_checkpoint emits 18 columns whereas the values and nulls
> arrays are defined to be of size 19. Although it's not critical,
> attaching a tiny patch to fix this.
>
> Good catch! I'm wondering if a constant wouldn't be useful for such case.

Thanks. I thought of having a macro, but it creates a lot of diff with
the previous versions as we have to change for other pg_control_XXX
functions.

Regards,
Bharath Rupireddy.




Re: Delay the variable initialization in get_rel_sync_entry

2021-12-23 Thread Euler Taveira
On Wed, Dec 22, 2021, at 10:11 AM, houzj.f...@fujitsu.com wrote:
> When reviewing some logical replication patches. I noticed that in
> function get_rel_sync_entry() we always invoke get_rel_relispartition() 
> and get_rel_relkind() at the beginning which could cause unnecessary
> cache access.
> 
> ---
> get_rel_sync_entry(PGOutputData *data, Oid relid)
> {
> RelationSyncEntry *entry;
> bool am_partition = get_rel_relispartition(relid);
> char relkind = get_rel_relkind(relid);
> ---
> 
> The extra cost could sometimes be noticeable because get_rel_sync_entry is a
> hot function which is executed for each change. And the 'am_partition' and
> 'relkind' are necessary only when we need to rebuild the RelationSyncEntry.
> 
> Here is the perf result for the case when inserted large amounts of data into 
> a
> un-published table in which case the cost is noticeable.
> 
> --12.83%--pgoutput_change
> |--11.84%--get_rel_sync_entry
> |--4.76%--get_rel_relispartition
> |--4.70%--get_rel_relkind
Good catch. WFM. Deferring variable initialization close to its first use is
good practice.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Buildfarm support for older versions

2021-12-23 Thread Andrew Dunstan


On 12/23/21 08:50, Andrew Dunstan wrote:
> On 12/22/21 23:20, Larry Rosenman wrote:
>> On 12/22/2021 10:15 pm, Tom Lane wrote:
>>> Larry Rosenman  writes:
 On 12/22/2021 9:59 pm, Tom Lane wrote:
> Does it work if you drop --enable-nls?  (It'd likely be worth fixing
> if so, but I'm trying to narrow the possible causes.)
 Nope...
>>> OK.  Since 9.3 succeeds, it seems like it's a link problem
>>> we fixed at some point.  Can you bisect to find where we
>>> fixed it?
>>>
>>>     regards, tom lane
>> I can try -- I haven't been very good at that.
>>
>> I can give you access to the machine and the id the Buildfarm runs under.
>>
>> (or give me a good process starting from a buildfarm layout).
>>
>>
> I will work on it on my FBSD setup.
>
>

For the 9.2 error, try setting this in the config_env stanza:


    CFLAGS => '-O2 -fPIC',


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Buildfarm support for older versions

2021-12-23 Thread Larry Rosenman

On 12/23/2021 10:13 am, Andrew Dunstan wrote:

On 12/23/21 08:50, Andrew Dunstan wrote:

On 12/22/21 23:20, Larry Rosenman wrote:

On 12/22/2021 10:15 pm, Tom Lane wrote:

Larry Rosenman  writes:

On 12/22/2021 9:59 pm, Tom Lane wrote:
Does it work if you drop --enable-nls?  (It'd likely be worth 
fixing

if so, but I'm trying to narrow the possible causes.)

Nope...

OK.  Since 9.3 succeeds, it seems like it's a link problem
we fixed at some point.  Can you bisect to find where we
fixed it?

    regards, tom lane

I can try -- I haven't been very good at that.

I can give you access to the machine and the id the Buildfarm runs 
under.


(or give me a good process starting from a buildfarm layout).



I will work on it on my FBSD setup.




For the 9.2 error, try setting this in the config_env stanza:


    CFLAGS => '-O2 -fPIC',


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com


That got us further, but it dies on startdb:
$ cat startdb-C-1.log
waiting for server to start stopped waiting
pg_ctl: could not start server
Examine the log output.
=== db log file ==
LOG:  unrecognized configuration parameter "unix_socket_directories" in 
file 
"/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/data-C/postgresql.conf" 
line 576
FATAL:  configuration file 
"/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/data-C/postgresql.conf" 
contains errors

$

And we have the errors on the other branches with a temp(?) directory.
--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106




Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2021-12-23 Thread Mark Dilger



> On Dec 22, 2021, at 12:01 AM, Pavel Borisov  wrote:
> 
> Thank you, Mark!
> 
> In v6 (PFA) I've made the changes on your advice i.e.
> 
> - pg_amcheck with --checkunique option will ignore uniqueness check (with a 
> warning) if amcheck version in a db is <1.4 and doesn't support the feature.

Ok.

+   int vmaj = 0,
+   vmin = 0,
+   vrev = 0;
+   const char *amcheck_version = pstrdup(PQgetvalue(result, 0, 1));
+
+   sscanf(amcheck_version, "%d.%d.%d", &vmaj, &vmin, &vrev);

The pstrdup is unnecessary but harmless.

> - fixed unnecessary drop table in regression

Ok.

> - use the existing table for uniqueness check in 005_opclass_damage.pl

It appears you still create a new table, bttest_unique, rather than using the 
existing table int4tbl.  That's fine.

> - added tests into 003_check.pl . It is only smoke test that just verifies 
> new functions.

+
+$node->command_checks_all(
+   [
+   @cmd, '-s', 's1', '-i', 't1_btree', '--parent-check',
+   '--checkunique', 'db1'
+   ],
+   2,
+   [$index_missing_relation_fork_re],
+   [$no_output_re],
+   'pg_amcheck smoke test --parent-check');
+
+$node->command_checks_all(
+   [
+   @cmd, '-s', 's1', '-i', 't1_btree', '--heapallindexed',
+   '--rootdescend', '--checkunique',  'db1'
+   ],
+   2,
+   [$index_missing_relation_fork_re],
+   [$no_output_re],
+   'pg_amcheck smoke test --heapallindexed --rootdescend');
+
+$node->command_checks_all(
+   [ @cmd, '--checkunique', '-d', 'db1', '-d', 'db2', '-d', 'db3', '-S', 's*' 
],
+   0, [$no_output_re], [$no_output_re],
+   'pg_amcheck excluding all corrupt schemas');
+

You have borrowed the existing tests but forgot to change their names.  (The 
last line of each check is the test name, such as 'pg_amcheck smoke test 
--parent-check'.)  Please make each test name unique.

> - added test contrib/amcheck/t/004_verify_nbtree_unique.pl it is more 
> extensive test based on opclass damage which was intended to be main test for 
> amcheck, but which I've forgotten to add to commit in v5.
> 005_opclass_damage.pl test, which you've seen in v5 is largely based on first 
> part of 004_verify_nbtree_unique.pl (with the later calling pg_amcheck, and 
> the former calling bt_index_check(), bt_index_parent_check() )

Ok.

> - added forgotten upgrade script amcheck--1.3--1.4.sql (from v4)

Ok.


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2021-12-23 Thread SATYANARAYANA NARLAPURAM
On Thu, Dec 23, 2021 at 5:18 AM Ashutosh Bapat 
wrote:

> On Thu, Dec 23, 2021 at 5:53 AM SATYANARAYANA NARLAPURAM
>  wrote:
> >
> > Hi Hackers,
> >
> > I am considering implementing RPO (recovery point objective) enforcement
> feature for Postgres where the WAL writes on the primary are stalled when
> the WAL distance between the primary and standby exceeds the configured
> (replica_lag_in_bytes) threshold. This feature is useful particularly in
> the disaster recovery setups where primary and standby are in different
> regions and synchronous replication can't be set up for latency and
> performance reasons yet requires some level of RPO enforcement.
>
> Limiting transaction rate when the standby fails behind is a good feature
> ...
>
> >
> > The idea here is to calculate the lag between the primary and the
> standby (Async?) server during XLogInsert and block the caller until the
> lag is less than the threshold value. We can calculate the max lag by
> iterating over ReplicationSlotCtl->replication_slots. If this is not
> something we don't want to do in the core, at least adding a hook for
> XlogInsert is of great value.
>
> but doing it in XLogInsert does not seem to be a good idea.


XLogInsert isn't the best place to throttle/govern in a simple and fair
way, particularly the long-running transactions on the server?


> It's a
> common point for all kinds of logging including VACUUM. We could
> accidently stall a critical VACUUM operation because of that.
>

Agreed, but again this is a policy decision that DBA can relax/enforce. I
expect RPO is in the range of a few 100MBs to GBs and on a healthy system
typically lag never comes close to this value. The Hook implementation can
take care of nitty-gritty details on the policy enforcement based on the
needs, for example, not throttling some backend processes like vacuum,
checkpointer; throttling based on the roles, for example not to throttle
superuser connections; and throttling based on replay lag, write lag,
checkpoint taking longer, closer to disk full. Each of these can be easily
translated into GUCs. Depending on the direction of the thread on the hook
vs a feature in the Core, I can add more implementation details.



> As Bharath described, it better be handled at the application level
> monitoring.
>

Both RPO based WAL throttling and application level monitoring can co-exist
as each one has its own merits and challenges. Each application developer
has to implement their own throttling logic and often times it is hard to
get it right.


> --
> Best Wishes,
> Ashutosh Bapat
>


Re: Buildfarm support for older versions

2021-12-23 Thread Andrew Dunstan


On 12/23/21 11:27, Larry Rosenman wrote:
>
>>>
>>
>> For the 9.2 error, try setting this in the config_env stanza:
>>
>>
>>     CFLAGS => '-O2 -fPIC',
>>
>>
>>
>
> That got us further, but it dies on startdb:
> $ cat startdb-C-1.log
> waiting for server to start stopped waiting
> pg_ctl: could not start server
> Examine the log output.
> === db log file ==
> LOG:  unrecognized configuration parameter "unix_socket_directories"
> in file
> "/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/data-C/postgresql.conf"
> line 576
> FATAL:  configuration file
> "/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/data-C/postgresql.conf"
> contains errors
> $
>
> And we have the errors on the other branches with a temp(?) directory.


looks like it's picking up the wrong perl libraries. Please show us the
output of

   grep -v secret 
/home/pgbuildfarm/buildroot/REL9_2_STABLE/$animal.lastrun-logs/web-txn.data


cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2021-12-23 Thread Justin Pryzby
On Thu, Dec 23, 2021 at 09:14:18AM -0400, Fabien COELHO wrote:
> It seems that the v31 patch does not apply anymore:
> 
>   postgresql> git apply 
> ~/v31-0001-Document-historic-behavior-of-links-to-directori.patch
>   error: patch failed: doc/src/sgml/func.sgml:27410
>   error: doc/src/sgml/func.sgml: patch does not apply

Thanks for continuing to follow this patch ;)

I fixed a conflict with output/tablespace from d1029bb5a et seq.
I'm not sure why you got a conflict with 0001, though.

I think the 2nd half of the patches are still waiting for fixes to lstat() on
windows.

You complained before that there were too many patches, and I can see how it
might be a pain depending on your workflow.  But the division is deliberate.
Dealing with patchsets is easy for me: I use "mutt" and for each patch
attachment, I type "|git am" (or |git am -3).
>From b8ed86daad1bf9fbb647f7c007130bb0878a2a94 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 16 Mar 2020 14:12:55 -0500
Subject: [PATCH v32 01/11] Document historic behavior of links to
 directories..

Backpatch to 9.5: pg_stat_file
---
 doc/src/sgml/func.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e58efce5865..b5c1befe627 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27495,7 +27495,7 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 Returns a record containing the file's size, last access time stamp,
 last modification time stamp, last file status change time stamp (Unix
 platforms only), file creation time stamp (Windows only), and a flag
-indicating if it is a directory.
+indicating if it is a directory (or a symbolic link to a directory).


 This function is restricted to superusers by default, but other users
-- 
2.17.0

>From 88b324a6877ce916589872f5067bd5810e60608e Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 17 Mar 2020 13:16:24 -0500
Subject: [PATCH v32 02/11] Add tests on pg_ls_dir before changing it

---
 src/test/regress/expected/misc_functions.out | 24 
 src/test/regress/sql/misc_functions.sql  |  8 +++
 2 files changed, 32 insertions(+)

diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 1013d17f87d..830de507e7c 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -243,6 +243,30 @@ select count(*) > 0 from
  t
 (1 row)
 
+select * from (select pg_ls_dir('.', false, true) as name) as ls where ls.name='.'; -- include_dot_dirs=true
+ name 
+--
+ .
+(1 row)
+
+select * from (select pg_ls_dir('.', false, false) as name) as ls where ls.name='.'; -- include_dot_dirs=false
+ name 
+--
+(0 rows)
+
+select pg_ls_dir('does not exist', true, false); -- ok with missingok=true
+ pg_ls_dir 
+---
+(0 rows)
+
+select pg_ls_dir('does not exist'); -- fails with missingok=false
+ERROR:  could not open directory "does not exist": No such file or directory
+-- Check that expected columns are present
+select * from pg_stat_file('.') limit 0;
+ size | access | modification | change | creation | isdir 
+--++--++--+---
+(0 rows)
+
 --
 -- Test replication slot directory functions
 --
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index 7ab9b2a1509..422a1369aeb 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -91,6 +91,14 @@ select count(*) > 0 from
where spcname = 'pg_default') pts
   join pg_database db on pts.pts = db.oid;
 
+select * from (select pg_ls_dir('.', false, true) as name) as ls where ls.name='.'; -- include_dot_dirs=true
+select * from (select pg_ls_dir('.', false, false) as name) as ls where ls.name='.'; -- include_dot_dirs=false
+select pg_ls_dir('does not exist', true, false); -- ok with missingok=true
+select pg_ls_dir('does not exist'); -- fails with missingok=false
+
+-- Check that expected columns are present
+select * from pg_stat_file('.') limit 0;
+
 --
 -- Test replication slot directory functions
 --
-- 
2.17.0

>From 6b7b4a3a7f4fe3edcb5a67f467b10543fa47c343 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 9 Mar 2020 22:40:24 -0500
Subject: [PATCH v32 03/11] Add pg_ls_dir_metadata to list a dir with file
 metadata..

Generalize pg_ls_dir_files and retire pg_ls_dir

Need catversion bumped?
---
 doc/src/sgml/func.sgml   |  21 ++
 src/backend/catalog/system_functions.sql |   1 +
 src/backend/utils/adt/genfile.c  | 239 +++
 src/include/catalog/pg_proc.dat  |  12 +
 src/test/regress/expected/misc_functions.out |  24 ++
 src/test/regress/expected/tablespace.out |   8 +
 src/test/regress/sql/misc_functions.sql  |  11 +
 src/test/regress/sql/tablespace.sql  |

Re: Buildfarm support for older versions

2021-12-23 Thread Larry Rosenman

On 12/23/2021 11:23 am, Andrew Dunstan wrote:

On 12/23/21 11:27, Larry Rosenman wrote:






For the 9.2 error, try setting this in the config_env stanza:


    CFLAGS => '-O2 -fPIC',





That got us further, but it dies on startdb:
$ cat startdb-C-1.log
waiting for server to start stopped waiting
pg_ctl: could not start server
Examine the log output.
=== db log file ==
LOG:  unrecognized configuration parameter "unix_socket_directories"
in file
"/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/data-C/postgresql.conf"
line 576
FATAL:  configuration file
"/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/data-C/postgresql.conf"
contains errors
$

And we have the errors on the other branches with a temp(?) directory.



looks like it's picking up the wrong perl libraries. Please show us the
output of

   grep -v secret
/home/pgbuildfarm/buildroot/REL9_2_STABLE/$animal.lastrun-logs/web-txn.data


cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com



$ grep -v secret web-txn.data
$changed_this_run = '';
$changed_since_success = '';
$branch = 'REL9_2_STABLE';
$status = 1;
$stage = 'StartDb-C:1';
$animal = 'gerenuk';
$ts = 1640276469;
$log_data = 'Last file mtime in snapshot: Wed Dec 15 23:00:28 2021 GMT
===
waiting for server to start stopped waiting
pg_ctl: could not start server
Examine the log output.
=== db log file ==
LOG:  unrecognized configuration parameter "unix_socket_directories" in 
file 
"/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/data-C/postgresql.conf" 
line 576
FATAL:  configuration file 
"/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/data-C/postgresql.conf" 
contains errors

';
$confsum = 'This file was created by PostgreSQL configure 9.2.24, which 
was

generated by GNU Autoconf 2.63.  Invocation command line was

  $ ./configure --enable-cassert --enable-debug --enable-nls 
--enable-tap-tests \\
--with-perl 
--prefix=/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst \\
--with-pgport=5700 
--cache-file=/home/pgbuildfarm/buildroot/accache-gerenuk/config-REL9_2_STABLE.cache



hostname = borg.lerctr.org
uname -m = amd64
uname -r = 14.0-CURRENT
uname -s = FreeBSD
uname -v = FreeBSD 14.0-CURRENT #26 main-n251898-acdc1de369a: Wed Dec 22 
18:11:08 CST 2021 
r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL


/usr/bin/uname -p = amd64


PATH: /home/ler/.asdf/shims
PATH: /home/ler/.asdf/bin
PATH: /sbin
PATH: /bin
PATH: /usr/sbin
PATH: /usr/bin
PATH: /usr/local/sbin
PATH: /usr/local/bin
PATH: /home/ler/bin
PATH: /home/ler/go/bin
PATH: /usr/local/opt/terraform@0.12/bin
PATH: /home/ler/bin




$Script_Config = {
   \'alerts\' => {},
   \'animal\' => \'gerenuk\',
   \'base_port\' => 5678,
   \'bf_perl_version\' => \'5.32.1\',
   \'build_env\' => {
\'INCLUDES\' => 
\'-I/usr/local/include\',

\'LDFLAGS\' => \'-L/usr/local/lib\'
  },
   \'build_root\' => \'/home/pgbuildfarm/buildroot\',
   \'ccache_failure_remove\' => undef,
   \'config\' => [],
   \'config_env\' => {
 \'CFLAGS\' => \'-O2 -fPIC\'
   },
   \'config_opts\' => [
  \'--enable-cassert\',
  \'--enable-debug\',
  \'--enable-nls\',
  \'--enable-tap-tests\',
  \'--with-perl\'
],
   \'core_file_glob\' => \'*.core*\',
   \'extra_config\' => {
   \'DEFAULT\' => [
  \'log_line_prefix 
= \\\'%m [%p:%l] %q%a \\\'\',
  \'log_connections 
= \\\'true\\\'\',
  
\'log_disconnections = \\\'true\\\'\',
  \'log_statement = 
\\\'all\\\'\',

  \'fsync = off\'
]
 },
   \'force_every\' => {},
   \'git_ignore_mirror_failure\' => 1,
   \'git_keep_mirror\' => 1,
   \'git_use_workdirs\' => 1,
   \'invocation_args\' => [
  \'--config\',
  
\'/home/pgbuildfarm/conf/gerenuk.conf\',

  \'--test\',
  \'

Re: Buildfarm support for older versions

2021-12-23 Thread Andrew Dunstan


On 12/23/21 12:23, Andrew Dunstan wrote:
> On 12/23/21 11:27, Larry Rosenman wrote:
>>> For the 9.2 error, try setting this in the config_env stanza:
>>>
>>>
>>>     CFLAGS => '-O2 -fPIC',
>>>
>>>
>>>
>> That got us further, but it dies on startdb:
>> $ cat startdb-C-1.log
>> waiting for server to start stopped waiting
>> pg_ctl: could not start server
>> Examine the log output.
>> === db log file ==
>> LOG:  unrecognized configuration parameter "unix_socket_directories"
>> in file
>> "/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/data-C/postgresql.conf"
>> line 576
>> FATAL:  configuration file
>> "/home/pgbuildfarm/buildroot/REL9_2_STABLE/inst/data-C/postgresql.conf"
>> contains errors
>> $
>>
>> And we have the errors on the other branches with a temp(?) directory.
>
> looks like it's picking up the wrong perl libraries. Please show us the
> output of
>
>grep -v secret 
> /home/pgbuildfarm/buildroot/REL9_2_STABLE/$animal.lastrun-logs/web-txn.data
>
>


Oh, you need to be building with the buildfarm client's git tip, not the
released code. Alternatively, apply this patch:


https://github.com/PGBuildFarm/client-code/commit/75c762ba74fdec96ebf6c2433d61d3eeead825c3


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2021-12-23 Thread Максим Орлов
>
> The pstrdup is unnecessary but harmless.
>
> > - use the existing table for uniqueness check in 005_opclass_damage.pl
>
> It appears you still create a new table, bttest_unique, rather than using
> the existing table int4tbl.  That's fine.
>
> > - added tests into 003_check.pl . It is only smoke test that just
> verifies new functions.
>
> You have borrowed the existing tests but forgot to change their names.
> (The last line of each check is the test name, such as 'pg_amcheck smoke
> test --parent-check'.)  Please make each test name unique.
>

Thanks for your review! Fixed all these remaining things from patch v6.
PFA v7 patch.

---
Best regards,
Maxim Orlov.


v7-0001-Add-option-for-amcheck-and-pg_amcheck-to-check-un.patch
Description: Binary data


Re: sequences vs. synchronous replication

2021-12-23 Thread Tomas Vondra

On 12/23/21 15:42, Fujii Masao wrote:



On 2021/12/23 3:49, Tomas Vondra wrote:
Attached is a patch tweaking WAL logging - in wal_level=minimal we do 
the same thing as now, in higher levels we log every sequence fetch.


Thanks for the patch!

With the patch, I found that the regression test for sequences failed.

+    fetch = log = fetch;

This should be "log = fetch"?

On second thought, originally a sequence doesn't guarantee that the 
value already returned by nextval() will never be returned by subsequent 
nextval() after the server crash recovery. That is, nextval() may return 
the same value across crash recovery. Is this understanding right? For 
example, this case can happen if the server crashes after nextval() 
returned the value but before WAL for the sequence was flushed to the 
permanent storage.


I think the important step is commit. We don't guarantee anything for 
changes in uncommitted transactions. If you do nextval in a transaction 
and the server crashes before the WAL gets flushed before COMMIT, then 
yes, nextval may generate the same nextval again. But after commit that 
is not OK - it must not happen.



So it's not a bug that sync standby may return the same value as
already returned in the primary because the corresponding WAL has not
been replicated yet, isn't it?



No, I don't think so. Once the COMMIT happens (and gets confirmed by the 
sync standby), it should be possible to failover to the sync replica 
without losing any data in committed transaction. Generating duplicate 
values is a clear violation of that.


IMHO the fact that we allow a transaction to commit (even just locally) 
without flushing all the WAL it depends on is clearly a data loss bug.


BTW, if the returned value is stored in database, the same value is 
guaranteed not to be returned again after the server crash or by sync 
standby. Because in that case the WAL of the transaction storing that 
value is flushed and replicated.




True, assuming the table is WAL-logged etc. I agree the issue may be 
affecting a fairly small fraction of workloads, because most people use 
sequences to generate data for inserts etc.


So I think this makes it acceptable / manageable. Of course, this 
means the values are much less monotonous (across backends), but I 
don't think we really promised that. And I doubt anyone is really 
using sequences like this (just nextval) in performance critical use 
cases.


I think that this approach is not acceptable to some users. So, if we 
actually adopt WAL-logging every sequence fetch, also how about exposing 
SEQ_LOG_VALS as reloption for a sequence? If so, those who want to log 
every sequence fetch can set this SEQ_LOG_VALS reloption to 0. OTOH, 
those who prefer the current behavior in spite of the risk we're 
discussing at this thread can set the reloption to 32 like it is for 
now, for example.




I think it'd be worth explaining why you think it's not acceptable?

I've demonstrated the impact on regular workloads (with other changes 
that write stuff to WAL) is not measurable, and enabling sequence 
caching eliminates most of the overhead for the rare corner case 
workloads if needed. It does generate a bit more WAL, but the sequence 
WAL records are pretty tiny.


I'm opposed to adding relooptions that affect correctness - it just 
seems like a bad idea to me. Moreover setting the CACHE for a sequence 
does almost the same thing - if you set CACHE 32, we only generate WAL 
once every 32 increments. The only difference is that this cache is not 
shared between backends, so one backend will generate 1,2,3,... and 
another backend will generate 33,34,35,... etc. I don't think that's a 
problem, because if you want strictly monotonous / gap-less sequences 
you can't use our sequences anyway. Yes, with short-lived backends this 
may consume the sequences faster, but well - short-lived backends are 
expensive anyway and overflowing bigserial is still unlikely.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: CREATEROLE and role ownership hierarchies

2021-12-23 Thread Joshua Brindle
On Tue, Dec 21, 2021 at 8:26 PM Mark Dilger
 wrote:
>
>
>
> > On Dec 21, 2021, at 5:11 PM, Shinya Kato  
> > wrote:
> >
> > I fixed the patches because they cannot be applied to HEAD.
>
> Thank you.

I reviewed and tested these and they LGTM. FYI the rebased v3 patches
upthread are raw diffs so git am won't apply them. I can add myself to
the CF as a reviewer if it is helpful.




Fwd: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2021-12-23 Thread SATYANARAYANA NARLAPURAM
Please find the attached draft patch.

On Thu, Dec 23, 2021 at 2:47 AM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Thu, Dec 23, 2021 at 5:53 AM SATYANARAYANA NARLAPURAM
>  wrote:
> >
> > Hi Hackers,
> >
> > I am considering implementing RPO (recovery point objective) enforcement
> feature for Postgres where the WAL writes on the primary are stalled when
> the WAL distance between the primary and standby exceeds the configured
> (replica_lag_in_bytes) threshold. This feature is useful particularly in
> the disaster recovery setups where primary and standby are in different
> regions and synchronous replication can't be set up for latency and
> performance reasons yet requires some level of RPO enforcement.
>
> +1 for the idea in general. However, blocking writes on primary seems
> an extremely radical idea. The replicas can fall behind transiently at
> times and blocking writes on the primary may stop applications failing
> for these transient times. This is not a problem if the applications
> have retry logic for the writes. How about blocking writes on primary
> if the replicas fall behind the primary for a certain period of time?
>

My proposal is to block the caller from writing until the lag situation is
improved. Don't want to throw any errors and fail the tranaction. I think
we are aligned?


>
> > The idea here is to calculate the lag between the primary and the
> standby (Async?) server during XLogInsert and block the caller until the
> lag is less than the threshold value. We can calculate the max lag by
> iterating over ReplicationSlotCtl->replication_slots.
>
> The "falling behind" can also be quantified by the number of
> write-transactions on the primary. I think it's good to have the users
> choose what the "falling behind" means for them. We can have something
> like the "recovery_target" param with different options name, xid,
> time, lsn.
>

The transactions can be of arbitrary size and length and these options may
not provide the desired results. Time is a worthy option to add.


>
> > If this is not something we don't want to do in the core, at least
> adding a hook for XlogInsert is of great value.
>
> IMHO, this feature may not be needed by everyone, the hook-way seems
> reasonable so that the postgres vendors can provide different
> implementations (for instance they can write an extension that
> implements this hook which can block writes on primary, write some log
> messages, inform some service layer of the replicas falling behind the
> primary etc.). If we were to have the hook in XLogInsert which gets
> called so frequently or XLogInsert is a hot-path, the hook really
> should do as little work as possible, otherwise the write operations
> latency may increase.
>

A Hook is a good start. If there is enough interest then an extension can
be added to the contrib module.


> > A few other scenarios I can think of with the hook are:
> >
> > Enforcing RPO as described above
> > Enforcing rate limit and slow throttling when sync standby is falling
> behind (could be flush lag or replay lag)
> > Transactional log rate governance - useful for cloud providers to
> provide SKU sizes based on allowed WAL writes.
> >
> > Thoughts?
>
> The hook can help to achieve the above objectives but where to place
> it and what parameters it should take as input (or what info it should
> emit out of the server via the hook) are important too.
>

XLogInsert in my opinion is the best place to call it and the hook can be
something like this "void xlog_insert_hook(NULL)" as all the throttling
logic required is the current flush position which can be obtained
from GetFlushRecPtr and the ReplicationSlotCtl. Attached a draft patch.


>
> Having said all, the RPO feature can also be implemented outside of
> the postgres, a simple implementation could be - get the primary
> current wal lsn using pg_current_wal_lsn and all the replicas
> restart_lsn using pg_replication_slot, if they differ by certain
> amount, then issue ALTER SYSTEM SET READ ONLY command [1] on the
> primary, this requires the connections to the server and proper access
> rights. This feature can also be implemented as an extension (without
> the hook) which doesn't require any connections to the server yet can
> access the required info primary current_wal_lsn, restart_lsn of the
> replication slots etc, but the RPO enforcement may not be immediate as
> the server doesn't have any hooks in XLogInsert or some other area.
>

READ ONLY is a decent choice but can fail the writes or not take
into effect until the end of the transaction?


> [1] -
> https://www.postgresql.org/message-id/CAAJ_b967uKBiW6gbHr5aPzweURYjEGv333FHVHxvJmMhanwHXA%40mail.gmail.com
>
> Regards,
> Bharath Rupireddy.
>


0001-Add-xlog_insert_hook-to-give-control-to-the-plugins.patch
Description: Binary data


Re: Delay the variable initialization in get_rel_sync_entry

2021-12-23 Thread Michael Paquier
On Thu, Dec 23, 2021 at 12:54:41PM -0300, Euler Taveira wrote:
> On Wed, Dec 22, 2021, at 10:11 AM, houzj.f...@fujitsu.com wrote:
>> The extra cost could sometimes be noticeable because get_rel_sync_entry is a
>> hot function which is executed for each change. And the 'am_partition' and
>> 'relkind' are necessary only when we need to rebuild the RelationSyncEntry.
>> 
>> Here is the perf result for the case when inserted large amounts of data 
>> into a
>> un-published table in which case the cost is noticeable.
>> 
>> --12.83%--pgoutput_change
>> |--11.84%--get_rel_sync_entry
>> |--4.76%--get_rel_relispartition
>> |--4.70%--get_rel_relkind

How does the perf balance change once you apply the patch?  Do we have
anything else that stands out?  Getting rid of this bottleneck is fine
by itself, but I am wondering if there are more things to worry about
or not.

> Good catch. WFM. Deferring variable initialization close to its first use is
> good practice.

Yeah, it is usually a good practice to have the declaration within
the code block that uses it rather than piling everything at the
beginning of the function.  Being able to see that in profiles is
annoying, and the change is simple, so I'd like to backpatch it.

This is a period of vacations for a lot of people, so I'll wait until
the beginning-ish of January before doing anything.
--
Michael


signature.asc
Description: PGP signature


Inconsistent ellipsis in regression test error message?

2021-12-23 Thread Peter Smith
The most recent cfbot run for a patch I am interested in has failed a
newly added regression test.

Please see http://cfbot.cputube.org/ for 36/2906

~

The failure logs [2] are very curious because the error message is
what was expected but it has a different position of the ellipsis
(...).

But only for Windows.

 -- fail - publication WHERE clause must be boolean
 ALTER PUBLICATION testpub5 SET TABLE testpub_rf_tbl3 WHERE (1234);
 ERROR:  argument of PUBLICATION WHERE must be type boolean, not type integer
-LINE 1: ALTER PUBLICATION testpub5 SET TABLE testpub_rf_tbl3 WHERE (...
+LINE 1: ...PUBLICATION testpub5 SET TABLE testpub_rf_tbl3 WHERE (1234);


How is that possible?

Is this a problem caused by the patch code? If so, how?

Or is this some obscure boundary case bug of the error ellipsis
calculation which I've exposed by accident due to the specific length
of my bad command?

Thanks for any ideas!

--
[2] https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.157408

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Buildfarm support for older versions

2021-12-23 Thread Larry Rosenman

On 12/23/2021 11:58 am, Andrew Dunstan wrote:



Oh, you need to be building with the buildfarm client's git tip, not 
the

released code. Alternatively, apply this patch:


https://github.com/PGBuildFarm/client-code/commit/75c762ba74fdec96ebf6c2433d61d3eeead825c3




with git tip:
output attached.

I can give access if needed.

$ grep -v secret conf/gerenuk.conf
# -*-perl-*- hey - emacs - this is a perl file

=comment

Copyright (c) 2003-2010, Andrew Dunstan

See accompanying License file for license details

=cut

package PGBuild;

use strict;

use warnings FATAL => 'qw';

use vars qw(%conf);

# use vars qw($VERSION); $VERSION = 'REL_4.19';

my $branch;
{
no warnings qw(once);
$branch = $main::branch;
}

%conf =(

# identity
animal => "gerenuk",

# source code
scm => 'git', # or 'cvs'
git_keep_mirror => 1,  # manage a git mirror in the build root
git_ignore_mirror_failure => 1, # ignore failures in fetching to 
mirror


# use symlinked git repo from non-HEAD branches,
# like git-new-workdir does
git_use_workdirs => 1,

scmrepo => undef, # default is community repo for either type
scm_url => undef, # webref for diffs on server - use default for 
community

# git_reference => undef, # for --reference on git repo
# cvsmethod => 'update', # or 'export'
use_git_cvsserver => undef, # or 'true' if repo is a git cvsserver

# external commands and control
make => 'gmake', # or gmake if required. can include path if 
necessary.

make_jobs => 10, # >1 for parallel "make" and "make check" steps
tar_log_cmd => undef, # default is "tar -z -cf runlogs.tgz *.log"
# replacement must have the same effect

# max time in seconds allowed for a single branch run
# undef/0 means unlimited
wait_timeout => undef,

# where and how to build
# must be absolute, can be either Unix or Windows style for MSVC
# undef means default, buildroot dir in script directory
build_root =>   '/home/pgbuildfarm/buildroot', , #  or 
'/path/to/buildroot',

use_vpath => undef, # set true to do vpath builds

# path to directory with auxiliary web script
# if relative, the must be relative to buildroot/branch
# Now only used on older Msys installations
# aux_path => "../..",

keep_error_builds => 0,
core_file_glob => "*.core*",   # Linux style, use "*.core" for BSD

# where to report status
target => "https://buildfarm.postgresql.org/cgi-bin/pgstatus.pl";,

# where to report change in OS version or compiler version
upgrade_target => 
"https://buildfarm.postgresql.org/cgi-bin/upgrade.pl";,


# change this to a true value if using MSVC, in which case also
# see MSVC section below

using_msvc => undef,

# if force_every is a scalar it will be used on all branches, like 
this

# for legacy reasons:
# force_every => 336 , # max hours between builds, undef or 0 = 
unforced
# we now prefer it to be a hash with branch names as the keys, like 
this

#
# this setting should be kept conservatively high, or not used at 
all  -

# for the most part it's best to let the script decide if something
# has changed that requires a new run for the branch.
#
# an entry with a name of 'default' matches any branch not named
force_every => {

# HEAD => 48,
# REL8_3_STABLE => 72,
# default => 168,
},

# alerts are triggered if the server doesn't see a build on a branch 
after

# this many hours, and then sent out every so often,

alerts => {

#HEAD  => { alert_after => 72,  alert_every => 24 },
# REL8_1_STABLE => { alert_after => 240, alert_every => 48 },
},

# include / exclude patterns for files that trigger a build
# if both are specified then they are both applied as filters
# undef means don't ignore anything.
# exclude qr[^doc/|\.po$] to ignore changes to docs and po files
# (recommended)
# undef means null filter.
trigger_exclude => qr[^doc/|\.po$],
trigger_include => undef,

# settings for mail notices - default to notifying nobody
# these lists contain addresses to be notified
# must be complete email addresses, as the email is sent from the 
server


mail_events =>{
all => [], # unconditional
fail => ["ler\@lerctr.org"], # if this build fails
change => ["ler\@lerctr.org"], # if this build causes a state 
change
green => ["ler\@lerctr.org"], # if this build causes a state 
change to/from OK

},

# if this flag is set and ccache is used, an unsuccessful run will 
result
# in the removal of the ccache directory (and you need to make sure 
that
# its parent is writable). The default is off - ccache should be 
able to
# handle failures, although there have been suspicions in the past 
that
# it's not quite as reliable as we'd want, and thus we have this 
option.


ccache_failure_remove => u

Re: parallel vacuum comments

2021-12-23 Thread Amit Kapila
On Thu, Dec 23, 2021 at 10:56 AM Masahiko Sawada  wrote:
>
> On Wed, Dec 22, 2021 at 10:55 PM Amit Kapila  wrote:
> >
> > On Wed, Dec 22, 2021 at 6:22 PM Amit Kapila  wrote:
> > >
> > > On Wed, Dec 22, 2021 at 5:39 PM houzj.f...@fujitsu.com
> > >  wrote:
> > > >
> > > >
> > > > 2)
> > > > +#include "utils/rel.h"
> > > > +#include "utils/lsyscache.h"
> > > > +#include "utils/memutils.h"
> > > >
> > > > It might be better to keep the header file in alphabetical order.
> > > > :
> > > > +#include "utils/lsyscache.h"
> > > > +#include "utils/memutils.h"
> > > > +#include "utils/rel.h"
> > > >
> > >
> > > Right, I'll take care of this as I am already making some other edits
> > > in the patch.
> > >
> >
> > Fixed this and made a few other changes in the patch that includes (a)
> > passed down the num_index_scans information in parallel APIs based on
> > which it can make the decision whether to reinitialize DSM or consider
> > conditional parallel vacuum clean up; (b) get rid of first-time
> > variable in ParallelVacuumState as that is not required if we have
> > num_index_scans information; (c) there seems to be quite a few
> > unnecessary includes in vacuumparallel.c which I have removed; (d)
> > unnecessary error callback info was being set in ParallelVacuumState
> > in leader backend; (e) changed/added comments at quite a few places.
> >
> > Can you please once verify the changes in the attached?
>
> Thank you for updating the patch!
>
> I agreed with these changes and it looks good to me.
>

Pushed. As per my knowledge, we have addressed the improvements
raised/discussed in this thread.

-- 
With Regards,
Amit Kapila.




Re: parallel vacuum comments

2021-12-23 Thread Masahiko Sawada
On Fri, Dec 24, 2021 at 11:59 AM Amit Kapila  wrote:
>
> On Thu, Dec 23, 2021 at 10:56 AM Masahiko Sawada  
> wrote:
> >
> > On Wed, Dec 22, 2021 at 10:55 PM Amit Kapila  
> > wrote:
> > >
> > > On Wed, Dec 22, 2021 at 6:22 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Wed, Dec 22, 2021 at 5:39 PM houzj.f...@fujitsu.com
> > > >  wrote:
> > > > >
> > > > >
> > > > > 2)
> > > > > +#include "utils/rel.h"
> > > > > +#include "utils/lsyscache.h"
> > > > > +#include "utils/memutils.h"
> > > > >
> > > > > It might be better to keep the header file in alphabetical order.
> > > > > :
> > > > > +#include "utils/lsyscache.h"
> > > > > +#include "utils/memutils.h"
> > > > > +#include "utils/rel.h"
> > > > >
> > > >
> > > > Right, I'll take care of this as I am already making some other edits
> > > > in the patch.
> > > >
> > >
> > > Fixed this and made a few other changes in the patch that includes (a)
> > > passed down the num_index_scans information in parallel APIs based on
> > > which it can make the decision whether to reinitialize DSM or consider
> > > conditional parallel vacuum clean up; (b) get rid of first-time
> > > variable in ParallelVacuumState as that is not required if we have
> > > num_index_scans information; (c) there seems to be quite a few
> > > unnecessary includes in vacuumparallel.c which I have removed; (d)
> > > unnecessary error callback info was being set in ParallelVacuumState
> > > in leader backend; (e) changed/added comments at quite a few places.
> > >
> > > Can you please once verify the changes in the attached?
> >
> > Thank you for updating the patch!
> >
> > I agreed with these changes and it looks good to me.
> >
>
> Pushed.

Thank you for committing the patch!

> As per my knowledge, we have addressed the improvements
> raised/discussed in this thread.

Yeah, I think so too.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Emit a warning if the extension's GUC is set incorrectly

2021-12-23 Thread Shinya Kato

On 2021/12/22 3:33, Tom Lane wrote:

I wrote:

0003 looks to me like it was superseded by 75d22069e.  I do not
particularly like that patch though; it seems both inefficient
and ill-designed.  0003 is adding a check in an equally bizarre
place.  Why isn't add_placeholder_variable the place to deal
with this?  Also, once we know that a prefix is reserved,
why don't we throw an error not just a warning for attempts to
set some unknown variable under that prefix?  We don't allow
you to set unknown non-prefixed variables.

Concretely, I think we should do the attached, which removes much of
75d22069e and does the check at the point of placeholder creation.
(After looking closer, add_placeholder_variable's caller is the
function that is responsible for rejecting bad names, not
add_placeholder_variable itself.)

This also fixes an indisputable bug in 75d22069e, namely that it
did prefix comparisons incorrectly and would throw warnings for
cases it shouldn't.  Reserving "plpgsql" shouldn't have the effect
of reserving "pl" as well.


Thank you for creating the patch.
This is exactly what I wanted to create. It looks good to me.



I'm tempted to propose that we also rename EmitWarningsOnPlaceholders
to something like MarkGUCPrefixReserved, to more clearly reflect
what it does now.  (We could provide the old name as a macro alias
to avoid breaking extensions needlessly.)


+1


--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION






Re: more descriptive message for process termination due to max_slot_wal_keep_size

2021-12-23 Thread Kyotaro Horiguchi
At Thu, 23 Dec 2021 18:08:08 +0530, Ashutosh Bapat 
 wrote in 
> On Wed, Dec 15, 2021 at 9:42 AM Kyotaro Horiguchi
>  wrote:
> > > LOG:  invalidating slot "s1"
> > > DETAIL:  The slot's restart_lsn 0/1D68 is behind the limit 0/1100 
> > > defined by max_slot_wal_keep_size.
> >
> > The second line could be changed like the following or anything other.
> >
> > > DETAIL:  The slot's restart_lsn 0/1D68 got behind the limit 
> > > 0/1100 determined by max_slot_wal_keep_size.
> > .
> >
> 
> The second version looks better as it gives more details. I am fine
> with either of the above wordings.
> 
> I would prefer everything in the same message though since
> "invalidating slot ..." is too short a LOG message. Not everybody
> enabled details always.

Mmm. Right. I have gone too much to the same way with the
process-termination message.

I rearranged the meesages as follows in the attached version. (at master)

> LOG:  terminating process %d to release replication slot \"%s\" because its 
> restart_lsn %X/%X exceeds max_slot_wal_keep_size
> DETAIL:  The slot got behind the limit %X/%X determined by 
> max_slot_wal_keep_size.

> LOG:  invalidating slot \"%s\" because its restart_LSN %X/%X exceeds 
> max_slot_wal_keep_size
c> DETAIL:  The slot got behind the limit %X/%X determined by 
max_slot_wal_keep_size.

The messages is actually incomplete even in 13 so I think the change
to the errmsg() message of the first message is worth back-patching.

- v3-0001-Make-a-message-on-process-termination-more-dscrip.patch

  Changes only the first main message and it can be back-patched to 14. 

- v3-0001-Make-a-message-on-process-termination-more-dscrip_13.patch

  The same to the above but for 13, which doesn't have LSN_FORMAT_ARGS.

- v3-0002-Add-detailed-information-to-slot-invalidation-mes.patch

  Attaches the DETAIL line shown above to both messages, only for the
  master.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 4bfbb58801e9d2a0e0ab8320dd95bc850a0a397b Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 24 Dec 2021 12:52:07 +0900
Subject: [PATCH v3 1/2] Make a message on process termination more dscriptive

The message at process termination due to slot limit doesn't provide
the reason. In the major scenario the message is followed by another
message about slot invalidatation, which shows the detail for the
termination.  However the second message is missing if the slot is
temporary one.

Augment the first message with the reason same as the second message.

Backpatch through to 13 where the message was introduced.

Reported-by: Alex Enachioaie 
Author: Kyotaro Horiguchi 
Reviewed-by: Ashutosh Bapat 
Reviewed-by: Bharath Rupireddy 
Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org
Backpatch-through: 13
---
 src/backend/replication/slot.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 90ba9b417d..0ceac3a54d 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1253,8 +1253,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			if (last_signaled_pid != active_pid)
 			{
 ereport(LOG,
-		(errmsg("terminating process %d to release replication slot \"%s\"",
-active_pid, NameStr(slotname;
+		(errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
+active_pid, NameStr(slotname),
+LSN_FORMAT_ARGS(restart_lsn;
 
 (void) kill(active_pid, SIGTERM);
 last_signaled_pid = active_pid;
-- 
2.27.0

>From 444b3bb64ee907280606696e0cb90f6b022c9cd6 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 24 Dec 2021 13:23:54 +0900
Subject: [PATCH v3] Make a message on process termination more dscriptive

The message at process termination due to slot limit doesn't provide
the reason. In the major scenario the message is followed by another
message about slot invalidatation, which shows the detail for the
termination.  However the second message is missing if the slot is
temporary one.

Augment the first message with the reason same as the second message.

Backpatch through to 13 where the message was introduced.

Reported-by: Alex Enachioaie 
Author: Kyotaro Horiguchi 
Reviewed-by: Ashutosh Bapat 
Reviewed-by: Bharath Rupireddy 
Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org
Backpatch-through: 13
---
 src/backend/replication/slot.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 02047ea920..15b8934ae2 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1228,8 +1228,10 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			if (last_signaled_pid != active_pid)
 			{
 er

Re: Allow escape in application_name

2021-12-23 Thread Kyotaro Horiguchi
At Thu, 23 Dec 2021 23:10:38 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2021/12/17 16:50, Kyotaro Horiguchi wrote:
> > Thus rewriting the code we're focusing on like the following would
> > make sense to me.
> > 
> >>if (strcmp(keywords[i], "application_name") == 0)
> >>{
> >>values[i]  = process_pgfdw_appname(values[i]);
> >>
> >>/*
> >> * Break if we have a non-empty string. If we end up failing 
> >> with
> >> * all candidates, fallback_application_name would work.
> >> */
> >>if (appanme[0] != '\0')

(appname?)

> >>break;
> >>}   
> 
> I'm ok to remove the check "values[i] != NULL", but think that it's
> better to keep the other check "*(values[i]) != '\0'" as it
> is. Because *(values[i]) can be null character and it's a waste of
> cycles to call process_pgfdw_appname() in that case.

Right. I removed too much.

> > Thanks for revisiting.
> > 
> >> #1. use "[unknown]"
> >> #2. add the check but not use "[unknown]"
> >> #3. don't add the check (i.e., what the current patch does)
> >>
> >> For now, I'm ok to choose #2 or #3.
> > As I said before, given that we don't show "unkown" or somethig like
> > as the fallback, I'm fine with not having a NULL check since anyway it
> > bumps into SEGV immediately.  In short I'm fine with #3 here.
> 
> Yep, let's use #3 approach.

Agreed.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: row filtering for logical replication

2021-12-23 Thread Peter Smith
The current PG docs text for CREATE PUBLICATION (in the v54-0001
patch) has a part that says

+   A nullable column in the WHERE clause could cause the
+   expression to evaluate to false; avoid using columns without not-null
+   constraints in the WHERE clause.

I felt that the caution to "avoid using" nullable columns is too
strongly worded. AFAIK nullable columns will work perfectly fine so
long as you take due care of them in the WHERE clause. In fact, it
might be very useful sometimes to filter on nullable columns.

Here is a small test example:

// publisher
test_pub=# create table t1 (id int primary key, msg text null);
test_pub=# create publication p1 for table t1 where (msg != 'three');
// subscriber
test_sub=# create table t1 (id int primary key, msg text null);
test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost
dbname=test_pub application_name=sub1' PUBLICATION p1;

// insert some data
test_pub=# insert into t1 values (1, 'one'), (2, 'two'), (3, 'three'),
(4, null), (5, 'five');
test_pub=# select * from t1;
 id |  msg
+---
  1 | one
  2 | two
  3 | three
  4 |
  5 | five
(5 rows)

// data at sub
test_sub=# select * from t1;
 id | msg
+--
  1 | one
  2 | two
  5 | five
(3 rows)

Notice the row 4 with the NULL is also not replicated. But, perhaps we
were expecting it to be replicated (because NULL is not 'three'). To
do this, simply rewrite the WHERE clause to properly account for
nulls.

// truncate both sides
test_pub=# truncate table t1;
test_sub=# truncate table t1;

// alter the WHERE clause
test_pub=# alter publication p1 set table t1 where (msg is null or msg
!= 'three');

// insert data at pub
test_pub=# insert into t1 values (1, 'one'), (2, 'two'), (3, 'three'),
(4, null), (5, 'five');
INSERT 0 5
test_pub=# select * from t1;
 id |  msg
+---
  1 | one
  2 | two
  3 | three
  4 |
  5 | five
(5 rows)

// data at sub (not it includes the row 4)
test_sub=# select * from t1;
 id | msg
+--
  1 | one
  2 | two
  4 |
  5 | five
(4 rows)

~~

So, IMO the PG docs wording for this part should be relaxed a bit.

e.g.
BEFORE:
+   A nullable column in the WHERE clause could cause the
+   expression to evaluate to false; avoid using columns without not-null
+   constraints in the WHERE clause.
AFTER:
+   A nullable column in the WHERE clause could cause the
+   expression to evaluate to false. To avoid unexpected results, any possible
+   null values should be accounted for.

Thoughts?

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: sequences vs. synchronous replication

2021-12-23 Thread Kyotaro Horiguchi
At Thu, 23 Dec 2021 19:50:22 +0100, Tomas Vondra 
 wrote in 
> On 12/23/21 15:42, Fujii Masao wrote:
> > On 2021/12/23 3:49, Tomas Vondra wrote:
> >> Attached is a patch tweaking WAL logging - in wal_level=minimal we do
> >> the same thing as now, in higher levels we log every sequence fetch.
> > Thanks for the patch!
> > With the patch, I found that the regression test for sequences failed.
> > +    fetch = log = fetch;
> > This should be "log = fetch"?
> > On second thought, originally a sequence doesn't guarantee that the
> > value already returned by nextval() will never be returned by
> > subsequent nextval() after the server crash recovery. That is,
> > nextval() may return the same value across crash recovery. Is this
> > understanding right? For example, this case can happen if the server
> > crashes after nextval() returned the value but before WAL for the
> > sequence was flushed to the permanent storage.
> 
> I think the important step is commit. We don't guarantee anything for
> changes in uncommitted transactions. If you do nextval in a
> transaction and the server crashes before the WAL gets flushed before
> COMMIT, then yes, nextval may generate the same nextval again. But
> after commit that is not OK - it must not happen.

I don't mean to stand on Fujii-san's side particularly, but it seems
to me sequences of RDBSs are not rolled back generally.  Some googling
told me that at least Oracle (documented), MySQL, DB2 and MS-SQL
server doesn't rewind sequences at rollback, that is, sequences are
incremented independtly from transaction control.  It seems common to
think that two nextval() calls for the same sequence must not return
the same value in any context.

> > So it's not a bug that sync standby may return the same value as
> > already returned in the primary because the corresponding WAL has not
> > been replicated yet, isn't it?
> > 
> 
> No, I don't think so. Once the COMMIT happens (and gets confirmed by
> the sync standby), it should be possible to failover to the sync
> replica without losing any data in committed transaction. Generating
> duplicate values is a clear violation of that.

So, strictly speaking, that is a violation of the constraint I
mentioned regardless whether the transaction is committed or
not. However we have technical limitations as below.

> IMHO the fact that we allow a transaction to commit (even just
> locally) without flushing all the WAL it depends on is clearly a data
> loss bug.
> 
> > BTW, if the returned value is stored in database, the same value is
> > guaranteed not to be returned again after the server crash or by sync
> > standby. Because in that case the WAL of the transaction storing that
> > value is flushed and replicated.
> > 
> 
> True, assuming the table is WAL-logged etc. I agree the issue may be
> affecting a fairly small fraction of workloads, because most people
> use sequences to generate data for inserts etc.

It seems to me, from the fact that sequences are designed explicitly
untransactional and that behavior is widely adopted, the discussion
might be missing some significant use-cases.  But there's a
possibility that the spec of sequence came from some technical
limitation in the past, but I'm not sure..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

2021-12-23 Thread Kyotaro Horiguchi
At Thu, 23 Dec 2021 20:35:54 +0530, Bharath Rupireddy 
 wrote in 
> Hi,
> 
> It is useful (for debugging purposes) if the checkpoint end message
> has the checkpoint LSN and REDO LSN [1]. It gives more context while
> analyzing checkpoint-related issues. The pg_controldata gives the last
> checkpoint LSN and REDO LSN, but having this info alongside the log
> message helps analyze issues that happened previously, connect the
> dots and identify the root cause.
> 
> Attaching a small patch herewith. Thoughts?

A big +1 from me.  I thought about proposing the same in the past.

> [1]
> 2021-12-23 14:58:54.694 UTC [1965649] LOG:  checkpoint starting:
> shutdown immediate
> 2021-12-23 14:58:54.714 UTC [1965649] LOG:  checkpoint complete: wrote
> 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;
> write=0.001 s, sync=0.001 s, total=0.025 s; sync files=0,
> longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB;
> LSN=0/14D0AD0, REDO LSN=0/14D0AD0

I thougt about something like the following, but your proposal may be
clearer.

> WAL range=[0/14D0340, 0/14D0AD0]

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Be clear about what log_checkpoints emits in the documentation

2021-12-23 Thread Kyotaro Horiguchi
At Thu, 23 Dec 2021 20:56:22 +0530, Bharath Rupireddy 
 wrote in 
> Hi,
> 
> Currently the documentation of the log_checkpoints GUC says the following:
> 
> Some statistics are included in the log messages, including the number
> of buffers written and the time spent writing them.
> 
> Usage of the word "Some" makes it a vague statement. Why can't we just
> be clear about what statistics the log_checkpoints GUC can emit, like
> the attached patch?
> 
> Thoughts?

It seems like simply a maintenance burden of documentation since it
doesn't add any further detail of any item in a checkpoint log
message.  But I'm not sure we want detailed explanations for them and
it seems to me we deliberately never explained the detail of any log
messages.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: An obsolete comment of pg_stat_statements

2021-12-23 Thread Kyotaro Horiguchi
At Mon, 22 Nov 2021 22:50:04 +0800, Julien Rouhaud  wrote 
in 
> On Mon, Nov 22, 2021 at 2:48 PM Kyotaro Horiguchi
>  wrote:
> >
> > At Mon, 22 Nov 2021 15:38:23 +0900 (JST), Kyotaro Horiguchi 
> >  wrote in
> > >  * queryId is supposed to be a valid value, otherwise this function 
> > > dosen't
> > >  * calucate it by its own as before then returns immediately.
> >
> > Mmm. That's bad. This is the correted version.
> >
> >  * queryId is supposed to be a valid value, otherwise this function doesn't
> >  * calculate it by its own as before then returns immediately.
> 
> Ah good catch!  Indeed the semantics changed and I missed that comment.
> 
> I think that the new comment should be a bit more precise about what
> is a valid value and should probably not refer to a previous version
> of the code.  How about something like:
> 
> - * If queryId is 0 then this is a utility statement for which we couldn't
> - * compute a queryId during parse analysis, and we should compute a suitable
> - * queryId internally.
> + * If queryId is 0 then no query fingerprinting source has been enabled, so 
> we
> + * act as if the extension was disabled: silently exit without doing any 
> work.
>   *

Thanks! Looks better.  It is used as-is in the attached.

And I will register this to the next CF.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 1563bd869cb8da080e3488e64116da402f79be8c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 24 Dec 2021 15:25:57 +0900
Subject: [PATCH] Fix a function comment

Commit 5fd9dfa5f50e4906c35133a414ebec5b6d518493 forgot to fix the
comment. Fix it so that it desribes the new behavior.

Author: Julien Rouhaud
Reviewed-by: Kyotaro Horiguchi
---
 contrib/pg_stat_statements/pg_stat_statements.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 726ba59e2b..3224da9275 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -1191,9 +1191,8 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 /*
  * Store some statistics for a statement.
  *
- * If queryId is 0 then this is a utility statement for which we couldn't
- * compute a queryId during parse analysis, and we should compute a suitable
- * queryId internally.
+ * If queryId is 0 then no query fingerprinting source has been enabled, so we
+ * act as if the extension was disabled: silently exit without doing any work.
  *
  * If jstate is not NULL then we're trying to create an entry for which
  * we have no statistics as yet; we just want to record the normalized
-- 
2.27.0



Re: sequences vs. synchronous replication

2021-12-23 Thread Tomas Vondra




On 12/24/21 06:37, Kyotaro Horiguchi wrote:

At Thu, 23 Dec 2021 19:50:22 +0100, Tomas Vondra 
 wrote in

On 12/23/21 15:42, Fujii Masao wrote:

On 2021/12/23 3:49, Tomas Vondra wrote:

Attached is a patch tweaking WAL logging - in wal_level=minimal we do
the same thing as now, in higher levels we log every sequence fetch.

Thanks for the patch!
With the patch, I found that the regression test for sequences failed.
+    fetch = log = fetch;
This should be "log = fetch"?
On second thought, originally a sequence doesn't guarantee that the
value already returned by nextval() will never be returned by
subsequent nextval() after the server crash recovery. That is,
nextval() may return the same value across crash recovery. Is this
understanding right? For example, this case can happen if the server
crashes after nextval() returned the value but before WAL for the
sequence was flushed to the permanent storage.


I think the important step is commit. We don't guarantee anything for
changes in uncommitted transactions. If you do nextval in a
transaction and the server crashes before the WAL gets flushed before
COMMIT, then yes, nextval may generate the same nextval again. But
after commit that is not OK - it must not happen.


I don't mean to stand on Fujii-san's side particularly, but it seems
to me sequences of RDBSs are not rolled back generally.  Some googling
told me that at least Oracle (documented), MySQL, DB2 and MS-SQL
server doesn't rewind sequences at rollback, that is, sequences are
incremented independtly from transaction control.  It seems common to
think that two nextval() calls for the same sequence must not return
the same value in any context.



Yes, sequences are not rolled back on abort generally. That would 
require much stricter locking, and that'd go against using sequences in 
concurrent sessions.


But we're not talking about sequence rollback - we're talking about data 
loss, caused by failure to flush WAL for a sequence. But that affects 
the *current* code too, and to much greater extent.


Consider this:

BEGIN;
SELECT nextval('s') FROM generate_series(1,1000) s(i);
ROLLBACK; -- or crash of a different backend

BEGIN;
SELECT nextval('s');
COMMIT;

With the current code, this may easily lose the WAL, and we'll generate 
duplicate values from the sequence. We pretty much ignore the COMMIT.


With the proposed change to WAL logging, that is not possible. The 
COMMIT flushes enough WAL to prevent this issue.


So this actually makes this issue less severe.

Maybe I'm missing some important detail, though. Can you show an example 
where the proposed changes make the issue worse?



So it's not a bug that sync standby may return the same value as
already returned in the primary because the corresponding WAL has not
been replicated yet, isn't it?



No, I don't think so. Once the COMMIT happens (and gets confirmed by
the sync standby), it should be possible to failover to the sync
replica without losing any data in committed transaction. Generating
duplicate values is a clear violation of that.


So, strictly speaking, that is a violation of the constraint I
mentioned regardless whether the transaction is committed or
not. However we have technical limitations as below.



I don't follow. What violates what?

If the transaction commits (and gets a confirmation from sync replica), 
the modified WAL logging prevents duplicate values. It does nothing for 
uncommitted transactions. Seems like an improvement to me.



IMHO the fact that we allow a transaction to commit (even just
locally) without flushing all the WAL it depends on is clearly a data
loss bug.


BTW, if the returned value is stored in database, the same value is
guaranteed not to be returned again after the server crash or by sync
standby. Because in that case the WAL of the transaction storing that
value is flushed and replicated.



True, assuming the table is WAL-logged etc. I agree the issue may be
affecting a fairly small fraction of workloads, because most people
use sequences to generate data for inserts etc.


It seems to me, from the fact that sequences are designed explicitly
untransactional and that behavior is widely adopted, the discussion
might be missing some significant use-cases.  But there's a
possibility that the spec of sequence came from some technical
limitation in the past, but I'm not sure..



No idea. IMHO from the correctness / behavior point of view, the 
modified logging is an improvement. The only issue is the additional 
overhead, and I think the cache addresses that quite well.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company