Re: pglz compression performance, take two

2020-12-25 Thread Andrey Borodin


> 12 дек. 2020 г., в 22:47, Andrey Borodin  написал(а):
> 
> 

I've cleaned up comments, checked that memory alignment stuff actually make 
sense for 32-bit ARM (according to Godbolt) and did some more code cleanup. PFA 
v2 patch.

I'm still in doubt should I register this patch on CF or not. I'm willing to 
work on this, but it's not clear will it hit PGv14. And I hope for PGv15 we 
will have lz4 or something better for WAL compression.

Best regards, Andrey Borodin.



v2-0001-Reorganize-pglz-compression-code.patch
Description: Binary data


Re: On login trigger: take three

2020-12-25 Thread Pavel Stehule
so 26. 12. 2020 v 8:00 odesílatel Pavel Stehule 
napsal:

> Hi
>
>
> Thank you.
>> I have applied all your fixes in on_connect_event_trigger-12.patch.
>>
>> Concerning enable_client_connection_trigger GUC, I think that it is
>> really useful: it is the fastest and simplest way to disable login triggers
>> in case
>> of some problems with them (not only for superuser itself, but for all
>> users). Yes, it can be also done using "ALTER EVENT TRIGGER DISABLE".
>> But assume that you have a lot of databases with different login policies
>> enforced by on-login event triggers. And you want temporary disable them
>> all, for example for testing purposes.
>> In this case GUC is most convenient way to do it.
>>
>>
> There was typo in patch
>
> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
> index f810789..8861f1b 100644
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -1,4 +1,4 @@
> -
> +\
>
> I have not any objections against functionality or design. I tested the
> performance, and there are no negative impacts when this feature is not
> used. There is significant overhead related to plpgsql runtime
> initialization, but when this trigger will be used, then probably some
> other PLpgSQL procedures and functions will be used too, and then this
> overhead can be ignored.
>
> * make without warnings
> * make check-world passed
> * doc build passed
>
> Possible ToDo:
>
> The documentation can contain a note so usage connect triggers in
> environments with short life sessions and very short fast queries without
> usage PLpgSQL functions or procedures can have negative impact on
> performance due overhead of initialization of PLpgSQL engine.
>
> I'll mark this patch as ready for committers
>

looks so this patch has not entry in commitfestapp 2021-01

Regards

Pavel


> Regards
>
> Pavel
>
>
>
>>
>> Konstantin Knizhnik
>> Postgres Professional: http://www.postgrespro.com
>> The Russian Postgres Company
>>
>>


Re: On login trigger: take three

2020-12-25 Thread Pavel Stehule
Hi


Thank you.
> I have applied all your fixes in on_connect_event_trigger-12.patch.
>
> Concerning enable_client_connection_trigger GUC, I think that it is really
> useful: it is the fastest and simplest way to disable login triggers in case
> of some problems with them (not only for superuser itself, but for all
> users). Yes, it can be also done using "ALTER EVENT TRIGGER DISABLE".
> But assume that you have a lot of databases with different login policies
> enforced by on-login event triggers. And you want temporary disable them
> all, for example for testing purposes.
> In this case GUC is most convenient way to do it.
>
>
There was typo in patch

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f810789..8861f1b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1,4 +1,4 @@
-
+\

I have not any objections against functionality or design. I tested the
performance, and there are no negative impacts when this feature is not
used. There is significant overhead related to plpgsql runtime
initialization, but when this trigger will be used, then probably some
other PLpgSQL procedures and functions will be used too, and then this
overhead can be ignored.

* make without warnings
* make check-world passed
* doc build passed

Possible ToDo:

The documentation can contain a note so usage connect triggers in
environments with short life sessions and very short fast queries without
usage PLpgSQL functions or procedures can have negative impact on
performance due overhead of initialization of PLpgSQL engine.

I'll mark this patch as ready for committers

Regards

Pavel



>
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
diff --git a/doc/src/sgml/bki.sgml b/doc/src/sgml/bki.sgml
index 036a72c81e..bb62f25b2a 100644
--- a/doc/src/sgml/bki.sgml
+++ b/doc/src/sgml/bki.sgml
@@ -182,7 +182,7 @@
 { oid => '1', oid_symbol => 'TemplateDbOid',
   descr => 'database\'s default template',
   datname => 'template1', encoding => 'ENCODING', datcollate => 'LC_COLLATE',
-  datctype => 'LC_CTYPE', datistemplate => 't', datallowconn => 't',
+  datctype => 'LC_CTYPE', datistemplate => 't', datallowconn => 't', dathaslogontriggers => 'f',
   datconnlimit => '-1', datlastsysoid => '0', datfrozenxid => '0',
   datminmxid => '1', dattablespace => 'pg_default', datacl => '_null_' },
 
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3a2266526c..8100ff761a 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2944,6 +2944,17 @@ SCRAM-SHA-256$:&l
   
  
 
+ 
+  
+   dathaslogontriggers bool
+  
+  
+Indicates that there are client connection triggers defined for this database.
+This flag is used to avoid extra lookup of pg_event_trigger table on each backend startup.
+This flag is used internally by Postgres and should not be manually changed by DBA or application.
+  
+ 
+
  
   
datconnlimit int4
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4b60382778..c1914c5ceb 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -996,6 +996,24 @@ include_dir 'conf.d'
   
  
 
+ 
+  enable_client_connection_trigger (boolean)
+  
+   enable_client_connection_trigger configuration parameter
+  
+  
+  
+   
+Enables firing the client_connection
+trigger when a client connects. This parameter is switched on by default.
+Errors in trigger code can prevent user to login to the system.
+In this case disabling this parameter in connection string can solve the problem:
+psql "dbname=postgres options='-c enable_client_connection_trigger=false'".
+Only superuser can change this variable.
+   
+  
+ 
+
  
  
 
diff --git a/doc/src/sgml/ecpg.sgml b/doc/src/sgml/ecpg.sgml
index 0fef9bfcbe..1ecb8c1f45 100644
--- a/doc/src/sgml/ecpg.sgml
+++ b/doc/src/sgml/ecpg.sgml
@@ -4673,6 +4673,7 @@ datdba = 10 (type: 1)
 encoding = 0 (type: 5)
 datistemplate = t (type: 1)
 datallowconn = t (type: 1)
+dathaslogontriggers = f (type: 1)
 datconnlimit = -1 (type: 5)
 datlastsysoid = 11510 (type: 1)
 datfrozenxid = 379 (type: 1)
@@ -4698,6 +4699,7 @@ datdba = 10 (type: 1)
 encoding = 0 (type: 5)
 datistemplate = f (type: 1)
 datallowconn = t (type: 1)
+dathaslogontriggers = f (type: 1)
 datconnlimit = -1 (type: 5)
 datlastsysoid = 11510 (type: 1)
 datfrozenxid = 379 (type: 1)
diff --git a/doc/src/sgml/event-trigger.sgml b/doc/src/sgml/event-trigger.sgml
index 60366a950e..ae40a8e1a2 100644
--- a/doc/src/sgml/event-trigger.sgml
+++ b/doc/src/sgml/event-trigger.sgml
@@ -28,6 +28,7 @@
  An event trigger fires whenever the event with which it is associated
  occurs in the database in which it is defined. Currently, the only
  supported events are
+ client_connection,
  ddl_co

Re: Better client reporting for "immediate stop" shutdowns

2020-12-25 Thread Bharath Rupireddy
On Sat, Dec 26, 2020 at 4:33 AM Andres Freund  wrote:
> On 2020-12-21 16:43:33 -0500, Tom Lane wrote:
> > * I chose to report the same message for immediate shutdown as we
> > already use for SIGTERM (fast shutdown or pg_terminate_backend()).
> > Should it be different, and if so what?
>
> To do better I think we'd have to distinguish the different cases? An
> error message like
> "terminating connection due to {fast shutdown,immediate shutdown,connection 
> termination} administrator command"
> or such could be helpful, but I don't think your patch adds *quite*
> enough state?

Currently, for fast shutdown, the "FATAL:  terminating connection due
to administrator command" message is shown in server logs per backend.
The idea used for immediate shutdown can be extended to fast shutdown
as well, that is postmaster can set the signal state just before
signalling the backends with SIGTERM and later in ProcessInterrupts()
the status can be checked and report something like "FATAL:
terminating connection due to fast shutdown command".

And for smart shutdown, since the postmaster waits until the normal
backends to go away on their own and no FATAL messages get logged, so
we don't need to set the signal state.

> I'd like to not log all these repeated messages into the server
> log. It's quite annoying to have to digg through thousands of lines of
> repeated "terminating connection..." lines that add absolutely no
> additional information, just because I am shutting down the
> server. Similarly, trying to find the reason for a PANIC is often hard
> due to all the other messages.

Currently, only one "terminating connection due to "
message(WARNING for immediate shutdown, FATAL for fast shutdown) gets
logged in the server logs per backend, so the number of log messages
for each shutdown depends on the number of active backends plus other
bg workers if any. If we don't want to let each active backend to show
up these messages separately, then how about postmaster (as it anyways
knows what are the active backends it currently has) checking if all
the backends have exited properly and showing only one message,
something like "the active backends are terminated due to "?

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: proposal: schema variables

2020-12-25 Thread Pavel Stehule
so 26. 12. 2020 v 7:18 odesílatel Erik Rijkers  napsal:

> On 2020-12-26 05:52, Pavel Stehule wrote:
> > so 19. 12. 2020 v 7:57 odesílatel Pavel Stehule
> > 
> > napsal:
> > [schema-variables-20201222.patch.gz (~]
> >
> >> Hi
> >>
> >> only rebase
> >>
> >
> > rebase and comments fixes
> >
>
> Hi Pavel,
>
> This file is the exact same as the file you sent Tuesday. Is it a
> mistake?
>

It is the same. Unfortunately,  it was sent in an isolated thread, and
commitfest applications didn't find the latest version. I tried to attach
new thread to the commitfest application, but without success.


Re: proposal: schema variables

2020-12-25 Thread Erik Rijkers

On 2020-12-26 05:52, Pavel Stehule wrote:
so 19. 12. 2020 v 7:57 odesílatel Pavel Stehule 


napsal:
[schema-variables-20201222.patch.gz (~]


Hi

only rebase



rebase and comments fixes



Hi Pavel,

This file is the exact same as the file you sent Tuesday. Is it a 
mistake?









Re: Parallel Inserts in CREATE TABLE AS

2020-12-25 Thread Dilip Kumar
On Thu, Dec 24, 2020 at 1:07 PM Bharath Rupireddy
 wrote:
>
> On Thu, Dec 24, 2020 at 10:25 AM vignesh C  wrote:
> > You could change intoclause_len = strlen(intoclausestr) to
> > strlen(intoclausestr) + 1 and use intoclause_len in the remaining
> > places. We can avoid the +1 in the other places.
> > +   /* Estimate space for into clause for CTAS. */
> > +   if (IS_CTAS(intoclause) && OidIsValid(objectid))
> > +   {
> > +   intoclausestr = nodeToString(intoclause);
> > +   intoclause_len = strlen(intoclausestr);
> > +   shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 
> > 1);
> > +   shm_toc_estimate_keys(&pcxt->estimator, 1);
> > +   }
>
> Done.
>
> > Can we use  node->nworkers_launched == 0 in place of
> > node->need_to_scan_locally, that way the setting and resetting of
> > node->need_to_scan_locally can be removed. Unless need_to_scan_locally
> > is needed in any of the functions that gets called.
> > +   /* Enable leader to insert in case no parallel workers were 
> > launched. */
> > +   if (node->nworkers_launched == 0)
> > +   node->need_to_scan_locally = true;
> > +
> > +   /*
> > +* By now, for parallel workers (if launched any), would have
> > started their
> > +* work i.e. insertion to target table. In case the leader is 
> > chosen to
> > +* participate for parallel inserts in CTAS, then finish its
> > share before
> > +* going to wait for the parallel workers to finish.
> > +*/
> > +   if (node->need_to_scan_locally)
> > +   {
>
> need_to_scan_locally is being set in ExecGather() even if
> nworkers_launched > 0 it can still be true, so I think we can not
> remove need_to_scan_locally in ExecParallelInsertInCTAS.
>
> Attaching v15 patch set for further review. Note that the change is
> only in 0001 patch, other patches remain unchanged from v14.

I have reviewed part of v15-0001 patch, I have a few comments, I will
continue to review this.

1.

@@ -763,18 +763,34 @@ GetCurrentCommandId(bool used)
 /* this is global to a transaction, not subtransaction-local */
 if (used)
 {
-/*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader.  We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
- */
-Assert(!IsParallelWorker());
+ /*
+  * This is a temporary hack for all common parallel insert cases i.e.
+  * insert into, ctas, copy from. To be changed later. In a parallel
+  * worker, set currentCommandIdUsed to true only if it was not set to
+  * true at the start of the parallel operation (by way of
+  * SetCurrentCommandIdUsedForWorker()). We have to do this because
+  * GetCurrentCommandId(true) may be called from anywhere, especially
+  * for parallel inserts, within parallel worker.
+  */
+Assert(!(IsParallelWorker() && !currentCommandIdUsed));

Why is this temporary hack? and what is the plan for removing this hack?

2.
+/*
+ * ChooseParallelInsertsInCTAS --- determine whether or not parallel
+ * insertion is possible, if yes set the parallel insert state i.e. push down
+ * the dest receiver to the Gather nodes.
+ */
+void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
+{
+if (!IS_CTAS(into))
+return;

When will this hit?  The functtion name suggest that it is from CTAS
but now you have a check that if it is
not for CTAS then return,  can you add the comment that when do you
expect this case?

Also the function name should start in a new line
i.e
void
ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)

3.
+/*
+ * ChooseParallelInsertsInCTAS --- determine whether or not parallel
+ * insertion is possible, if yes set the parallel insert state i.e. push down
+ * the dest receiver to the Gather nodes.
+ */

Push down to the Gather nodes?  I think the right statement will be
push down below the Gather node.


4.
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
 DR_intorel *myState = (DR_intorel *) self;

+if (myState->is_parallel_worker)
+{
+/* In the worker */

+SetCurrentCommandIdUsedForWorker();
+myState->output_cid = GetCurrentCommandId(false);
+}
+else
 {
non-parallel worker code
}
}

I think instead of moving all the code related to non-parallel worker
in the else we can do better.  This
will avoid unnecessary code movement.

4.
intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
{
 DR_intorel *myState = (DR_intorel *) self;

-- Comment ->in parallel worker we don't need to crease dest recv blah blah
+if (myState->is_parallel_worker)
{
--parallel worker handling--
return;
 

Re: proposal: schema variables

2020-12-25 Thread Pavel Stehule
so 19. 12. 2020 v 7:57 odesílatel Pavel Stehule 
napsal:

> Hi
>
> only rebase
>

rebase and comments fixes

Regards

Pavel




> Regards
>
> Pavel
>


schema-variables-20201222.patch.gz
Description: application/gzip


RE: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently

2020-12-25 Thread tsunakawa.ta...@fujitsu.com
From: Michael Paquier 
> On Wed, Dec 09, 2020 at 09:52:17AM -0300, Alvaro Herrera wrote:
> > Well, that definition seems unfriendly to me.  I prefer the stance
> > that if you change the value for the parent, then future partitions
> > inherit that value.
> 
> That's indeed more interesting from the user perspective.  So +1 from me.

As I mentioned as below, some properties apply to that, and some don't.

--
That would be right when the storage property is an optional specification such 
as fillfactor.  For example, when I run ALTER TABLE mytable SET (fillfactor = 
70) and then CREATE TABLE mytable_p1 PARTITION OF mytable, I find it nice that 
the fillfactor os mytable_p1 is also 70 (but I won't complain if it isn't, 
since I can run ALTER TABLE SET on the parent table again.)

OTOH, CREATE TABLE and CREATE UNLOGGED TABLE is an explicit request to create a 
logged and unlogged relation respectively.  I feel it a strange? if CREATE 
TABLE mytable_p1 PARTITION OF mytable creates an unlogged partition.
--


Anyway, I think I'll group ALTER TABLE/INDEX altering actions based on some 
common factors and suggest what would be a desirable behavior, asking for 
opinions.  I'd like to explore the consensus on the basic policy for fixes.  
Then, I hope we will be able to work on fixes for each ALTER action in patches 
that can be released separately.  I'd like to regist requiring all fixes to be 
arranged at once, since that may become a high bar for those who volunteer to 
fix some of the actions.  (Even a committer Alvaro-san struggled to fix one 
action, ALTER TABLE REPLICA IDENTITY.)


Regards
Takayuki Tsunakawa






Re: SQL/JSON: functions

2020-12-25 Thread Zhihong Yu
For 0001-Common-SQL-JSON-clauses-v51.patch :

+   /*  | implementation_defined_JSON_representation_option (BSON, AVRO
etc) */

I don't find implementation_defined_JSON_representation_option in the
patchset. Maybe rephrase the above as a comment
without implementation_defined_JSON_representation_option ?

For getJsonEncodingConst(), should the method error out for the default
case of switch (encoding) ?

0002-SQL-JSON-constructors-v51.patch :

+   Assert(!OidIsValid(collation)); /* result is always an
json[b] type */

an json -> a json

+   /* XXX TEXTOID is default by standard */
+   returning->typid = JSONOID;

Comment doesn't seem to match the assignment.

For json_object_agg_transfn :

+   if (out->len > 2)
+   appendStringInfoString(out, ", ");

Why length needs to be at least 3 (instead of 2) ?

Cheers

On Fri, Dec 25, 2020 at 12:26 PM Nikita Glukhov 
wrote:

> On 17.09.2020 08:41, Michael Paquier wrote:
>
> On Sat, Jul 18, 2020 at 09:24:11AM -0400, Andrew Dunstan wrote:
>
> I think patches 5 and 6 need to be submitted to the next commitfest,
> This is far too much scope creep to be snuck in under the current CF item.
>
> I'll look at patches 1-4.
>
> Even with that, the patch set has been waiting on author for the last
> six weeks, so I am marking it as RwF for now.  Please feel free to
> resubmit.
>
> Attached 51st version of the patches rebased onto current master.
>
>
> There were some shift/reduce conflicts in SQL grammar that have appeared
> after "expr AS keyword" refactoring in 06a7c3154f.  I'm not sure if I resolved
> them correctly.  JSON TEXT pseudotype, introduced in #0006, caused a lot of
> grammar conflicts, so it was replaced with simple explicit pg_catalog.json.
>
> Also new CoercionForm COERCE_SQL_SYNTAX was introduced, and this reminds 
> custom
> function formats that I have used in earlier version of the patches for
> deparsing of SQL/JSON constructor expressions that were based on raw json[b]
> function calls.  These custom function formats were replaced in v43 with
> dedicated executor nodes for SQL/JSON constructors.  So, I'm not sure is it
> worth to try to replace back nodes with new COERCE_SQL_SYNTAX.
>
> --
> Nikita Glukhov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>


Re: Better client reporting for "immediate stop" shutdowns

2020-12-25 Thread Andres Freund
Hi,

On 2020-12-21 16:43:33 -0500, Tom Lane wrote:
> Up to now, if you shut down the database with "pg_ctl stop -m immediate"
> then clients get a scary message claiming that something has crashed,
> because backends can't tell whether the SIGQUIT they got was sent for
> a crash-and-restart situation or because of an immediate-stop command.

+many


> This isn't great from a fit-and-finish perspective, and it occurs to me
> that it's really easy to do better: the postmaster can stick a flag
> into shared memory explaining the reason for SIGQUIT.  While we don't
> like the postmaster touching shared memory, there doesn't seem to be
> any need for interlocking or anything like that, so there is no risk
> involved that's greater than those already taken by pmsignal.c.
> 
> So, here's a very simple proposed patch.  Some issues for possible
> bikeshedding:

> * Up to now, pmsignal.c has only been for child-to-postmaster
> communication, so maybe there is some better place to put the
> support code.  I can't think of one though.

Seems fine with me.


> * I chose to report the same message for immediate shutdown as we
> already use for SIGTERM (fast shutdown or pg_terminate_backend()).
> Should it be different, and if so what?

To do better I think we'd have to distinguish the different cases? An
error message like
"terminating connection due to {fast shutdown,immediate shutdown,connection 
termination} administrator command"
or such could be helpful, but I don't think your patch adds *quite*
enough state?


I'd like to not log all these repeated messages into the server
log. It's quite annoying to have to digg through thousands of lines of
repeated "terminating connection..." lines that add absolutely no
additional information, just because I am shutting down the
server. Similarly, trying to find the reason for a PANIC is often hard
due to all the other messages.

Greetings,

Andres Freund




Re: Logical decoding without slots: decoding in lockstep with recovery

2020-12-25 Thread Andres Freund
Hi,

On 2020-12-23 14:56:07 +0800, Craig Ringer wrote:
> I want to share an idea I've looked at a few times where I've run into
> situations where logical slots were inadvertently dropped, or where it
> became necessary to decode changes in the past on a slot.
> 
> As most of you will know you can't just create a logical slot in the past.
> Even if it was permitted, it'd be unsafe due to catalog_xmin retention
> requirements and missing WAL.
> 
> But if we can arrange a physical replica to replay the WAL of interest and
> decode each commit as soon as it's replayed by the startup process, we know
> the needed catalog rows must all exist, so it's safe to decode the change.
> 
> So it should be feasible to run logical decoding in standby, even without a
> replication slot, so long as we:
> 
> * pause startup process after each xl_xact_commit
> * wake the walsender running logical decoding
> * decode and process until ReorderBufferCommit for the just-committed xact
> returns
> * wake the startup process to decode the up to the next commit

I don't think it's safe to just do this for each xl_xact_commit - we can
remove needed rows at quite a few places, not just around transaction
commit. Rows needed to correctly decode rows earlier in the transaction
might not be available by the time the commit record was logged.

I think you'd basically have to run logical decoding in lockstep with
WAL replay, i.e. replay one record, then call logical decoding for that
record, replay the next record, ...


> Can anyone see any obvious problem with this?

The patch for logical decoding on the standby
https://postgr.es/m/20181212204154.nsxf3gzqv3gesl32%40alap3.anarazel.de
should provide some of the infrastructure to do this properly. Should
really commit it. /me adds an entry to the top of the todo list.

Greetings,

Andres Freund




Re: [HACKERS] [PATCH] Generic type subscripting

2020-12-25 Thread Dmitry Dolgov
> On Tue, Dec 22, 2020 at 02:21:22PM -0500, Tom Lane wrote:
> Pavel Stehule  writes:
> > But maybe we try to design some that are designed already. Is there some
> > info about index specification in SQL/JSON?
>
> We do have precedent for this, it's the rules about resolving argument
> types for overloaded functions.  But the conclusion that that precedent
> leads to is that we should check whether the subscript expression can
> be *implicitly* coerced to either integer or text, and fail if neither
> coercion or both coercions succeed.  I'd be okay with that from a system
> design standpoint, but it's hard to say without trying it whether it
> will work out nicely from a usability standpoint.  In a quick trial
> it seems it might be okay:
>
> regression=# create function mysub(int) returns text language sql
> regression-# as $$select 'int'$$;
> CREATE FUNCTION
> regression=# create function mysub(text) returns text language sql
> as $$select 'text'$$;
> CREATE FUNCTION
> regression=# select mysub(42);
>  mysub
> ---
>  int
> (1 row)
>
> regression=# select mysub('foo');
>  mysub
> ---
>  text
> (1 row)
>
> regression=# select mysub(42::bigint);
> ERROR:  function mysub(bigint) does not exist

I'm not sure I completely follow and can't immediately find the relevant
code for overloaded functions, so I need to do a perception check.
Following this design in jsonb_subscripting_transform we try to coerce
the subscription expression to both integer and text (and maybe even to
jsonpath), and based on the result of which coercion has succeeded chose
different logic to handle it, right?

And just for me to understand. In the above example of the overloaded
function, with the integer we can coerce it only to text (since original
type of the expression is integer), and with the bigint it could be
coerced both to integer and text, that's why failure, isn't?




Pre-allocating WAL files

2020-12-25 Thread Andres Freund
Hi,

When running write heavy transactional workloads I've many times
observed that one needs to run the benchmarks for quite a while till
they get to their steady state performance. The most significant reason
for that is that initially WAL files will not get recycled, but need to
be freshly initialized. That's 16MB of writes that need to synchronously
finish before a small write transaction can even start to be written
out...

I think there's two useful things we could do:

1) Add pg_wal_preallocate(uint64 bytes) that ensures (bytes +
   segment_size - 1) / segment_size WAL segments exist from the current
   point in the WAL. Perhaps with the number of bytes defaulting to
   min_wal_size if not explicitly specified?

2) Have checkpointer (we want walwriter to run with low latency to flush
   out async commits etc) occasionally check if WAL files need to be
   pre-allocated.

   Checkpointer already tracks the amount of WAL that's expected to be
   generated till the end of the checkpoint, so it seems like it's a
   pretty good candidate to do so.

   To keep checkpointer pre-allocating when idle we could signal it
   whenever a record has crossed a segment boundary.


With a plain pgbench run I see a 2.5x reduction in throughput in the
periods where we initialize WAL files.

Greetings,

Andres Freund




Re: Temporary tables versus wraparound... again

2020-12-25 Thread Noah Misch
On Tue, Nov 10, 2020 at 04:10:57PM +0900, Masahiko Sawada wrote:
> On Mon, Nov 9, 2020 at 3:23 PM Greg Stark  wrote:
> > On Mon, 9 Nov 2020 at 00:17, Noah Misch  wrote:

> > > > @@ -3340,6 +3383,7 @@ heap_truncate_one_rel(Relation rel)
> > > >
> > > >   /* Truncate the underlying relation */
> > > >   table_relation_nontransactional_truncate(rel);
> > > > + ResetVacStats(rel);
> > >
> > > I didn't test, but I expect this will cause a stats reset for the second
> > > TRUNCATE here:
> > >
> > > CREATE TABLE t ();
> > > ...
> > > BEGIN;
> > > TRUNCATE t;
> > > TRUNCATE t;  -- inplace relfrozenxid reset
> > > ROLLBACK;  -- inplace reset survives
> > >
> > > Does that indeed happen?
> >
> > Apparently no, see below.  I have to say I was pretty puzzled by the
> > actual behaviour which is that the rollback actually does roll back
> > the inplace update. But I *think* what is happening is that the first
> > truncate does an MVCC update so the inplace update happens only to the
> > newly created tuple which is never commited.
> 
> I think in-plase update that the patch introduces is not used because
> TRUNCATE doesn't use heap_truncate_one_rel() to truncate a table in
> that scenario. It does MVCC update the pg_class tuple for a new
> relfilenode with new relfrozenxid and other stats, see
> RelationSetNewRelfilenode(). If we create and truncate a table within
> the transaction it does in-place update that the patch introduces but
> I think it's no problem in this case either.

Agreed.  Rolling back a heap_truncate_one_rel() always implies rolling back to
an earlier version of the entire pg_class tuple.  (That may not be true of
mapped relations, but truncating them is unreasonable.)  Thanks for checking.

> > Thinking about things a bit this does worry me a bit. I wonder if
> > inplace update is really safe outside of vacuum where we know we're
> > not in a transaction that can be rolled back. But IIRC doing a
> > non-inplace update on pg_class for these columns breaks other things.
> > I don't know if that's still true.
> 
> heap_truncate_one_rel() is not a transaction-safe operation. Doing
> in-place updates during that operation seems okay to me unless I'm
> missing something.

Yep.




Re: pgsql: Add key management system

2020-12-25 Thread Bruce Momjian
On Fri, Dec 25, 2020 at 07:30:01PM +0100, Erik Rijkers wrote:
> On 2020-12-25 16:19, Bruce Momjian wrote:
> 
> > Add key management system
> > doc/src/sgml/database-encryption.sgml |  97 +
> 
> Attached are a few typos.
> 
> I also noticed that this option does not occur in the initdb --help:
> 
>   -u  --copy-encryption-keys
> 
> Was that deliberate?

No.  :-(  Attached patch applied.  Thanks.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

diff --git a/doc/src/sgml/database-encryption.sgml b/doc/src/sgml/database-encryption.sgml
index f938c9f574..82bc137a61 100644
--- a/doc/src/sgml/database-encryption.sgml
+++ b/doc/src/sgml/database-encryption.sgml
@@ -13,7 +13,7 @@
   log from being able to access the data stored in those files.
   For example, when using cluster file encryption, users who have read
   access to the cluster directories for backup purposes will not be able
-  to decrypt the data stored in the these files.
+  to decrypt the data stored in these files.
  
 
  
@@ -24,7 +24,7 @@
   Key one is used to encrypt write-ahead log (WAL) files.  Two different
   keys are used so that primary and standby servers can use different zero
   (heap/index/temp) keys, but the same one (WAL) key, so that these keys
-  can eventually be rotated by switching the primary to the standby as
+  can eventually be rotated by switching the primary to the standby
   and then changing the WAL key.
  
 
@@ -68,7 +68,7 @@ initdb -D dbname --cluster-key-command='ckey_passphrase.sh'
During the initdb process, if
--cluster-key-command is specified, two data-level
encryption keys are created.   These two keys are then encrypted with
-   the key enryption key (KEK) supplied by the cluster key command before
+   the key encryption key (KEK) supplied by the cluster key command before
being stored in the database directory.  The key or passphrase that
derives the key must be supplied from the terminal or stored in a
trusted key store, such as key vault software, hardware security module.
@@ -87,7 +87,7 @@ initdb -D dbname --cluster-key-command='ckey_passphrase.sh'
   
 
   
-   The data encryption keys are randomly generated and are of 128, 192,
+   The data encryption keys are randomly generated and are 128, 192,
or 256-bits in length.  They are encrypted by the key encryption key
(KEK) using Advanced Encryption Standard (AES256)
encryption in Galois/Counter Mode (GCM), which also
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 92594772f6..4d07ce6e3f 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2326,6 +2326,8 @@ usage(const char *progname)
 	printf(_("  -R, --authprompt  prompt for a passphrase or PIN\n"));
 	printf(_("  -s, --showshow internal settings\n"));
 	printf(_("  -S, --sync-only   only sync data directory\n"));
+	printf(_("  -u, --copy-encryption-keys=DATADIR\n"
+			 "copy the file encryption key from another cluster\n"));
 	printf(_("\nOther options:\n"));
 	printf(_("  -V, --version output version information, then exit\n"));
 	printf(_("  -?, --helpshow this help, then exit\n"));


Re: pgsql: Add key management system

2020-12-25 Thread Erik Rijkers

On 2020-12-25 16:19, Bruce Momjian wrote:


Add key management system
doc/src/sgml/database-encryption.sgml |  97 +


Attached are a few typos.

I also noticed that this option does not occur in the initdb --help:

  -u  --copy-encryption-keys

Was that deliberate?


Thanks,

Erik Rijkers






--- ./doc/src/sgml/database-encryption.sgml.orig	2020-12-25 19:11:55.809303009 +0100
+++ ./doc/src/sgml/database-encryption.sgml	2020-12-25 19:22:22.558936395 +0100
@@ -13,7 +13,7 @@
   log from being able to access the data stored in those files.
   For example, when using cluster file encryption, users who have read
   access to the cluster directories for backup purposes will not be able
-  to decrypt the data stored in the these files.
+  to decrypt the data stored in these files.
  
 
  
@@ -24,7 +24,7 @@
   Key one is used to encrypt write-ahead log (WAL) files.  Two different
   keys are used so that primary and standby servers can use different zero
   (heap/index/temp) keys, but the same one (WAL) key, so that these keys
-  can eventually be rotated by switching the primary to the standby as
+  can eventually be rotated by switching the primary to the standby
   and then changing the WAL key.
  
 
@@ -68,7 +68,7 @@
During the initdb process, if
--cluster-key-command is specified, two data-level
encryption keys are created.   These two keys are then encrypted with
-   the key enryption key (KEK) supplied by the cluster key command before
+   the key encryption key (KEK) supplied by the cluster key command before
being stored in the database directory.  The key or passphrase that
derives the key must be supplied from the terminal or stored in a
trusted key store, such as key vault software, hardware security module.
@@ -87,7 +87,7 @@
   
 
   
-   The data encryption keys are randomly generated and are of 128, 192,
+   The data encryption keys are randomly generated and are 128, 192,
or 256-bits in length.  They are encrypted by the key encryption key
(KEK) using Advanced Encryption Standard (AES256)
encryption in Galois/Counter Mode (GCM), which also


Re: A failure of standby to follow timeline switch

2020-12-25 Thread Fujii Masao




On 2020/12/25 12:03, Kyotaro Horiguchi wrote:

Thank you for looking this.

At Thu, 24 Dec 2020 15:33:04 +0900, Fujii Masao  
wrote in

When I applied two patches in the master branch and
ran "make check-world", I got the following error.

== creating database "contrib_regression" ==
# Looks like you planned 37 tests but ran 36.
# Looks like your test exited with 255 just after 36.
t/001_stream_rep.pl ..
Dubious, test returned 255 (wstat 65280, 0xff00)
Failed 1/37 subtests
...
Test Summary Report
---
t/001_stream_rep.pl(Wstat: 65280 Tests: 36 Failed: 0)
   Non-zero exit status: 255
   Parse errors: Bad plan.  You planned 37 tests but ran 36.
Files=21, Tests=239, 302 wallclock secs ( 0.10 usr 0.05 sys + 41.69
cusr 39.84 csys = 81.68 CPU)
Result: FAIL
make[2]: *** [check] Error 1
make[1]: *** [check-recovery-recurse] Error 2
make[1]: *** Waiting for unfinished jobs
t/070_dropuser.pl . ok


Mmm. I retried that and saw it succeed (with 0002 applied).

If I modified "user Test::More tests => 37" to 38 in the perl file, I
got a similar result.


What happens if you run make check-world with -j 4? When I ran that,
the test failed. But with -j 1, the test finished with success. I'm not sure
why this happened, though..

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Commit fest manager for 2021-01

2020-12-25 Thread Masahiko Sawada
On Fri, Dec 25, 2020 at 8:20 PM Magnus Hagander  wrote:
>
>
> On Fri, Dec 25, 2020 at 8:48 AM Michael Paquier  wrote:
>>
>> On Fri, Dec 25, 2020 at 04:35:30PM +0900, Masahiko Sawada wrote:
>> > Hmm, on the left of the logout button, I can see only the 'edit
>> > profile' button and 'Activity log' button.
>>
>> Maybe that's a cache issue with your browser?  Magnus, any ideas?
>>
>> I cannot control the permissions of the app, but if that proves to be
>> necessary I am fine to switch the CF status.  Just let me know if you
>> want me to do so when the time comes.
>
>
> Ugh, that was me adding just half the permissions required :/ One should add 
> both the "can have some permissions at all" and the "member of the admins 
> group". I only did the second part :)
>
> Fixed now, sorry about that!

Thank you! I now can see the administration button.

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/




Re: Logical decoding without slots: decoding in lockstep with recovery

2020-12-25 Thread Craig Ringer
On Wed, 23 Dec 2020, 18:57 Amit Kapila,  wrote:

> On Wed, Dec 23, 2020 at 12:26 PM Craig Ringer
>  wrote:
> >
> > Hi all
> >
> > I want to share an idea I've looked at a few times where I've run into
> situations where logical slots were inadvertently dropped, or where it
> became necessary to decode changes in the past on a slot.
> >
> > As most of you will know you can't just create a logical slot in the
> past. Even if it was permitted, it'd be unsafe due to catalog_xmin
> retention requirements and missing WAL.
> >
> > But if we can arrange a physical replica to replay the WAL of interest
> and decode each commit as soon as it's replayed by the startup process, we
> know the needed catalog rows must all exist, so it's safe to decode the
> change.
> >
> > So it should be feasible to run logical decoding in standby, even
> without a replication slot, so long as we:
> >
> > * pause startup process after each xl_xact_commit
> > * wake the walsender running logical decoding
> > * decode and process until ReorderBufferCommit for the just-committed
> xact returns
> > * wake the startup process to decode the up to the next commit
> >
>
> How will you deal with subscriber restart?  I think you need some way
> to remember confirmed_flush_lsn and restart_lsn and then need to teach
> WAL machinery to restart from some previous point.
>

The simplest option, albeit slow, would be to require the subscriber to
confirm flush before allowing more WAL redo. That's what I'd initially
assumed. This is a bit of a corner case situation after all, it's never
going to be fast with the switching back and forth.

More efficient would be to decode and output to a local spool file and
fsync it. Then separately send that to the subscriber, like has been
discussed in other work on more efficient logical decoding. So the output
plugin output would go to a local spool.

That can be implemented with pg_recvlogical if needed.


> --
> With Regards,
> Amit Kapila.
>


Re: Add session statistics to pg_stat_database

2020-12-25 Thread Masahiro Ikeda

Hi,

As a user, I want this feature to know whether
clients' session activities are as expected.

I have some comments about the patch.


1. pg_proc.dat

The unit of "session time" and so on says "in seconds".
But, is "in milliseconds" right?


2. monitoring.sgml

IIUC, "active_time" includes the time executes a fast-path function and
"idle in transaction" includes "idle in transaction(aborted)" time.

Why don't you reference pg_stat_activity's "state" column and
"active_time" is the total time when the state is "active" and "fast 
path"?

"idle in transaction" is as same too.


3. pgstat.h

The comment of PgStat_MsgConn says "Sent by pgstat_connection".
I thought "pgstat_connection" is a function, but it doesn't exist.

Is "Sent by the backend" right?

Although this is a trivial thing, the following row has too many tabs.
Other structs have only one space.
// }Pgstat_MsgConn;



Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




RE: Disable WAL logging to speed up data loading

2020-12-25 Thread osumi.takami...@fujitsu.com
Hi, Michael


Thank you for your attention to this thread.
On Friday, December 25, 2020 4:09 PM  Michael Paquier  
wrote:
> On Thu, Dec 03, 2020 at 03:52:47AM +, tsunakawa.ta...@fujitsu.com
> wrote:
> > The code looks good, and the performance seems to be nice, so I marked
> > this ready for committer.
> 
> FWIW, I am extremely afraid of this proposal because this is basically a
> footgun able to corrupt customer instances, and I am ready to bet that people
> would switch wal_level to none because they see a lot of benefits in doing so
> at first sight, until the host running the server is plugged off and they 
> need to
> use pg_resetwal in urgency to bring an instance online. 
In the patch, I added plenty of descriptions (especially cautions) to the 
documents.
At least, when users notice the existence of "none" parameter for wal_level,
they cannot avoid having a look at such cautions.
Accordingly, it's really impossible that they see only the merits.

In terms of pg_resetwal,
it should not happen that they use it to utilize the server
corrupted by failure of any operation during wal_level=none.

I documented clearly
this wal_level is designed to make
the server never start up again when any unexpected crash is detected
and thus users need to recreate the whole cluster again.

> Users have already
> the option to make things go bad, just by disabling full page writes or fsync,
> but I really don't think that we should put in their hands more options able 
> to
> break instances, nor should we try to spend more efforts in having more
> "protections" that would trigger only once the instance is already fried.
> 
> Perhaps this is something that Horiguchi-san pointed out upthread in [1]
> (last sentence of first paragraph), but did you consider that it is already
> possible to do bulk-loading with a minimal amount of WAL generated as long
> as you do the COPY within the transaction that created the table?  Quoting
> the docs in [2]:
> "COPY is fastest when used within the same transaction as an earlier
> CREATE TABLE or TRUNCATE command. In such cases no WAL needs to be
> written, because in case of an error, the files containing the newly loaded 
> data
> will be removed anyway. However, this consideration only applies when
> wal_level is minimal as all commands must write WAL otherwise."
> 
> Upgrade scenarios have been mentioned in this case as being a pain when it
> comes to take advantage of this optimization.  Wouldn't it be safer if we took
> a client approach instead, where restores are able to load the data with a
> cheap succession of commands by loading the data in a transaction done
> after a TRUNCATE?
In a data warehouse environment,
the environment of our scenario in this thread in mind,
I think that to truncate target table
beforehand in the same transaction is not always realistic solution.
Imagine an appended data loading case to execute bulk data load
into one specific table with a bunch of records. 

Best Regards,
Takamichi Osumi




Re: Commit fest manager for 2021-01

2020-12-25 Thread Magnus Hagander
On Fri, Dec 25, 2020 at 8:48 AM Michael Paquier  wrote:

> On Fri, Dec 25, 2020 at 04:35:30PM +0900, Masahiko Sawada wrote:
> > Hmm, on the left of the logout button, I can see only the 'edit
> > profile' button and 'Activity log' button.
>
> Maybe that's a cache issue with your browser?  Magnus, any ideas?
>
> I cannot control the permissions of the app, but if that proves to be
> necessary I am fine to switch the CF status.  Just let me know if you
> want me to do so when the time comes.
>

Ugh, that was me adding just half the permissions required :/ One should
add both the "can have some permissions at all" and the "member of the
admins group". I only did the second part :)

Fixed now, sorry about that!

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: About to add WAL write/fsync statistics to pg_stat_wal view

2020-12-25 Thread Masahiro Ikeda

Hi,

I rebased the patch to the master branch.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATIONdiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4b60382778..45d54cd394 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7388,6 +7388,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  track_wal_io_timing (boolean)
+  
+   track_wal_io_timing configuration parameter
+  
+  
+  
+   
+Enables timing of WAL I/O calls. This parameter is off by default,
+because it will repeatedly query the operating system for
+the current time, which may cause significant overhead on some
+platforms.  You can use the  tool to
+measure the overhead of timing on your system.
+I/O timing information is
+displayed in 
+pg_stat_wal.  Only superusers can
+change this setting.
+   
+  
+ 
+
  
   track_functions (enum)
   
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3d6c901306..1a5aad819f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3479,7 +3479,51 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
wal_buffers_full bigint
   
   
-   Number of times WAL data was written to disk because WAL buffers became full
+   Total number of times WAL data was written to disk because WAL buffers became full
+  
+ 
+
+ 
+  
+   wal_write bigint
+  
+  
+   Total number of times WAL data was written to disk
+  
+ 
+
+ 
+  
+   wal_write_time double precision
+  
+  
+   Total amount of time that has been spent in the portion of
+   WAL data was written to disk, in milliseconds
+   (if  is enabled, otherwise zero).
+   To avoid standby server's performance degradation, they don't collect
+   this statistics
+  
+ 
+
+ 
+  
+   wal_sync bigint
+  
+  
+   Total number of times WAL data was synced to disk
+  
+ 
+
+ 
+  
+   wal_sync_time double precision
+  
+  
+   Total amount of time that has been spent in the portion of
+   WAL data was synced to disk, in milliseconds
+   (if  is enabled, otherwise zero).
+   To avoid standby server's performance degradation, they don't collect
+   this statistics
   
  
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9867e1b403..19d101647e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -109,6 +109,7 @@ int			CommitDelay = 0;	/* precommit delay in microseconds */
 int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
 int			wal_retrieve_retry_interval = 5000;
 int			max_slot_wal_keep_size_mb = -1;
+bool		track_wal_io_timing = false;
 
 #ifdef WAL_DEBUG
 bool		XLOG_DEBUG = false;
@@ -980,6 +981,8 @@ static void WALInsertLockAcquireExclusive(void);
 static void WALInsertLockRelease(void);
 static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
 
+static bool fsyncMethodCalled();
+
 /*
  * Insert an XLOG record represented by an already-constructed chain of data
  * chunks.  This is a low-level routine; to construct the WAL record header
@@ -2538,6 +2541,8 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 			Size		nbytes;
 			Size		nleft;
 			int			written;
+			instr_time	start;
+			instr_time	duration;
 
 			/* OK to write the page(s) */
 			from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
@@ -2546,9 +2551,24 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 			do
 			{
 errno = 0;
+
+/* Measure i/o timing to write WAL data */
+if (track_wal_io_timing)
+	INSTR_TIME_SET_CURRENT(start);
+
 pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
 written = pg_pwrite(openLogFile, from, nleft, startoffset);
 pgstat_report_wait_end();
+
+if (track_wal_io_timing)
+{
+	INSTR_TIME_SET_CURRENT(duration);
+	INSTR_TIME_SUBTRACT(duration, start);
+	WalStats.m_wal_write_time += INSTR_TIME_GET_MILLISEC(duration);
+}
+
+WalStats.m_wal_write++;
+
 if (written <= 0)
 {
 	char		xlogfname[MAXFNAMELEN];
@@ -10489,6 +10509,27 @@ assign_xlog_sync_method(int new_sync_method, void *extra)
 	}
 }
 
+/*
+ * Check if fsync mothod is called.
+ */
+static bool
+fsyncMethodCalled()
+{
+	if (!enableFsync)
+		return false;
+
+	switch (sync_method)
+	{
+		case SYNC_METHOD_FSYNC:
+		case SYNC_METHOD_FSYNC_WRITETHROUGH:
+		case SYNC_METHOD_FDATASYNC:
+			return true;
+		default:
+			/* others don't have a specific fsync method */
+			return false;
+	}
+}
+
 
 /*
  * Issue appropriate kind of fsync (if any) for an XLOG output file.
@@ -10499,8 +10540,19 @@ assign_xlog_sync_method(int new_sync_method, void *extra)
 void
 issue_xlog_fsync(int fd, XLogSegNo segno)
 {
+	instr_time	

Re: Get memory contexts of an arbitrary backend process

2020-12-25 Thread Kasahara Tatsuhito
Hi,

On Thu, Dec 10, 2020 at 10:48 AM torikoshia  wrote:
>
> On 2020-12-04 19:16, torikoshia wrote:
> > On 2020-12-03 10:36, Tom Lane wrote:
> >> Fujii Masao  writes:
> >>> I'm starting to study how this feature behaves. At first, when I
> >>> executed
> >>> the following query, the function never returned. ISTM that since
> >>> the autovacuum launcher cannot respond to the request of memory
> >>> contexts dump, the function keeps waiting infinity. Is this a bug?
> >>> Probably we should exclude non-backend proceses from the target
> >>> processes to dump? Sorry if this was already discussed.
> >>
> >>>  SELECT pg_get_backend_memory_contexts(pid) FROM
> >>> pg_stat_activity;
> >
> > Thanks for trying it!
> >
> > It was not discussed explicitly, and I was going to do it later
> > as commented.
> >
> >> +/* TODO: Check also whether backend or not. */
> >
> >>
> >> FWIW, I think this patch is fundamentally unsafe.  It's got a
> >> lot of the same problems that I complained about w.r.t. the
> >> nearby proposal to allow cross-backend stack trace dumping.
> >> It does avoid the trap of thinking that it can do work in
> >> a signal handler, but instead it supposes that it can do
> >> work involving very high-level objects such as shared hash tables
> >> in anyplace that might execute CHECK_FOR_INTERRUPTS.  That's
> >> never going to be safe: the only real expectation the system
> >> has is that CHECK_FOR_INTERRUPTS is called at places where our
> >> state is sane enough that a transaction abort can clean up.
> >> Trying to do things like taking LWLocks is going to lead to
> >> deadlocks or worse.  We need not even get into the hard questions,
> >> such as what happens when one process or the other exits
> >> unexpectedly.
> >
> > Thanks for reviewing!
> >
> > I may misunderstand something, but the dumper works not at
> > CHECK_FOR_INTERRUPTS but during the client read, i.e.,
> > ProcessClientReadInterrupt().
> >
> > Is it also unsafe?
> >
> >
> > BTW, since there was a comment that the shared hash table
> > used too much memory, I'm now rewriting this patch not to use
> > the shared hash table but a simpler static shared memory struct.
>
> Attached a rewritten patch.
Thanks for updating patch.

But when I had applyed the patch to the current HEAD and did make, I
got an error due to duplicate OIDs.
You need to rebase the patch.

> Accordingly, I also slightly modified the basic design as below.
>
> ---
> # Communication flow between the dumper and the requester
> - (1) When requesting memory context dumping, the dumper changes
> the struct on the shared memory state from 'ACCEPTABLE' to
> 'REQUESTING'.
> - (2) The dumper sends the signal to the dumper process and wait on
> the latch.
> - (3) When the dumper noticed the signal, it changes the state to
> 'DUMPING'.
> - (4) When the dumper completes dumping, it changes the state to
> 'DONE' and set the latch.
> - (5) The requestor reads the dump file and shows it to the user.
> Finally, the requestor removes the dump file and reset the shared
> memory state to 'ACCEPTABLE'.
>
> # Query cancellation
> - When the requestor cancels dumping, e.g. signaling using ctrl-C,
> the requestor changes the state of the shared memory to 'CANCELING'.
> - The dumper checks the state when it tries to change the state to
> 'DONE' at (4), and if the state is 'CANCELING', it initializes the
> dump file and reset the shared memory state to 'ACCEPTABLE'.
>
> # Cleanup dump file and the shared memory
> - In the normal case, the dumper removes the dump file and resets
> the shared memory entry as described in (5).
> - When something like query cancellation or process termination
> happens on the dumper after (1) and before (3), in other words,
> the state is 'REQUESTING', the requestor does the cleanup.
> - When something happens on the dumper or the requestor after (3)
> and before (4), in other words, the state is 'DUMPING', the dumper
> does the cleanup. Specifically, if the requestor cancels the query,
> it just changes the state to 'CANCELING' and the dumper notices it
> and cleans up things later.
> OTOH, when the dumper fails to dump, it cleans up the dump file and
> reset the shared memory state.
> - When something happens on the requestor after (4), i.e., the state
> is 'DONE', the requestor does the cleanup.
> - In the case of receiving SIGKILL or power failure, all dump files
> are removed in the crash recovery process.
> ---
If the dumper is terminated before it dumps, the requestor will appear
to enter an
infinite loop because the status of mcxtdumpShmem will not change.
The following are the steps to reproduce.

 - session1
   BEGIN; LOCK TABLE t;
   - session2
 SELECT * FROM t; -- wait
 - session3
   select pg_get_target_backend_memory_contexts(); -- wait
 - session1
   select pg_terminate_backend(); -- kill session2
 - session3 waits forever.

Therefore, you may need to set mcxtdumpShmem->dump_status to
MCXTDUMPSTATUS_CANCELING
or other statu

pg_waldump: Limit the number of lines shown at the start

2020-12-25 Thread Kyotaro Horiguchi
Hello.

I'm occasionally annoyed by the behavior of pg_waldump that it shows
all records in the specified segment before entering waiting state.

tail -f shows only the last few dozen lines at the start.  This
behavior I think is useful also for pg_waldump -f.

It was as simple than expected as attached.

Any thoughts, opinions?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 31e99c2a6d..dacdb600bf 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -72,6 +72,8 @@ typedef struct XLogDumpStats
 
 #define fatal_error(...) do { pg_log_fatal(__VA_ARGS__); exit(EXIT_FAILURE); } while(0)
 
+#define FOLLOW_START_LINES 20
+
 static void
 print_rmgr_list(void)
 {
@@ -752,6 +754,10 @@ main(int argc, char **argv)
 	XLogRecPtr	first_record;
 	char	   *waldir = NULL;
 	char	   *errormsg;
+	XLogRecPtr	last_recs[FOLLOW_START_LINES];
+	int			last_idx = 0;
+	bool		storing;
+	bool		rounded;
 
 	static struct option long_options[] = {
 		{"bkp-details", no_argument, NULL, 'b'},
@@ -1065,10 +1071,41 @@ main(int argc, char **argv)
 			   (uint32) (first_record >> 32), (uint32) first_record,
 			   (uint32) (first_record - private.startptr));
 
+	storing = config.follow;
+	rounded = false;
+
 	for (;;)
 	{
 		/* try to read the next record */
 		record = XLogReadRecord(xlogreader_state, &errormsg);
+
+		if (storing)
+		{
+			if (record)
+			{
+/* remember LSNs of the last FOLLOW_START_LINES records  */
+last_recs[last_idx++] = xlogreader_state->ReadRecPtr;
+last_idx = last_idx % FOLLOW_START_LINES;
+
+if (last_idx == 0)
+	rounded = true;
+			}
+			else
+			{
+XLogRecPtr startPtr;
+
+/* find the first record to show */
+if (!rounded)
+	startPtr = last_recs[0];
+else
+	startPtr = last_recs[last_idx];
+
+XLogFindNextRecord(xlogreader_state, startPtr);
+storing = false;
+			}
+			continue;
+		}
+
 		if (!record)
 		{
 			if (!config.follow || private.endptr_reached)


Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-12-25 Thread Andy Fan
On Tue, Aug 4, 2020 at 7:49 AM Alvaro Herrera 
wrote:

> I've been working on the ability to detach a partition from a
> partitioned table, without causing blockages to concurrent activity.
> I think this operation is critical for some use cases.
>
>
This would be a very great feature.  When we can't handle thousands of
partitions
very well, and user agree to detach some old partitions automatically, the
blocking
issue here would be a big blocker for this solution. Thanks for working on
this!



> There was a lot of great discussion which ended up in Robert completing
> a much sought implementation of non-blocking ATTACH.  DETACH was
> discussed too because it was a goal initially, but eventually dropped
> from that patch altogether. Nonetheless, that thread provided a lot of
> useful input to this implementation.  Important ones:
>
> [1]
> https://postgr.es/m/CA+TgmoYg4x7AH=_QSptvuBKf+3hUdiCa4frPkt+RvXZyjX1n=w...@mail.gmail.com
> [2]
> https://postgr.es/m/CA+TgmoaAjkTibkEr=xJg3ndbRsHHSiYi2SJgX69MVosj=lj...@mail.gmail.com
> [3]
> https://postgr.es/m/CA+TgmoY13KQZF-=hntrt9uywyx3_oyoqpu9iont49jggidp...@mail.gmail.com
>
> Attached is a patch that implements
> ALTER TABLE ... DETACH PARTITION .. CONCURRENTLY.
>
> In the previous thread we were able to implement the concurrent model
> without the extra keyword.  For this one I think that won't work; my
> implementation works in two transactions so there's a restriction that
> you can't run it in a transaction block.  Also, there's a wait phase
> that makes it slower than the non-concurrent one.  Those two drawbacks
> make me think that it's better to keep both modes available, just like
> we offer both CREATE INDEX and CREATE INDEX CONCURRENTLY.
>
> Why two transactions?  The reason is that in order for this to work, we
> make a catalog change (mark it detached), and commit so that all
> concurrent transactions can see the change.  A second transaction waits
> for anybody who holds any lock on the partitioned table and grabs Access
> Exclusive on the partition (which now no one cares about, if they're
> looking at the partitioned table), where the DDL action on the partition
> can be completed.
>
> ALTER TABLE is normally unable to run in two transactions.  I hacked it
> (0001) so that the relation can be closed and reopened in the Exec phase
> (by having the rel as part of AlteredTableInfo: when ATRewriteCatalogs
> returns, it uses that pointer to close the rel).  It turns out that this
> is sufficient to make that work.  This means that ALTER TABLE DETACH
> CONCURRENTLY cannot work as part of a multi-command ALTER TABLE, but
> that's alreay enforced by the grammar anyway.
>
> DETACH CONCURRENTLY doesn't work if a default partition exists.  It's
> just too problematic a case; you would still need to have AEL on the
> default partition.
>
>
> I haven't yet experimented with queries running in a standby in tandem
> with a detach.
>
> --
> Álvaro Herrera
>


-- 
Best Regards
Andy Fan