Re: [PATCH] pg_permissions

2021-03-25 Thread Joel Jacobson
On Thu, Mar 25, 2021, at 17:51, Tom Lane wrote:
> "Joel Jacobson" mailto:joel%40compiler.org>> writes:
> > On Thu, Mar 25, 2021, at 16:16, Alvaro Herrera wrote:
> >> Ah, of course -- the only way to obtain the acl columns is by going
> >> through the catalogs individually, so it won't be possible.  I think
> >> this could be fixed with some very simple, quick function pg_get_acl()
> >> that takes a catalog OID and object OID and returns the ACL; then
> >> use aclexplode() to obtain all those details.
> 
> > +1 for adding pg_get_acl().
> 
> I wonder what performance will be like with lots o' objects.

I guess pg_get_acl() would need to be implemented using a switch(classid) with 
36 cases (one for each class)?

Is your performance concern on how such switch statement will be optimized by 
the C-compiler?

I can see how it would be annoyingly slow if the compiler would pick a branch 
table or binary search,
instead of producing a O(2) fast jump table.

On the topic of C switch statements:

I think the Clang/GCC-compiler folks (anyone here?) could actually be inspired 
by PostgreSQL's PerfectHash.pm.
I think the same strategy could be used in C compilers to optimize switch 
statements with sparse case values,
which currently produce slow binary search code O(log n) while a perfect hash 
solution would be O(2).

Example showing the unintelligent binary search code produced by GCC: 
https://godbolt.org/z/1G6G3vcjx (Clang is just as bad.) This is a hypothetical 
example with sparse case values. This is not the case here, since the classid 
case values are nicely ordered from OCLASS_CLASS..OCLASS_TRANSFORM (0..37), so 
they should produce O(2) fast jump tables.

Maybe there is some other performance concern to reason about that I'm missing 
here?

/Joel

Re: MultiXact\SLRU buffers configuration

2021-03-25 Thread Andrey Borodin
Hi Thomas, Gilles, all!

Thanks for reviewing this!

> 25 марта 2021 г., в 02:31, Thomas Munro  написал(а):
> 
> I don't think we should put "slru" (the name of the buffer replacement
> algorithm, implementation detail) in the GUC names.
+1


> +It defaults to 0, in this case CLOG size is taken as
> shared_buffers / 512.
> 
> We already know that increasing the number of CLOG buffers above the
> current number hurts as the linear search begins to dominate
Uh, my intent was to copy original approach of CLOG SLRU size, I just missed 
that Min(,) thing in shared_buffers logic.


> 26 марта 2021 г., в 08:46, Thomas Munro  написал(а):
> 
> Hi Andrey, all,
> 
> I propose some changes, and I'm attaching a new version:
> 
> I renamed the GUCs as clog_buffers etc (no "_slru_").  I fixed some
> copy/paste mistakes where the different GUCs were mixed up.  I made
> some changes to the .conf.sample.  I rewrote the documentation so that
> it states the correct unit and defaults, and refers to the
> subdirectories that are cached by these buffers instead of trying to
> give a new definition of each of the SLRUs.
> 
> Do you like those changes?
Yes!

> Some things I thought about but didn't change:
> 
> I'm not entirely sure if we should use the internal and historical
> names well known to hackers (CLOG), or the visible directory names (I
> mean, we could use pg_xact_buffers instead of clog_buffers).
While it is good idea to make notes about directory name, I think the real 
naming criteria is to help find this GUC when user encounters wait events in 
pg_stat_activity. I think there is no CLOG mentions in docs [0], only 
XactBuffer, XactSLRU et c.

> I'm not saying the 0002 patch is bug-free yet though, it's a bit finickity.
I think the idea of speeding up linear search is really really good for scaling 
SLRUs. It's not even about improving normal performance of the cluster, but 
it's important from preventing pathological degradation under certain 
circumstances. Bigger cache really saves SLAs :) I'll look into the patch more 
closely this weekend. Thank you!

Best regards, Andrey Borodin.

[0] 
https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ACTIVITY-VIEW



Re: Get memory contexts of an arbitrary backend process

2021-03-25 Thread Kyotaro Horiguchi
At Fri, 26 Mar 2021 14:02:49 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2021/03/26 13:28, Kyotaro Horiguchi wrote:
> >> "some contexts are omitted"
> >> "n child contexts: total_bytes = ..."
> > Sorry I missed that is already implemented.  So my opnion is I agree
> > with limiting with a fixed-number, and preferablly sorted in
> > descending order of... totalspace/nblocks?
> 
> This may be an improvement, but makes us modify
> MemoryContextStatsInternal()
> very much. I'm afraid that it's too late to do that at this stage...
> What about leaving the output order as it is at the first version?

So I said "preferably":p  (with a misspelling...)
I'm fine with that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Get memory contexts of an arbitrary backend process

2021-03-25 Thread Fujii Masao




On 2021/03/26 13:28, Kyotaro Horiguchi wrote:

"some contexts are omitted"
"n child contexts: total_bytes = ..."


Sorry I missed that is already implemented.  So my opnion is I agree
with limiting with a fixed-number, and preferablly sorted in
descending order of... totalspace/nblocks?


This may be an improvement, but makes us modify MemoryContextStatsInternal()
very much. I'm afraid that it's too late to do that at this stage...
What about leaving the output order as it is at the first version?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Get memory contexts of an arbitrary backend process

2021-03-25 Thread Kyotaro Horiguchi
At Fri, 26 Mar 2021 13:17:21 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Fri, 26 Mar 2021 12:49:16 +0900, Fujii Masao  
> wrote in 
> > On 2021/03/26 12:26, Fujii Masao wrote:
> > > On 2021/03/26 12:01, Kyotaro Horiguchi wrote:
> > >> At Thu, 25 Mar 2021 23:45:17 +0900, torikoshia
> > >>  wrote in
> > >>> Attached new one.
> > > Regarding the argument max_children, isn't it better to set its
> > > default value,
> > > e.g., 100 like MemoryContextStats() uses?
> > 
> > +   mcxtLogData->maxChildrenPerContext = max_children;
> > +
> > + if(!SendProcSignal(pid, PROCSIG_LOG_MEMORY_CONTEXT,
> > proc->backendId))
> > +   PG_RETURN_BOOL(true);
> > 
> > Do we need memory barrier here? Otherwise, there is a race condition
> > where maxChildrenPerContext has not been set yet when the target
> > backend
> > that received signal read that shared variable. No?
> > 
> > +Note that when multiple
> > +pg_get_backend_memory_contexts called in
> > + succession or simultaneously, max_children
> > can
> > +be the one of another
> > +pg_get_backend_memory_contexts.
> > +   
> > 
> > This might not be an issue in practice as Horiguchi-san said upthread.
> > But this looks a bit ugly and might be footgun later. The current
> > approach
> > using shared memory to pass this information to the target backends
> > might be overkill, and too complicated to polish the design at the
> > stage
> > just before feature freeze. So I'd withdraw my previous comment and
> > am OK to use the hard-coded value as max_children at the first version
> > of this feature. Thought?
> 
> The dumper function silently suppresses logging when there are too
> many children.  Suppressing a part of output when the user didn't told
> anything looks like just a misbehavior (even if it is written in the
> documentation..), especially when the suppressed contexts occupy the
> majority of memory usage.  I think the fixed-limit of lines works if
> the lines are in descending order of memory usage.
> 
> At least we need an additional line to inform the suppression.


> "some contexts are omitted"
> "n child contexts: total_bytes = ..."

Sorry I missed that is already implemented.  So my opnion is I agree
with limiting with a fixed-number, and preferablly sorted in
descending order of... totalspace/nblocks?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Get memory contexts of an arbitrary backend process

2021-03-25 Thread Kyotaro Horiguchi
At Fri, 26 Mar 2021 12:49:16 +0900, Fujii Masao  
wrote in 
> On 2021/03/26 12:26, Fujii Masao wrote:
> > On 2021/03/26 12:01, Kyotaro Horiguchi wrote:
> >> At Thu, 25 Mar 2021 23:45:17 +0900, torikoshia
> >>  wrote in
> >>> Attached new one.
> > Regarding the argument max_children, isn't it better to set its
> > default value,
> > e.g., 100 like MemoryContextStats() uses?
> 
> + mcxtLogData->maxChildrenPerContext = max_children;
> +
> + if(!SendProcSignal(pid, PROCSIG_LOG_MEMORY_CONTEXT,
> proc->backendId))
> + PG_RETURN_BOOL(true);
> 
> Do we need memory barrier here? Otherwise, there is a race condition
> where maxChildrenPerContext has not been set yet when the target
> backend
> that received signal read that shared variable. No?
> 
> +Note that when multiple
> +pg_get_backend_memory_contexts called in
> + succession or simultaneously, max_children
> can
> +be the one of another
> +pg_get_backend_memory_contexts.
> +   
> 
> This might not be an issue in practice as Horiguchi-san said upthread.
> But this looks a bit ugly and might be footgun later. The current
> approach
> using shared memory to pass this information to the target backends
> might be overkill, and too complicated to polish the design at the
> stage
> just before feature freeze. So I'd withdraw my previous comment and
> am OK to use the hard-coded value as max_children at the first version
> of this feature. Thought?

The dumper function silently suppresses logging when there are too
many children.  Suppressing a part of output when the user didn't told
anything looks like just a misbehavior (even if it is written in the
documentation..), especially when the suppressed contexts occupy the
majority of memory usage.  I think the fixed-limit of lines works if
the lines are in descending order of memory usage.

At least we need an additional line to inform the suppression.

"some contexts are omitted"
"n child contexts: total_bytes = ..."

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: increase size of pg_commit_ts buffers

2021-03-25 Thread Thomas Munro
On Mon, Feb 15, 2021 at 11:56 PM Andrey Borodin  wrote:
> > 16 янв. 2021 г., в 03:07, Alvaro Herrera  
> > написал(а):
> > Andrey Borodin already has a patch to change the behavior for
> > multixact, which is something we should perhaps also do.  I now notice
> > that they're also reporting a bug in that thread ... sigh
>
> I've tried in that thread [0] to do more intelligent optimisation than just 
> increase number of buffers.
> Though, in fact bigger memory had dramatically better effect that lock tricks.
>
> Maybe let's make all SLRUs buffer sizes configurable?

+1

I got interested in the SLRU sizing problem (the lock trick and CV
stuff sounds interesting too, but I didn't have time to review that in
this cycle).  The main problem I'm aware of with it is the linear
search, so I tried to fix that.  See Andrey's thread for details.




Re: [UNVERIFIED SENDER] Re: Challenges preventing us moving to 64 bit transaction id (XID)?

2021-03-25 Thread Thomas Munro
On Fri, Mar 26, 2021 at 2:57 AM David Steele  wrote:
> On 1/22/21 6:46 PM, Finnerty, Jim wrote:
> > First 3 patches derived from the original 64-bit xid patch set by Alexander 
> > Korotkov
>
> The patches no longer apply
> (http://cfbot.cputube.org/patch_32_2960.log), so marked Waiting on Author.
>
> I also removed the PG14 target since this is a fresh patch set after a
> long hiatus with no new review.

Hi Jim,

I just wanted to say that I'm definitely interested in progress in
this area, and I'm sure many others are too.  Let's talk again about
incremental steps in the PG15 cycle.  The reason for lack of responses
on this thread is most likely due to being at the business end of the
PG14 cycle.




Re: Logical Replication - improve error message while adding tables to the publication in check_publication_add_relation

2021-03-25 Thread Bharath Rupireddy
On Thu, Mar 11, 2021 at 8:26 PM Bharath Rupireddy
 wrote:
>
> On Wed, Mar 10, 2021 at 5:48 PM Euler Taveira  wrote:
> >
> > On Wed, Mar 10, 2021, at 2:14 AM, Bharath Rupireddy wrote:
> >
> > While providing thoughts on [1], I observed that the error messages
> > that are emitted while adding foreign, temporary and unlogged tables
> > can be improved a bit from the existing [2] to [3]. For instance, the
> > existing message when foreign table is tried to add into the
> > publication "f1" is not a table" looks odd. Because it says that the
> > foreign table is not a table at all.
> >
> > I wouldn't mix [regular|partitioned|temporary|unlogged] tables with foreign
> > tables. Although, they have a pg_class entry in common, foreign tables 
> > aren't
> > "real" tables (external storage); they even have different DDLs to handle it
> > (CREATE TABLE x CREATE FOREIGN TABLE).
> >
> > postgres=# CREATE PUBLICATION testpub FOR TABLE f1;
> > ERROR:  "f1" is not a table
> > DETAIL:  Only tables can be added to publications.
> >
> > I agree that "f1 is not a table" is a confusing message at first because
> > foreign table has "table" as description. Maybe if we apply the negation in
> > both messages it would be clear (using the same wording as system tables).
> >
> > ERROR:  "f1" is a foreign table
> > DETAIL:  Foreign tables cannot be added to publications.
>
> Thanks. Changed the error message and detail to the way we have it for
> system tables presently. Attaching v2 patch for further review.

Here's the v3 patch rebased on the latest master.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v3-0001-Improve-error-message-while-adding-tables-to-publ.patch
Description: Binary data


Re: Get memory contexts of an arbitrary backend process

2021-03-25 Thread Kyotaro Horiguchi
At Fri, 26 Mar 2021 12:26:31 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2021/03/26 12:01, Kyotaro Horiguchi wrote:
> > At Thu, 25 Mar 2021 23:45:17 +0900, torikoshia
> >  wrote in
> >> Attached new one.
> 
> Regarding the argument max_children, isn't it better to set its
> default value,
> e.g., 100 like MemoryContextStats() uses?
> + ereport(WARNING,
> + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> + errmsg("must be a superuser to log memory contexts")));
> 
> IMO this type of error, i.e., permission error, should be treated as
> ERROR
> rather than WARNING, like pg_terminate_backend() does.
> + ereport(WARNING,
> + (errmsg("failed to send signal: %m")));
> 
> Per message style rule, "could not send signal to process %d: %m" is
> better?

+1 x 3 for the above.

> > Thanks!
> > -SELECT * FROM pg_get_backend_memory_contexts();
> > +SELECT * FROM pg_get_backend_memory_contexts(0, 0);
> > However we can freely change the signature since it's new in 14, but I
> > see the (void) signature as useful.  I vaguely thought of defining
> > pg_get_backend_memory_contexts(void) in pg_proc.dat then defining
> > (int, int) as a different function in system_views.sql.  That allows
> > the function of the second signature to output nothing. You can make a
> > distinction between the two signatures by using PG_NARGS() in the C
> > function.
> > That being said, it's somewhat uneasy that two functions with the same
> > name returns different values.  I'd like to hear others what they feel
> > like about the behavior.
> 
> I think that it's confusing to merge two different features into one
> function.
> Isn't it better to leave pg_get_backend_memory_contexts() as it is,
> and
> make the log-memory-contexts feature as separate function, e.g.,
> pg_log_backend_memory_contexts()?

Yeah, that name looks better than pg_print_ba.. and I agree to
separate the two features.  Sorry for wagging the discussion
back-and-forth.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Get memory contexts of an arbitrary backend process

2021-03-25 Thread Fujii Masao




On 2021/03/26 12:26, Fujii Masao wrote:



On 2021/03/26 12:01, Kyotaro Horiguchi wrote:

At Thu, 25 Mar 2021 23:45:17 +0900, torikoshia  
wrote in

Attached new one.


Regarding the argument max_children, isn't it better to set its default value,
e.g., 100 like MemoryContextStats() uses?


+   mcxtLogData->maxChildrenPerContext = max_children;
+
+   if(!SendProcSignal(pid, PROCSIG_LOG_MEMORY_CONTEXT, proc->backendId))
+   PG_RETURN_BOOL(true);

Do we need memory barrier here? Otherwise, there is a race condition
where maxChildrenPerContext has not been set yet when the target backend
that received signal read that shared variable. No?

+Note that when multiple
+pg_get_backend_memory_contexts called in
+succession or simultaneously, max_children can
+be the one of another
+pg_get_backend_memory_contexts.
+   

This might not be an issue in practice as Horiguchi-san said upthread.
But this looks a bit ugly and might be footgun later. The current approach
using shared memory to pass this information to the target backends
might be overkill, and too complicated to polish the design at the stage
just before feature freeze. So I'd withdraw my previous comment and
am OK to use the hard-coded value as max_children at the first version
of this feature. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: wal stats questions

2021-03-25 Thread Kyotaro Horiguchi
At Fri, 26 Mar 2021 10:32:23 +0900, Fujii Masao  
wrote in 
> > I may be misunderstanding or missing something, but the only place
> > where pgWalUsage counters are increased is XLogInsertRecrod.  That is,
> > pgWalUsage counts wal insertions, not writes nor flushes.  AFAICS
> 
> Yes. And WalStats instead of pgWalUsage includes the stats about
> not only WAL insertions, but also writes and flushes.

Ugh! I was missing a very large blob.. Ok, we need additional check
for non-pgWalUsage part. Sorry.

> pgstat_send_wal() checks WalStats to determine whether there are
> pending WAL stats to send to the stats collector or not. That is,
> the counters of not only WAL insertions but also writes and flushes
> should be checked to determine whether there are pending stats or not,
> I think..

I think we may have an additional flag to notify about io-stat part,
in constrast to wal-insertion part . Anyway we do additional
INSTR_TIME_SET_CURRENT when track_wal_io_timinge.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: MultiXact\SLRU buffers configuration

2021-03-25 Thread Thomas Munro
Hi Andrey, all,

I propose some changes, and I'm attaching a new version:

I renamed the GUCs as clog_buffers etc (no "_slru_").  I fixed some
copy/paste mistakes where the different GUCs were mixed up.  I made
some changes to the .conf.sample.  I rewrote the documentation so that
it states the correct unit and defaults, and refers to the
subdirectories that are cached by these buffers instead of trying to
give a new definition of each of the SLRUs.

Do you like those changes?

Some things I thought about but didn't change:

I'm not entirely sure if we should use the internal and historical
names well known to hackers (CLOG), or the visible directory names (I
mean, we could use pg_xact_buffers instead of clog_buffers).  I am not
sure why these GUCs need to be PGDLLIMPORT, but I see that NBuffers is
like that.

I wanted to do some very simple smoke testing of CLOG sizes on my
local development machine:

  pgbench -i -s1000 postgres
  pgbench -t400 -c8 -j8 -Mprepared postgres

I disabled autovacuum after running that just to be sure it wouldn't
interfere with my experiment:

  alter table pgbench_accounts set (autovacuum_enabled = off);

Then I shut the cluster down and made a copy, so I could do some
repeated experiments from the same initial conditions each time.  At
this point I had 30 files -001E under pg_xact, holding 256kB = ~1
million transactions each.  It'd take ~960 buffers to cache it all.
So how long does VACUUM FREEZE pgbench_accounts take?

I tested with just the 0001 patch, and also with the 0002 patch
(improved version, attached):

clog_buffers=128:  0001=2:28.499, 0002=2:17:891
clog_buffers=1024: 0001=1:38.485, 0002=1:29.701

I'm sure the speedup of the 0002 patch can be amplified by increasing
the number of transactions referenced in the table OR number of
clog_buffers, considering that the linear search produces
O(transactions * clog_buffers) work.  That was 32M transactions and
8MB of CLOG, but I bet if you double both of those numbers once or
twice things start to get hot.  I don't see why you shouldn't be able
to opt to cache literally all of CLOG if you want (something like 50MB
assuming default autovacuum_freeze_max_age, scale to taste, up to
512MB for the theoretical maximum useful value).

I'm not saying the 0002 patch is bug-free yet though, it's a bit finickity.
From e7a8b9a0e27de80bab13f004b4d7e3fa1da677c4 Mon Sep 17 00:00:00 2001
From: Andrey Borodin 
Date: Mon, 15 Feb 2021 21:51:56 +0500
Subject: [PATCH v12 1/2] Make all SLRU buffer sizes configurable.

Provide new GUCs to set the number of buffers, instead of using hard
coded defaults.

Author: Andrey M. Borodin 
Reviewed-by: Anastasia Lubennikova 
Reviewed-by: Tomas Vondra 
Reviewed-by: Alexander Korotkov 
Reviewed-by: Gilles Darold 
Reviewed-by: Thomas Munro 
Discussion: https://postgr.es/m/2BEC2B3F-9B61-4C1D-9FB5-5FAB0F05EF86%40yandex-team.ru
---
 doc/src/sgml/config.sgml  | 137 ++
 src/backend/access/transam/clog.c |   6 +
 src/backend/access/transam/commit_ts.c|   5 +-
 src/backend/access/transam/multixact.c|   8 +-
 src/backend/access/transam/subtrans.c |   5 +-
 src/backend/commands/async.c  |   8 +-
 src/backend/storage/lmgr/predicate.c  |   4 +-
 src/backend/utils/init/globals.c  |   8 +
 src/backend/utils/misc/guc.c  |  77 ++
 src/backend/utils/misc/postgresql.conf.sample |   9 ++
 src/include/access/multixact.h|   4 -
 src/include/access/subtrans.h |   3 -
 src/include/commands/async.h  |   5 -
 src/include/miscadmin.h   |   7 +
 src/include/storage/predicate.h   |   4 -
 15 files changed, 261 insertions(+), 29 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ddc6d789d8..f1112bfa9c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1886,6 +1886,143 @@ include_dir 'conf.d'

   
  
+ 
+
+  multixact_offsets_buffers (integer)
+  
+   multixact_offsets_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of shared memory to used to cache the contents
+of pg_multixact/offsets (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+The default value is 8.
+This parameter can only be set at server start.
+   
+  
+ 
+
+
+  multixact_members_buffers (integer)
+  
+   multixact_members_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of shared memory to used to cache the contents
+of pg_multixact/members (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+The default value is 16.
+This param

Re: [PATCH] add concurrent_abort callback for output plugin

2021-03-25 Thread Amit Kapila
On Thu, Mar 25, 2021 at 2:37 PM Markus Wanner
 wrote:
>
> here is another tidbit from our experience with using logical decoding.
> The attached patch adds a callback to notify the output plugin of a
> concurrent abort.  I'll continue to describe the problem in more detail
> and how this additional callback solves it.
>
> Streamed transactions as well as two-phase commit transactions may get
> decoded before they finish.  At the point the begin_cb is invoked and
> first changes are delivered to the output plugin, it is not necessarily
> known whether the transaction will commit or abort.
>
> This leads to the possibility of the transaction getting aborted
> concurrent to logical decoding.  In that case, it is likely for the
> decoder to error on a catalog scan that conflicts with the abort of the
> transaction.  The reorderbuffer sports a PG_CATCH block to cleanup.
> However, it does not currently inform the output plugin.  From its point
> of view, the transaction is left dangling until another one comes along
> or until the final ROLLBACK or ROLLBACK PREPARED record from WAL gets
> decoded.  Therefore, what the output plugin might see in this case is:
>
> * filter_prepare_cb (txn A)
> * begin_prepare_cb  (txn A)
> * apply_change  (txn A)
> * apply_change  (txn A)
> * apply_change  (txn A)
> * begin_cb  (txn B)
>
> In other words, in this example, only the begin_cb of the following
> transaction implicitly tells the output plugin that txn A could not be
> fully decoded.  And there's no upper time boundary on when that may
> happen.  (It could also be another filter_prepare_cb, if the subsequent
> transaction happens to be a two-phase transaction as well.  Or an
> explicit rollback_prepared_cb or stream_abort if there's no other
> transaction in between.)
>
> An alternative and arguably cleaner approach for streamed transactions
> may be to directly invoke stream_abort.  However, the lsn argument
> passed could not be that of the abort record, as that's not known at the
> point in time of the concurrent abort.  Plus, this seems like a bad fit
> for two-phase commit transactions.
>
> Again, this callback is especially important for output plugins that
> invoke further actions on downstream nodes that delay the COMMIT
> PREPARED of a transaction upstream, e.g. until prepared on other nodes.
> Up until now, the output plugin has no way to learn about a concurrent
> abort of the currently decoded (2PC or streamed) transaction (perhaps
> short of continued polling on the transaction status).
>

I think as you have noted that stream_abort or rollback_prepared will
be sent (the remaining changes in-between will be skipped) as we
decode them from WAL so it is not clear to me how it causes any delays
as opposed to where we don't detect concurrent abort say because after
that we didn't access catalog table.

-- 
With Regards,
Amit Kapila.




Re: Get memory contexts of an arbitrary backend process

2021-03-25 Thread Fujii Masao




On 2021/03/26 12:01, Kyotaro Horiguchi wrote:

At Thu, 25 Mar 2021 23:45:17 +0900, torikoshia  
wrote in

Attached new one.


Regarding the argument max_children, isn't it better to set its default value,
e.g., 100 like MemoryContextStats() uses?

+   ereport(WARNING,
+   (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+errmsg("must be a superuser to log memory 
contexts")));

IMO this type of error, i.e., permission error, should be treated as ERROR
rather than WARNING, like pg_terminate_backend() does.

+   ereport(WARNING,
+   (errmsg("failed to send signal: %m")));

Per message style rule, "could not send signal to process %d: %m" is better?


Thanks!

-SELECT * FROM pg_get_backend_memory_contexts();
+SELECT * FROM pg_get_backend_memory_contexts(0, 0);

However we can freely change the signature since it's new in 14, but I
see the (void) signature as useful.  I vaguely thought of defining
pg_get_backend_memory_contexts(void) in pg_proc.dat then defining
(int, int) as a different function in system_views.sql.  That allows
the function of the second signature to output nothing. You can make a
distinction between the two signatures by using PG_NARGS() in the C
function.

That being said, it's somewhat uneasy that two functions with the same
name returns different values.  I'd like to hear others what they feel
like about the behavior.


I think that it's confusing to merge two different features into one function.
Isn't it better to leave pg_get_backend_memory_contexts() as it is, and
make the log-memory-contexts feature as separate function, e.g.,
pg_log_backend_memory_contexts()?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Get memory contexts of an arbitrary backend process

2021-03-25 Thread Kyotaro Horiguchi
At Thu, 25 Mar 2021 23:45:17 +0900, torikoshia  
wrote in 
> Attached new one.

Thanks!

-SELECT * FROM pg_get_backend_memory_contexts();
+SELECT * FROM pg_get_backend_memory_contexts(0, 0);

However we can freely change the signature since it's new in 14, but I
see the (void) signature as useful.  I vaguely thought of defining
pg_get_backend_memory_contexts(void) in pg_proc.dat then defining
(int, int) as a different function in system_views.sql.  That allows
the function of the second signature to output nothing. You can make a
distinction between the two signatures by using PG_NARGS() in the C
function.

That being said, it's somewhat uneasy that two functions with the same
name returns different values.  I'd like to hear others what they feel
like about the behavior.


+# print memory context of specified backend
+

This looks like a garbage.


+   if( pid == 0)

Odd spacing:p

+note "current_logfiles = $current_logfiles";
+
+like(
+   $current_logfiles,
+   qr|^stderr log/postgresql-.*log$|,
+   'current_logfiles is sane');
+
+my $lfname = $current_logfiles;
+$lfname =~ s/^stderr //;
+chomp $lfname;
+
+my $logfile;
+my $print_count;
+
+# Verify that the backtraces of the processes are logged into logfile.
+for (my $attempts = 0; $attempts < $max_attempts; $attempts++)
+{
+   $logfile = $node->data_dir . '/' . $lfname;

It seems that you forgot to include the change on this part in v4.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: making update/delete of inheritance trees scale better

2021-03-25 Thread Tom Lane
Amit Langote  writes:
> On Fri, Mar 26, 2021 at 3:07 AM Tom Lane  wrote:
>> I think the reason is that the parser
>> initially builds all INSERT ... SELECT cases with the SELECT as an
>> explicit subquery, giving rise to a SubqueryScan node just below the
>> ModifyTable, which will project exactly the desired columns and no more.
>> We'll optimize away the SubqueryScan if it's a no-op projection, but not
>> if it is getting rid of junk columns.  There is room for more optimization
>> here: dropping the SubqueryScan in favor of making ModifyTable do the same
>> projection would win by removing one plan node's worth of overhead.

> Oh, so there could possibly be a case where ModifyTable would have to
> do junk filtering for INSERTs, but IIUC only if the planner optimized
> away junk-filtering-SubqueryScan nodes too?  I was thinking that
> perhaps INSERTs used to need junk filtering in the past but no longer
> and now it's just dead code.

I'm honestly not very sure about that.  It's possible that there was
some state of the code in which we supported INSERT/SELECT but didn't
end up putting a SubqueryScan node in there, but if so it was a long
long time ago.  It looks like the SELECT-is-a-subquery parser logic
dates to 05e3d0ee8 of 2000-10-05, which was a long time before
ModifyTable existed as such.  I'm not interested enough to dig
further than that.

However, it's definitely true that we can now generate INSERT plans
where there's a SubqueryScan node that's not doing anything but
stripping junk columns, for instance

INSERT INTO t SELECT x FROM t2 ORDER BY y;

where the ORDER BY has to be done with an explicit sort.  The
sort will be on a resjunk "y" column.

regards, tom lane




Re: Replication slot stats misgivings

2021-03-25 Thread Amit Kapila
On Fri, Mar 26, 2021 at 1:17 AM Andres Freund  wrote:
>
> Hi,
>
> On 2021-03-25 17:12:31 +0530, Amit Kapila wrote:
> > On Thu, Mar 25, 2021 at 11:36 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Wed, Mar 24, 2021 at 7:06 PM Amit Kapila  
> > > wrote:
> > > >
> > > >
> > > > Leaving aside restart case, without some sort of such sanity checking,
> > > > if both drop (of old slot) and create (of new slot) messages are lost
> > > > then we will start accumulating stats in old slots. However, if only
> > > > one of them is lost then there won't be any such problem.
> > > >
> > > > > Perhaps we could have RestoreSlotFromDisk() send something to the 
> > > > > stats
> > > > > collector ensuring the mapping makes sense?
> > > > >
> > > >
> > > > Say if we send just the index location of each slot then probably we
> > > > can setup replSlotStats. Now say before the restart if one of the drop
> > > > messages was missed (by stats collector) and that happens to be at
> > > > some middle location, then we would end up restoring some already
> > > > dropped slot, leaving some of the still required ones. However, if
> > > > there is some sanity identifier like name along with the index, then I
> > > > think that would have worked for such a case.
> > >
> > > Even such messages could also be lost? Given that any message could be
> > > lost under a UDP connection, I think we cannot rely on a single
> > > message. Instead, I think we need to loosely synchronize the indexes
> > > while assuming the indexes in replSlotStats and
> > > ReplicationSlotCtl->replication_slots are not synchronized.
> > >
> > > >
> > > > I think it would have been easier if we would have some OID type of
> > > > identifier for each slot. But, without that may be index location of
> > > > ReplicationSlotCtl->replication_slots and slotname combination can
> > > > reduce the chances of slot stats go wrong quite less even if not zero.
> > > > If not name, do we have anything else in a slot that can be used for
> > > > some sort of sanity checking?
> > >
> > > I don't see any useful information in a slot for sanity checking.
> > >
> >
> > In that case, can we do a hard check for which slots exist if
> > replSlotStats runs out of space (that can probably happen only after
> > restart and when we lost some drop messages)?
>
> I suggest we wait doing anything about this until we know if the shared
> stats patch gets in or not (I'd give it 50% maybe). If it does get in
> things get a good bit easier, because we don't have to deal with the
> message loss issues anymore.
>

Okay, that makes sense.

-- 
With Regards,
Amit Kapila.




Re: making update/delete of inheritance trees scale better

2021-03-25 Thread Amit Langote
On Fri, Mar 26, 2021 at 3:07 AM Tom Lane  wrote:
> Robert Haas  writes:
> > - Until I went back and found your earlier remarks on this thread, I
> > was confused as to why you were replacing a JunkFilter with a
> > ProjectionInfo. I think it would be good to try to add some more
> > explicit comments about that design choice, perhaps as header comments
> > for ExecGetUpdateNewTuple, or maybe there's a better place. I'm still
> > not sure why we need to do the same thing for the insert case, which
> > seems to just be about removing junk columns.
>
> I wondered about that too; this patch allegedly isn't touching anything
> interesting about INSERT cases, so why should we modify that?  However,
> when I tried to poke at that, I discovered that it seems to be dead code
> anyway.  A look at coverage.postgresql.org will show you that no
> regression test reaches "junk_filter_needed = true" in
> ExecInitModifyTable's inspection of INSERT tlists, and I've been unable to
> create such a case manually either.

I noticed this too.

>  I think the reason is that the parser
> initially builds all INSERT ... SELECT cases with the SELECT as an
> explicit subquery, giving rise to a SubqueryScan node just below the
> ModifyTable, which will project exactly the desired columns and no more.
> We'll optimize away the SubqueryScan if it's a no-op projection, but not
> if it is getting rid of junk columns.  There is room for more optimization
> here: dropping the SubqueryScan in favor of making ModifyTable do the same
> projection would win by removing one plan node's worth of overhead.

Oh, so there could possibly be a case where ModifyTable would have to
do junk filtering for INSERTs, but IIUC only if the planner optimized
away junk-filtering-SubqueryScan nodes too?  I was thinking that
perhaps INSERTs used to need junk filtering in the past but no longer
and now it's just dead code.

>  But
> I don't think we need to panic about whether the projection is done with
> ExecProject or a junk filter --- we'd be strictly ahead of the current
> situation either way.

I would've liked to confirm that with a performance comparison, but no
test case exists to do so. :(

> Given that background, I agree with Amit's choice to change this,
> just to reduce the difference between how INSERT and UPDATE cases work.
>
> For now, there's no performance difference anyway, since neither the
> ProjectionInfo nor the JunkFilter code can be reached.
>  (Maybe a comment about that would be useful.)

I've added a comment in ExecInitModifyTable() around the block that
initializes new-tuple ProjectionInfo to say that INSERTs don't
actually need to use it today.

> BTW, in the version of the patch that I'm working on (not ready to
> post yet), I've thrown away everything that Amit did in setrefs.c
> and tlist.c, so I don't recommend he spend time improving the comments
> there ;-).

Oh, I removed filter_junk_tlist_entries() in my updated version too
prompted by Robert's comment, but haven't touched
set_update_tlist_references().

>  I did not much like the idea of building a full TargetList
> for each partition; that's per-partition cycles and storage space that
> we won't be able to reclaim with the 0002 patch.  And we don't really
> need it, because there are only going to be three cases at runtime:
> pull a column from the subplan result tuple, pull a column from the
> old tuple, or insert a NULL for a dropped column.  So I've replaced
> the per-target-table tlists with integer lists of the attnums of the
> UPDATE's target columns in that table.  These are compact and they
> don't require any further processing in setrefs.c, and the executor
> can easily build a projection expression from that data.

I remember that in the earliest unposted versions, I had made
ExecInitModifyTable() take up the burden of creating the targetlist
that the projection will compute, which is what your approach sounds
like.  However, I had abandoned it due to the concern that it possibly
hurt the prepared statements because we would build the targetlist on
every execution, whereas only once if the planner does it.

I'm just about done addressing Robert's comments, so will post an
update shortly.

--
Amit Langote
EDB: http://www.enterprisedb.com




Re: Proposal for col LIKE $1 with generic Plan

2021-03-25 Thread Andy Fan
On Thu, Mar 25, 2021 at 10:15 AM Andy Fan  wrote:

> Thanks to the get_index_clause_from_support, we can use index for WHERE a
> like
> 'abc%' case. However this currently only works on custom plan. Think about
> the
> case where the planning takes lots of time, custom plan will not give a
> good
> result. so I want to see if we should support this for a generic plan as
> well.
>
> The first step of this is we need to find an operator to present prefix is
> just
> literal. which means '%' is just '%', not match any characters. After
> trying
> 'like' and '~' operator, I find none of them can be used. for example:
>
> PREPARE s AS SELECT * FROM t WHERE a LIKE ('^' || $1);
> EXECUTE s('%abc');
>
> '%' is still a special character to match any characters.  So '~' is.  So
> I think
> we need to define an new operator like text(a) ~^ text(b), which means a
> is prefixed with b literally. For example:
>
> 'abc' ~^ 'ab` -> true
> 'abc' ~^ 'ab%' -> false
>
> so the above case can be written as:
>
> PREPARE s AS SELECT * FROM t WHERE a ~^ $1;
>
>
During the PoC coding, I found we already have ^@ operator for this [1], but
we don't implement that for BTree index so far.  So I will try gist index
for my
current user case and come back to this thread later.  Thanks!


[1]
https://www.postgresql.org/message-id/20180416155036.36070396%40wp.localdomain


-- 
Best Regards
Andy Fan (https://www.aliyun.com/)


Re: wal stats questions

2021-03-25 Thread Fujii Masao




On 2021/03/26 10:08, Kyotaro Horiguchi wrote:

At Thu, 25 Mar 2021 19:01:23 +0900, Fujii Masao  
wrote in

On 2021/03/25 16:37, Kyotaro Horiguchi wrote:

pgWalUsage was used without resetting and we (I) thought that that
behavior should be preserved.  On second thought, as Andres suggested,
we can just reset pgWalUsage at sending since AFAICS no one takes
difference crossing pgstat_report_stat() calls.


Yes, I agree that we can do that since there seems no such code for
now.
Also if we do that, we can check, for example "pgWalUsage.wal_records

0"

as you suggested, to easily determine whether there is pending WAL
stats or not.
Anyway I agree it's better to add comments about the design more.

...

If any wal activity has been recorded, wal_records is always positive
thus we can check for wal activity just by "pgWalUsage.wal_records >
0, which should be fast enough.


Maybe there is the case where a backend generates no WAL records,
but just writes them because it needs to do write-ahead-logging
when flush the table data? If yes, "pgWalUsage.wal_records > 0" is not
enough.
Probably other fields also need to be checked.


(I noticed I made the discussion above unconsciously premising
pgWalUsage reset.)

I may be misunderstanding or missing something, but the only place
where pgWalUsage counters are increased is XLogInsertRecrod.  That is,
pgWalUsage counts wal insertions, not writes nor flushes.  AFAICS


Yes. And WalStats instead of pgWalUsage includes the stats about
not only WAL insertions, but also writes and flushes.
pgstat_send_wal() checks WalStats to determine whether there are
pending WAL stats to send to the stats collector or not. That is,
the counters of not only WAL insertions but also writes and flushes
should be checked to determine whether there are pending stats or not, I think..

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Stronger safeguard for archive recovery not to miss data

2021-03-25 Thread Fujii Masao




On 2021/03/25 23:21, David Steele wrote:

On 1/25/21 3:55 AM, Laurenz Albe wrote:

On Mon, 2021-01-25 at 08:19 +, osumi.takami...@fujitsu.com wrote:

I think you should pst another patch where the second, now superfluous, error
message is removed.


Updated. This patch showed no failure during regression tests
and has been aligned by pgindent.


Looks good to me.
I'll set it to "ready for committer" again.


Fujii, does the new patch in [1] address your concerns?


No. I'm still not sure if this patch is good idea... I understand
why this safeguard is necessary. OTOH I'm afraid it increases
a bit the risk that users get unstartable database, i.e., lose whole database.
But maybe I'm concerned about rare case and my opinion is minority one.
So I'd like to hear more opinions about this patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: wal stats questions

2021-03-25 Thread Kyotaro Horiguchi
At Thu, 25 Mar 2021 19:01:23 +0900, Fujii Masao  
wrote in 
> On 2021/03/25 16:37, Kyotaro Horiguchi wrote:
> > pgWalUsage was used without resetting and we (I) thought that that
> > behavior should be preserved.  On second thought, as Andres suggested,
> > we can just reset pgWalUsage at sending since AFAICS no one takes
> > difference crossing pgstat_report_stat() calls.
> 
> Yes, I agree that we can do that since there seems no such code for
> now.
> Also if we do that, we can check, for example "pgWalUsage.wal_records
> > 0"
> as you suggested, to easily determine whether there is pending WAL
> stats or not.
> Anyway I agree it's better to add comments about the design more.
...
> > If any wal activity has been recorded, wal_records is always positive
> > thus we can check for wal activity just by "pgWalUsage.wal_records >
> > 0, which should be fast enough.
> 
> Maybe there is the case where a backend generates no WAL records,
> but just writes them because it needs to do write-ahead-logging
> when flush the table data? If yes, "pgWalUsage.wal_records > 0" is not
> enough.
> Probably other fields also need to be checked.

(I noticed I made the discussion above unconsciously premising
pgWalUsage reset.)

I may be misunderstanding or missing something, but the only place
where pgWalUsage counters are increased is XLogInsertRecrod.  That is,
pgWalUsage counts wal insertions, not writes nor flushes.  AFAICS
pgWalUsage.wal_records is always incremented when other counters are
increased.  Looking from another side, we should refrain from adding
counters that incrases at a different time than
pgWalUsage.wal_recrods. (That should be written as a comment there.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.

2021-03-25 Thread Fujii Masao




On 2021/03/26 9:25, Masahiro Ikeda wrote:


On 2021/03/25 21:23, Fujii Masao wrote:

On 2021/03/25 9:58, Andres Freund wrote:

Outside of very
narrow circumstances a normal exit shouldn't use _exit(1). Neither 1 no
_exit() (as opposed to exit()) seem appropriate. This seems like a bad
idea.


So you're thinking that the stats collector should do proc_exit(0)
or something even when it receives immediate shutdown request?

One idea to avoid using _exit(1) is to change the SIGQUIT handler
so that it just sets the flag. Then if the stats collector detects that
the flag is set in the main loop, it gets out of the loop,
skips writing the permanent stats file and then exits with exit(0).
That is, normal and immediate shutdown requests are treated
almost the same way in the stats collector. Only the difference of
them is whether it saves the stats to the file or not. Thought?


Although I don't oppose the idea to change from _exit(1) to proc_exit(0), the
reason why I used _exit(1) is that I thought the behavior to skip writing the
stat counters is not normal exit. Actually, other background processes use
_exit(2) instead of _exit(0) or proc_exit(0) in immediate shutdown although
the status code is different because they touch shared memory.


True.






Also, won't this lead to postmaster now starting to complain about
pgstat having crashed in an immediate shutdown? Right now only exit
status 0 is expected:

     if (pid == PgStatPID)
     {
     PgStatPID = 0;
     if (!EXIT_STATUS_0(exitstatus))
     LogChildExit(LOG, _("statistics collector process"),
  pid, exitstatus);
     if (pmState == PM_RUN || pmState == PM_HOT_STANDBY)
     PgStatPID = pgstat_start();
     continue;
     }


No. In immediate shutdown case, pmdie() sets "Shutdown" variable to
ImmdiateShutdown and HandleChildCrash() doesn't complain that in that case
because of the following.

 /*
  * We only log messages and send signals if this is the first process
  * crash and we're not doing an immediate shutdown; otherwise, we're only
  * here to update postmaster's idea of live processes.  If we have already
  * signaled children, nonzero exit status is to be expected, so don't
  * clutter log.
  */
 take_action = !FatalError && Shutdown != ImmediateShutdown;


IIUC, in immediate shutdown case (and other cases too?), HandleChildCrash() is
never invoked due to the exit of pgstat. My understanding of Andres-san's
comment is that is it ok to show like the following log message?

```
LOG:  statistics collector process (PID 64991) exited with exit code 1
```

Surely, other processes don't output the log like it. So, I agree to suppress
the log message.


Yes. I was mistakenly thinking that HandleChildCrash() was called
even when the stats collector exits with non-zero code, like other processes.
But that's not true.

How should we do this? HandleChildCrash() calls LogChildExit()
when FatalError = false and Shutdown != ImmediateShutdown.
We should use the same conditions for the stats collector as follows?

if (pid == PgStatPID)
{
PgStatPID = 0;
-   if (!EXIT_STATUS_0(exitstatus))
+   if (!EXIT_STATUS_0(exitstatus) && !FatalError &&
+   Shutdown != ImmediateShutdown)
LogChildExit(LOG, _("statistics collector 
process"),
 pid, exitstatus);

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: DETAIL for wrong scram password

2021-03-25 Thread Michael Paquier
On Thu, Mar 25, 2021 at 03:54:10PM +, Jacob Champion wrote:
> It looks like the code paths that lead to a doomed authentication
> already provide their own, more specific, logdetail (role doesn't
> exist, role has no password, role doesn't have a SCRAM secret, etc.).

Yes, you are right here.  I missed the parts before
mock_scram_secret() gets called and there are comments in the whole
area.  Hmm, at the end of the day, I think that would just have
verify_client_proof() fill in logdetail when the client proof does not
match, and use a wording different than what's proposed upthread to
outline that this is a client proof mismatch.
--
Michael


signature.asc
Description: PGP signature


Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.

2021-03-25 Thread Masahiro Ikeda



On 2021/03/25 19:48, Fujii Masao wrote:
> 
> 
> On 2021/03/25 9:31, Masahiro Ikeda wrote:
>>
>>
>> On 2021/03/24 18:36, Fujii Masao wrote:
>>>
>>>
>>> On 2021/03/24 3:51, Andres Freund wrote:
 Hi,

 On 2021-03-23 15:50:46 +0900, Fujii Masao wrote:
> This fact makes me wonder that if we collect the statistics about WAL
> writing
> from walreceiver as we discussed in other thread, the stats collector 
> should
> be invoked at more earlier stage. IIUC walreceiver can be invoked before
> PMSIGNAL_BEGIN_HOT_STANDBY is sent.

 FWIW, in the shared memory stats patch the stats subsystem is
 initialized early on by the startup process.
>>>
>>> This is good news!
>>
>> Fujii-san, Andres-san,
>> Thanks for your comments!
>>
>> I didn't think about the start order. From the point of view, I noticed that
>> the current source code has two other concerns.
>>
>>
>> 1. This problem is not only for the wal receiver.
>>
>> The problem which the wal receiver starts before the stats collector
>> is launched during archive recovery is not only for the the wal receiver but
>> also the checkpointer and the bgwriter. Before starting redo, the startup
>> process sends the postmaster "PMSIGNAL_RECOVERY_STARTED" signal to launch the
>> checkpointer and the bgwriter to be able to perform creating restartpoint.
>>
>> Although the socket for communication between the stats collector and the
>> other processes is made in earlier stage via pgstat_init(), I agree to make
>> the stats collector starts earlier stage is defensive. BTW, in my
>> environments(linux, net.core.rmem_default = 212992), the socket can buffer
>> almost 300 WAL stats messages. This mean that, as you said, if the redo phase
>> is too long, it can lost the messages easily.
>>
>>
>> 2. To make the stats clear in redo phase.
>>
>> The statistics can be reset after the wal receiver, the checkpointer and
>> the wal writer are started in redo phase. So, it's not enough the stats
>> collector is invoked at more earlier stage. We need to fix it.
>>
>>
>>
>> (I hope I am not missing something.)
>> Thanks to Andres-san's work([1]), the above problems will be handle in the
>> shared memory stats patch. First problem will be resolved since the stats are
>> collected in shared memory, so the stats collector process is unnecessary
>> itself. Second problem will be resolved to remove the reset code because the
>> temporary stats file won't generated, and if the permanent stats file
>> corrupted, just recreate it.
> 
> Yes. So we should wait for the shared memory stats patch to be committed
> before working on walreceiver stats patch more?

Yes, I agree.

Regards,

-- 
Masahiro Ikeda
NTT DATA CORPORATION




Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.

2021-03-25 Thread Masahiro Ikeda


On 2021/03/25 21:23, Fujii Masao wrote:
> On 2021/03/25 9:58, Andres Freund wrote:
>> Outside of very
>> narrow circumstances a normal exit shouldn't use _exit(1). Neither 1 no
>> _exit() (as opposed to exit()) seem appropriate. This seems like a bad
>> idea.
> 
> So you're thinking that the stats collector should do proc_exit(0)
> or something even when it receives immediate shutdown request?
> 
> One idea to avoid using _exit(1) is to change the SIGQUIT handler
> so that it just sets the flag. Then if the stats collector detects that
> the flag is set in the main loop, it gets out of the loop,
> skips writing the permanent stats file and then exits with exit(0).
> That is, normal and immediate shutdown requests are treated
> almost the same way in the stats collector. Only the difference of
> them is whether it saves the stats to the file or not. Thought?

Although I don't oppose the idea to change from _exit(1) to proc_exit(0), the
reason why I used _exit(1) is that I thought the behavior to skip writing the
stat counters is not normal exit. Actually, other background processes use
_exit(2) instead of _exit(0) or proc_exit(0) in immediate shutdown although
the status code is different because they touch shared memory.


>> Also, won't this lead to postmaster now starting to complain about
>> pgstat having crashed in an immediate shutdown? Right now only exit
>> status 0 is expected:
>>
>>     if (pid == PgStatPID)
>>     {
>>     PgStatPID = 0;
>>     if (!EXIT_STATUS_0(exitstatus))
>>     LogChildExit(LOG, _("statistics collector process"),
>>  pid, exitstatus);
>>     if (pmState == PM_RUN || pmState == PM_HOT_STANDBY)
>>     PgStatPID = pgstat_start();
>>     continue;
>>     }
> 
> No. In immediate shutdown case, pmdie() sets "Shutdown" variable to
> ImmdiateShutdown and HandleChildCrash() doesn't complain that in that case
> because of the following.
> 
> /*
>  * We only log messages and send signals if this is the first process
>  * crash and we're not doing an immediate shutdown; otherwise, we're only
>  * here to update postmaster's idea of live processes.  If we have already
>  * signaled children, nonzero exit status is to be expected, so don't
>  * clutter log.
>  */
> take_action = !FatalError && Shutdown != ImmediateShutdown;

IIUC, in immediate shutdown case (and other cases too?), HandleChildCrash() is
never invoked due to the exit of pgstat. My understanding of Andres-san's
comment is that is it ok to show like the following log message?

```
LOG:  statistics collector process (PID 64991) exited with exit code 1
```

Surely, other processes don't output the log like it. So, I agree to suppress
the log message.

FWIW, in immediate shutdown case, since pmdie() sets "pmState" variable to
"PM_WAIT_BACKENDS", pgstat_start() won't be invoked.


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION




Re: [HACKERS] Custom compression methods

2021-03-25 Thread Justin Pryzby
On Mon, Mar 22, 2021 at 10:41:33AM -0400, Robert Haas wrote:
> trying to explain how TOAST works here, I added a link. It looks,
> though, like that documentation also needs to be patched for this
> change. I'll look into that, and your remaining patches, next.

I added an Opened Item for any necessary updates to the toast docs.

https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items

-- 
Justin




Re: Proposal: Save user's original authenticated identity for logging

2021-03-25 Thread Michael Paquier
On Thu, Mar 25, 2021 at 06:51:22PM +, Jacob Champion wrote:
> On Thu, 2021-03-25 at 14:41 +0900, Michael Paquier wrote:
>> +   port->authn_id = MemoryContextStrdup(TopMemoryContext, id);
>> It may not be obvious that all the field is copied to TopMemoryContext
>> because the Port requires that.
> 
> I've expanded the comment. (v11 attached, with incremental changes over
> v10 in since-v10.diff.txt.)

That's the addition of "to match the lifetime of the Port".  Looks
good.

>> +$node->stop('fast');
>> +my $log_contents = slurp_file($log);
>> Like 11e1577, let's just truncate the log files in all those tests.
> 
> Hmm... having the full log file contents for the SSL tests has been
> incredibly helpful for me with the NSS work. I'd hate to lose them; it
> can be very hard to recreate the test conditions exactly.

Does it really matter to have the full contents of the file from the
previous tests though?  like() would report the contents of
slurp_file() when it fails if the generated output does not match the
expected one, so you actually get less noise this way.

>> +   if (auth_method < 0 || USER_AUTH_LAST < auth_method)
>> +   {
>> +   Assert((0 <= auth_method) && (auth_method <= USER_AUTH_LAST));
>> What's the point of having the check and the assertion?  NULL does not
>> really seem like a good default here as this should never really
>> happen.  Wouldn't a FATAL be actually safer?
> 
> I think FATAL makes more sense. Changed, thanks.

Thanks.  FWIW, one worry I had here was a corrupted stack that calls
this code path that would remain undetected.

>> For SSPI, I think that this should be moved down once we are sure that
>> there is no error and that pg_SSPI_recvauth() reports STATUS_OK to the
>> caller.  There is a similar issue with CheckCertAuth(), and
>> set_authn_id() is documented so as it should be called only when we
>> are sure that authentication succeeded.
> 
> Authentication *has* succeeded already; that's what the SSPI machinery
> has done above. Likewise for CheckCertAuth, which relies on the TLS
> subsystem to validate the client signature before setting the peer_cn.
> The user mapping is an authorization concern: it answers the question,
> "is an authenticated user allowed to use a particular Postgres user
> name?"

Okay.  Could you make the comments in those various areas more
explicit about the difference and that it is intentional to register
the auth ID before checking the user map?  Anybody reading this code
in the future may get confused with the differences in handling all
that according to the auth type involved if that's not clearly
stated.

> That was my intent, yeah. Getting this into the stats framework was
> more than I could bite off for this first patchset, but having it
> stored in a central location will hopefully help people do more with
> it.

No problem with that.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Custom compression methods

2021-03-25 Thread Robert Haas
On Thu, Mar 25, 2021 at 5:44 AM Dilip Kumar  wrote:
> Okay got it.  Fixed as suggested.

Committed with a bit of editing of the comments.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Support for NSS as a libpq TLS backend

2021-03-25 Thread Jacob Champion
On Fri, 2021-03-26 at 00:22 +0100, Daniel Gustafsson wrote:
> > On 23 Mar 2021, at 20:04, Stephen Frost  wrote:
> > 
> > Eh, poor wording on my part.  You're right, the question, reworded
> > again, was "Would someone want to get the context returned by
> > NSS_InitContext?".  If we think there's a reason that someone might want
> > that context then perhaps we should allow getting it, in addition to the
> > pr_fd.  If there's really no reason to ever want the context from
> > NSS_InitContext then what you have here where we're returning pr_fd is
> > probably fine.
> 
> I can't think of any reason, maybe Jacob who has been knee-deep in NSS 
> contexts
> have insights which tell a different story?

The only thing you can do with a context pointer is shut it down, and I
don't think that's something that should be exposed.

--Jacob


Re: shared memory stats: high level design decisions: consistency, dropping

2021-03-25 Thread Magnus Hagander
On Thu, Mar 25, 2021 at 6:20 PM Andres Freund  wrote:
>
> Hi,
>
> On 2021-03-24 14:42:11 +0100, Magnus Hagander wrote:
> > On Sun, Mar 21, 2021 at 11:34 PM Andres Freund  wrote:
> > > > If I understand what you are proposing, all stats views would become
> > > > completely volatile, without even within-query consistency.  That really
> > > > is not gonna work.  As an example, you could get 
> > > > not-even-self-consistent
> > > > results from a join to a stats view if the planner decides to implement
> > > > it as a nestloop with the view on the inside.
> > >
> > > I don't really think it's a problem that's worth incuring that much cost 
> > > to
> > > prevent. We already have that behaviour for a number of of the pg_stat_* 
> > > views,
> > > e.g. pg_stat_xact_all_tables, pg_stat_replication.
> >
> > Aren't those both pretty bad examples though?
> >
> > pg_stat_xact_all_tables surely is within-query consistent, and would
> > be pretty useless if it wwas within-transaction consistent?
>
> It's not within-query consistent:
>
> postgres[1182102][1]=# SELECT pg_stat_get_xact_numscans('pg_class'::regclass) 
> UNION ALL SELECT count(*) FROM pg_class UNION ALL SELECT 
> pg_stat_get_xact_numscans('pg_class'::regclass);
> ┌───┐
> │ pg_stat_get_xact_numscans │
> ├───┤
> │ 0 │
> │   397 │
> │ 1 │
> └───┘
> (3 rows)

Heh. OK, I admit I didn't consider a UNION query like that -- I only
considered it being present *once* in a query :)

That said, if wanted that can be dealt with a WITH MATERIALIZED as
long as it's in the same query, no?


> > pg_stat_replication is a snapshot of what things are right now (like
> > pg_stat_activity), and not collected statistics.
>
> However, pg_stat_activity does have snapshot semantics...

Yeah, yay consistency.


> > Maybe there's inconsistency in that they should've had a different
> > name to separate it out, but fundamentally having xact consistent
> > views there would be a bad thing, no?
>
> True. One weird thing around the _xact_ versions that we, at best,
> *hint* at in the docs, but also contradict, is that _xact_ views are
> actually not tied to the transaction.  It's really about unsubmitted
> stats. E.g. if executing the following via copy-paste
>
> SELECT count(*) FROM pg_class;
> SELECT pg_stat_get_xact_numscans('pg_class'::regclass);
> SELECT count(*) FROM pg_class;
> SELECT pg_stat_get_xact_numscans('pg_class'::regclass);
>
> will most of the time return
> 
> 0
> 
> 1
>
> because after the transaction for the first count(*) commits, we'll not
> have submitted stats for more than PGSTAT_STAT_INTERVAL. But after the
> second count(*) it'll be shorter, therefore the stats won't be
> submitted...

That's... cute. I hadn't realized that part, but then I've never
actually had use for the _xact_ views.


> > > There's just a huge difference between being able to access a table's 
> > > stats in
> > > O(1) time, or having a single stats access be O(database-objects).
> > >
> > > And that includes accesses to things like pg_stat_bgwriter, 
> > > pg_stat_database
> > > (for IO over time stats etc) that often are monitored at a somewhat high
> > > frequency - they also pay the price of reading in all object stats.  On my
> > > example database with 1M tables it takes 0.4s to read pg_stat_database.
> >
> > IMV, singeling things out into "larger groups" would be one perfectly
> > acceptable compromise. That is, say that pg_stat_user_tables can be
> > inconsistent vs with pg_stat_bgwriter, but it cannot be inconsistent
> > with itself.
>
> I don't think that buys us all that much though. It still is a huge
> issue that we need to cache the stats for all relations even though we
> only access the stats for table.

Well, you yourself just mentioned that access to bgwriter and db stats
are often sampled at a higher frequency.

That said, this can often include *individual* tables as well, but
maybe not all at once.


> > > > I also believe that the snapshotting behavior has advantages in terms
> > > > of being able to perform multiple successive queries and get consistent
> > > > results from them.  Only the most trivial sorts of analysis don't need
> > > > that.
> > >
> > > In most cases you'd not do that in a transaction tho, and you'd need to 
> > > create
> > > temporary tables with a snapshot of the stats anyway.
> >
> > I'd say in most cases this analysis happens in snapshots anyway, and
> > those are snapshots unrelated to what we do in pg_stat. It's either
> > snapshotted to tables, or to storage in a completely separate
> > database.
>
> Agreed. I wonder some of that work would be made easier if we added a
> function to export all the data in the current snapshot as a json
> document or such? If we add configurable caching (see below) that'd
> really not be a lot of additional work.

I'd assume if you want the snapshot in the database, yo

Re: [PATCH] add concurrent_abort callback for output plugin

2021-03-25 Thread Markus Wanner

On 25.03.21 21:21, Andres Freund wrote:

... the code added as part of 7259736a6e5b7c75 ...


That's the streaming part, which can be thought of as a more general 
variant of the two-phase decoding in that it allows multiple "flush 
points" (invoking ReorderBufferProcessTXN).  Unlike the PREPARE of a 
two-phase commit, where the reorderbuffer can be sure there's no further 
change to be processed after the PREPARE.  Nor is there any invocation 
of ReorderBufferProcessTXN before that fist one at PREPARE time.  With 
that in mind, I'm surprised support for streaming got committed before 
2PC.  It clearly has different use cases, though.


However, I'm sure your inputs on how to improve and cleanup the 
implementation will be appreciated.  The single tiny problem this patch 
addresses is the same for 2PC and streaming decoding: the output plugin 
currently has no way to learn about a concurrent abort of a transaction 
still being decoded, at the time this happens.


Both, 2PC and streaming do require the reorderbuffer to forward changes 
(possibly) prior to the transaction's commit.  That's the whole point of 
these two features.  Therefore, I don't think we can get around 
concurrent aborts.



You may have only meant it as a shorthand: But imo output plugins have
absolutely no business "invoking actions downstream".


From my point of view, that's the raison d'être for an output plugin. 
Even if it does so merely by forwarding messages.  But yeah, of course a 
whole bunch of other components and changes are needed to implement the 
kind of global two-phase commit system I tried to describe.


I'm open to suggestions on how to reference that use case.


diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index c291b05a423..a6d044b870b 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2488,6 +2488,12 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, 
ReorderBufferTXN *txn,
errdata = NULL;
curtxn->concurrent_abort = true;
  
+			/*

+* Call the cleanup hook to inform the output plugin 
that the
+* transaction just started had to be aborted.
+*/
+   rb->concurrent_abort(rb, txn, streaming, commit_lsn);
+
/* Reset the TXN so that it is allowed to stream 
remaining data. */
ReorderBufferResetTXN(rb, txn, snapshot_now,
  command_id, 
prev_lsn,


I don't think this would be ok, errors thrown in the callback wouldn't
be handled as they would be in other callbacks.


That's a good point.  Maybe the CATCH block should only set a flag, 
allowing for the callback to be invoked outside of it.


Regards

Markus my-callbacks-do-not-throw-error Wanner




Re: [PATCH] More docs on what to do and not do in extension code

2021-03-25 Thread Bruce Momjian
On Thu, Mar 25, 2021 at 08:49:44AM -0400, David Steele wrote:
> On 1/22/21 1:36 AM, Craig Ringer wrote:
> > 
> > Would you mind attaching a revised version of the patch with your edits?
> > Otherwise I'll go and merge them in once you've had your say on my
> > comments inline below.
> 
> Bharath, do the revisions in [1] look OK to you?
> 
> > Bruce, Robert, can I have an opinion from you on how best to locate and
> > structure these docs, or whether you think they're suitable for the main
> > docs at all? See patch upthread.
> 
> Bruce, Robert, any thoughts here?

I know I sent an email earlier this month saying we shouldn't
over-document the backend hooks because the code could drift away from
the README content:


https://www.postgresql.org/message-id/20210309172049.GD26575%40momjian.us

Agreed.  If you document the hooks too much, it allows them to drift
away from matching the code, which makes the hook documentation actually
worse than having no hook documentation at all.

However, for this doc patch, the content seem to be more strategic, so
less likely to change, and hard to figure out from the code directly.
Therefore, I think this would be a useful addition to the docs.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-25 Thread Bruce Momjian
On Wed, Mar 24, 2021 at 11:20:49PM +0800, Julien Rouhaud wrote:
> On Wed, Mar 24, 2021 at 08:13:40AM -0400, Bruce Momjian wrote:
> > I have no local modifications.  Please modify the patch I posted and
> > repost your version, thanks.
> 
> Ok!  I used the last version of the patch you sent and addressed the following
> comments from earlier messages in attached v20:
> 
> - copyright year to 2021
> - s/has has been compute/has been compute/
> - use the name CleanQuerytext in the first commit

My apologies --- yes, I made those two changes after I posted my version
of the patch.  I should have reposted my version with those changes.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-25 Thread Alvaro Herrera
On 2020-Nov-30, Justin Pryzby wrote:

> On Tue, Nov 03, 2020 at 08:56:06PM -0300, Alvaro Herrera wrote:

> > * On the first transaction (the one that marks the partition as
> > detached), the partition is locked with ShareLock rather than
> > ShareUpdateExclusiveLock.  This means that DML is not allowed anymore.
> > This seems a reasonable restriction to me; surely by the time you're
> > detaching the partition you're not inserting data into it anymore.
> 
> I don't think it's an issue with your patch, but FYI that sounds like 
> something
> I had to do recently: detach *all* partitions of various tabls to promote 
> their
> partition key column from timestamp to timestamptz.  And we insert directly
> into child tables, not routed via parent.
> 
> I don't your patch is still useful, but not to us.  So the documentation 
> should
> be clear about that.

FWIW since you mentioned this detail specifically: I backed away from
doing this (and use ShareUpdateExclusive), because it wasn't buying us
anything anyway.  The reason for it is that I wanted to close the hole
for RI queries, and this seemed the simplest fix; but it really *wasn't*
a fix anyway.  My later games with the active snapshot (which are
present in the version I pushed) better close this problem.  So I don't
think this would be a problem.

-- 
Álvaro Herrera   Valdivia, Chile




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-25 Thread Alvaro Herrera
I added that test as promised, and I couldn't find any problems, so I
have pushed it.

Thanks!

-- 
Álvaro Herrera   Valdivia, Chile




Re: [PATCH] add concurrent_abort callback for output plugin

2021-03-25 Thread Andres Freund
Hi,

On 2021-03-25 10:07:28 +0100, Markus Wanner wrote:
> This leads to the possibility of the transaction getting aborted concurrent
> to logical decoding.  In that case, it is likely for the decoder to error on
> a catalog scan that conflicts with the abort of the transaction.  The
> reorderbuffer sports a PG_CATCH block to cleanup.

FWIW, I am seriously suspicuous of the code added as part of
7259736a6e5b7c75 and plan to look at it after the code freeze. I can't
really see this code surviving as is. The tableam.h changes, the bsyscan
stuff, ... Leaving correctness aside, the code bloat and performance
affects alone seems problematic.


> Again, this callback is especially important for output plugins that invoke
> further actions on downstream nodes that delay the COMMIT PREPARED of a
> transaction upstream, e.g. until prepared on other nodes. Up until now, the
> output plugin has no way to learn about a concurrent abort of the currently
> decoded (2PC or streamed) transaction (perhaps short of continued polling on
> the transaction status).

You may have only meant it as a shorthand: But imo output plugins have
absolutely no business "invoking actions downstream".


> diff --git a/src/backend/replication/logical/reorderbuffer.c 
> b/src/backend/replication/logical/reorderbuffer.c
> index c291b05a423..a6d044b870b 100644
> --- a/src/backend/replication/logical/reorderbuffer.c
> +++ b/src/backend/replication/logical/reorderbuffer.c
> @@ -2488,6 +2488,12 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, 
> ReorderBufferTXN *txn,
>   errdata = NULL;
>   curtxn->concurrent_abort = true;
>  
> + /*
> +  * Call the cleanup hook to inform the output plugin 
> that the
> +  * transaction just started had to be aborted.
> +  */
> + rb->concurrent_abort(rb, txn, streaming, commit_lsn);
> +
>   /* Reset the TXN so that it is allowed to stream 
> remaining data. */
>   ReorderBufferResetTXN(rb, txn, snapshot_now,
> command_id, 
> prev_lsn,

I don't think this would be ok, errors thrown in the callback wouldn't
be handled as they would be in other callbacks.

Greetings,

Andres Freund




Re: [PATCH] Covering SPGiST index

2021-03-25 Thread Pavel Borisov
In a v14 I forgot to add the test. PFA v15

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


v15-0001-Covering-SP-GiST-index-support-for-INCLUDE-colum.patch
Description: Binary data


Re: [PATCH] Covering SPGiST index

2021-03-25 Thread Pavel Borisov
>
> On further contemplation, it occurs to me that if we make the switch
> to "key values are stored per normal rules", then even if there is an
> index with pass-by-value keys out there someplace, it would only break
> on big-endian architectures.  On little-endian, the extra space
> occupied by the Datum format would just seem to be padding space.
> So this probably means that the theoretical compatibility hazard is
> small enough to be negligible, and we should go with my option #2
> (i.e., we need to replace SGLTDATUM with normal attribute-fetch logic).
>
> regards, tom lane
>

I am sorry for the delay in reply. Now I've returned to the work on the
patch.
First of all big thanks for good pieces of advice. I especially liked the
idea of not allocating a big array in a picksplit procedure and doing
deform and form tuples on the fly.
I found all notes mentioned are quite worth integrating into the patch, and
have made the next version of a patch (with a pgindent done also). PFA v 14.

I hope I understand the way to modify SGLTDATUM in the right way. If not
please let me know. (The macro SGLTDATUM itself is not removed, it calls
fetch_att. And I find this suitable as the address for the first tuple
attribute is MAXALIGNed).

Thanks again for your consideration. From now I hope to be able to work on
the feature with not so big delay.

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


v14-0001-Covering-SP-GiST-index-support-for-INCLUDE-colum.patch
Description: Binary data


Re: Replication slot stats misgivings

2021-03-25 Thread Andres Freund
Hi,

On 2021-03-25 17:12:31 +0530, Amit Kapila wrote:
> On Thu, Mar 25, 2021 at 11:36 AM Masahiko Sawada  
> wrote:
> >
> > On Wed, Mar 24, 2021 at 7:06 PM Amit Kapila  wrote:
> > >
> > >
> > > Leaving aside restart case, without some sort of such sanity checking,
> > > if both drop (of old slot) and create (of new slot) messages are lost
> > > then we will start accumulating stats in old slots. However, if only
> > > one of them is lost then there won't be any such problem.
> > >
> > > > Perhaps we could have RestoreSlotFromDisk() send something to the stats
> > > > collector ensuring the mapping makes sense?
> > > >
> > >
> > > Say if we send just the index location of each slot then probably we
> > > can setup replSlotStats. Now say before the restart if one of the drop
> > > messages was missed (by stats collector) and that happens to be at
> > > some middle location, then we would end up restoring some already
> > > dropped slot, leaving some of the still required ones. However, if
> > > there is some sanity identifier like name along with the index, then I
> > > think that would have worked for such a case.
> >
> > Even such messages could also be lost? Given that any message could be
> > lost under a UDP connection, I think we cannot rely on a single
> > message. Instead, I think we need to loosely synchronize the indexes
> > while assuming the indexes in replSlotStats and
> > ReplicationSlotCtl->replication_slots are not synchronized.
> >
> > >
> > > I think it would have been easier if we would have some OID type of
> > > identifier for each slot. But, without that may be index location of
> > > ReplicationSlotCtl->replication_slots and slotname combination can
> > > reduce the chances of slot stats go wrong quite less even if not zero.
> > > If not name, do we have anything else in a slot that can be used for
> > > some sort of sanity checking?
> >
> > I don't see any useful information in a slot for sanity checking.
> >
> 
> In that case, can we do a hard check for which slots exist if
> replSlotStats runs out of space (that can probably happen only after
> restart and when we lost some drop messages)?

I suggest we wait doing anything about this until we know if the shared
stats patch gets in or not (I'd give it 50% maybe). If it does get in
things get a good bit easier, because we don't have to deal with the
message loss issues anymore.

Greetings,

Andres Freund




Re: Error on failed COMMIT

2021-03-25 Thread David Steele

On 3/25/21 3:07 PM, Dave Cramer wrote:
On Thu, 25 Mar 2021 at 12:04, David Steele > wrote:


Test are failing on the cfbot for this patch and it looks like a new
patch is needed from Dave, at the least, so marking Waiting on Author.

Should we be considering this patch Returned with Feedback instead?

Not sure, at this point the impetus for this is not getting a lot of 
traction.
Honestly I think the approach I took is too simple and I don't have the 
inclination at the moment to

rewrite it.


In that case Returned with Feedback is appropriate so I have done that.

Of course, the conversation can continue on this thread or a new one and 
when you have a new patch you can create a new CF entry.


Regards,
--
-David
da...@pgmasters.net




Re: Error on failed COMMIT

2021-03-25 Thread Dave Cramer
On Thu, 25 Mar 2021 at 12:04, David Steele  wrote:

> On 1/26/21 1:02 PM, Dave Cramer wrote:
> > On Tue, 26 Jan 2021 at 12:46, Laurenz Albe  > > wrote:
> >
> > I see your point from the view of the JDBC driver.
> >
> > It just feels hacky - somewhat similar to what you say
> > above: don't go through the normal transaction rollback steps,
> > but issue an error message.
> >
> > At least we should fake it well...
> >
> > OK, let me look into how we deal with COMMIT and CHAIN.
> >
> > I can see some real issues with this as Vik pointed out.
>
> Test are failing on the cfbot for this patch and it looks like a new
> patch is needed from Dave, at the least, so marking Waiting on Author.
>
> Should we be considering this patch Returned with Feedback instead?
>
>
Not sure, at this point the impetus for this is not getting a lot of
traction.
Honestly I think the approach I took is too simple and I don't have the
inclination at the moment to
rewrite it.


Dave Cramer
www.postgres.rocks

> Regards,
> --
> -David
> da...@pgmasters.net
>


Re: shared memory stats: high level design decisions: consistency, dropping

2021-03-25 Thread Robert Haas
On Sun, Mar 21, 2021 at 12:14 PM Tom Lane  wrote:
> If I understand what you are proposing, all stats views would become
> completely volatile, without even within-query consistency.  That really
> is not gonna work.  As an example, you could get not-even-self-consistent
> results from a join to a stats view if the planner decides to implement
> it as a nestloop with the view on the inside.
>
> I also believe that the snapshotting behavior has advantages in terms
> of being able to perform multiple successive queries and get consistent
> results from them.  Only the most trivial sorts of analysis don't need
> that.
>
> In short, what you are proposing sounds absolutely disastrous for
> usability of the stats views, and I for one will not sign off on it
> being acceptable.
>
> I do think we could relax the consistency guarantees a little bit,
> perhaps along the lines of only caching view rows that have already
> been read, rather than grabbing everything up front.  But we can't
> just toss the snapshot concept out the window.  It'd be like deciding
> that nobody needs MVCC, or even any sort of repeatable read.

So, just as a data point, the output of pg_locks is not stable within
a transaction. In fact, the pg_locks output could technically include
the same exact lock more than once, if it's being moved from the
fast-path table to the main table just as you are reading all the
data. In theory, that could produce the same kinds of problems that
you're concerned about here, and I suspect sometimes it does. But I
haven't seen a lot of people yelling and screaming about it. The
situation isn't ideal, but it's not disastrous either.

I think it's really hard for us as developers to predict what kinds of
effects of these kinds of decisions will have in real-world
deployments. All of us have probably had the experience of making some
behavioral change that we thought would not be too big a deal and it
actually pissed off a bunch of users who were relying on the old
behavior. I know I have. Conversely, I've reluctantly made changes
that seemed rather dangerous to me and heard nary a peep. If somebody
takes the position that changing this behavior is scary because we
don't know how many users will be inconvenienced or how badly, I can
only agree. But saying that it's tantamount to deciding that nobody
needs MVCC is completely over the top. This is statistical data, not
user data, and there are good reasons to think that people don't have
the same expectations in both cases, starting with the fact that we
have some stuff that works like that already.

More than that, there's a huge problem with the way this works today
that can't be fixed without making some compromises. In the test case
Andres mentioned upthread, the stats snapshot burned through 170MB of
RAM. Now, you might dismiss that as not much memory in 2021, but if
you have a lot of backends accessing the stats, that value could be
multiplied by a two digit or even three digit number, and that is
*definitely* a lot of memory, even in 2021. But even if it's not
multiplied by anything, we're shipping with a default work_mem of just
4MB. So, the position we're implicitly taking today is: if you join
two 5MB tables, it's too risky to put the entire contents of one of
them into a single in-memory hash table, because we might run the
machine out of RAM. But if you have millions of objects in your
database and touch the statistics for one of those objects, once, it's
absolutely OK to slurp tens or even hundreds of megabytes of data into
backend-private memory to avoid the possibility that you might later
access another one of those counters and expect snapshot semantics.

To be honest, I don't find either of those positions very believable.
I do not think it likely that the typical user really wants a 5MB hash
join to be done in batches to save memory, and I think it equally
unlikely that everybody wants to read and cache tens or hundreds of
megabytes of data to get MVCC semantics for volatile statistics. I
think there are probably some cases where having that information be
stable across a transaction lifetime is really useful, so if we can
provide that as an optional behavior, I think that would be a pretty
good idea. But I don't think it's reasonable for that to be the only
behavior, and I'm doubtful about whether it should even be the
default. I bet there are a lot of cases where somebody just wants to
take a quick glance at some of the values as they exist right now, and
has no intention of running any more queries that might examine the
same data again later. Not only does caching all the data use a lot of
memory, but having to read all the data in order to cache it is
potentially a lot slower than just reading the data actually
requested. I'm unwilling to dismiss that as a negligible problem.

In short, I agree that there's stuff to worry about here, but I don't
agree that a zero-consistency model is a crazy idea, even though I
also think it would be nice if

Re: some pointless HeapTupleHeaderIndicatesMovedPartitions calls

2021-03-25 Thread Álvaro Herrera
On 2021-Feb-24, Michael Paquier wrote:

> On Mon, Feb 22, 2021 at 05:15:57PM -0300, Álvaro Herrera wrote:
> > I changed my mind on this after noticing that
> > ItemPointerIndicatesMovedPartitions has a few callers; leaving the
> > interface incomplete/asymmetric would be worse.  So I propose to do
> > this.
> 
> Doing that looks fine to me as well.

Thanks! Pushed.

-- 
Álvaro Herrera   Valdivia, Chile
"Los trabajadores menos efectivos son sistematicamente llevados al lugar
donde pueden hacer el menor daño posible: gerencia."  (El principio Dilbert)




Re: Proposal: Save user's original authenticated identity for logging

2021-03-25 Thread Jacob Champion
On Thu, 2021-03-25 at 14:41 +0900, Michael Paquier wrote:
> I got to look at the DN patch yesterday, so now's the turn of this
> one.  Nice work.

Thanks!

> +   port->authn_id = MemoryContextStrdup(TopMemoryContext, id);
> It may not be obvious that all the field is copied to TopMemoryContext
> because the Port requires that.

I've expanded the comment. (v11 attached, with incremental changes over
v10 in since-v10.diff.txt.)

> +$node->stop('fast');
> +my $log_contents = slurp_file($log);
> Like 11e1577, let's just truncate the log files in all those tests.

Hmm... having the full log file contents for the SSL tests has been
incredibly helpful for me with the NSS work. I'd hate to lose them; it
can be very hard to recreate the test conditions exactly.

> +   if (auth_method < 0 || USER_AUTH_LAST < auth_method)
> +   {
> +   Assert((0 <= auth_method) && (auth_method <= USER_AUTH_LAST));
> What's the point of having the check and the assertion?  NULL does not
> really seem like a good default here as this should never really
> happen.  Wouldn't a FATAL be actually safer?

I think FATAL makes more sense. Changed, thanks.

> It looks wrong to me to include in the SSL tests some checks related
> to SCRAM authentication.  This should remain in 001_password.pl, as of
> src/test/authentication/.

Agreed. Moved the bad-password SCRAM tests over, and removed the
duplicates. The last SCRAM test in that file, which tests the
interaction between client certificates and SCRAM auth, remains.

> port->gss->princ = MemoryContextStrdup(TopMemoryContext, 
> port->gbuf.value);
> +   set_authn_id(port, gbuf.value);
> I don't think that this position is right for GSSAPI.  Shouldn't this
> be placed at the end of pg_GSS_checkauth() and only if the status is
> OK?

No, and the tests will catch you if you try. Authentication happens
before authorization (user mapping), and can succeed independently even
if authz doesn't. See below.

> For SSPI, I think that this should be moved down once we are sure that
> there is no error and that pg_SSPI_recvauth() reports STATUS_OK to the
> caller.  There is a similar issue with CheckCertAuth(), and
> set_authn_id() is documented so as it should be called only when we
> are sure that authentication succeeded.

Authentication *has* succeeded already; that's what the SSPI machinery
has done above. Likewise for CheckCertAuth, which relies on the TLS
subsystem to validate the client signature before setting the peer_cn.
The user mapping is an authorization concern: it answers the question,
"is an authenticated user allowed to use a particular Postgres user
name?"

Postgres currently conflates authn and authz in many places, and in my
experience, that'll make it difficult to maintain new authorization
features like the ones in the wishlist upthread. This patch is only one
step towards a clearer distinction.

> I am actually in
> favor of keeping this information in the Port with those pluggability
> reasons.

That was my intent, yeah. Getting this into the stats framework was
more than I could bite off for this first patchset, but having it
stored in a central location will hopefully help people do more with
it.

--Jacob
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index bfcffbdb3b..0647d7cc32 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -348,6 +348,10 @@ auth_failed(Port *port, int status, char *logdetail)
  * Auth methods should call this exactly once, as soon as the user is
  * successfully authenticated, even if they have reason to know that
  * authorization will fail later.
+ *
+ * The provided string will be copied into the TopMemoryContext, to match the
+ * lifetime of the Port, so it is safe to pass a string that is managed by an
+ * external library.
  */
 static void
 set_authn_id(Port *port, const char *id)
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index aa15877ef7..309bbc06d0 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -3128,8 +3128,8 @@ hba_authname(hbaPort *port)
 
if (auth_method < 0 || USER_AUTH_LAST < auth_method)
{
-   Assert((0 <= auth_method) && (auth_method <= USER_AUTH_LAST));
-   return NULL;
+   /* Should never happen. */
+   elog(FATAL, "port has out-of-bounds UserAuth: %d", auth_method);
}
 
return UserAuthName[auth_method];
diff --git a/src/test/authentication/t/001_password.pl 
b/src/test/authentication/t/001_password.pl
index 3ac137aebd..adeb3bce33 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -17,7 +17,7 @@ if (!$use_unix_sockets)
 }
 else
 {
-   plan tests => 19;
+   plan tests => 21;
 }
 
 
@@ -126,6 +126,23 @@ unlike(
 $log = $node->rotate_logfile();
 $node->start;
 
+# Test that bad passwords are rejected.
+$ENV{"PGPASSWORD"} = 'badpass';
+test_role($node, 'scram_role', 'scram-sha-256', 2);
+$ENV{"PGPASSWORD"} = 'pass

Re: making update/delete of inheritance trees scale better

2021-03-25 Thread Tom Lane
Robert Haas  writes:
> - Until I went back and found your earlier remarks on this thread, I
> was confused as to why you were replacing a JunkFilter with a
> ProjectionInfo. I think it would be good to try to add some more
> explicit comments about that design choice, perhaps as header comments
> for ExecGetUpdateNewTuple, or maybe there's a better place. I'm still
> not sure why we need to do the same thing for the insert case, which
> seems to just be about removing junk columns.

I wondered about that too; this patch allegedly isn't touching anything
interesting about INSERT cases, so why should we modify that?  However,
when I tried to poke at that, I discovered that it seems to be dead code
anyway.  A look at coverage.postgresql.org will show you that no
regression test reaches "junk_filter_needed = true" in
ExecInitModifyTable's inspection of INSERT tlists, and I've been unable to
create such a case manually either.  I think the reason is that the parser
initially builds all INSERT ... SELECT cases with the SELECT as an
explicit subquery, giving rise to a SubqueryScan node just below the
ModifyTable, which will project exactly the desired columns and no more.
We'll optimize away the SubqueryScan if it's a no-op projection, but not
if it is getting rid of junk columns.  There is room for more optimization
here: dropping the SubqueryScan in favor of making ModifyTable do the same
projection would win by removing one plan node's worth of overhead.  But
I don't think we need to panic about whether the projection is done with
ExecProject or a junk filter --- we'd be strictly ahead of the current
situation either way.

Given that background, I agree with Amit's choice to change this,
just to reduce the difference between how INSERT and UPDATE cases work.
For now, there's no performance difference anyway, since neither the
ProjectionInfo nor the JunkFilter code can be reached.  (Maybe a comment
about that would be useful.)

BTW, in the version of the patch that I'm working on (not ready to
post yet), I've thrown away everything that Amit did in setrefs.c
and tlist.c, so I don't recommend he spend time improving the comments
there ;-).  I did not much like the idea of building a full TargetList
for each partition; that's per-partition cycles and storage space that
we won't be able to reclaim with the 0002 patch.  And we don't really
need it, because there are only going to be three cases at runtime:
pull a column from the subplan result tuple, pull a column from the
old tuple, or insert a NULL for a dropped column.  So I've replaced
the per-target-table tlists with integer lists of the attnums of the
UPDATE's target columns in that table.  These are compact and they
don't require any further processing in setrefs.c, and the executor
can easily build a projection expression from that data.

regards, tom lane




document lock obtained for FKs during detach

2021-03-25 Thread Alvaro Herrera
I failed to document the lock acquired on tables that reference the
partitioned table that we're detaching a partition from.  This patch
(proposed for backpatch to 12, where that feature was added) fixes that
problem.

Noticed the omission a week ago while working out some details of
concurrent detach.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"It takes less than 2 seconds to get to 78% complete; that's a good sign.
A few seconds later it's at 90%, but it seems to have stuck there.  Did
somebody make percentages logarithmic while I wasn't looking?"
http://smylers.hates-software.com/2005/09/08/1995c749.html
>From 98c0fe1d22333ea4ec27256f9c4dda98b1166b65 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 25 Mar 2021 14:53:21 -0300
Subject: [PATCH] Document lock obtained during partition detach

We acquire a SHARE lock all table that reference the partitioned table
that we're detaching a partition from, but failed to document this fact.
My oversight in commit f56f8f8da6af.  Repair.  Backpatch to 12.
---
 doc/src/sgml/ref/alter_table.sgml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 0bd0c1a503..cd56d6a17e 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -962,6 +962,8 @@ WITH ( MODULUS numeric_literal, REM
   ties to the table from which it was detached.  Any indexes that were
   attached to the target table's indexes are detached.  Any triggers that
   were created as clones of those in the target table are removed.
+  SHARE lock is obtained on any tables that reference
+  this partitioned table in foreign key constraints.
  
 

-- 
2.20.1



Re: shared memory stats: high level design decisions: consistency, dropping

2021-03-25 Thread Andres Freund
Hi,

On 2021-03-21 14:53:50 -0700, Andres Freund wrote:
> On 2021-03-21 11:41:30 -0400, Stephen Frost wrote:
> > > 2) How to remove stats of dropped objects?
> > >
> > > [...]
> >
> > The current approach sounds pretty terrible and propagating that forward
> > doesn't seem great.  Guess here I'd disagree with your gut feeling and
> > encourage trying to go 'full in', as you put it, or at least put enough
> > effort into it to get a feeling of if it's going to require a *lot* more
> > work or not and then reconsider if necessary.
> 
> I think my gut's argument is that it's already a huge patch, and that
> it's better to have the the very substantial memory and disk IO savings
> with the crappy vacuum approach, than neither. And given the timeframe
> there does seem to be a substantial danger of "neither" being the
> outcome...  Anyway, I'm mentally sketching out what it'd take.

I implemented this. Far from polished, but it does survive the
regression tests, including new tests in stats.sql.  functions stats,
2PC and lots of naming issues (xl_xact_dropped_stats with members
ndropped and'dropped_stats) are yet to be addressed - but not
architecturally relevant.

I think there are three hairy corner-cases that I haven't thought
sufficiently about:

- It seems likely that there's a relatively narrow window where a crash
  could end up not dropping stats.

  The xact.c integration is basically parallel to smgrGetPendingDeletes
  etc.  I don't see what prevents a checkpoint from happing after
  RecordTransactionCommit() (which does smgrGetPendingDeletes), but
  before smgrDoPendingDeletes(). Which means that if we crash in that
  window, nothing would clean up those files (and thus also not the
  dropped stats that I implemented very similarly).

  Closing that window seems doable (put the pending file/stat deletion
  list in shared memory after logging the WAL record, but before
  MyProc->delayChkpt = false, do something with that in
  CheckPointGuts()), but not trivial.

  If we had a good way to run pgstat_vacuum_stat() on a very
  *occasional* basis via autovac, that'd be ok. But it'd be a lot nicer
  if it were bulletproof.

- Does temporary table cleanup after a crash properly deal with their
  stats?

- I suspect there are a few cases where pending stats in one connection
  could "revive" an already dropped stat. It'd not be hard to address in
  an inefficient way in the shmstats patch, but it'd come with some
  efficiency penalty - but it might an irrelevant efficiency difference.


Definitely for later, but if we got this ironed out, we probably could
stop throwing stats away after crashes. Instead storing stats snapshots
alongside redo LSNs or such. It'd be nice if immediate shutdowns etc
wouldn't lead to autovacuum not doing any vacuuming for quite a while.

Greetings,

Andres Freund




Re: shared memory stats: high level design decisions: consistency, dropping

2021-03-25 Thread Andres Freund
Hi,

On 2021-03-24 14:42:11 +0100, Magnus Hagander wrote:
> On Sun, Mar 21, 2021 at 11:34 PM Andres Freund  wrote:
> > > If I understand what you are proposing, all stats views would become
> > > completely volatile, without even within-query consistency.  That really
> > > is not gonna work.  As an example, you could get not-even-self-consistent
> > > results from a join to a stats view if the planner decides to implement
> > > it as a nestloop with the view on the inside.
> >
> > I don't really think it's a problem that's worth incuring that much cost to
> > prevent. We already have that behaviour for a number of of the pg_stat_* 
> > views,
> > e.g. pg_stat_xact_all_tables, pg_stat_replication.
> 
> Aren't those both pretty bad examples though?
> 
> pg_stat_xact_all_tables surely is within-query consistent, and would
> be pretty useless if it wwas within-transaction consistent?

It's not within-query consistent:

postgres[1182102][1]=# SELECT pg_stat_get_xact_numscans('pg_class'::regclass) 
UNION ALL SELECT count(*) FROM pg_class UNION ALL SELECT 
pg_stat_get_xact_numscans('pg_class'::regclass);
┌───┐
│ pg_stat_get_xact_numscans │
├───┤
│ 0 │
│   397 │
│ 1 │
└───┘
(3 rows)


> pg_stat_replication is a snapshot of what things are right now (like
> pg_stat_activity), and not collected statistics.

However, pg_stat_activity does have snapshot semantics...


> Maybe there's inconsistency in that they should've had a different
> name to separate it out, but fundamentally having xact consistent
> views there would be a bad thing, no?

True. One weird thing around the _xact_ versions that we, at best,
*hint* at in the docs, but also contradict, is that _xact_ views are
actually not tied to the transaction.  It's really about unsubmitted
stats. E.g. if executing the following via copy-paste

SELECT count(*) FROM pg_class;
SELECT pg_stat_get_xact_numscans('pg_class'::regclass);
SELECT count(*) FROM pg_class;
SELECT pg_stat_get_xact_numscans('pg_class'::regclass);

will most of the time return

0

1

because after the transaction for the first count(*) commits, we'll not
have submitted stats for more than PGSTAT_STAT_INTERVAL. But after the
second count(*) it'll be shorter, therefore the stats won't be
submitted...


> > There's just a huge difference between being able to access a table's stats 
> > in
> > O(1) time, or having a single stats access be O(database-objects).
> >
> > And that includes accesses to things like pg_stat_bgwriter, pg_stat_database
> > (for IO over time stats etc) that often are monitored at a somewhat high
> > frequency - they also pay the price of reading in all object stats.  On my
> > example database with 1M tables it takes 0.4s to read pg_stat_database.
> 
> IMV, singeling things out into "larger groups" would be one perfectly
> acceptable compromise. That is, say that pg_stat_user_tables can be
> inconsistent vs with pg_stat_bgwriter, but it cannot be inconsistent
> with itself.

I don't think that buys us all that much though. It still is a huge
issue that we need to cache the stats for all relations even though we
only access the stats for table.


> > We currently also fetch the full stats in places like autovacuum.c. Where we
> > don't need repeated access to be consistent - we even explicitly force the
> > stats to be re-read for every single table that's getting vacuumed.
> >
> > Even if we to just cache already accessed stats, places like do_autovacuum()
> > would end up with a completely unnecessary cache of all tables, blowing up
> > memory usage by a large amount on systems with lots of relations.
> 
> autovacuum is already dealing with things being pretty fuzzy though,
> so it shouldn't matter much there?
> 
> But autovacuum might also deserve it's own interface to access the
> data directly and doesn't have to follow the same one as the stats
> views in this new scheme, perhaps?

Yes, we can do that now.


> > > I also believe that the snapshotting behavior has advantages in terms
> > > of being able to perform multiple successive queries and get consistent
> > > results from them.  Only the most trivial sorts of analysis don't need
> > > that.
> >
> > In most cases you'd not do that in a transaction tho, and you'd need to 
> > create
> > temporary tables with a snapshot of the stats anyway.
> 
> I'd say in most cases this analysis happens in snapshots anyway, and
> those are snapshots unrelated to what we do in pg_stat. It's either
> snapshotted to tables, or to storage in a completely separate
> database.

Agreed. I wonder some of that work would be made easier if we added a
function to export all the data in the current snapshot as a json
document or such? If we add configurable caching (see below) that'd
really not be a lot of additional work.


> I would *like* to have query-level consistent views. But I may be able
> to compro

Re: [PATCH] pg_permissions

2021-03-25 Thread Tom Lane
"Joel Jacobson"  writes:
> On Thu, Mar 25, 2021, at 16:16, Alvaro Herrera wrote:
>> Ah, of course -- the only way to obtain the acl columns is by going
>> through the catalogs individually, so it won't be possible.  I think
>> this could be fixed with some very simple, quick function pg_get_acl()
>> that takes a catalog OID and object OID and returns the ACL; then
>> use aclexplode() to obtain all those details.

> +1 for adding pg_get_acl().

I wonder what performance will be like with lots o' objects.

regards, tom lane




Re: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently

2021-03-25 Thread David Steele

On 1/27/21 1:00 AM, Kyotaro Horiguchi wrote:

At Wed, 27 Jan 2021 05:30:29 +, "tsunakawa.ta...@fujitsu.com" 
 wrote in

From: Kyotaro Horiguchi 

"CREATE TABLE" is not "CREATE LOGGED TABLE". We can assume that as
"CREATE  TABLE", where the default logged-ness
varies according to context. Or it might have been so since the beginning.
Currently we don't have the syntax "CREATE LOGGED TABLE", but we can add
that syntax.


Yes, I thought of that possibility after a while too.  But I felt a bit(?) 
hesitant to do it considering back-patching.  Also, the idea requires ALTER 
TABLE ATTACH PARTITION will have to modify the logged-ness property of the 
target partition and its subpartitions with that of the parent partitioned 
table.  However, your idea is one candidate worth pursuing, including whether 
or not to back-patch what.


I think this and all possible similar changes won't be back-patched at
all.  It is a desaster for any establish version to change this kind
of behavior.



  We pursue relasing all fixes at once but we might release all fixes  other 
than
some items that cannot be fixed for some technical reasons  at the time, like
REPLICA IDENITTY.

I'm not sure how long we will wait for the time of release, though.


Anyway, we have to define the ideal behavior for each ALTER action based on 
some comprehensible principle.  Yeah... this may become a long, tiring journey. 
 (I anticipate some difficulty and compromise in reaching agreement, as was 
seen in the past discussion for the fix for ALTER TABLE REPLICA IDENTITY.  
Scary)


It seems to me that we have agreed that the ideal behavior for ALTER
TABLE on a parent to propagate to descendants. An observed objection
is that behavior is dangerous for operations that requires large
amount of resources that could lead to failure after elapsing a long
time. The revealed difficulty of REPLICA IDENTITY is regarded as a
technical issue that should be overcome.


This thread seems to be stalled. Since it represents a bug fix is there 
something we can do in the meantime while a real solution is worked out?


Perhaps just inform the user that nothing was done? Or get something 
into the documentation?


Regards,
--
-David
da...@pgmasters.net




Re: [PATCH] pg_permissions

2021-03-25 Thread Joel Jacobson
On Thu, Mar 25, 2021, at 16:16, Alvaro Herrera wrote:
> On 2021-Mar-25, Joel Jacobson wrote:
> 
> > pg_shdepend doesn't contain the aclitem info though,
> > so it won't work for pg_permissions if we want to expose
> > privilege_type, is_grantable and grantor.
> 
> Ah, of course -- the only way to obtain the acl columns is by going
> through the catalogs individually, so it won't be possible.  I think
> this could be fixed with some very simple, quick function pg_get_acl()
> that takes a catalog OID and object OID and returns the ACL; then
> use aclexplode() to obtain all those details.

+1 for adding pg_get_acl().
Do you want to write a patch for that?
I could try implementing it otherwise, but would be good with buy-in
from some more hackers on if we want these system views at all first.

Maybe we can try to decide on that first,
i.e. if we want them and what they should return?

In the meantime, if people want to try out the views,
I've modified the patch to use pg_shdepend for pg_ownerships,
while pg_permissions is still UNION ALL.

Both views now also use pg_identify_object().

Example usage:

CREATE ROLE test_user;
CREATE ROLE test_group;
CREATE ROLE test_owner;
CREATE SCHEMA test AUTHORIZATION test_owner;
GRANT ALL ON SCHEMA test TO test_group;
GRANT test_group TO test_user;

SELECT * FROM pg_permissions WHERE grantor = 'test_owner'::regrole;
   classid| objid | objsubid |  type  | schema | name | identity |  grantor 
  |  grantee   | privilege_type | is_grantable
--+---+--+++--+--++++--
pg_namespace | 37128 |0 | schema || test | test | 
test_owner | test_owner | USAGE  | f
pg_namespace | 37128 |0 | schema || test | test | 
test_owner | test_owner | CREATE | f
pg_namespace | 37128 |0 | schema || test | test | 
test_owner | test_group | USAGE  | f
pg_namespace | 37128 |0 | schema || test | test | 
test_owner | test_group | CREATE | f
(4 rows)

SET ROLE TO test_user;
CREATE TABLE test.a ();
RESET ROLE;
ALTER TABLE test.a OWNER TO test_owner;

SELECT * FROM pg_ownerships WHERE owner = 'test_owner'::regrole;
classid | objid | objsubid |  type  | schema | name | identity |   owner
-+---+--+++--+--+
1259 | 37129 |0 | table  | test   | a| test.a   | test_owner
2615 | 37128 |0 | schema || test | test | test_owner
(2 rows)

GRANT INSERT ON test.a TO test_group;

SELECT * FROM pg_permissions WHERE grantee = 'test_group'::regrole;
   classid| objid | objsubid |  type  | schema | name | identity |  grantor 
  |  grantee   | privilege_type | is_grantable
--+---+--+++--+--++++--
pg_class | 37129 |0 | table  | test   | a| test.a   | 
test_owner | test_group | INSERT | f
pg_namespace | 37128 |0 | schema || test | test | 
test_owner | test_group | USAGE  | f
pg_namespace | 37128 |0 | schema || test | test | 
test_owner | test_group | CREATE | f
(3 rows)

Looks good or can we improve them further?

> 
> > The semantics will not be entirely the same,
> > since internal objects are not tracked in pg_shdepend,
> > but I think this is an improvement.
> 
> I just realized that pg_shdepend will not show anything for pinned users
> (the bootstrap superuser).  I *think* this is not a problem.

I also think it's not a problem.

Doing a "SELECT * FROM pg_ownerships" would be very noisy
if such objects would be included, as all pre-installed catalog objects would 
show up,
but by excluding them, the user will only see relevant ownerships explicitly 
owned by "real" roles.

We would get the same improvement for pg_permissions if pg_shdepend would be 
use there as well.
Right now it's very noisy, as all permissions also for the bootstrap superuser 
are included.

/Joel

0005-pg_permissions-and-pg_ownerships.patch
Description: Binary data


Re: [HACKERS] Custom compression methods

2021-03-25 Thread Robert Haas
On Wed, Mar 24, 2021 at 2:15 PM Tom Lane  wrote:
> But let's ignore the case of pg_upgrade and just consider a dump/restore.
> I'd still say that unless you give --no-toast-compression then I would
> expect the dump/restore to preserve the tables' old compression behavior.
> Robert's argument that the pre-v14 database had no particular compression
> behavior seems nonsensical to me.  We know exactly which compression
> behavior it has.

I said that it didn't have a state, not that it didn't have a
behavior. That's not exactly the same thing. But I don't want to argue
about it, either. It's a judgement call what's best here, and I don't
pretend to have all the answers. If you're sure you've got it right
... great!

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Error on failed COMMIT

2021-03-25 Thread David Steele

On 1/26/21 1:02 PM, Dave Cramer wrote:
On Tue, 26 Jan 2021 at 12:46, Laurenz Albe > wrote:


I see your point from the view of the JDBC driver.

It just feels hacky - somewhat similar to what you say
above: don't go through the normal transaction rollback steps,
but issue an error message.

At least we should fake it well...

OK, let me look into how we deal with COMMIT and CHAIN.

I can see some real issues with this as Vik pointed out.


Test are failing on the cfbot for this patch and it looks like a new 
patch is needed from Dave, at the least, so marking Waiting on Author.


Should we be considering this patch Returned with Feedback instead?

Regards,
--
-David
da...@pgmasters.net




Re: DETAIL for wrong scram password

2021-03-25 Thread Jacob Champion
On Thu, 2021-03-25 at 16:41 +0900, Michael Paquier wrote:
> On top of what's
> proposed, would it make sense to have a second logdetail for the case
> of a mock authentication?  We don't log that yet, so I guess that it
> could be useful for audit purposes?
It looks like the code paths that lead to a doomed authentication
already provide their own, more specific, logdetail (role doesn't
exist, role has no password, role doesn't have a SCRAM secret, etc.).

--Jacob


Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-25 Thread Alvaro Herrera
On 2021-Mar-23, Alvaro Herrera wrote:

> I think a possible solution to this problem is that the "detach" flag in
> pg_inherits is not a boolean anymore, but an Xid (or maybe two Xids).
> Not sure exactly which Xid(s) yet, and I'm not sure what are the exact
> rules, but the Xid becomes a marker that indicates an horizon past which
> the partition is no longer visible.  Then, REPEATABLE READ can see the
> partition, but only if its snapshot is older than the Xid.

So a solution to this problem seems similar (but not quite the same) as
pg_index.indcheckxmin: the partition is included in the partition
directory, or not, depending on the pg_inherits tuple visibility for the
active snapshot.  This solves the problem because the RI query uses a
fresh snapshot, for which the partition has already been detached, while
the normal REPEATABLE READ query is using the old snapshot for which the
'detach-pending' row is still seen as in progress.  With this, the weird
hack in a couple of places that needed to check the isolation level is
gone, which makes me a bit more comfortable.

So attached is v9 with this problem solved.

I'll add one more torture test, and if it works correctly I'll push it:
have a cursor in the repeatable read transaction, which can read the
referenced partition table and see the row in the detached partition,
but the RI query must not see that row.  Bonus: the RI query is run from
another cursor that is doing UPDATE WHERE CURRENT OF that cursor.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)
>From 61ff1c133bb4fa7e81f119e806af9760b2faf0e4 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 13 Jul 2020 20:15:30 -0400
Subject: [PATCH v9 1/2] Let ALTER TABLE exec routines deal with the relation

This means that ATExecCmd relies on AlteredRelationInfo->rel instead of
keeping the relation as a local variable; this is useful if the
subcommand needs to modify the relation internally.  For example, a
subcommand that internally commits its transaction needs this.
---
 src/backend/commands/tablecmds.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3349bcfaa7..bf7fd6e8ae 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -157,6 +157,8 @@ typedef struct AlteredTableInfo
 	Oid			relid;			/* Relation to work on */
 	char		relkind;		/* Its relkind */
 	TupleDesc	oldDesc;		/* Pre-modification tuple descriptor */
+	/* Transiently set during Phase 2, normally set to NULL */
+	Relation	rel;
 	/* Information saved by Phase 1 for Phase 2: */
 	List	   *subcmds[AT_NUM_PASSES]; /* Lists of AlterTableCmd */
 	/* Information saved by Phases 1/2 for Phase 3: */
@@ -354,7 +356,7 @@ static void ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 	  AlterTableUtilityContext *context);
 static void ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			  AlterTableUtilityContext *context);
-static void ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
+static void ATExecCmd(List **wqueue, AlteredTableInfo *tab,
 	  AlterTableCmd *cmd, LOCKMODE lockmode, int cur_pass,
 	  AlterTableUtilityContext *context);
 static AlterTableCmd *ATParseTransformCmd(List **wqueue, AlteredTableInfo *tab,
@@ -4569,7 +4571,6 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 		{
 			AlteredTableInfo *tab = (AlteredTableInfo *) lfirst(ltab);
 			List	   *subcmds = tab->subcmds[pass];
-			Relation	rel;
 			ListCell   *lcmd;
 
 			if (subcmds == NIL)
@@ -4578,10 +4579,10 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			/*
 			 * Appropriate lock was obtained by phase 1, needn't get it again
 			 */
-			rel = relation_open(tab->relid, NoLock);
+			tab->rel = relation_open(tab->relid, NoLock);
 
 			foreach(lcmd, subcmds)
-ATExecCmd(wqueue, tab, rel,
+ATExecCmd(wqueue, tab,
 		  castNode(AlterTableCmd, lfirst(lcmd)),
 		  lockmode, pass, context);
 
@@ -4593,7 +4594,11 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			if (pass == AT_PASS_ALTER_TYPE)
 ATPostAlterTypeCleanup(wqueue, tab, lockmode);
 
-			relation_close(rel, NoLock);
+			if (tab->rel)
+			{
+relation_close(tab->rel, NoLock);
+tab->rel = NULL;
+			}
 		}
 	}
 
@@ -4619,11 +4624,12 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
  * ATExecCmd: dispatch a subcommand to appropriate execution routine
  */
 static void
-ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
+ATExecCmd(List **wqueue, AlteredTableInfo *tab,
 		  AlterTableCmd *cmd, LOCKMODE lockmode, int cur_pass,
 		  AlterTableUtilityContext *context)
 {
 	ObjectAddress address = InvalidObjectAddress;
+	Relation	rel = tab->rel;
 
 	switch (cmd->subtype)
 	{
@@ -5730,6 +5736,7 @@ ATGetQueueEntry(List **wqueue, Relatio

Re: Is it useful to record whether plans are generic or custom?

2021-03-25 Thread torikoshia

On 2021-03-25 22:14, Fujii Masao wrote:

On 2021/03/23 16:32, torikoshia wrote:

On 2021-03-05 17:47, Fujii Masao wrote:

Thanks for your comments!


Thanks for updating the patch!

PostgreSQL Patch Tester reported that the patched version failed to be 
compiled

at Windows. Could you fix this issue?
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.131238



It seems PGDLLIMPORT was necessary..
Attached a new one.

Regards.diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 16158525ca..887c4b2be8 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -251,6 +251,72 @@ FROM pg_stat_statements ORDER BY query COLLATE "C";
  UPDATE pgss_test SET b = $1 WHERE a > $2  | 1 |3 | t   | t | t
 (7 rows)
 
+--
+-- Track the number of generic plan
+--
+CREATE TABLE pgss_test (i int, j int, k int);
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+SET plan_cache_mode TO force_generic_plan;
+SET pg_stat_statements.track_utility = TRUE;
+PREPARE pgss_p1 AS SELECT i FROM pgss_test WHERE i = $1;
+EXECUTE pgss_p1(1);
+ i 
+---
+(0 rows)
+
+-- EXPLAIN ANALZE should be recorded
+PREPARE pgss_p2 AS SELECT j FROM pgss_test WHERE j = $1;
+EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) EXECUTE pgss_p2(1);
+  QUERY PLAN   
+---
+ Seq Scan on pgss_test (actual rows=0 loops=1)
+   Filter: (j = $1)
+(2 rows)
+
+-- Nested Portal
+PREPARE pgss_p3 AS SELECT k FROM pgss_test WHERE k = $1;
+BEGIN;
+DECLARE pgss_c1 CURSOR FOR SELECT name FROM pg_prepared_statements;
+FETCH IN pgss_c1;
+  name   
+-
+ pgss_p2
+(1 row)
+
+EXECUTE pgss_p3(1);
+ k 
+---
+(0 rows)
+
+FETCH IN pgss_c1;
+  name   
+-
+ pgss_p1
+(1 row)
+
+COMMIT;
+SELECT calls, generic_calls, query FROM pg_stat_statements;
+ calls | generic_calls |  query   
+---+---+--
+ 1 | 0 | DECLARE pgss_c1 CURSOR FOR SELECT name FROM pg_prepared_statements
+ 0 | 0 | SELECT calls, generic_calls, query FROM pg_stat_statements
+ 1 | 1 | PREPARE pgss_p1 AS SELECT i FROM pgss_test WHERE i = $1
+ 2 | 0 | FETCH IN pgss_c1
+ 1 | 0 | BEGIN
+ 1 | 0 | SELECT pg_stat_statements_reset()
+ 1 | 1 | EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) EXECUTE pgss_p2(1)
+ 1 | 0 | COMMIT
+ 1 | 1 | PREPARE pgss_p3 AS SELECT k FROM pgss_test WHERE k = $1
+(9 rows)
+
+SET pg_stat_statements.track_utility = FALSE;
+DEALLOCATE ALL;
+DROP TABLE pgss_test;
 --
 -- pg_stat_statements.track = none
 --
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
index 0f63f08f7e..7fdef315ae 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -44,7 +44,8 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean,
 OUT blk_write_time float8,
 OUT wal_records int8,
 OUT wal_fpi int8,
-OUT wal_bytes numeric
+OUT wal_bytes numeric,
+OUT generic_calls int8
 )
 RETURNS SETOF record
 AS 'MODULE_PATHNAME', 'pg_stat_statements_1_8'
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 62cccbfa44..b14919c989 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -81,6 +81,7 @@
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/plancache.h"
 #include "utils/timestamp.h"
 
 PG_MODULE_MAGIC;
@@ -192,6 +193,7 @@ typedef struct Counters
 	int64		wal_records;	/* # of WAL records generated */
 	int64		wal_fpi;		/* # of WAL full page images generated */
 	uint64		wal_bytes;		/* total amount of WAL generated in bytes */
+	int64		generic_calls;	/* # of times generic plans executed */
 } Counters;
 
 /*
@@ -1446,6 +1448,10 @@ pgss_store(const char *query, uint64 queryId,
 			if (e->counters.max_time[kind] < total_time)
 e->counters.max_time[kind] = total_time;
 		}
+
+		if (kind == PGSS_EXEC && is_plan_type_generic)
+			e->counters.generic_calls += 1;
+
 		e->counters.rows += rows;
 		e->counters.shared_blks_hit += bufusage->shared_blks_hit;
 		e->counters.shared_blks_read += bufusage->shared_blks_read;
@@ -1510,8 +1516,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
 #define PG_STAT_STATEMENTS_COLS_V1_1	18
 #define PG_STAT_STATEMENTS_COLS_V1_2	19
 #define PG_STAT_STATEMENTS_COLS_V1_3	23

Re: [PATCH] Add extra statistics to explain for Nested Loop

2021-03-25 Thread Justin Pryzby
> diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
> index afc45429ba4..723eccca013 100644
> --- a/src/backend/commands/explain.c
> +++ b/src/backend/commands/explain.c
> @@ -1589,29 +1589,82 @@ ExplainNode(PlanState *planstate, List *ancestors,
>   double  startup_ms = 1000.0 * 
> planstate->instrument->startup / nloops;
>   double  total_ms = 1000.0 * 
> planstate->instrument->total / nloops;
>   double  rows = planstate->instrument->ntuples / nloops;
> + double  total_rows = planstate->instrument->ntuples;
> + double  min_r = planstate->instrument->min_tuples;
> + double  max_r = planstate->instrument->max_tuples;
> + double  min_t_ms = 1000.0 * 
> planstate->instrument->min_t;
> + double  max_t_ms = 1000.0 * 
> planstate->instrument->max_t;
>  
>   if (es->format == EXPLAIN_FORMAT_TEXT)
>   {
> - if (es->timing)
> - appendStringInfo(es->str,
> -  " (actual 
> time=%.3f..%.3f rows=%.0f loops=%.0f)",
> -  startup_ms, 
> total_ms, rows, nloops);
> + if (nloops > 1 && es->verbose)
> + {
> + if (es->timing)
> + {
> + appendStringInfo(es->str,
> +  " 
> (actual time=%.3f..%.3f rows=%.0f loops=%.0f)\n",
> +  
> startup_ms, total_ms, rows, nloops);
> + ExplainIndentText(es);
> + appendStringInfo(es->str,
> +  "Loop: 
> min_time=%.3f max_time=%.3f min_rows=%.0f max_rows=%.0f total_rows=%.0f",
> +  
> min_t_ms, max_t_ms, min_r, max_r, total_rows);

Lines with "colon" format shouldn't use equal signs, and should use two spaces
between fields.  See:
https://www.postgresql.org/message-id/20200619022001.gy17...@telsasoft.com
https://www.postgresql.org/message-id/20200402054120.gc14...@telsasoft.com
https://www.postgresql.org/message-id/20200407042521.GH2228%40telsasoft.com


> + }
> + else
> + {
> + appendStringInfo(es->str,
> +  " 
> (actual rows=%.0f loops=%.0f)\n",
> +  rows, 
> nloops);
> + ExplainIndentText(es);
> + appendStringInfo(es->str,
> +  "Loop: 
> min_rows=%.0f max_rows=%.0f total_rows=%.0f",
> +  min_r, 
> max_r, total_rows);
> + }
> + }
>   else
> - appendStringInfo(es->str,
> -  " (actual 
> rows=%.0f loops=%.0f)",
> -  rows, nloops);
> + {
> + if (es->timing)
> + appendStringInfo(es->str,
> +  " 
> (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
> +  
> startup_ms, total_ms, rows, nloops);
> + else
> + appendStringInfo(es->str,
> +  " 
> (actual rows=%.0f loops=%.0f)",
> +  rows, 
> nloops);
> + }
>   }
>   else

Since this is now on a separate line, the "if (nloops > 1 && es->verbose)"
can be after the existing "if (es->timing)", and shouldn't need its own
"if (es->timing)".  It should conditionally add a separate line, rather than
duplicating the "(actual.*" line.

> - if (es->timing)
> + if (nloops > 1 && es->verbose)

In non-text mode, think you should not check "nloops > 1".  Rather, print the
field as 0.

The whole logic is duplicated for parallel workers.  This could instead be a
function, called from both places.  I think this would handle the computation
as well as the output.  This would 

Re: [PATCH] pg_permissions

2021-03-25 Thread Alvaro Herrera
On 2021-Mar-25, Joel Jacobson wrote:

> pg_shdepend doesn't contain the aclitem info though,
> so it won't work for pg_permissions if we want to expose
> privilege_type, is_grantable and grantor.

Ah, of course -- the only way to obtain the acl columns is by going
through the catalogs individually, so it won't be possible.  I think
this could be fixed with some very simple, quick function pg_get_acl()
that takes a catalog OID and object OID and returns the ACL; then
use aclexplode() to obtain all those details.

> The semantics will not be entirely the same,
> since internal objects are not tracked in pg_shdepend,
> but I think this is an improvement.

I just realized that pg_shdepend will not show anything for pinned users
(the bootstrap superuser).  I *think* this is not a problem.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"E pur si muove" (Galileo Galilei)




Re: FETCH FIRST clause PERCENT option

2021-03-25 Thread David Steele

On 1/26/21 11:29 AM, Surafel Temesgen wrote:


On Mon, Jan 25, 2021 at 2:39 PM Kyotaro Horiguchi 
mailto:horikyota@gmail.com>> wrote:


Sorry for the dealy. I started to look this.

Hi kyotaro,
Thanks for looking into this but did we agree to proceed
on this approach? I fear that it will be the west of effort if
Andrew comes up with the patch for his approach.
Andrew Gierth: Are you working on your approach?


[Added Andrew to recipients]

Andrew, would you care to comment?

Regards,
--
-David
da...@pgmasters.net




Re: Get memory contexts of an arbitrary backend process

2021-03-25 Thread torikoshia

On 2021-03-25 22:02, Fujii Masao wrote:

On 2021/03/25 0:17, torikoshia wrote:

On 2021-03-23 17:24, Kyotaro Horiguchi wrote:

Thanks for reviewing and suggestions!


The patched version failed to be compiled as follows. Could you fix 
this issue?


Sorry, it included a header file that's not contained in
the current version patch.

Attached new one.

mcxtfuncs.c:22:10: fatal error: utils/mcxtfuncs.h: No such file or 
directory

 #include "utils/mcxtfuncs.h"
  ^~~
compilation terminated.
make[4]: *** [: mcxtfuncs.o] Error 1
make[4]: *** Waiting for unfinished jobs
make[3]: *** [../../../src/backend/common.mk:39: adt-recursive] Error 2
make[3]: *** Waiting for unfinished jobs
make[2]: *** [common.mk:39: utils-recursive] Error 2
make[1]: *** [Makefile:42: all-backend-recurse] Error 2
make: *** [GNUmakefile:11: all-src-recurse] Error 2

https://cirrus-ci.com/task/4621477321375744

Regards,
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1d3429fbd9..a4017a0760 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24821,6 +24821,37 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_get_backend_memory_contexts
+
+pg_get_backend_memory_contexts (
+  pid integer,
+  max_children integer )
+setof record
+   
+   
+Get memory contexts whose backend process has the specified process ID.
+max_children limits the max number of children
+to print per one parent context. 0 means unlimited.
+When pid equals 0,
+pg_get_backend_memory_contexts displays all
+the memory contexts of the local process regardless of
+max_children. 
+When pid does not equal 0,
+memory contexts will be printed based on the log configuration set.
+See  for more information.
+Only superusers can call this function even when the specified process
+is non-superuser backend.
+Note that when multiple
+pg_get_backend_memory_contexts called in
+succession or simultaneously, max_children can
+be the one of another
+pg_get_backend_memory_contexts.
+   
+  
+
   

 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 0dca65dc7b..48a1a0e958 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -555,10 +555,10 @@ REVOKE ALL ON pg_shmem_allocations FROM PUBLIC;
 REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC;
 
 CREATE VIEW pg_backend_memory_contexts AS
-SELECT * FROM pg_get_backend_memory_contexts();
+SELECT * FROM pg_get_backend_memory_contexts(0, 0);
 
 REVOKE ALL ON pg_backend_memory_contexts FROM PUBLIC;
-REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts FROM PUBLIC;
 
 -- Statistics views
 
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 3e4ec53a97..ed5393324a 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -46,6 +46,7 @@
 #include "storage/sinvaladt.h"
 #include "storage/spin.h"
 #include "utils/snapmgr.h"
+#include "utils/memutils.h"
 
 /* GUCs */
 int			shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -269,6 +270,7 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	McxtLogShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c6a8d4611e..c61d5079e2 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -30,6 +30,7 @@
 #include "storage/shmem.h"
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
+#include "utils/memutils.h"
 
 /*
  * The SIGUSR1 signal is multiplexed to support signaling multiple event
@@ -440,6 +441,20 @@ HandleProcSignalBarrierInterrupt(void)
 	/* latch will be set by procsignal_sigusr1_handler */
 }
 
+/*
+ * HandleProcSignalLogMemoryContext
+ *
+ * Handle receipt of an interrupt indicating logging memory context.
+ * Signal handler portion of interrupt handling.
+ */
+static void
+HandleProcSignalLogMemoryContext(void)
+{
+	InterruptPending = true;
+	LogMemoryContextPending = true;
+	/* latch will be set by procsignal_sigusr1_handler */
+}
+
 /*
  * Perform global barrier related interrupt checking.
  *
@@ -580,6 +595,27 @@ ProcessProcSignalBarrier(void)
 	ConditionVariableBroadcast(&MyProcSignalSlot->pss_barrierCV);
 }
 
+/*
+ * ProcessLogMemoryContextInterrupt
+ *		The portion of logging memory context interrupt handling that runs
+ *		outside of the signal handler.
+ */
+void
+ProcessLogMemoryContextInterrupt(void)
+{
+	int		max_children;
+
+	LogMemoryContextPending = false;
+
+	max_children = mcxtLogData->maxChildrenPerContext;
+
+	ereport(LOG,
+		(errmsg("Log

Re: Stronger safeguard for archive recovery not to miss data

2021-03-25 Thread David Steele

On 1/25/21 3:55 AM, Laurenz Albe wrote:

On Mon, 2021-01-25 at 08:19 +, osumi.takami...@fujitsu.com wrote:

I think you should pst another patch where the second, now superfluous, error
message is removed.


Updated. This patch showed no failure during regression tests
and has been aligned by pgindent.


Looks good to me.
I'll set it to "ready for committer" again.


Fujii, does the new patch in [1] address your concerns?

Regards,
--
-David
da...@pgmasters.net

[1] 
https://www.postgresql.org/message-id/OSBPR01MB48887EFFCA39FA9B1DBAFB0FEDBD0%40OSBPR01MB4888.jpnprd01.prod.outlook.com





Re: [UNVERIFIED SENDER] Re: Challenges preventing us moving to 64 bit transaction id (XID)?

2021-03-25 Thread David Steele

On 1/22/21 6:46 PM, Finnerty, Jim wrote:

First 3 patches derived from the original 64-bit xid patch set by Alexander 
Korotkov


The patches no longer apply 
(http://cfbot.cputube.org/patch_32_2960.log), so marked Waiting on Author.


I also removed the PG14 target since this is a fresh patch set after a 
long hiatus with no new review.


Regards,
--
-David
da...@pgmasters.net




Re: Tying an object's ownership to datdba

2021-03-25 Thread John Naylor
On Thu, Mar 25, 2021 at 12:07 AM Noah Misch  wrote:
>
> > In the refactoring patch, there is a lingering comment reference to
roles_has_privs_of(). Aside from that, it looks good to me. A possible
thing to consider is an assert that is_admin is not null where we expect
that.
>
> Thanks.  The next version will contain the comment fix and
> "Assert(OidIsValid(admin_of) == PointerIsValid(is_admin));".
>
> > The database owner role patch looks good as well.
>
> Do you plan to put the CF entry into Ready for Committer, or should the
> patches wait for another review?

I've marked it Ready for Committer.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Polyphase merge is obsolete

2021-03-25 Thread David Steele

On 3/25/21 9:41 AM, David Steele wrote:

On 1/22/21 5:19 PM, Heikki Linnakangas wrote:

On 22/10/2020 14:48, Heikki Linnakangas wrote:

On 11/09/2017 13:37, Tomas Vondra wrote:

I planned to do some benchmarking on this patch, but apparently the
patch no longer applies. Rebase please?


Here's a rebase of this. Sorry to keep you waiting :-).


I am amused that the wait here was 3+ years!


Here's an updated version that fixes one bug:


We're near the end of the CF and this has not really seen any review 
after a long hiatus. I'll move this to the next CF on April 30 unless I 
hear objections.


Oops, I meant March 30.

Regards,
--
-David
da...@pgmasters.net




Re: Polyphase merge is obsolete

2021-03-25 Thread David Steele

On 1/22/21 5:19 PM, Heikki Linnakangas wrote:

On 22/10/2020 14:48, Heikki Linnakangas wrote:

On 11/09/2017 13:37, Tomas Vondra wrote:

I planned to do some benchmarking on this patch, but apparently the
patch no longer applies. Rebase please?


Here's a rebase of this. Sorry to keep you waiting :-).


I am amused that the wait here was 3+ years!


Here's an updated version that fixes one bug:


We're near the end of the CF and this has not really seen any review 
after a long hiatus. I'll move this to the next CF on April 30 unless I 
hear objections.


Regards,
--
-David
da...@pgmasters.net




Re: PoC/WIP: Extended statistics on expressions

2021-03-25 Thread Dean Rasheed
On Thu, 25 Mar 2021 at 00:05, Tomas Vondra
 wrote:
>
> here's an updated patch. 0001

The change to the way that CreateStatistics() records dependencies
isn't quite right -- recordDependencyOnSingleRelExpr() will not create
any dependencies if the expression uses only a whole-row Var. However,
pull_varattnos() will include whole-row Vars, and so nattnums_exprs
will be non-zero, and CreateStatistics() will not create a whole-table
dependency when it should.

I suppose that could be fixed up by inspecting the bitmapset returned
by pull_varattnos() in more detail, but I think it's probably safer to
revert to the previous code, which matched what index_create() did.

Regards,
Dean




Re: pg_rewind copies

2021-03-25 Thread David Steele

On 1/22/21 8:26 AM, Heikki Linnakangas wrote:

On 16/12/2020 00:08, Cary Huang wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:    not tested

Hello

The patch seems to do as described and the regression and tap tests 
are passing

+    /*
+ * A local source is not expected to change while we're 
rewinding, so check

+ * that we size of the file matches our earlier expectation.
+ */
Is this a tyoo?


Yep, thanks! Attached is a new patch version, with that fixed and 
rebased. No other changes.


Cary, does this patch look ready to commit? If so, please change the 
state in the CF entry to "Ready for Committer".


Regards,
--
-David
da...@pgmasters.net




Re: Refactor SSL test framework to support multiple TLS libraries

2021-03-25 Thread Andrew Dunstan


On 3/24/21 7:49 PM, Daniel Gustafsson wrote:
>> On 25 Mar 2021, at 00:26, Alvaro Herrera  wrote:
>>
>> On 2021-Mar-25, Daniel Gustafsson wrote:
>>
>>> Attached is a v2 which addresses the comments raised on the main NSS 
>>> thread, as
>>> well as introduces named parameters for the server cert function to make the
>>> test code easier to read.
>> I don't like this patch.  I think your SSL::Server::OpenSSL and
>> SSL::Server::NSS packages should be doing "use parent SSL::Server";
>> having SSL::Server grow a line to "use" its subclass
>> SSL::Server::OpenSSL and import its get_new_openssl_backend() method
>> seems to go against the grain.
> I'm far from skilled at Perl module inheritance but that makes sense, I'll 
> take
> a stab at that after some sleep and coffee.
>

The thing is that SSLServer isn't currently constructed in an OO
fashion. Typically, OO modules in perl don't export anything, and all
access is via the class (for the constructor or static methods) or
instances, as in

    my $instance = MyClass->new();
    $instance->mymethod();

In such a module you should not see lines using Exporter or defining
@Export.

So probably the first step in this process would be to recast SSLServer
as an OO type module, without subclassing it, and then create a subclass
along the lines Alvarro suggests.

If this is all strange to you, I can help a bit.

Incidentally, I'm not sure why we need to break SSLServer into
SSL::Server - are we expecting to create other children of the SSL
namespace?


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Is it useful to record whether plans are generic or custom?

2021-03-25 Thread Fujii Masao




On 2021/03/23 16:32, torikoshia wrote:

On 2021-03-05 17:47, Fujii Masao wrote:

Thanks for your comments!


Thanks for updating the patch!

PostgreSQL Patch Tester reported that the patched version failed to be compiled
at Windows. Could you fix this issue?
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.131238

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-25 Thread Fujii Masao




On 2021/03/25 11:50, Masahiro Ikeda wrote:



On 2021/03/23 16:10, Fujii Masao wrote:



On 2021/03/22 20:25, ikedamsh wrote:

Agreed. Users can know whether the stats is for walreceiver or not. The
pg_stat_wal view in standby server shows for the walreceiver, and in primary
server it shows for the others. So, I updated the document.
(v20-0003-Makes-the-wal-receiver-report-WAL-statistics.patch)


Thanks for updating the docs!

There was the discussion about when the stats collector is invoked, at [1].
Currently during archive recovery or standby, the stats collector is
invoked when the startup process reaches the consistent state, sends
PMSIGNAL_BEGIN_HOT_STANDBY, and then the system is starting accepting
read-only connections. But walreceiver can be invoked at earlier stage.
This can cause walreceiver to generate and send the statistics about WAL
writing even though the stats collector has not been running yet. This might
be problematic? If so, maybe we need to ensure that the stats collector is
invoked before walreceiver?

During recovery, the stats collector is not invoked if hot standby mode is
disabled. But walreceiver can be running in this case. So probably we should
change walreceiver so that it's invoked even when hot standby is disabled?
Otherwise we cannnot collect the statistics about WAL writing by walreceiver
in that case.

[1]
https://postgr.es/m/e5a982a5-8bb4-5a10-cf9a-40dd1921b...@oss.nttdata.com


Thanks for comments! I didn't notice that.
As I mentioned[1], if my understanding is right, this issue seem to be not for
only the wal receiver.

Since the shared memory thread already handles these issues, does this patch,
which to collect the stats for the wal receiver and make a common function for
writing wal files, have to be committed after the patches for share memory
stats are committed? Or to handle them in this thread because we don't know
when the shared memory stats patches will be committed.

I think the former is better because to collect stats in shared memory is very
useful feature for users and it make a big change in design. So, I think it's
beneficial to make an effort to move the shared memory stats thread forward
(by reviewing or testing) instead of handling the issues in this thread.


Sounds reasonable. Agreed.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Get memory contexts of an arbitrary backend process

2021-03-25 Thread Fujii Masao




On 2021/03/25 0:17, torikoshia wrote:

On 2021-03-23 17:24, Kyotaro Horiguchi wrote:

Thanks for reviewing and suggestions!


The patched version failed to be compiled as follows. Could you fix this issue?

mcxtfuncs.c:22:10: fatal error: utils/mcxtfuncs.h: No such file or directory
 #include "utils/mcxtfuncs.h"
  ^~~
compilation terminated.
make[4]: *** [: mcxtfuncs.o] Error 1
make[4]: *** Waiting for unfinished jobs
make[3]: *** [../../../src/backend/common.mk:39: adt-recursive] Error 2
make[3]: *** Waiting for unfinished jobs
make[2]: *** [common.mk:39: utils-recursive] Error 2
make[1]: *** [Makefile:42: all-backend-recurse] Error 2
make: *** [GNUmakefile:11: all-src-recurse] Error 2

https://cirrus-ci.com/task/4621477321375744

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Additional Chapter for Tutorial - arch-dev.sgml

2021-03-25 Thread David Steele

On 1/22/21 4:15 AM, Heikki Linnakangas wrote:

On 21/01/2021 14:38, Jürgen Purtz wrote:

This supervisor process is called postmaster and listens at
a specified TCP/IP port for incoming connections. Whenever he
detects a request for a connection, he spawns a new backend process.


It sounds weird to refer to a process with "he". I left out this hunk, 
and the other with similar changes.


Committed the rest, thanks!.


So it looks like this was committed. Is there anything left to do?

If not, we should close the CF entry.

Regards,
--
-David
da...@pgmasters.net




Re: [PATCH] More docs on what to do and not do in extension code

2021-03-25 Thread David Steele

On 1/22/21 1:36 AM, Craig Ringer wrote:


Would you mind attaching a revised version of the patch with your edits? 
Otherwise I'll go and merge them in once you've had your say on my 
comments inline below.


Bharath, do the revisions in [1] look OK to you?

Bruce, Robert, can I have an opinion from you on how best to locate and 
structure these docs, or whether you think they're suitable for the main 
docs at all? See patch upthread.


Bruce, Robert, any thoughts here?

Regards,
--
-David
da...@pgmasters.net

[1] 
https://www.postgresql.org/message-id/CAGRY4nyjHh-Tm89A8eS1x%2BJtZ-qHU7wY%2BR0DEEtWfv5TQ3HcGA%40mail.gmail.com





Re: insensitive collations

2021-03-25 Thread Daniel Verite
Jim Finnerty wrote:

> Currently nondeterministic collations are disabled at the database level. 

Deterministic ICU collations are also disabled.

> The cited reason was because of the lack of LIKE support and because certain
> catalog views use LIKE. 

But the catalogs shouldn't use the default collation of the database.

commit 586b98fdf1aaef4a27744f8b988479aad4bd9a01 provides
some details about this:

Author: Tom Lane 
Date:   Wed Dec 19 17:35:12 2018 -0500

Make type "name" collation-aware.

The "name" comparison operators now all support collations, making them
functionally equivalent to "text" comparisons, except for the different
physical representation of the datatype.  They do, in fact, mostly share
the varstr_cmp and varstr_sortsupport infrastructure, which has been
slightly enlarged to handle the case.

To avoid changes in the default behavior of the datatype, set name's
typcollation to C_COLLATION_OID not DEFAULT_COLLATION_OID, so that
by default comparisons to a name value will continue to use strcmp
semantics.  (This would have been the case for system catalog columns
anyway, because of commit 6b0faf723, but doing this makes it true for
user-created name columns as well.  In particular, this avoids
locale-dependent changes in our regression test results.)

In consequence, tweak a couple of places that made assumptions about
collatable base types always having typcollation DEFAULT_COLLATION_OID.
I have not, however, attempted to relax the restriction that user-
defined collatable types must have that.  Hence, "name" doesn't
behave quite like a user-defined type; it acts more like a domain
with COLLATE "C".  (Conceivably, if we ever get rid of the need for
catalog name columns to be fixed-length, "name" could actually become
such a domain over text.  But that'd be a pretty massive undertaking,
and I'm not volunteering.)

Discussion: https://postgr.es/m/15938.1544377...@sss.pgh.pa.us


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: [PATCH 1/1] Initial mach based shared memory support.

2021-03-25 Thread David Steele

On 1/19/21 6:50 AM, James Hilliard wrote:

On Mon, Jan 18, 2021 at 11:12 PM Tom Lane  wrote:


James Hilliard  writes:

from my understanding due to mach semantics a number of the sanity checks
we are doing for sysv shmem are probably unnecessary when using mach
API's due to the mach port task bindings(mach_task_self()) effectively
ensuring our maps are already task bound and won't conflict with other tasks.


Really?  If so, this whole patch is pretty much dead in the water,
because the fact that sysv shmem is globally accessible is exactly
why we use it (well, that and the fact that you can find out how many
processes are attached to it).  It's been a long time since we cared
about sysv shmem as a source of shared storage.  What we really use
it for is as a form of mutual exclusion, i.e. being able to detect
whether any live children remain from a dead postmaster.  That is,
PGSharedMemoryIsInUse is not some minor ancillary check, it's the main
reason this module exists at all.  If POSIX-style mmap'd shmem had the
same capability we'd likely have dropped sysv altogether long ago.

>

Oh, I had figured the mutual exclusion check was just to ensure that
one didn't accidentally overlap an existing shared memory map.


At the very least, this patch does not seem to be ready for review, so I 
have marked it Waiting on Author.


Since it appears the approach needs to be reconsidered, my 
recommendation is to mark it Returned with Feedback at the end of the CF 
so I'll do that unless there are objections.


Thanks,
--
-David
da...@pgmasters.net




Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.

2021-03-25 Thread Fujii Masao




On 2021/03/25 9:58, Andres Freund wrote:

Hi,

On 2021-03-24 18:36:14 +0900, Fujii Masao wrote:

Barring any objection, I'm thinking to commit this patch.


To which branches?


I was thinking to push the patch to the master branch
because this is not a bug fix.


I am *strongly* opposed to backpatching this.


+1


This strikes me as a quite a misleading function name.


Yeah, better name is always welcome :)


Outside of very
narrow circumstances a normal exit shouldn't use _exit(1). Neither 1 no
_exit() (as opposed to exit()) seem appropriate. This seems like a bad
idea.


So you're thinking that the stats collector should do proc_exit(0)
or something even when it receives immediate shutdown request?

One idea to avoid using _exit(1) is to change the SIGQUIT handler
so that it just sets the flag. Then if the stats collector detects that
the flag is set in the main loop, it gets out of the loop,
skips writing the permanent stats file and then exits with exit(0).
That is, normal and immediate shutdown requests are treated
almost the same way in the stats collector. Only the difference of
them is whether it saves the stats to the file or not. Thought?


Also, won't this lead to postmaster now starting to complain about
pgstat having crashed in an immediate shutdown? Right now only exit
status 0 is expected:

if (pid == PgStatPID)
{
PgStatPID = 0;
if (!EXIT_STATUS_0(exitstatus))
LogChildExit(LOG, _("statistics collector 
process"),
 pid, exitstatus);
if (pmState == PM_RUN || pmState == PM_HOT_STANDBY)
PgStatPID = pgstat_start();
continue;
}


No. In immediate shutdown case, pmdie() sets "Shutdown" variable to
ImmdiateShutdown and HandleChildCrash() doesn't complain that in that case
because of the following.

/*
 * We only log messages and send signals if this is the first process
 * crash and we're not doing an immediate shutdown; otherwise, we're 
only
 * here to update postmaster's idea of live processes.  If we have 
already
 * signaled children, nonzero exit status is to be expected, so don't
 * clutter log.
 */
take_action = !FatalError && Shutdown != ImmediateShutdown;


+ * The statistics collector is started by the postmaster as soon as the
+ * startup subprocess finishes, or as soon as the postmaster is ready
+ * to accept read only connections during archive recovery.  It remains
+ * alive until the postmaster commands it to terminate.  Normal
+ * termination is by SIGUSR2 after the checkpointer must exit(0),
+ * which instructs the statistics collector to save the final statistics
+ * to reuse at next startup and then exit(0).


I can't parse "...after the checkpointer must exit(0), which instructs..."


What about changing that to the following?


Normal termination is requested after checkpointer exits. It's by SIGUSR2,
which instructs the statistics collector to save the final statistics and
then exit(0).



+ * Emergency termination is by SIGQUIT; like any backend, the statistics
+ * collector will exit quickly without saving the final statistics.  It's
+ * ok because the startup process will remove all statistics at next
+ * startup after emergency termination.


Normally there won't be any stats to remove, no? The permanent stats
file has been removed when the stats collector started up.


Yes. In normal case, when the stats collector starts up, it loads the stats
from the file and remove the file. OTOH, when the recovery is necessary,
instead the startup process calls pgstat_reset_all() and removes the stats 
files.

You're thinking that the above comments are confusing?
If so, what about the following?

--
Emergency termination is by SIGQUIT; the statistics collector
will exit quickly without saving the final statistics. In this case
the statistics files don't need to be written because the next startup
will trigger a crash recovery and remove all statistics files forcibly
even if they exist.
--

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: insensitive collations

2021-03-25 Thread Daniel Verite
Jim Finnerty wrote:

> For a UTF8 encoded, case-insensitive (nondeterministic), accent-sensitive
> ICU
> collation, a LIKE predicate can be used with a small transformation of the
> predicate, and the pattern can contain multi-byte characters:
> 
> from:
> 
> SELECT * FROM locations WHERE location LIKE 'midi-Pyrené%';  
> -- ERROR:  nondeterministic collations are not supported for LIKE
> 
> to:
> 
> SELECT * FROM locations WHERE lower(location) COLLATE "C" LIKE
> lower('midi-Pyrené%');

For prefix matching, there's a simpler way with non-deterministic
collations based on the advice in [1]

The trick is that if an ICU collation is assigned to "location",
whether it's deterministic or not,

SELECT * FROM locations WHERE location LIKE 'midi-Pyrené%';

is equivalent to:

SELECT * FROM locations WHERE location BETWEEN
   'midi-Pyrené' AND 'midi-Pyrené' || E'\u';

and that will use a btree index if available.

Also, it works with all features of ND-collations and all encodings, not
just case-insensitiveness and UTF-8.

Now that doesn't solve LIKE '%midi-Pyrené%', or LIKE '%midi_Pyrené%',
but that trick could be a building block for an algorithm implementing
LIKE with ND-collations in the future.

[1]
https://unicode-org.github.io/icu/userguide/collation/architecture.html#generating-bounds-for-a-sort-key-prefix-matching


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: SQL/JSON: JSON_TABLE

2021-03-25 Thread David Steele

On 1/20/21 8:42 PM, Nikita Glukhov wrote:

Thank you for review.

Attached 45th version of the patches. "SQL/JSON functions" patch corresponds to
v52 patch set posted in the separate thread.


Another rebase needed (http://cfbot.cputube.org/patch_32_2902.log), 
marked Waiting on Author.


I can see that Álvaro suggested that the patch be split up so it can be 
reviewed and committed in pieces. It looks like you've done that to some 
extent, but I wonder if more can be done. In particular, it looks like 
that first patch could be broken up -- at lot.


Regards,
--
-David
da...@pgmasters.net




Re: Replication slot stats misgivings

2021-03-25 Thread Amit Kapila
On Thu, Mar 25, 2021 at 11:36 AM Masahiko Sawada  wrote:
>
> On Wed, Mar 24, 2021 at 7:06 PM Amit Kapila  wrote:
> >
> >
> > Leaving aside restart case, without some sort of such sanity checking,
> > if both drop (of old slot) and create (of new slot) messages are lost
> > then we will start accumulating stats in old slots. However, if only
> > one of them is lost then there won't be any such problem.
> >
> > > Perhaps we could have RestoreSlotFromDisk() send something to the stats
> > > collector ensuring the mapping makes sense?
> > >
> >
> > Say if we send just the index location of each slot then probably we
> > can setup replSlotStats. Now say before the restart if one of the drop
> > messages was missed (by stats collector) and that happens to be at
> > some middle location, then we would end up restoring some already
> > dropped slot, leaving some of the still required ones. However, if
> > there is some sanity identifier like name along with the index, then I
> > think that would have worked for such a case.
>
> Even such messages could also be lost? Given that any message could be
> lost under a UDP connection, I think we cannot rely on a single
> message. Instead, I think we need to loosely synchronize the indexes
> while assuming the indexes in replSlotStats and
> ReplicationSlotCtl->replication_slots are not synchronized.
>
> >
> > I think it would have been easier if we would have some OID type of
> > identifier for each slot. But, without that may be index location of
> > ReplicationSlotCtl->replication_slots and slotname combination can
> > reduce the chances of slot stats go wrong quite less even if not zero.
> > If not name, do we have anything else in a slot that can be used for
> > some sort of sanity checking?
>
> I don't see any useful information in a slot for sanity checking.
>

In that case, can we do a hard check for which slots exist if
replSlotStats runs out of space (that can probably happen only after
restart and when we lost some drop messages)?


-- 
With Regards,
Amit Kapila.




Re: PoC/WIP: Extended statistics on expressions

2021-03-25 Thread Dean Rasheed
On Thu, 25 Mar 2021 at 00:05, Tomas Vondra
 wrote:
>
> Actually, I think we need that block at all - there's no point in
> keeping the exact expression, because if there was a statistics matching
> it it'd be matched by the examine_variable. So if we get here, we have
> to just split it into the vars anyway. So the second block is entirely
> useless.

Good point.

> That however means we don't need the processing with GroupExprInfo and
> GroupVarInfo lists, i.e. we can revert back to the original simpler
> processing, with a bit of extra logic to match expressions, that's all.
>
> The patch 0003 does this (it's a bit crude, but hopefully enough to
> demonstrate).

Cool. I did wonder about that, but I didn't fully think it through.
I'll take a look.

> 0002 is an attempt to fix an issue I noticed today - we need to handle
> type changes.
>
> I think we have two options:
>
> a) Make UpdateStatisticsForTypeChange smarter to also transform and
> update the expression string, and reset pg_statistics[] data.
>
> b) Just recreate the statistics, just like we do for indexes. Currently
> this does not force analyze, so it just resets all the stats. Maybe it
> should do analyze, though.

I'd vote for (b) without an analyse, and I agree with getting rid of
UpdateStatisticsForTypeChange(). I've always been a bit skeptical
about trying to preserve extended statistics after a type change, when
we don't preserve regular per-column stats.

> BTW I wonder how useful the updated statistics actually is. Consider
> this example:
> ...
> the expression now looks like this:
>
> 
> "public"."s" (ndistinct) ON ((a + b)), b)::numeric)::double
> precision + c)) FROM t
> 
>
> But we're matching it to (((b)::double precision + c)), so that fails.
>
> This is not specific to extended statistics - indexes have exactly the
> same issue. Not sure how common this is in practice.

Hmm, that's unfortunate. Maybe it's not that common in practice
though. I'm not sure if there is any practical way to fix it, but if
there is, I guess we'd want to apply the same fix to both stats and
indexes, and that certainly seems out of scope for this patch.

Regards,
Dean




Re: shared-memory based stats collector

2021-03-25 Thread Kyotaro Horiguchi
At Mon, 15 Mar 2021 17:51:31 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Fri, 12 Mar 2021 22:20:40 -0800, Andres Freund  wrote 
> in 
> > Horiguchi-san, is there a chance you could add a few tests (on master)
> > that test/document the way stats are kept across "normal" restarts, and
> > thrown away after crash restarts/immediate restarts and also thrown away
> > graceful streaming rep shutdowns?
> 
> Roger. I'll give it a try.

Sorry, I forgot this. This is it.

retards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
# Verify stats data persistence

use strict;
use warnings;
use PostgresNode;
use TestLib;
use Test::More tests => 6;

my $backup_name = 'my_backup';

# Initialize a test cluster
my $node_primary = get_new_node('primary');
$node_primary->init(allows_streaming => 1);
$node_primary->start;

$node_primary->safe_psql('postgres', 'CREATE TABLE t(); SELECT * from t;');
my $count =
  $node_primary->safe_psql('postgres',
   'SELECT seq_scan FROM 
pg_stat_user_tables');
is($count, '1', "counter is incremented");

# Stas numbers are restored after graceful restart
$node_primary->restart;

$count =
  $node_primary->safe_psql('postgres',
   'SELECT seq_scan FROM 
pg_stat_user_tables');
is($count, '1', "counter is correctly restored");

# Stas numbers are blown away after a crash
$node_primary->stop('immediate');
$node_primary->start;

$count =
  $node_primary->safe_psql('postgres',
   'SELECT seq_scan FROM 
pg_stat_user_tables');
is($count, '0', "counter is reset after crash");

# Create a standby
$node_primary->backup($backup_name);
my $node_standby = get_new_node('standby');
$node_standby->init_from_backup($node_primary, $backup_name,
has_streaming 
=> 1);
$node_standby->start;

# Stats numbers are initialized to zero
$count =
  $node_standby->safe_psql('postgres',
   'SELECT seq_scan FROM 
pg_stat_user_tables');
is($count, '0', "counter is initialized");

# Stats numbers are incremented also on standby
$node_standby->safe_psql('postgres', 'SELECT * from t;');
$count =
  $node_standby->safe_psql('postgres',
   'SELECT seq_scan FROM 
pg_stat_user_tables');
is($count, '1', "counter is incremented on standby");

# Stas numbers are always blown away even after graceful restart on standby
$node_primary->restart;

$count =
  $node_primary->safe_psql('postgres',
   'SELECT seq_scan FROM 
pg_stat_user_tables');
is($count, '0', "counter is reset after restart on standby");


Re: row filtering for logical replication

2021-03-25 Thread Peter Eisentraut

On 22.03.21 03:15, Euler Taveira wrote:

I attached a new patch set that addresses:

* fix documentation;
* rename PublicationRelationQual to PublicationRelationInfo;
* remove the memset that was leftover from a previous patch set;
* add new tests to improve coverage (INSERT/UPDATE/DELETE to exercise 
the row

   filter code).


I have committed the 0001 patch.

Attached are a few fixup patches that I recommend you integrate into 
your patch set.  They address backward compatibility with PG13, and a 
few more stylistic issues.


I suggest you combine your 0002, 0003, and 0004 patches into one.  They 
can't be used separately, and for example the psql changes in patch 0003 
already appear as regression test output changes in 0002, so this 
arrangement isn't useful.  (0005 can be kept separately, since it's 
mostly for debugging right now.)
From c29459505db88c86dd7aa61019fa406202c30b0a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 25 Mar 2021 11:57:48 +0100
Subject: [PATCH 1/6] fixup! Row filter for logical replication

Remove unused header files, clean up whitespace.
---
 src/backend/catalog/pg_publication.c| 2 --
 src/backend/replication/pgoutput/pgoutput.c | 4 
 2 files changed, 6 deletions(-)

diff --git a/src/backend/catalog/pg_publication.c 
b/src/backend/catalog/pg_publication.c
index f31ae28de2..78f5780fb7 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -33,11 +33,9 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
-
 #include "parser/parse_clause.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_relation.h"
-
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/catcache.h"
diff --git a/src/backend/replication/pgoutput/pgoutput.c 
b/src/backend/replication/pgoutput/pgoutput.c
index ce6da8de19..6151f34925 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -16,14 +16,10 @@
 #include "catalog/partition.h"
 #include "catalog/pg_publication.h"
 #include "catalog/pg_publication_rel.h"
-#include "catalog/pg_type.h"
 #include "commands/defrem.h"
 #include "executor/executor.h"
 #include "fmgr.h"
-#include "nodes/execnodes.h"
 #include "nodes/nodeFuncs.h"
-#include "optimizer/planner.h"
-#include "optimizer/optimizer.h"
 #include "parser/parse_coerce.h"
 #include "replication/logical.h"
 #include "replication/logicalproto.h"
-- 
2.30.2

From 91c20ecb45a8627a9252ed589fe6b85927be8b45 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 25 Mar 2021 11:59:13 +0100
Subject: [PATCH 2/6] fixup! Row filter for logical replication

Allow replication from older PostgreSQL versions without prqual.
---
 src/backend/replication/logical/tablesync.c | 70 +++--
 1 file changed, 37 insertions(+), 33 deletions(-)

diff --git a/src/backend/replication/logical/tablesync.c 
b/src/backend/replication/logical/tablesync.c
index 40d84dadb5..246510c82e 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -796,49 +796,53 @@ fetch_remote_table_info(char *nspname, char *relname,
walrcv_clear_result(res);
 
/* Get relation qual */
-   resetStringInfo(&cmd);
-   appendStringInfo(&cmd,
-"SELECT pg_get_expr(prqual, prrelid) "
-"  FROM pg_publication p "
-"  INNER JOIN pg_publication_rel pr "
-"   ON (p.oid = pr.prpubid) "
-" WHERE pr.prrelid = %u "
-"   AND p.pubname IN (", 
lrel->remoteid);
-
-   first = true;
-   foreach(lc, MySubscription->publications)
+   if (walrcv_server_version(wrconn) >= 14)
{
-   char   *pubname = strVal(lfirst(lc));
+   resetStringInfo(&cmd);
+   appendStringInfo(&cmd,
+"SELECT pg_get_expr(prqual, 
prrelid) "
+"  FROM pg_publication p "
+"  INNER JOIN 
pg_publication_rel pr "
+"   ON (p.oid = 
pr.prpubid) "
+" WHERE pr.prrelid = %u "
+"   AND p.pubname IN (", 
lrel->remoteid);
+
+   first = true;
+   foreach(lc, MySubscription->publications)
+   {
+   char   *pubname = strVal(lfirst(lc));
 
-   if (first)
-   first = false;
-   else
-   appendStringInfoString(&cmd, ", ");
+   if (first)
+   first = false;
+   else
+  

Re: [HACKERS] logical decoding of two-phase transactions

2021-03-25 Thread Amit Kapila
On Wed, Mar 24, 2021 at 3:59 PM houzj.f...@fujitsu.com
 wrote:
>
> > I have incorporated all your changes and additionally made few more changes
> > (a) got rid of LogicalRepBeginPrepareData and instead used
> > LogicalRepPreparedTxnData, (b) made a number of changes in comments and
> > docs, (c) ran pgindent, (d) modified tests to use standard wait_for_catch
> > function and removed few tests to reduce the time and to keep regression
> > tests reliable.
>
> Hi,
>
> When reading the code, I found some comments related to the patch here.
>
>  * XXX Now, this can even lead to a deadlock 
> if the prepare
>  * transaction is waiting to get it logically 
> replicated for
>  * distributed 2PC. Currently, we don't have 
> an in-core
>  * implementation of prepares for distributed 
> 2PC but some
>  * out-of-core logical replication solution 
> can have such an
>  * implementation. They need to inform users 
> to not have locks
>  * on catalog tables in such transactions.
>  */
>
> Since we will have in-core implementation of prepares, should we update the 
> comments here ?
>

Fixed this in the latest patch posted by me. I have additionally
updated the docs to reflect the same.

-- 
With Regards,
Amit Kapila.




Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb

2021-03-25 Thread Amul Sul
On Thu, Mar 25, 2021 at 12:10 PM Kyotaro Horiguchi
 wrote:
>
> Sorry for the bug.
>
> At Thu, 25 Mar 2021 01:50:29 -0400, Tom Lane  wrote in
> > Amul Sul  writes:
> > > On Wed, Mar 24, 2021 at 8:09 PM Tom Lane  wrote:
> > >> static inline struct SMgrRelationData *
> > >> RelationGetSmgr(Relation rel)
> > >> {
> > >> if (unlikely(rel->rd_smgr == NULL))
> > >> RelationOpenSmgr(rel);
> > >> return rel->rd_smgr;
> > >> }
> >
> > > A quick question: Can't it be a macro instead of an inline function
> > > like other macros we have in rel.h?
> >
> > The multiple-evaluation hazard seems like an issue.  We've tolerated
> > such hazards in the past, but mostly just because we weren't relying
> > on static inlines being available, so there wasn't a good way around
> > it.
> >
> > Also, the conditional evaluation here would look rather ugly
> > in a macro, I think, if indeed you could do it at all without
> > provoking compiler warnings.
>
> FWIW, +1 for the function as is.
>

Ok, in the attached patch, I have added the inline function to rel.h, and for
that, I end up including smgr.h to rel.h. I tried to replace all rel->rd_smgr
by RelationGetSmgr() function and removed the RelationOpenSmgr() call from
the nearby to it which I don't think needed at all.

Regards,
Amul
From dfff4920996aaa443986d43a7bb1231a0230a1e5 Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Thu, 25 Mar 2021 06:27:58 -0400
Subject: [PATCH] Add RelationGetSmgr() inline function

---
 contrib/amcheck/verify_nbtree.c   |  3 +-
 contrib/bloom/blinsert.c  |  6 +--
 contrib/pg_prewarm/autoprewarm.c  |  4 +-
 contrib/pg_prewarm/pg_prewarm.c   |  5 +--
 contrib/pg_visibility/pg_visibility.c |  7 ++--
 src/backend/access/gist/gistbuild.c   | 14 +++
 src/backend/access/hash/hashpage.c|  4 +-
 src/backend/access/heap/heapam_handler.c  |  7 ++--
 src/backend/access/heap/rewriteheap.c | 11 ++
 src/backend/access/heap/visibilitymap.c   | 46 ++-
 src/backend/access/nbtree/nbtree.c|  8 ++--
 src/backend/access/nbtree/nbtsort.c   | 14 ++-
 src/backend/access/spgist/spginsert.c | 16 
 src/backend/access/table/tableam.c|  7 +---
 src/backend/catalog/index.c   |  5 +--
 src/backend/catalog/storage.c | 17 +
 src/backend/commands/tablecmds.c  |  7 ++--
 src/backend/storage/buffer/bufmgr.c   | 24 +++-
 src/backend/storage/freespace/freespace.c | 40 ++--
 src/include/utils/rel.h   | 13 +++
 20 files changed, 118 insertions(+), 140 deletions(-)

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index b31906cbcfd..7f34eed04ba 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -301,8 +301,7 @@ bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
 		bool		heapkeyspace,
 	allequalimage;
 
-		RelationOpenSmgr(indrel);
-		if (!smgrexists(indrel->rd_smgr, MAIN_FORKNUM))
+		if (!smgrexists(RelationGetSmgr(indrel), MAIN_FORKNUM))
 			ereport(ERROR,
 	(errcode(ERRCODE_INDEX_CORRUPTED),
 	 errmsg("index \"%s\" lacks a main relation fork",
diff --git a/contrib/bloom/blinsert.c b/contrib/bloom/blinsert.c
index d37ceef753a..79fb92debc5 100644
--- a/contrib/bloom/blinsert.c
+++ b/contrib/bloom/blinsert.c
@@ -178,9 +178,9 @@ blbuildempty(Relation index)
 	 * this even when wal_level=minimal.
 	 */
 	PageSetChecksumInplace(metapage, BLOOM_METAPAGE_BLKNO);
-	smgrwrite(index->rd_smgr, INIT_FORKNUM, BLOOM_METAPAGE_BLKNO,
+	smgrwrite(RelationGetSmgr(index), INIT_FORKNUM, BLOOM_METAPAGE_BLKNO,
 			  (char *) metapage, true);
-	log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
+	log_newpage(&(RelationGetSmgr(index))->smgr_rnode.node, INIT_FORKNUM,
 BLOOM_METAPAGE_BLKNO, metapage, true);
 
 	/*
@@ -188,7 +188,7 @@ blbuildempty(Relation index)
 	 * write did not go through shared_buffers and therefore a concurrent
 	 * checkpoint may have moved the redo pointer past our xlog record.
 	 */
-	smgrimmedsync(index->rd_smgr, INIT_FORKNUM);
+	smgrimmedsync(RelationGetSmgr(index), INIT_FORKNUM);
 }
 
 /*
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index b3f73ea92d6..0289ea657cb 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -514,15 +514,13 @@ autoprewarm_database_main(Datum main_arg)
 			old_blk->filenode != blk->filenode ||
 			old_blk->forknum != blk->forknum)
 		{
-			RelationOpenSmgr(rel);
-
 			/*
 			 * smgrexists is not safe for illegal forknum, hence check whether
 			 * the passed forknum is valid before using it in smgrexists.
 			 */
 			if (blk->forknum > InvalidForkNumber &&
 blk->forknum <= MAX_FORKNUM &&
-smgrexists(rel->rd_smgr, blk->forknum))
+smgrexists(RelationGetSmgr(rel), blk->forknum))
 nblocks = RelationGetNumberOfBlocksInFork(rel, blk->forknum);
 			else

Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.

2021-03-25 Thread Fujii Masao




On 2021/03/25 9:31, Masahiro Ikeda wrote:



On 2021/03/24 18:36, Fujii Masao wrote:



On 2021/03/24 3:51, Andres Freund wrote:

Hi,

On 2021-03-23 15:50:46 +0900, Fujii Masao wrote:

This fact makes me wonder that if we collect the statistics about WAL writing
from walreceiver as we discussed in other thread, the stats collector should
be invoked at more earlier stage. IIUC walreceiver can be invoked before
PMSIGNAL_BEGIN_HOT_STANDBY is sent.


FWIW, in the shared memory stats patch the stats subsystem is
initialized early on by the startup process.


This is good news!


Fujii-san, Andres-san,
Thanks for your comments!

I didn't think about the start order. From the point of view, I noticed that
the current source code has two other concerns.


1. This problem is not only for the wal receiver.

The problem which the wal receiver starts before the stats collector
is launched during archive recovery is not only for the the wal receiver but
also the checkpointer and the bgwriter. Before starting redo, the startup
process sends the postmaster "PMSIGNAL_RECOVERY_STARTED" signal to launch the
checkpointer and the bgwriter to be able to perform creating restartpoint.

Although the socket for communication between the stats collector and the
other processes is made in earlier stage via pgstat_init(), I agree to make
the stats collector starts earlier stage is defensive. BTW, in my
environments(linux, net.core.rmem_default = 212992), the socket can buffer
almost 300 WAL stats messages. This mean that, as you said, if the redo phase
is too long, it can lost the messages easily.


2. To make the stats clear in redo phase.

The statistics can be reset after the wal receiver, the checkpointer and
the wal writer are started in redo phase. So, it's not enough the stats
collector is invoked at more earlier stage. We need to fix it.



(I hope I am not missing something.)
Thanks to Andres-san's work([1]), the above problems will be handle in the
shared memory stats patch. First problem will be resolved since the stats are
collected in shared memory, so the stats collector process is unnecessary
itself. Second problem will be resolved to remove the reset code because the
temporary stats file won't generated, and if the permanent stats file
corrupted, just recreate it.


Yes. So we should wait for the shared memory stats patch to be committed
before working on walreceiver stats patch more?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: wal stats questions

2021-03-25 Thread Fujii Masao




On 2021/03/25 16:37, Kyotaro Horiguchi wrote:

At Wed, 24 Mar 2021 21:07:26 -0700, Andres Freund  wrote in

Hi,

On 2021-03-25 10:51:56 +0900, Masahiro Ikeda wrote:

On 2021/03/25 8:22, Andres Freund wrote:

1) What is the motivation to have both prevWalUsage and pgWalUsage,
instead of just accumulating the stats since the last submission?
There doesn't seem to be any comment explaining it? Computing
differences to previous values, and copying to prev*, isn't free. I
assume this is out of a fear that the stats could get reset before
they're used for instrument.c purposes - but who knows?


Is your point that it's better to call pgWalUsage = 0; after sending the
stats?


Yes. At least there should be a comment explaining why it's done the way
it is.


pgWalUsage was used without resetting and we (I) thought that that
behavior should be preserved.  On second thought, as Andres suggested,
we can just reset pgWalUsage at sending since AFAICS no one takes
difference crossing pgstat_report_stat() calls.


Yes, I agree that we can do that since there seems no such code for now.
Also if we do that, we can check, for example "pgWalUsage.wal_records > 0"
as you suggested, to easily determine whether there is pending WAL stats or not.
Anyway I agree it's better to add comments about the design more.



3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
just to figure out if there's been any changes isn't all that
cheap. This is regularly exercised in read-only workloads too. Seems
adding a boolean WalStats.have_pending = true or such would be
better.


I understood that for backends, this may leads performance degradation and
this problem is not only for the WalStats but also SLRUStats.

I wondered the degradation is so big because pgstat_report_stat() checks if at
least PGSTAT_STAT_INTERVAL msec is passed since it last sent before check the
diff. If my understanding is correct, to get timestamp is more expensive.


Getting a timestamp is expensive, yes. But I think we need to check for
the no-pending-wal-stats *before* the clock check. See the below:



4) For plain backends pgstat_report_wal() is called by
pgstat_report_stat() - but it is not checked as part of the "Don't
expend a clock check if nothing to do" check at the top.  It's
probably rare to have pending wal stats without also passing one of
the other conditions, but ...


(I'm not confidence my understanding of your comment is right.)
You mean that we need to expend a clock check in pgstat_report_wal()?
I think it's unnecessary because pgstat_report_stat() is already checked it.


No, I mean that right now we might can erroneously return early
pgstat_report_wal() for normal backends in some workloads:

void
pgstat_report_stat(bool disconnect)
...
/* Don't expend a clock check if nothing to do */
if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
!have_function_stats && !disconnect)
return;

might return if there only is pending WAL activity. This needs to check
whether there are pending WAL stats. Which in turn means that the check
should be fast.


Agreed that the condition is wrong.  On the other hand, the counters
are incremented in XLogInsertRecord() and I think we don't want add
instructions there.


Basically yes. We should avoid that especially while WALInsertLock is being 
hold.
But it's not so harmful to do that after the lock is released?


If any wal activity has been recorded, wal_records is always positive
thus we can check for wal activity just by "pgWalUsage.wal_records >
0, which should be fast enough.


Maybe there is the case where a backend generates no WAL records,
but just writes them because it needs to do write-ahead-logging
when flush the table data? If yes, "pgWalUsage.wal_records > 0" is not enough.
Probably other fields also need to be checked.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [PATCH] Add extra statistics to explain for Nested Loop

2021-03-25 Thread e . sokolova

Thank you all for your feedback and reforms.
I attach a new version of the patch with the some changes and fixes. 
Here's a list of the major changes:
1) New format of extra statistics. This is now contained in a line 
separate from the main statistics.


Julien Rouhaud писал 2021-02-01 08:28:
On Thu, Jan 28, 2021 at 8:38 PM Yugo NAGATA  
wrote:


postgres=# explain (analyze, verbose) select * from a,b where a.i=b.j;
   
 QUERY PLAN

---
 Nested Loop  (cost=0.00..2752.00 rows=991 width=8) (actual 
time=0.021..17.651 rows=991 loops=1)

   Output: a.i, b.j
   Join Filter: (a.i = b.j)
   Rows Removed by Join Filter: 99009
   ->  Seq Scan on public.b  (cost=0.00..2.00 rows=100 width=4) 
(actual time=0.009..0.023 rows=100 loops=1)

 Output: b.j
   ->  Seq Scan on public.a  (cost=0.00..15.00 rows=1000 width=4) 
(actual time=0.005..0.091 min_time=0.065 max_time=0.163 min_rows=1000 
rows=1000 max_rows=1000 loops=100)

 Output: a.i
 Planning Time: 0.066 ms
 Execution Time: 17.719 ms
(10 rows)

I don't like this format where the extra statistics appear in the same
line of existing information because the output format differs 
depended
on whether the plan node's loops > 1 or not. This makes the length of 
a
line too long. Also, other information reported by VERBOSE doesn't 
change

the exiting row format and just add extra rows for new information.

Instead, it seems good for me to add extra rows for the new statistics
without changint the existing row format as other VERBOSE information,
like below.

   ->  Seq Scan on public.a  (cost=0.00..15.00 rows=1000 width=4) 
(actual time=0.005..0.091 rows=1000  loops=100)

 Output: a.i
 Loops: min_time=0.065 max_time=0.163 min_rows=1000 
max_rows=1000


and so  on. What do you think about it?




2) Correction of the case of parallel scan


In parallel scan, the extra statistics are not reported correctly.

This reports max/min rows or time of inner scan as 0 in parallel 
workers,
and as a result only the leader process's ones are accounted. To fix 
this,

we would change InstrAggNode as below.





3) Adding extra statistics about total number of rows (total rows). 
There were many wishes for this here.


Please don't hesitate to share any thoughts on this topic.

--
Ekaterina Sokolova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index afc45429ba4..723eccca013 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1589,29 +1589,82 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		double		startup_ms = 1000.0 * planstate->instrument->startup / nloops;
 		double		total_ms = 1000.0 * planstate->instrument->total / nloops;
 		double		rows = planstate->instrument->ntuples / nloops;
+		double		total_rows = planstate->instrument->ntuples;
+		double		min_r = planstate->instrument->min_tuples;
+		double		max_r = planstate->instrument->max_tuples;
+		double		min_t_ms = 1000.0 * planstate->instrument->min_t;
+		double		max_t_ms = 1000.0 * planstate->instrument->max_t;
 
 		if (es->format == EXPLAIN_FORMAT_TEXT)
 		{
-			if (es->timing)
-appendStringInfo(es->str,
- " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
- startup_ms, total_ms, rows, nloops);
+			if (nloops > 1 && es->verbose)
+			{
+if (es->timing)
+{
+	appendStringInfo(es->str,
+	 " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)\n",
+	 startup_ms, total_ms, rows, nloops);
+	ExplainIndentText(es);
+	appendStringInfo(es->str,
+	 "Loop: min_time=%.3f max_time=%.3f min_rows=%.0f max_rows=%.0f total_rows=%.0f",
+	 min_t_ms, max_t_ms, min_r, max_r, total_rows);
+}
+else
+{
+	appendStringInfo(es->str,
+	 " (actual rows=%.0f loops=%.0f)\n",
+	 rows, nloops);
+	ExplainIndentText(es);
+	appendStringInfo(es->str,
+	 "Loop: min_rows=%.0f max_rows=%.0f total_rows=%.0f",
+	 min_r, max_r, total_rows);
+}
+			}
 			else
-appendStringInfo(es->str,
- " (actual rows=%.0f loops=%.0f)",
- rows, nloops);
+			{
+if (es->timing)
+	appendStringInfo(es->str,
+	 " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
+	 startup_ms, total_ms, rows, nloops);
+else
+	appendStringInfo(es->str,
+	 " (actual rows=%.0f loops=%.0f)",
+	 rows, nloops);
+			}
 		}
 		else
 		{
-			if (es->timing)
+			if (nloops > 1 && es->verbose)
+			{
+if (es->timing)
+{
+	ExplainPropertyFloat("Actual Startup Time", "s", startup_ms,
+		 3, es);
+	ExplainPropertyFloat("Actual Total Time", "s", total_ms,
+		 3, es);
+	ExplainPropertyFl

Re: [PATCH] pg_permissions

2021-03-25 Thread Joel Jacobson
On Tue, Mar 23, 2021, at 21:39, Alvaro Herrera wrote:
>I wonder if these views should be defined on top of pg_shdepend instead
>of querying every single catalog.  That would make for much shorter
>queries.

+1

pg_shdepend doesn't contain the aclitem info though,
so it won't work for pg_permissions if we want to expose
privilege_type, is_grantable and grantor.

pg_shdepend should work fine for pg_ownerships though.

The semantics will not be entirely the same,
since internal objects are not tracked in pg_shdepend,
but I think this is an improvement.

Example:

create role baz;
create type foobar as ( foo int, bar boolean );
alter type foobar owner to baz;

-- UNION ALL variant:

select * from pg_ownerships where owner = 'baz'::regrole;
classid  | objid  | objsubid | owner |  type  | schema |  name   |
identity
--++--+---+++-+-
pg_class | 407858 |0 | baz   | composite type | public | foobar  | 
public.foobar
pg_type  | 407860 |0 | baz   | type   | public | foobar  | 
public.foobar
pg_type  | 407859 |0 | baz   | type   | public | _foobar | 
public.foobar[]
(3 rows)

-- pg_shdepend variant:

select * from pg_ownerships where owner = 'baz'::regrole;
classid | objid  | objsubid | owner | type | schema |  name  |   identity
-++--+---+--+++---
1247 | 407860 |0 | baz   | type | public | foobar | public.foobar
(1 row)

I'll update the patch.

/Joel

Re: proposal: unescape_text function

2021-03-25 Thread Peter Eisentraut



On 10.03.21 14:52, David Steele wrote:
I thought about it a little bit more, and  the prefix specification 
has not too much sense (more if we implement this functionality as 
function "unistr"). I removed the optional argument and renamed the 
function to "unistr". The functionality is the same. Now it supports 
Oracle convention, Java and Python (for Python U) and 
\+XX. These formats was already supported.The compatibility witth 
Oracle is nice.


Peter, it looks like Pavel has aligned this function with unistr() as 
you suggested. Thoughts?


I haven't read through the patch in detail yet, but I support the 
proposed details of the functionality.





Re: [HACKERS] Custom compression methods

2021-03-25 Thread Dilip Kumar
On Wed, Mar 24, 2021 at 10:57 PM Robert Haas  wrote:
>
> On Wed, Mar 24, 2021 at 12:14 PM Dilip Kumar  wrote:
> > Actually, we are already doing this, I mean ALTER TABLE .. ALTER
> > COLUMN .. SET COMPRESSION is already updating the compression method
> > of the index attribute.  So 0003 doesn't make sense, sorry for the
> > noise.  However, 0001 and 0002 are still valid, or do you think that
> > we don't want 0001 also?  If we don't need 0001 also then we need to
> > update the test output for 0002 slightly.
>
> It seems to me that 0002 is still not right. We can't fix the
> attcompression to whatever the default is at the time the index is
> created, because the default can be changed later, and there's no way
> to fix index afterward. I mean, it would be fine to do it that way if
> we were going to go with the other model, where the index state is
> separate from the table state, either can be changed independently,
> and it all gets dumped and restored. But, as it is, I think we should
> be deciding how to compress new values for an expression column based
> on the default_toast_compression setting at the time of compression,
> not the time of index creation.
>

Okay got it.  Fixed as suggested.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From c478a311b3410d5360946e953e1bfd53bbef7197 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Wed, 24 Mar 2021 15:08:14 +0530
Subject: [PATCH v4 1/2] Show compression method in index describe

---
 src/bin/psql/describe.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index eeac0ef..9d15324 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1897,6 +1897,7 @@ describeOneTableDetails(const char *schemaname,
 		if (pset.sversion >= 14 &&
 			!pset.hide_compression &&
 			(tableinfo.relkind == RELKIND_RELATION ||
+			 tableinfo.relkind == RELKIND_INDEX ||
 			 tableinfo.relkind == RELKIND_PARTITIONED_TABLE ||
 			 tableinfo.relkind == RELKIND_MATVIEW))
 		{
-- 
1.8.3.1

From 75dbb3094973a4e69dafc213b760f5f13ab33a91 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Wed, 24 Mar 2021 14:02:06 +0530
Subject: [PATCH v4 2/2] Fix attcompression for index expression columns

For the expression columns, set the attcompression to the invalid
compression method and while actually compressing the data if the
attcompression is not valid then use the default compression method.

Basically, attcompression for the index attribute is always in sync with
that of the table attribute but for the expression columns that is not
possible so for them always use the current default method while
compressing the data.
---
 src/backend/access/brin/brin_tuple.c|  8 +---
 src/backend/access/common/indextuple.c  | 16 ++--
 src/backend/catalog/index.c |  9 +
 src/test/regress/expected/compression.out   |  6 ++
 src/test/regress/expected/compression_1.out | 13 +
 src/test/regress/sql/compression.sql|  7 +++
 6 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c
index 8d03e60..32ffd9f 100644
--- a/src/backend/access/brin/brin_tuple.c
+++ b/src/backend/access/brin/brin_tuple.c
@@ -220,10 +220,12 @@ brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno, BrinMemTuple *tuple,
 
 /*
  * If the BRIN summary and indexed attribute use the same data
- * type, we can use the same compression method. Otherwise we
- * have to use the default method.
+ * type and it has a valid compression method, we can use the
+ * same compression method. Otherwise we have to use the
+ * default method.
  */
-if (att->atttypid == atttype->type_id)
+if (att->atttypid == atttype->type_id &&
+	CompressionMethodIsValid(att->attcompression))
 	compression = att->attcompression;
 else
 	compression = GetDefaultToastCompression();
diff --git a/src/backend/access/common/indextuple.c b/src/backend/access/common/indextuple.c
index 1f6b7b7..4d58644 100644
--- a/src/backend/access/common/indextuple.c
+++ b/src/backend/access/common/indextuple.c
@@ -103,8 +103,20 @@ index_form_tuple(TupleDesc tupleDescriptor,
 			(att->attstorage == TYPSTORAGE_EXTENDED ||
 			 att->attstorage == TYPSTORAGE_MAIN))
 		{
-			Datum		cvalue = toast_compress_datum(untoasted_values[i],
-	  att->attcompression);
+			Datum	cvalue;
+			char	compression = att->attcompression;
+
+			/*
+			 * If the compression method is not valid then use the default
+			 * compression method. This can happen because, for the expression
+			 * columns, we set the attcompression to the invalid compression
+			 * method, while creating the index so that we can use the current
+			 * default compression method when we actually need to compress.
+			 */
+			if (!CompressionMethodIsValid(compression))
+compression = GetDefaultToastCompression();
+
+			cvalue =

[PATCH] add concurrent_abort callback for output plugin

2021-03-25 Thread Markus Wanner

Hi,

here is another tidbit from our experience with using logical decoding. 
The attached patch adds a callback to notify the output plugin of a 
concurrent abort.  I'll continue to describe the problem in more detail 
and how this additional callback solves it.


Streamed transactions as well as two-phase commit transactions may get 
decoded before they finish.  At the point the begin_cb is invoked and 
first changes are delivered to the output plugin, it is not necessarily 
known whether the transaction will commit or abort.


This leads to the possibility of the transaction getting aborted 
concurrent to logical decoding.  In that case, it is likely for the 
decoder to error on a catalog scan that conflicts with the abort of the 
transaction.  The reorderbuffer sports a PG_CATCH block to cleanup. 
However, it does not currently inform the output plugin.  From its point 
of view, the transaction is left dangling until another one comes along 
or until the final ROLLBACK or ROLLBACK PREPARED record from WAL gets 
decoded.  Therefore, what the output plugin might see in this case is:


* filter_prepare_cb (txn A)
* begin_prepare_cb  (txn A)
* apply_change  (txn A)
* apply_change  (txn A)
* apply_change  (txn A)
* begin_cb  (txn B)

In other words, in this example, only the begin_cb of the following 
transaction implicitly tells the output plugin that txn A could not be 
fully decoded.  And there's no upper time boundary on when that may 
happen.  (It could also be another filter_prepare_cb, if the subsequent 
transaction happens to be a two-phase transaction as well.  Or an 
explicit rollback_prepared_cb or stream_abort if there's no other 
transaction in between.)


An alternative and arguably cleaner approach for streamed transactions 
may be to directly invoke stream_abort.  However, the lsn argument 
passed could not be that of the abort record, as that's not known at the 
point in time of the concurrent abort.  Plus, this seems like a bad fit 
for two-phase commit transactions.


Again, this callback is especially important for output plugins that 
invoke further actions on downstream nodes that delay the COMMIT 
PREPARED of a transaction upstream, e.g. until prepared on other nodes. 
Up until now, the output plugin has no way to learn about a concurrent 
abort of the currently decoded (2PC or streamed) transaction (perhaps 
short of continued polling on the transaction status).


I also think it generally improves the API by allowing the output plugin 
to rely on such a callback, rather than having to implicitly deduce this 
from other callbacks.


Thoughts or comments?  If this is agreed on, I can look into adding 
tests (concurrent aborts are not currently covered, it seems).


Regards

Markus
From: Markus Wanner 
Date: Thu, 11 Feb 2021 13:49:55 +0100
Subject: [PATCH] Add a concurrent_abort callback for the output plugin.

Logical decoding of a prepared or streamed transaction may fail if the
transaction got aborted after invoking the begin_cb (and likely having
sent some changes via change_cb), but before the necessary catalog scans
could be performed.  In this case, decoding the transaction is neither
possible nor necessary (given it got rolled back).

To give the output plugin a chance to cleanup the aborted transaction as
well, introduce a concurrent_abort callback.  It is only ever invoked to
terminate unfinished transactions, not for normal aborts.

Adjust contrib/test_decoding to define a concurrent_abort callback.
---
 contrib/test_decoding/test_decoding.c | 29 
 doc/src/sgml/logicaldecoding.sgml | 43 +-
 src/backend/replication/logical/logical.c | 45 +++
 .../replication/logical/reorderbuffer.c   |  6 +++
 src/include/replication/output_plugin.h   |  9 
 src/include/replication/reorderbuffer.h   |  7 +++
 6 files changed, 138 insertions(+), 1 deletion(-)

diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index ae5f397f351..a5dd80e0957 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -83,6 +83,9 @@ static void pg_decode_begin_prepare_txn(LogicalDecodingContext *ctx,
 static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
   ReorderBufferTXN *txn,
   XLogRecPtr prepare_lsn);
+static void pg_decode_concurrent_abort_txn(LogicalDecodingContext *ctx,
+		   ReorderBufferTXN *txn, bool streaming,
+		   XLogRecPtr lsn);
 static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
 		  ReorderBufferTXN *txn,
 		  XLogRecPtr commit_lsn);
@@ -137,6 +140,7 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
 	cb->change_cb = pg_decode_change;
 	cb->truncate_cb = pg_decode_truncate;
 	cb->commit_cb = pg_decode_commit_txn;
+	cb->concurrent_abort_cb = pg_decode_concurrent_abort_txn;
 	cb->filter_by_origin_cb = pg_decode_filter;
 	cb->shutdown_cb 

Re: [PATCH] Provide more information to filter_prepare

2021-03-25 Thread Markus Wanner

On 22.03.21 09:50, Markus Wanner wrote:
thank you for reconsidering this patch.  I updated it to include the 
required adjustments to the documentation.  Please review.


I tweaked the wording in the docs a bit, resulting in a v3 of this patch.

Regards

Markus
From: Markus Wanner 
Date: Tue, 2 Mar 2021 11:33:54 +0100
Subject: [PATCH] Add an xid argument to the filter_prepare callback for output plugins

---
 contrib/test_decoding/test_decoding.c |  4 ++-
 doc/src/sgml/logicaldecoding.sgml | 34 +++
 src/backend/replication/logical/decode.c  | 17 
 src/backend/replication/logical/logical.c |  5 ++--
 src/include/replication/logical.h |  3 +-
 src/include/replication/output_plugin.h   |  1 +
 6 files changed, 43 insertions(+), 21 deletions(-)

diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index ae5f397f351..de1b6926581 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -77,6 +77,7 @@ static void pg_decode_message(LogicalDecodingContext *ctx,
 			  bool transactional, const char *prefix,
 			  Size sz, const char *message);
 static bool pg_decode_filter_prepare(LogicalDecodingContext *ctx,
+	 TransactionId xid,
 	 const char *gid);
 static void pg_decode_begin_prepare_txn(LogicalDecodingContext *ctx,
 		ReorderBufferTXN *txn);
@@ -440,7 +441,8 @@ pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx,
  * substring, then we filter it out.
  */
 static bool
-pg_decode_filter_prepare(LogicalDecodingContext *ctx, const char *gid)
+pg_decode_filter_prepare(LogicalDecodingContext *ctx, TransactionId xid,
+		 const char *gid)
 {
 	if (strstr(gid, "_nodecode") != NULL)
 		return true;
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 80eb96d609a..f3ac84aa85a 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -794,20 +794,30 @@ typedef void (*LogicalDecodeMessageCB) (struct LogicalDecodingContext *ctx,
COMMIT PREPARED time. To signal that
decoding should be skipped, return true;
false otherwise. When the callback is not
-   defined, false is assumed (i.e. nothing is
-   filtered).
+   defined, false is assumed (i.e. no filtering, all
+   transactions using two-phase commit are decoded in two phases as well).
 
 typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx,
+  TransactionId xid,
   const char *gid);
 
-  The ctx parameter has the same contents as for the
-  other callbacks. The gid is the identifier that later
-  identifies this transaction for COMMIT PREPARED or
-  ROLLBACK PREPARED.
+   The ctx parameter has the same contents as for
+   the other callbacks. The parameters xid
+   and gid provide two different ways to identify
+   the transaction.  For some systems, the gid may
+   be sufficient.  However, reuse of the same gid
+   for example by a downstream node using multiple subscriptions may lead
+   to it not being a unique identifier.  Therefore, other systems combine
+   the xid with a node identifier to form a
+   globally unique transaction identifier.  The later COMMIT
+   PREPARED or ROLLBACK PREPARED carries both
+   identifiers, providing an output plugin the choice of what to use.
  
  
-  The callback has to provide the same static answer for a given
-  gid every time it is called.
+   The callback may be invoked several times per transaction to decode and
+   must provide the same static answer for a given pair of
+   xid and gid every time
+   it is called.
  
  
 
@@ -1219,9 +1229,11 @@ stream_commit_cb(...);  <-- commit of the streamed transaction

 

-Optionally the output plugin can specify a name pattern in the
-filter_prepare_cb and transactions with gid containing
-that name pattern will not be decoded as a two-phase commit transaction.
+Optionally the output plugin can define filtering rules via
+filter_prepare_cb to decode only specific transaction
+in two phases.  This can be achieved by pattern matching on the
+gid or via lookups using the
+xid.

 

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5f596135b15..97be4b0f23f 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -80,7 +80,8 @@ static void DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 static void DecodeXLogTuple(char *data, Size len, ReorderBufferTupleBuf *tup);
 
 /* helper functions for decoding transactions */
-static inline bool FilterPrepare(LogicalDecodingContext *ctx, const char *gid);
+static inline bool FilterPrepare(LogicalDecodi

Re: multi-install PostgresNode

2021-03-25 Thread Peter Eisentraut

On 25.03.21 04:47, Michael Paquier wrote:

On Wed, Mar 24, 2021 at 03:33:51PM -0300, Alvaro Herrera wrote:

On 2021-Mar-24, Andrew Dunstan wrote:

+# The install path set in get_new_node needs to be a directory containing
+# bin and lib subdirectories as in a standard PostgreSQL installation, so this
+# can't be used with installations where the bin and lib directories don't have
+# a common parent directory.


I've never heard of an installation where that wasn't true.  If there
was a need for that, seems like it'd be possible to set them with
{ bindir => ..., libdir => ...} but I doubt it'll ever be necessary.


This would imply an installation with some fancy --bindir or --libdir
specified in ./configure.  Never say never, but I also think that what
has been committed is fine.


/usr/lib64/? /usr/lib/x86_64-linux-gnu/?  Seems pretty common.




Re: proposal: schema variables - doc

2021-03-25 Thread Pavel Stehule
Hi



po 22. 3. 2021 v 10:47 odesílatel Pavel Stehule 
napsal:

> Hi
>
> st 17. 3. 2021 v 13:05 odesílatel Erik Rijkers  napsal:
>
>>
>> > On 2021.03.13. 07:01 Pavel Stehule  wrote:
>> > Hi
>> > fresh rebase
>> > [schema-variables-20210313.patch.gz]
>>
>>
>> Hi Pavel,
>>
>> I notice that the phrase 'schema variable' is not in the index at the end
>> ('bookindex.html').  Not good.
>>
>> It is also not in the index at the front of the manual - also not good.
>>
>> Maybe these two (front and back index) can be added?
>>
>
> I inserted new indexterm "schema variable", and now this part of
> bookindex.html looks like:
>
> schema variablealtering, ALTER VARIABLEchanging, LETdefining, CREATE
> VARIABLEdescription, Descriptionremoving, DROP VARIABLE
>
>
>
>>
>> If a user searches the pdf, the first occurrence he finds is at:
>>
>>   43.13.2.4. Global variables and constants
>>   (in itself that occurrence/mention is all right, but is should not be
>> the first find, I think)
>>
>> (I think there was in earlier versions of the patch an entry in the
>> 'contents', i.e., at the front of the manual).  I think it would be good to
>> have it in the front-index, pointing to either LET or CREATE VARIABLE, or
>> maybe even to a small introductory paragraph somewhere else (again, I seem
>> to remember that there was one in an earlier patch version).
>>
>
>
> I wrote new section to "advanced features" about schema variables
>
>
>>
>>
>> Of the new commands that this patch brings, 'LET' is the most immediately
>> illuminating for a user (even when a CREATE VARIABLE has to be done first.
>> There is an entry 'LET' in the index (good), but it would be better if that
>> with LET-entry too the phrase 'schema variable' occurred.  (I don't know if
>> that's possible)
>>
>>
>> Then, in the CREATE VARIABLE paragraphs it says
>>'Changing a schema variable is non-transactional by default.'
>>
>> I think that, unless there exists a mode where schema vars can be made
>> transactional, 'by default' should be deleted (and there is no such
>> 'transactional mode' for schema variables, is there?).  The 'Description'
>> also has such a 'By default' which is better removed for the same reason.
>>
>
> fixed
>
>
>>
>> In the CREATE VARIABLE page the example is:
>>
>> CREATE VARIABLE var1 AS integer;
>> SELECT var1;
>>
>> I suggest to make that
>>
>> CREATE VARIABLE var1 AS date;
>> LET var1 = (select current_date);
>> SELECT var1;
>>
>> So that the example immediately shows an application of functionality.
>>
>
> done
>
> Thank you for the documentation review.
>
> Updated patch attached
>
> Regards
>
> Pavel
>
>
fresh update with merged Eric's changes in documentation

Regards

Pavel


>
>>
>> Thanks,
>>
>> Erik Rijkers
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> >
>> > Pavel
>>
>


schema-variables-20210325.patch.gz
Description: application/gzip


Re: wal stats questions

2021-03-25 Thread Kyotaro Horiguchi
At Wed, 24 Mar 2021 21:07:26 -0700, Andres Freund  wrote in 
> Hi,
> 
> On 2021-03-25 10:51:56 +0900, Masahiro Ikeda wrote:
> > On 2021/03/25 8:22, Andres Freund wrote:
> > > 1) What is the motivation to have both prevWalUsage and pgWalUsage,
> > >instead of just accumulating the stats since the last submission?
> > >There doesn't seem to be any comment explaining it? Computing
> > >differences to previous values, and copying to prev*, isn't free. I
> > >assume this is out of a fear that the stats could get reset before
> > >they're used for instrument.c purposes - but who knows?
> >
> > Is your point that it's better to call pgWalUsage = 0; after sending the
> > stats?
> 
> Yes. At least there should be a comment explaining why it's done the way
> it is.

pgWalUsage was used without resetting and we (I) thought that that
behavior should be preserved.  On second thought, as Andres suggested,
we can just reset pgWalUsage at sending since AFAICS no one takes
difference crossing pgstat_report_stat() calls.

> > > 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
> > >just to figure out if there's been any changes isn't all that
> > >cheap. This is regularly exercised in read-only workloads too. Seems
> > >adding a boolean WalStats.have_pending = true or such would be
> > >better.
> >
> > I understood that for backends, this may leads performance degradation and
> > this problem is not only for the WalStats but also SLRUStats.
> >
> > I wondered the degradation is so big because pgstat_report_stat() checks if 
> > at
> > least PGSTAT_STAT_INTERVAL msec is passed since it last sent before check 
> > the
> > diff. If my understanding is correct, to get timestamp is more expensive.
> 
> Getting a timestamp is expensive, yes. But I think we need to check for
> the no-pending-wal-stats *before* the clock check. See the below:
> 
> 
> > > 4) For plain backends pgstat_report_wal() is called by
> > >pgstat_report_stat() - but it is not checked as part of the "Don't
> > >expend a clock check if nothing to do" check at the top.  It's
> > >probably rare to have pending wal stats without also passing one of
> > >the other conditions, but ...
> >
> > (I'm not confidence my understanding of your comment is right.)
> > You mean that we need to expend a clock check in pgstat_report_wal()?
> > I think it's unnecessary because pgstat_report_stat() is already checked it.
> 
> No, I mean that right now we might can erroneously return early
> pgstat_report_wal() for normal backends in some workloads:
> 
> void
> pgstat_report_stat(bool disconnect)
> ...
>   /* Don't expend a clock check if nothing to do */
>   if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
>   pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
>   !have_function_stats && !disconnect)
>   return;
> 
> might return if there only is pending WAL activity. This needs to check
> whether there are pending WAL stats. Which in turn means that the check
> should be fast.

Agreed that the condition is wrong.  On the other hand, the counters
are incremented in XLogInsertRecord() and I think we don't want add
instructions there.

If any wal activity has been recorded, wal_records is always positive
thus we can check for wal activity just by "pgWalUsage.wal_records >
0, which should be fast enough.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: DETAIL for wrong scram password

2021-03-25 Thread Michael Paquier
On Tue, Mar 02, 2021 at 05:48:05PM +, Jacob Champion wrote:
> What would you think about adding the additional detail right after
> verify_client_proof() fails? I.e.

Agreed.  Having that once all the code paths have been taken and the
client proof has been verified looks more solid.  On top of what's
proposed, would it make sense to have a second logdetail for the case
of a mock authentication?  We don't log that yet, so I guess that it
could be useful for audit purposes?
--
Michael


signature.asc
Description: PGP signature


Re: fix typo in reorderbuffer.c

2021-03-25 Thread Michael Paquier
On Wed, Mar 24, 2021 at 10:52:14AM +, houzj.f...@fujitsu.com wrote:
> Sorry for the late reply and yes, " Combo CID(s)" looks better.
> Attaching patch which replaces all styles "Combocid(s)" with " Combo CID(s)".

I have double-checked the code, adjusted things a bit to adapt with
some of the context of the code, and applied it.  Thanks!
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] logical decoding of two-phase transactions

2021-03-25 Thread Peter Smith
On Thu, Mar 25, 2021 at 1:40 PM Peter Smith  wrote:
>
> On Wed, Mar 24, 2021 at 11:31 PM Amit Kapila  wrote:
> >
> > On Tue, Mar 23, 2021 at 3:31 PM Peter Smith  wrote:
> > >
> > > On Tue, Mar 23, 2021 at 10:44 AM Peter Smith  
> > > wrote:
> > > >
> > > > PSA patches to fix those.
> > > >
> > >
> > > Hi Amit.
> > >
> > > PSA a patch to allow the ALTER SUBSCRIPTION ... REFRESH PUBLICATION to
> > > work when two-phase tristate is PENDING.
> > >
> > > This is necessary for the pg_dump/pg_restore scenario, or for any
> > > other use-case where the subscription might
> > > start off having no tables.
> > >
> >
> > + subrels = GetSubscriptionRelations(MySubscription->oid);
> > +
> > + /*
> > + * If there are no tables then leave the state as PENDING, which
> > + * allows ALTER SUBSCRIPTION ... REFRESH PUBLICATION to work.
> > + */
> > + become_two_phase_enabled = list_length(subrels) > 0;
> >
> > This code is similar at both the places it is used. Isn't it better to
> > move this inside AllTablesyncsReady and if required then we can change
> > the name of the function.
>
> I agree. That way is better.
>
> PSA a patch which changes the AllTableSyncsReady function to now
> include the zero tables check.
>
> (This patch is to be applied on top of all previous patches)
>
> --

PSA a patch which modifies the FetchTableStates function to use a more
efficient way of testing if the subscription has any tables or not.

(This patch is to be applied on top of all previous v66* patches posted)

--
Kind Regards,
Peter Smith.
Fujitsu Australia.


v66-0006-FetchTableStates-performance-improvements.patch
Description: Binary data


  1   2   >