date:20210401

Re: shared-memory based stats collector

2021-04-01 Thread Kyotaro Horiguchi

At Thu, 1 Apr 2021 19:44:25 -0700, Andres Freund  wrote in 
> Hi,
> 
> I spent quite a bit more time working on the patch. There's are large
> changes:
> 
> - postmaster/pgstats.c (which is an incorrect location now that it's not
>   a subprocess anymore) is split into utils/activity/pgstat.c and
>   utils/activity/pgstat_kind.c. I don't love the _kind name, but I
>   couldn't come up with anything better.

The place was not changed to keep footprint smaller.  I agree that the
old place is not appropriate.  pgstat_kind...  How about changin
pgstat.c to pgstat_core.c and pgstat_kind.c to pgstat.c?

> - Implemented a new GUC, stats_fetch_consistency = {none, cache,
>   snapshot}. I think the code overhead of it is pretty ok - most of the
>   handling is entirely generalized.

Sounds good.

> - Most of the "per stats kind" handling is done in pgstat_kind.c. Nearly
>   all the rest is done through an array with per-stats-kind information
>   (extending what was already done with pgstat_sharedentsize etc).
>
> - There is no separate "pending stats" hash anymore. If there are
>   pending stats, they are referenced from 'pgStatSharedRefHash' (which
>   used to be the "lookup cache" hash). All the entries with pending
>   stats are in double linked list pgStatPending.

Sounds reasonable. A bit silimar to TabStatusArray.. Pending stats and
shared stats share the same key so they are naturally consolidatable.

> - A stat's entry's lwlock, refcount, .. are moved into the dshash
>   entry. There is no need for them to be separate anymore. Also allows
>   to avoid doing some dsa lookups while holding dshash locks.
>
> - The dshash entries are not deleted until the refcount has reached
>   0. That's an important building block to avoid constantly re-creating
>   stats when flushing pending stats for a dropped object.

Does that mean the entries for a dropped object is actually dropped by
the backend that has been flushd stats of the dropped object at exit?
Sounds nice.

> - The reference to the shared entry is established the first time stats
>   for an object are reported. Together with the previous entry that
>   avoids nearly all the avenues for re-creating already dropped stats
>   (see below for the hole).
> 
> - I addded a bunch of pg_regress style tests, and a larger amount of
>   isolationtester tests. The latter are possibly due to a new
>   pg_stat_force_next_flush() function, avoiding the need to wait until
>   stats are submitted.
> 
> - 2PC support for "precise" dropping of stats has been added, the
>   collect_oids() based approach removed.

Cool!

> - lots of bugfixes, comments, etc...

Thanks for all of them.

> I know of one nontrivial issue that can lead to dropped stats being
> revived:
> 
> Within a transaction, a functions can be called even when another
> transaction that dropped that function has already committed. I added a
> spec test reproducing the issue:
> 
> # FIXME: this shows the bug that stats will be revived, because the
> # shared stats in s2 is only referenced *after* the DROP FUNCTION
> # committed. That's only possible because there is no locking (and
> # thus no stats invalidation) around function calls.
> permutation
>   "s1_track_funcs_all" "s2_track_funcs_none"
>   "s1_func_call" "s2_begin" "s2_func_call" "s1_func_drop" 
> "s2_track_funcs_all" "s2_func_call" "s2_commit" "s2_ff" "s1_func_stats" 
> "s2_func_stats"
> 
> I think the best fix here would be to associate an xid with the dropped
> stats object, and only delete the dshash entry once there's no alive
> with a horizon from before that xid...

I'm not sure how we do that avoiding a full scan on dshash..

> There's also a second issue (stats for newly created objects surviving
> the transaction), but that's pretty simple to resolve.
> 
> Here's all the gory details of my changes happening incrementally:
> 
> https://github.com/anarazel/postgres/compare/master...shmstat
> 
> I'll squash and split tomorrow. Too tired for today.

Thank you very much for all of your immense effort.

> I think this is starting to look a lot better than what we have now. But
> I'm getting less confident that it's realistic to get any of this into
> PG14, given the state of the release cycle.
> 
> 
> 
> > I'm impressed that the way you resolved "who should load stats". Using
> > static shared memory area to hold the point to existing DSA memory
> > resolves the "first attacher problem".  However somewhat doubtful
> > about the "who should write the stats file", I think it is reasonable
> > in general.
> >
> > But the current place of calling pgstat_write_stats() is a bit to
> > early.  Checkpointer reports some stats *after* calling
> > ShutdownXLOG().  Perhaps we need to move it after pg_stat_report_*()
> > calls in HandleCheckpointerInterrupts().
> 
> I now moved it into a before_shmem_exit(). I think that should avoid
> that problem?

I think so.

> > https://github.com/anarazel/postgres/commit/03824a236597c87c99d07aa14b9af9d6fe04dd37
> >
> > +

Fix typo in verify_heapam.c

2021-04-01 Thread Masahiko Sawada

Hi all,

I found typos in verify_heapam.c.

s/comitted/committed/

Please find an attached patch.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


fix_typo.patch
Description: Binary data

Re: libpq debug log

2021-04-01 Thread Kyotaro Horiguchi

At Fri, 02 Apr 2021 01:45:06 -0400, Tom Lane  wrote in 
> Kyotaro Horiguchi  writes:
> > At Fri, 2 Apr 2021 02:56:44 +, "iwata@fujitsu.com" 
> >  wrote in 
> >> Do ErrorResponse and NoticeResponse vary from test to test ...?
> >> If so, it seemed difficult to make the trace logs available for regression 
> >> test. 
> >> I will also investigate the cause of this issue.
> 
> > The redacted fields, F, L and R contained source file, souce line and
> > source function respectively. It is reasonable guess that the
> > difference comes from them but I'm not sure how they make a difference
> > of 50 bytes in length...
> 
> Some compilers include the full path of the source file in F ...

Ah. I suspected that but dismissed the possibility looking that on my
environment, gcc8.  That completely makes sense. Thank you!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Replication slot stats misgivings

2021-04-01 Thread Bharath Rupireddy

On Fri, Apr 2, 2021 at 9:57 AM vignesh C  wrote:
> Thanks for the comments, I will fix the comments and provide a patch
> for this soon.

Here are some comments:
1) How about something like below
+(errmsg("skipping \"%s\" replication slot
statistics as the statistic collector process does not have enough
statistic slots",
instead of
+(errmsg("skipping \"%s\" replication
slot's statistic as the statistic collector process does not have
enough statistic slots",

2) Does it mean "pg_statistic slots" when we say "statistic slots" in
the above warning? If yes, why can't we use "pg_statistic slots"
instead of "statistic slots" as with another existing message
"insufficient pg_statistic slots for array stats"?

3) Should we change the if condition to max_replication_slots <=
nReplSlotStats instead of max_replication_slots == nReplSlotStats? In
the scenario, it is mentioned that "one of the replication slots is
dropped", will this issue occur when multiple replication slots are
dropped?

4) Let's end the statement after this and start a new one, something like below
+ * this. To avoid writing beyond the max_replication_slots
instead of
+ * this, to avoid writing beyond the max_replication_slots

5) How about something like below
+ * this. To avoid writing beyond the max_replication_slots,
+ * this replication slot statistics information will
be skipped.
+ */
instead of
+ * this, to avoid writing beyond the max_replication_slots
+ * these replication slot statistic information will
be skipped.
+ */

6) Any specific reason to use a new local variable replSlotStat and
later memcpy into replSlotStats[nReplSlotStats]? Instead we could
directly fread into &replSlotStats[nReplSlotStats] and do
memset(&replSlotStats[nReplSlotStats], 0,
sizeof(PgStat_ReplSlotStats)); before the warnings. As warning
scenarios seem to be less frequent, we could avoid doing memcpy
always.
-if (fread(&replSlotStats[nReplSlotStats], 1,
sizeof(PgStat_ReplSlotStats), fpin)
+if (fread(&replSlotStat, 1, sizeof(PgStat_ReplSlotStats), fpin)

+memcpy(&replSlotStats[nReplSlotStats], &replSlotStat,
sizeof(PgStat_ReplSlotStats));

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Replication slot stats misgivings

2021-04-01 Thread Amit Kapila

On Thu, Apr 1, 2021 at 6:19 PM Masahiko Sawada  wrote:
>
> On Tue, Mar 30, 2021 at 9:58 AM Andres Freund  wrote:
> >
> > IMO, independent of the shutdown / startup issue, it'd be worth writing
> > a patch tracking the bytes sent independently of the slot stats storage
> > issues. That would also make the testing for the above cheaper...
>
> Agreed.
>
> I think the bytes sent should be recorded by the decoding plugin, not
> by the core side. Given that table filtering and row filtering,
> tracking the bytes passed to the decoding plugin would not help gauge
> the actual network I/O. In that sense, the description of stream_bytes
> in the doc seems not accurate:
>
> ---
> This and other streaming counters for this slot can be used to gauge
> the network I/O which occurred during logical decoding and allow
> tuning logical_decoding_work_mem.
> ---
>
> It can surely be used to allow tuning logical_decoding_work_mem but it
> could not be true for gauging the network I/O which occurred during
> logical decoding.
>

Agreed. I think we can adjust the wording accordingly.

-- 
With Regards,
Amit Kapila.

Re: libpq debug log

2021-04-01 Thread Kyotaro Horiguchi

At Fri, 02 Apr 2021 14:40:09 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Fri, 2 Apr 2021 02:56:44 +, "iwata@fujitsu.com" 
>  wrote in 
> > Hi Alvaro san
> > 
> > Thank you for your fix of trace log code.
> > 
> > > From: 'alvhe...@alvh.no-ip.org' 
> > > Sent: Friday, April 2, 2021 11:30 AM
> > ...
> > > It still didn't fix it!  Drongo is now reporting a difference in the 
> > > expected trace --
> > > and the differences all seem to be message lengths.
> > > Now that is pretty mysterious, because the messages themselves are printed
> > > identically.  Perl's output is pretty unhelpful, but I wrote them to a 
> > > file and diffed
> > > manually; it's attached.
> >
> > Do ErrorResponse and NoticeResponse vary from test to test ...?
> > If so, it seemed difficult to make the trace logs available for regression 
> > test. 
> > I will also investigate the cause of this issue.
> 
> The redacted fields, F, L and R contained source file, souce line and
> source function respectively. It is reasonable guess that the
> difference comes from them but I'm not sure how they make a difference
> of 50 bytes in length...
> 
> Anyway if the length is wrong, we should get an error after emitting
> the log line.
> 
> > if (logCursor - 1 != length)
> > fprintf(conn->Pfdebug,
> > "mismatched message length: consumed %d, 
> > expected %d\n",
> > logCursor - 1, length);
> 
> So, the cheapest measure for regression test would be just making the

So, the cheapest measure for regression test would be just *masking* the

> length field, while regress is true...

tired?

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: libpq debug log

2021-04-01 Thread Tom Lane

Kyotaro Horiguchi  writes:
> At Fri, 2 Apr 2021 02:56:44 +, "iwata@fujitsu.com" 
>  wrote in 
>> Do ErrorResponse and NoticeResponse vary from test to test ...?
>> If so, it seemed difficult to make the trace logs available for regression 
>> test. 
>> I will also investigate the cause of this issue.

> The redacted fields, F, L and R contained source file, souce line and
> source function respectively. It is reasonable guess that the
> difference comes from them but I'm not sure how they make a difference
> of 50 bytes in length...

Some compilers include the full path of the source file in F ...

regards, tom lane

policies with security definer option for allowing inline optimization

2021-04-01 Thread Dan Lynch

RLS policies quals/checks are optimized inline, and so I generally avoid
writing a separate procedure so the optimizer can do it's thing.

However, if you need a security definer to avoid recursive RLS if you're
doing a more complex query say, on a join table, anyone wish there was a
flag on the policy itself to specify that `WITH CHECK` or `USING`
expression could be run via security definer?

The main reason for this is to avoid writing a separate security definer
function so you can benefit from the optimizer.

Is this possible? Would this be worth a feature request to postgres core?

Cheers!

Dan

Re: pgbench - add pseudo-random permutation function

2021-04-01 Thread Fabien COELHO




See attached v27 proposal.


As usual, it is easier to see with the actual attachement:-)

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 50cf22ba6b..84d9566f49 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1057,7 +1057,7 @@ pgbench  options  d
 
   
 default_seed 
-   seed used in hash functions by default
+   seed used in hash and pseudorandom permutation functions by default
   
 
   
@@ -1864,6 +1864,24 @@ SELECT 4 AS four \; SELECT 5 AS five \aset

   
 
+  
+   
+permute ( i, size [, seed ] )
+integer
+   
+   
+Permuted value of i, in the range
+[0, size).  This is the new position of
+i (modulo size) in a
+pseudorandom permutation of the integers 0...size-1,
+parameterized by seed.
+   
+   
+permute(0, 4)
+an integer between 0 and 3
+   
+  
+
   

 pi ()
@@ -2071,29 +2089,70 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
 

 
+   
+
+  When designing a benchmark which selects rows non-uniformly, be aware
+  that the rows chosen may be correlated with other data such as IDs from
+  a sequence or the physical row ordering, which may skew performance
+  measurements.
+
+
+  To avoid this, you may wish to use the permute
+  function, or some other additional step with similar effect, to shuffle
+  the selected rows and remove such correlations.
+
+   
+
   
 Hash functions hash, hash_murmur2 and
 hash_fnv1a accept an input value and an optional seed parameter.
 In case the seed isn't provided the value of :default_seed
 is used, which is initialized randomly unless set by the command-line
--D option. Hash functions can be used to scatter the
-distribution of random functions such as random_zipfian or
-random_exponential. For instance, the following pgbench
-script simulates possible real world workload typical for social media and
-blogging platforms where few accounts generate excessive load:
+-D option.
+  
+
+  
+permute accepts an input value, a size, and an optional
+seed parameter.  It generates a pseudorandom permutation of integers in
+the range [0, size), and returns the index of the input
+value in the permuted values.  The permutation chosen is parameterized by
+the seed, which defaults to :default_seed, if not
+specified.  Unlike the hash functions, permute ensures
+that there are no collisions or holes in the output values.  Input values
+outside the interval are interpreted modulo the size.  The function raises
+an error if the size is not positive.  permute can be
+used to scatter the distribution of non-uniform random functions such as
+random_zipfian or random_exponential
+so that values drawn more often are not trivially correlated.  For
+instance, the following pgbench script
+simulates a possible real world workload typical for social media and
+blogging platforms where a few accounts generate excessive load:
 
 
-\set r random_zipfian(0, 1, 1.07)
-\set k abs(hash(:r)) % 100
+\set size 100
+\set r random_zipfian(1, :size, 1.07)
+\set k 1 + permute(:r, :size)
 
 
 In some cases several distinct distributions are needed which don't correlate
-with each other and this is when implicit seed parameter comes in handy:
+with each other and this is when the optional seed parameter comes in handy:
 
 
-\set k1 abs(hash(:r, :default_seed + 123)) % 100
-\set k2 abs(hash(:r, :default_seed + 321)) % 100
+\set k1 1 + permute(:r, :size, :default_seed + 123)
+\set k2 1 + permute(:r, :size, :default_seed + 321)
 
+
+A similar behavior can also be approximated with hash:
+
+
+\set size 100
+\set r random_zipfian(1, 100 * :size, 1.07)
+\set k 1 + abs(hash(:r)) % :size
+
+
+However, since hash generates collisions, some values
+will not be reachable and others will be more frequent than expected from
+the original distribution.
   
 
   
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index 4d529ea550..56f75ccd25 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -19,6 +19,7 @@
 #define PGBENCH_NARGS_VARIABLE	(-1)
 #define PGBENCH_NARGS_CASE		(-2)
 #define PGBENCH_NARGS_HASH		(-3)
+#define PGBENCH_NARGS_PERMUTE	(-4)
 
 PgBenchExpr *expr_parse_result;
 
@@ -370,6 +371,9 @@ static const struct
 	{
 		"hash_fnv1a", PGBENCH_NARGS_HASH, PGBENCH_HASH_FNV1A
 	},
+	{
+		"permute", PGBENCH_NARGS_PERMUTE, PGBENCH_PERMUTE
+	},
 	/* keep as last array element */
 	{
 		NULL, 0, 0
@@ -482,6 +486,19 @@ make_func(yyscan_t yyscanner, int fnumber, PgBenchExprList *args)
 			}
 			break;
 
+		/* pseudorandom permutation function with optional seed argument */
+		case PGBENCH_NARGS_PERMUTE:
+			if (len < 2 |

Re: libpq debug log

2021-04-01 Thread Kyotaro Horiguchi

At Fri, 2 Apr 2021 02:56:44 +, "iwata@fujitsu.com" 
 wrote in 
> Hi Alvaro san
> 
> Thank you for your fix of trace log code.
> 
> > From: 'alvhe...@alvh.no-ip.org' 
> > Sent: Friday, April 2, 2021 11:30 AM
> ...
> > It still didn't fix it!  Drongo is now reporting a difference in the 
> > expected trace --
> > and the differences all seem to be message lengths.
> > Now that is pretty mysterious, because the messages themselves are printed
> > identically.  Perl's output is pretty unhelpful, but I wrote them to a file 
> > and diffed
> > manually; it's attached.
>
> Do ErrorResponse and NoticeResponse vary from test to test ...?
> If so, it seemed difficult to make the trace logs available for regression 
> test. 
> I will also investigate the cause of this issue.

The redacted fields, F, L and R contained source file, souce line and
source function respectively. It is reasonable guess that the
difference comes from them but I'm not sure how they make a difference
of 50 bytes in length...

Anyway if the length is wrong, we should get an error after emitting
the log line.

>   if (logCursor - 1 != length)
>   fprintf(conn->Pfdebug,
>   "mismatched message length: consumed %d, 
> expected %d\n",
>   logCursor - 1, length);

So, the cheapest measure for regression test would be just making the
length field, while regress is true...

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Fix pg_checksums progress report

2021-04-01 Thread Fujii Masao





On 2021/04/02 14:23, shinya11.k...@nttdata.com wrote:

Hi,

I found a problem with the pg_checksums.c.

The total_size is calculated by scanning the directory.
The current_size is calculated by scanning the files, but the current_size does 
not include the size of NewPages.

This may cause pg_checksums progress report to not be 100%.
I have attached a patch that fixes this.


Thanks for the report and patch!

I could reproduce this issue and confirmed that the patch fixes it.

Regarding the patch, I think that it's better to add the comment about
why current_size needs to be counted including new pages.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: pgbench - add pseudo-random permutation function

2021-04-01 Thread Fabien COELHO




  r = (uint64) (pg_erand48(random_state.xseed) * size);

I do not understand why the random values are multiplied by anything in
the first place…


These are just random integers in the range [0,mask] and [0,size-1],
formed in exactly the same way as getrand().


Indeed, erand returns a double, this was the part I was missing. I did not 
realize that you had switched to doubles in your approach.


I think that permute should only use integer operations. I'd suggest to 
use one of the integer variants instead of going through a double 
computation and casting back to int. The internal state is based on 
integers, I do not see the added value of going through floats, possibly 
enduring floating point issues (undeflow, rounding, normalization, 
whatever) on the way, whereas from start to finish we just need ints.


See attached v27 proposal.

I still think that *rand48 is a poor (relatively small state) and 
inefficient (the implementation includes packing and unpacking 16 bits 
ints to build a 64 bits int) choice.


--
Fabien.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-01 Thread Julien Rouhaud

On Thu, Apr 01, 2021 at 03:27:11PM -0400, Bruce Momjian wrote:
> 
> OK, I am happy with your design decisions, thanks.

Thanks!  While double checking I noticed that I failed to remove a (now)
useless include of pgstat.h in nodeGatherMerge.c in last version.  I'm
attaching v22 to fix that, no other change.
>From 819a45faf520dfd60b4fe3e9aea71e3a2b69 Mon Sep 17 00:00:00 2001
From: Bruce Momjian 
Date: Mon, 22 Mar 2021 17:43:22 -0400
Subject: [PATCH v22 1/3] Move pg_stat_statements query jumbling to core.

A new compute_query_id GUC is also added, to control whether a query identifier
should be computed by the core.  It's thefore now possible to disable core
queryid computation and use pg_stat_statements with a different algorithm to
compute the query identifier by using third-party module.

To ensure that a single source of query identifier can be used and is well
defined, modules that calculate a query identifier should throw an error if
compute_query_id is enabled or if a query idenfitier was already calculated.
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +
 .../pg_stat_statements.conf   |   1 +
 doc/src/sgml/config.sgml  |  25 +
 doc/src/sgml/pgstatstatements.sgml|  20 +-
 src/backend/parser/analyze.c  |  14 +-
 src/backend/tcop/postgres.c   |   6 +-
 src/backend/utils/misc/Makefile   |   1 +
 src/backend/utils/misc/guc.c  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c  | 834 ++
 src/include/parser/analyze.h  |   4 +-
 src/include/utils/guc.h   |   1 +
 src/include/utils/queryjumble.h   |  58 ++
 13 files changed, 995 insertions(+), 785 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 62cccbfa44..bd8c96728c 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * Starting in Postgres 9.2, this module normalized query entries.  As of
+ * Postgres 14, the normalization is done by the core if compute_query_id is
+ * enabled, or optionally by third-party modules.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -114,8 +99,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -235,40 +218,6 @@ typedef struct pgssSharedState
 	pgssGlobalStats stats;		/* global statistics for pgss */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef struct pgssLocationLen
-{
-	int			location;		/* start offset in query text */
-	int			length;			/* length in bytes, or -1 to ignore */
-} pgssLocationLen;
-
-/*
- * Working state for computing a query jumble and producing a normalized
- * query string
- */
-typedef struct pgssJumbleState
-{
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
-
-	/* Array of locations of constants that should be removed */
-	pgssLocationLen *clocations;

RE: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently

2021-04-01 Thread tsunakawa.ta...@fujitsu.com

From: Bharath Rupireddy 
> Attaching a small patch that emits a warning when the persistence setting of a
> partitioned table is changed and also added a note into the docs. Please have 
> a
> look at it.

Thank you.  However, I think I'll withdraw this CF entry since I'm not sure I 
can take time for the time being to research and fix other various ALTER 
TABLE/INDEX issues.  I'll mark this as withdrawn around the middle of next week 
unless someone wants to continue this.


Regards
Takayuki Tsunakawa

Fix pg_checksums progress report

2021-04-01 Thread Shinya11.Kato

Hi,

I found a problem with the pg_checksums.c.

The total_size is calculated by scanning the directory.
The current_size is calculated by scanning the files, but the current_size does 
not include the size of NewPages.

This may cause pg_checksums progress report to not be 100%.
I have attached a patch that fixes this.

Regards,
Shinya Kato


fix_pg_checksums_progress_report.patch
Description: fix_pg_checksums_progress_report.patch

RE: Add client connection check during the execution of the query

2021-04-01 Thread tsunakawa.ta...@fujitsu.com

From: Thomas Munro 
> > Following PostmasterIsAlive(), maybe it's better to name it as
> pq_connection_is_alive() pq_client_is_alive(), or pq_frontend_is_alive() like
> the pqcomm.c's head comment uses the word frontend?
> 
> I think it's OK, because it matches the name of the GUC.  I'm more concerned
> about the name of the GUC.  Will we still be happy with this name if a future
> releases sends a heartbeat message?  I think that is still OK, so I'm happy 
> with
> these names for now, but if someone has a better name, please speak up very
> soon.

OK, agreed.


> > (4)
> > I think the new GUC works for walsender as well.  If so, how do we explain
> the relationship between the new GUC and wal_receiver_timeout and
> recommend the settings of them?
> 
> No, it only works while executing a query.  (Is there something in logical
> decoding, perhaps, that I have failed to consider?)

When I saw the following code, I thought the new GUC takes effect at the same 
time as wal\sender_timeout.  But they don't become effective simultaneously.  
The new GUC is effective before the logical replication starts.  And 
wal_sender_timeout takes effect after the physical/logical replication starts.  
So, there's no problem.

[PostgresMain]
if (am_walsender)
{
if (!exec_replication_command(query_string))
exec_simple_query(query_string);
}
else
exec_simple_query(query_string);


The patch looks committable to me.


> PS The "from" headers in emails received from Fujitsu seems to have the
> names stripped, somewhere in the tubes of the internet.  I see the full 
> version
> when people from Fujitsu quote other people from Fujitsu.
> I copied one of those into the commit message, complete with its magnificent
> kanji characters (perhaps these are the cause of the filtering?), and I hope
> that's OK with you.

Certainly, it seems that only my email address appears in the sender field on 
the mailer.  I'll check my Outlook settings.


Regards
Takayuki Tsunakawa

Re: Proposal: Save user's original authenticated identity for logging

2021-04-01 Thread Michael Paquier

On Fri, Apr 02, 2021 at 12:03:21AM +, Jacob Champion wrote:
> On Thu, 2021-04-01 at 10:21 +0900, Michael Paquier wrote:
> > This stuff can take advantage of 0d1a3343, and I
> > think that we should make the kerberos, ldap, authentication and SSL
> > test suites just use connect_ok() and connect_fails() from
> > PostgresNode.pm.  They just need to be extended a bit with a new
> > argument for the log pattern check.
> 
> v16, attached, migrates all tests in those suites to connect_ok/fails
> (in the first two patches), and also adds the log pattern matching (in
> the final feature patch).

Thanks.  I have been looking at 0001 and 0002, and found the addition
of %params to connect_ok() and connect_fails() confusing first, as
this is only required for the 12th test of 001_password.pl (failure to
grab a password for md5_role not located in a pgpass file with
PGPASSWORD not set).  Instead of falling into a trap where the tests
could remain stuck, I think that we could just pass down -w from
connect_ok() and connect_fails() to PostgresNode::psql.

This change made also the parameter handling of the kerberos tests
more confusing on two points:
- PostgresNode::psql uses a query as an argument, so there was a mix
between the query passed down within the set of parameters, but then
removed from the list.
- PostgresNode::psql uses already -XAt so there is no need to define
it again.

> A since-v15 diff is attached, but it should be viewed with suspicion
> since I've rebased on top of the new SSL tests at the same time.

That did not seem that suspicious to me ;)

Anyway, after looking at 0003, the main patch, it becomes quite clear
that the need to match logs depending on like() or unlike() is much
more elegant once we have use of parameters in connect_ok() and 
connect_fails(), but I think that it is a mistake to pass down blindly
the parameters to psql and delete some of them on the way while
keeping the others.  The existing code of HEAD only requires a SQL
query or some expected stderr or stdout output, so let's make all
that parameterized first.

Attached is what I have come up with as the first building piece,
which is basically a combination of 0001 and 0002, except that I
modified things so as the number of arguments remains minimal for all
the routines.  This avoids the manipulation of the list of parameters
passed down to PostgresNode::psql. The arguments for the optional
query, the expected stdout and stderr are part of the parameter set
(0001 was not doing that).  For the main patch, this will need to be
extended with two more parameters in each routine: log_like and
log_unlike to match for the log patterns, handled as arrays of
regexes.  That's what 0003 is basically doing already.

As a whole, this is a consolidation of its own, so let's apply this
part first.
--
Michael
From a184fe1e6488ca9a5f7c98ab02ae428e05a84449 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 2 Apr 2021 13:44:39 +0900
Subject: [PATCH v18] Plug more TAP test suites with new PostgresNode's new
 routines

This switches the kerberos, ldap and authentication to use connect_ok()
and connect_fails() recently introduced in PostgresNode.pm.  The SSL
tests need some extra juggling to accomodate with the changes.
---
 src/test/authentication/t/001_password.pl |  17 +++-
 src/test/authentication/t/002_saslprep.pl |  18 +++-
 src/test/kerberos/t/001_auth.pl   |  41 +++-
 src/test/ldap/t/001_auth.pl   |  15 ++-
 src/test/perl/PostgresNode.pm |  67 +---
 src/test/ssl/t/001_ssltests.pl| 118 +++---
 src/test/ssl/t/002_scram.pl   |  16 +--
 7 files changed, 170 insertions(+), 122 deletions(-)

diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl
index 36a616d7c7..4d5e304de1 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -46,12 +46,19 @@ sub test_role

 	$status_string = 'success' if ($expected_res eq 0);

-	local $Test::Builder::Level = $Test::Builder::Level + 1;
+	my $connstr = "user=$role";
+	my $testname = "authentication $status_string for method $method, role $role";

-	my $res = $node->psql('postgres', undef, extra_params => [ '-U', $role, '-w' ]);
-	is($res, $expected_res,
-		"authentication $status_string for method $method, role $role");
-	return;
+	if ($expected_res eq 0)
+	{
+		$node->connect_ok($connstr, $testname);
+	}
+	else
+	{
+		# No match pattern checks are done here on errors, only the
+		# status code.
+		$node->connect_fails($connstr, $testname);
+	}
 }

 # Initialize primary node
diff --git a/src/test/authentication/t/002_saslprep.pl b/src/test/authentication/t/002_saslprep.pl
index 0aaab090ec..530344a5d6 100644
--- a/src/test/authentication/t/002_saslprep.pl
+++ b/src/test/authentication/t/002_saslprep.pl
@@ -41,12 +41,20 @@ sub test_login

 	$status_string = 'success' if ($expected_res eq 0);

+	my $connstr = "user=$role"

Re: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently

2021-04-01 Thread Bharath Rupireddy

On Thu, Mar 25, 2021 at 10:21 PM David Steele  wrote:
> This thread seems to be stalled. Since it represents a bug fix is there
> something we can do in the meantime while a real solution is worked out?
>
> Perhaps just inform the user that nothing was done? Or get something
> into the documentation?

Attaching a small patch that emits a warning when the persistence
setting of a partitioned table is changed and also added a note into
the docs. Please have a look at it.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

v1-0001-Emit-warning-when-partitioned-table-s-persistence.patch
Description: Binary data

Re: Replication slot stats misgivings

2021-04-01 Thread vignesh C

On Fri, Apr 2, 2021 at 9:29 AM Masahiko Sawada  wrote:
>
> On Fri, Apr 2, 2021 at 1:55 AM vignesh C  wrote:
> >
> > On Thu, Apr 1, 2021 at 5:58 PM Amit Kapila  wrote:
> > >
> > > On Thu, Apr 1, 2021 at 3:43 PM vignesh C  wrote:
> > > >
> > > > On Wed, Mar 31, 2021 at 11:32 AM vignesh C  wrote:
> > > > >
> > > > > On Tue, Mar 30, 2021 at 11:00 AM Andres Freund  
> > > > > wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On 2021-03-30 10:13:29 +0530, vignesh C wrote:
> > > > > > > On Tue, Mar 30, 2021 at 6:28 AM Andres Freund 
> > > > > > >  wrote:
> > > > > > > > Any chance you could write a tap test exercising a few of these 
> > > > > > > > cases?
> > > > > > >
> > > > > > > I can try to write a patch for this if nobody objects.
> > > > > >
> > > > > > Cool!
> > > > > >
> > > > >
> > > > > Attached a patch which has the test for the first scenario.
> > > > >
> > > > > > > > E.g. things like:
> > > > > > > >
> > > > > > > > - create a few slots, drop one of them, shut down, start up, 
> > > > > > > > verify
> > > > > > > >   stats are still sane
> > > > > > > > - create a few slots, shut down, manually remove a slot, lower
> > > > > > > >   max_replication_slots, start up
> > > > > > >
> > > > > > > Here by "manually remove a slot", do you mean to remove the slot
> > > > > > > manually from the pg_replslot folder?
> > > > > >
> > > > > > Yep - thereby allowing max_replication_slots after the 
> > > > > > shutdown/start to
> > > > > > be lower than the number of slots-stats objects.
> > > > >
> > > > > I have not included the 2nd test in the patch as the test fails with
> > > > > following warnings and also displays the statistics of the removed
> > > > > slot:
> > > > > WARNING:  problem in alloc set Statistics snapshot: detected write
> > > > > past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438
> > > > > WARNING:  problem in alloc set Statistics snapshot: detected write
> > > > > past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438
> > > > >
> > > > > This happens because the statistics file has an additional slot
> > > > > present even though the replication slot was removed.  I felt this
> > > > > issue should be fixed. I will try to fix this issue and send the
> > > > > second test along with the fix.
> > > >
> > > > I felt from the statistics collector process, there is no way in which
> > > > we can identify if the replication slot is present or not because the
> > > > statistic collector process does not have access to shared memory.
> > > > Anything that the statistic collector process does independently by
> > > > traversing and removing the statistics of the replication slot
> > > > exceeding the max_replication_slot has its drawback of removing some
> > > > valid replication slot's statistics data.
> > > > Any thoughts on how we can identify the replication slot which has been 
> > > > dropped?
> > > > Can someone point me to the shared stats patch link with which message
> > > > loss can be avoided. I wanted to see a scenario where something like
> > > > the slot is dropped but the statistics are not updated because of an
> > > > immediate shutdown or server going down abruptly can occur or not with
> > > > the shared stats patch.
> > > >
> > >
> > > I don't think it is easy to simulate a scenario where the 'drop'
> > > message is dropped and I think that is why the test contains the step
> > > to manually remove the slot. At this stage, you can probably provide a
> > > test patch and a code-fix patch where it just drops the extra slots
> > > from the stats file. That will allow us to test it with a shared
> > > memory stats patch on which Andres and Horiguchi-San are working. If
> > > we still continue to pursue with current approach then as Andres
> > > suggested we might send additional information from
> > > RestoreSlotFromDisk to keep it in sync.
> >
> > Thanks for your comments, Attached patch has the fix for the same.
> > Also attached a couple of more patches which addresses the comments
> > which Andres had listed i.e changing char to NameData type and also to
> > display the unspilled/unstreamed transaction information in the
> > replication statistics.
> > Thoughts?
>
> Thank you for the patches!
>
> I've looked at those patches and here are some comments on 0001, 0002,
> and 0003 patch:
>
> 0001 patch:
>
> -   values[0] = PointerGetDatum(cstring_to_text(s->slotname));
> +   values[0] = PointerGetDatum(cstring_to_text(s->slotname.data));
>
> We can use NameGetDatum() instead.
>
> ---
> 0002 patch:
>
> The patch uses logical replication to test replication slots
> statistics but I think it's necessarily necessary. It would be more
> simple to use logical decoding. Maybe we can add TAP tests to
> contrib/test_decoding.
>
> ---
> 0003 patch:
>
>  void
>  pgstat_report_replslot(const char *slotname, int spilltxns, int spillcount,
> -  int spillbytes, int streamtxns, int
> streamcount, int streambytes)
> +  int sp

Re: Replication slot stats misgivings

2021-04-01 Thread Masahiko Sawada

On Fri, Apr 2, 2021 at 1:55 AM vignesh C  wrote:
>
> On Thu, Apr 1, 2021 at 5:58 PM Amit Kapila  wrote:
> >
> > On Thu, Apr 1, 2021 at 3:43 PM vignesh C  wrote:
> > >
> > > On Wed, Mar 31, 2021 at 11:32 AM vignesh C  wrote:
> > > >
> > > > On Tue, Mar 30, 2021 at 11:00 AM Andres Freund  
> > > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On 2021-03-30 10:13:29 +0530, vignesh C wrote:
> > > > > > On Tue, Mar 30, 2021 at 6:28 AM Andres Freund  
> > > > > > wrote:
> > > > > > > Any chance you could write a tap test exercising a few of these 
> > > > > > > cases?
> > > > > >
> > > > > > I can try to write a patch for this if nobody objects.
> > > > >
> > > > > Cool!
> > > > >
> > > >
> > > > Attached a patch which has the test for the first scenario.
> > > >
> > > > > > > E.g. things like:
> > > > > > >
> > > > > > > - create a few slots, drop one of them, shut down, start up, 
> > > > > > > verify
> > > > > > >   stats are still sane
> > > > > > > - create a few slots, shut down, manually remove a slot, lower
> > > > > > >   max_replication_slots, start up
> > > > > >
> > > > > > Here by "manually remove a slot", do you mean to remove the slot
> > > > > > manually from the pg_replslot folder?
> > > > >
> > > > > Yep - thereby allowing max_replication_slots after the shutdown/start 
> > > > > to
> > > > > be lower than the number of slots-stats objects.
> > > >
> > > > I have not included the 2nd test in the patch as the test fails with
> > > > following warnings and also displays the statistics of the removed
> > > > slot:
> > > > WARNING:  problem in alloc set Statistics snapshot: detected write
> > > > past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438
> > > > WARNING:  problem in alloc set Statistics snapshot: detected write
> > > > past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438
> > > >
> > > > This happens because the statistics file has an additional slot
> > > > present even though the replication slot was removed.  I felt this
> > > > issue should be fixed. I will try to fix this issue and send the
> > > > second test along with the fix.
> > >
> > > I felt from the statistics collector process, there is no way in which
> > > we can identify if the replication slot is present or not because the
> > > statistic collector process does not have access to shared memory.
> > > Anything that the statistic collector process does independently by
> > > traversing and removing the statistics of the replication slot
> > > exceeding the max_replication_slot has its drawback of removing some
> > > valid replication slot's statistics data.
> > > Any thoughts on how we can identify the replication slot which has been 
> > > dropped?
> > > Can someone point me to the shared stats patch link with which message
> > > loss can be avoided. I wanted to see a scenario where something like
> > > the slot is dropped but the statistics are not updated because of an
> > > immediate shutdown or server going down abruptly can occur or not with
> > > the shared stats patch.
> > >
> >
> > I don't think it is easy to simulate a scenario where the 'drop'
> > message is dropped and I think that is why the test contains the step
> > to manually remove the slot. At this stage, you can probably provide a
> > test patch and a code-fix patch where it just drops the extra slots
> > from the stats file. That will allow us to test it with a shared
> > memory stats patch on which Andres and Horiguchi-San are working. If
> > we still continue to pursue with current approach then as Andres
> > suggested we might send additional information from
> > RestoreSlotFromDisk to keep it in sync.
>
> Thanks for your comments, Attached patch has the fix for the same.
> Also attached a couple of more patches which addresses the comments
> which Andres had listed i.e changing char to NameData type and also to
> display the unspilled/unstreamed transaction information in the
> replication statistics.
> Thoughts?

Thank you for the patches!

I've looked at those patches and here are some comments on 0001, 0002,
and 0003 patch:

0001 patch:

-   values[0] = PointerGetDatum(cstring_to_text(s->slotname));
+   values[0] = PointerGetDatum(cstring_to_text(s->slotname.data));

We can use NameGetDatum() instead.

---
0002 patch:

The patch uses logical replication to test replication slots
statistics but I think it's necessarily necessary. It would be more
simple to use logical decoding. Maybe we can add TAP tests to
contrib/test_decoding.

---
0003 patch:

 void
 pgstat_report_replslot(const char *slotname, int spilltxns, int spillcount,
-  int spillbytes, int streamtxns, int
streamcount, int streambytes)
+  int spillbytes, int streamtxns, int streamcount,
+  int streambytes, int totaltxns, int totalbytes)
 {

As Andreas pointed out, we should use a struct of stats updates rather
than adding more arguments to pgstat_report_replslot().

---
+ 
+  
+tota

Re: New IndexAM API controlling index vacuum strategies

2021-04-01 Thread Peter Geoghegan

On Thu, Apr 1, 2021 at 6:14 AM Robert Haas  wrote:
> Without offering an opinion on this particular implementation choice,
> +1 for the idea of trying to make the table-with-indexes and the
> table-without-indexes cases work in ways that will feel similar to the
> user. Tables without indexes are probably rare in practice, but if
> some behaviors are implemented for one case and not the other, it will
> probably be confusing. One thought here is that it might help to try
> to write documentation for whatever behavior you choose. If it's hard
> to document without weasel-words, maybe it's not the best approach.

I have found a way to do this that isn't too painful, a little like
the VACUUM_FSM_EVERY_PAGES thing.

I've also found a way to further simplify the table-without-indexes
case: make it behave like a regular two-pass/has-indexes VACUUM with
regard to visibility map stuff when the page doesn't need a call to
lazy_vacuum_heap() (because there are no LP_DEAD items to set
LP_UNUSED on the page following pruning). But when it does call
lazy_vacuum_heap(), the call takes care of everything for
lazy_scan_heap(), which just continues to the next page due to
considering prunestate to have been "invalidated" by the call to
lazy_vacuum_heap(). So there is absolutely minimal special case code
for the table-without-indexes case now.

BTW I removed all of the lazy_scan_heap() utility functions from the
second patch in my working copy of the patch series. You were right
about that -- they weren't useful. We should just have the pruning
wrapper function I've called lazy_scan_prune(), not any of the others.
We only need one local variable in the lazy_vacuum_heap() that isn't
either the prune state set/returned by lazy_scan_prune(), or generic
stuff like a Buffer variable.

-- 
Peter Geoghegan

Re: Add client connection check during the execution of the query

2021-04-01 Thread Thomas Munro

On Fri, Apr 2, 2021 at 1:36 AM Bharath Rupireddy
 wrote:
> Here's a minor comment: it would be good if we have an extra line
> after variable assignments, before and after function calls/if
> clauses, something like

Done in v11.  Thanks.

Re: Add client connection check during the execution of the query

2021-04-01 Thread Thomas Munro

On Thu, Apr 1, 2021 at 10:16 PM tsunakawa.ta...@fujitsu.com
 wrote:
> From: Thomas Munro 
> > I changed my mind.  Let's commit the pleasingly simple Linux-only feature 
> > for
> > now, and extend to it to send some kind of no-op message in a later release.
> > So this is the version I'd like to go with.
> > Objections?
>
> +1, as some of our users experienced the problem that the server kept 
> processing (IIRC, a buggy PL/pgSQL procedure that loops indefinitely) after 
> they killed the client.

Cool.  Yeah, I have seen a few variants of that, and several other
complaints on the lists.

> TBH, Linux and Windows will be sufficient.  But I'm for providing a good 
> feature on a specific OS first.

I discovered that at least one other OS has adopted POLLRDHUP, so I
changed the language to something slightly more general:

+   
+This option is currently available only on systems that support the
+non-standard POLLRDHUP extension to the
+poll system call, including Linux.
+   

It seems like it must be quite easy for an OS to implement, since the
TCP stack surely has the information... it's just an API problem.
Hopefully that means that there aren't OSes that define the macro but
don't work the same way.  (I read somewhere that the POSIX compliance
test suite explicitly tests this half-shutdown case and fails any OS
that returns SIGHUP "prematurely".  Boo.)

> (1)
> +   rc = poll(&pollfd, 1, 0);
> +   if (rc < 0)
> +   {
> +   elog(COMMERROR, "could not poll socket: %m");
> +   return false;
>
> I think it's customary to use ereport() and errcode_for_socket_access().

Fixed.

> (2)
> pq_check_connection()
>
> Following PostmasterIsAlive(), maybe it's better to name it as 
> pq_connection_is_alive() pq_client_is_alive(), or pq_frontend_is_alive() like 
> the pqcomm.c's head comment uses the word frontend?

I think it's OK, because it matches the name of the GUC.  I'm more
concerned about the name of the GUC.  Will we still be happy with this
name if a future releases sends a heartbeat message?  I think that is
still OK, so I'm happy with these names for now, but if someone has a
better name, please speak up very soon.

> (3)
>  #include 
> +#include 
>  #ifndef WIN32
>  #include 
>  #endif
>
> poll.h should be included between #ifndef WIN32 and #endif?

Oops, I forgot to wrap that in #ifdef HAVE_POLL_H while moving stuff
around.  Fixed.

> (4)
> I think the new GUC works for walsender as well.  If so, how do we explain 
> the relationship between the new GUC and wal_receiver_timeout and recommend 
> the settings of them?

No, it only works while executing a query.  (Is there something in
logical decoding, perhaps, that I have failed to consider?)

PS The "from" headers in emails received from Fujitsu seems to have
the names stripped, somewhere in the tubes of the internet.  I see the
full version when people from Fujitsu quote other people from Fujitsu.
I copied one of those into the commit message, complete with its
magnificent kanji characters (perhaps these are the cause of the
filtering?), and I hope that's OK with you.
From 7c045e992469c88656655e2b460fb8622d7ac790 Mon Sep 17 00:00:00 2001
From: Maksim Milyutin 
Date: Fri, 26 Mar 2021 10:18:30 +0300
Subject: [PATCH v11] Detect POLLHUP/POLLRDHUP while running queries.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Provide a new GUC check_client_connection_interval that can be used to
check whether the client connection has gone away, while running very
long queries.  It is disabled by default.

For now this uses a non-standard Linux extension.  POLLRDHUP is not
defined by POSIX, and other OSes don't have a reliable way to know if a
connection was shut down without actually trying to read or write.  In
future we might extend this to other OSes by trying to send a
no-op/heartbeat message, but that may require protocol changes.

Author: Sergey Cherkashin 
Author: Thomas Munro 
Reviewed-by: Thomas Munro 
Reviewed-by: Tatsuo Ishii 
Reviewed-by: Konstantin Knizhnik 
Reviewed-by: Zhihong Yu 
Reviewed-by: Andres Freund 
Reviewed-by: Maksim Milyutin 
Reviewed-by: Tsunakawa, Takayuki/綱川 貴之 
Reviewed-by: Tom Lane  (much earlier version)
Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
---
 doc/src/sgml/config.sgml  | 37 +
 src/backend/libpq/pqcomm.c| 40 +++
 src/backend/tcop/postgres.c   | 32 +++
 src/backend/utils/init/globals.c  |  1 +
 src/backend/utils/init/postinit.c | 10 +
 src/backend/utils/misc/guc.c  | 29 ++
 src/backend/utils/misc/postgresql.conf.sample |  3 ++
 src/include/libpq/libpq.h |  1 +
 src/include/miscadmin.h   |  1 +
 src/include/tcop/tcopprot.h   |  1 +
 src/include/utils/timeout.h

Re: Flaky vacuum truncate test in reloptions.sql

2021-04-01 Thread Masahiko Sawada

On Fri, Apr 2, 2021 at 9:46 AM Michael Paquier  wrote:
>
> On Thu, Apr 01, 2021 at 10:54:29PM +0900, Masahiko Sawada wrote:
> > Looks good to me too.
>
> Okay, applied and back-patched down to 12 then.

Thank you!


Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

RE: libpq debug log

2021-04-01 Thread iwata....@fujitsu.com

Hi Alvaro san

Thank you for your fix of trace log code.

> From: 'alvhe...@alvh.no-ip.org' 
> Sent: Friday, April 2, 2021 11:30 AM
...
> It still didn't fix it!  Drongo is now reporting a difference in the expected 
> trace --
> and the differences all seem to be message lengths.
> Now that is pretty mysterious, because the messages themselves are printed
> identically.  Perl's output is pretty unhelpful, but I wrote them to a file 
> and diffed
> manually; it's attached.

Do ErrorResponse and NoticeResponse vary from test to test ...?
If so, it seemed difficult to make the trace logs available for regression 
test. 
I will also investigate the cause of this issue.

Regards,
Aya Iwata

Re: shared-memory based stats collector

2021-04-01 Thread Andres Freund

Hi,

I spent quite a bit more time working on the patch. There's are large
changes:

- postmaster/pgstats.c (which is an incorrect location now that it's not
  a subprocess anymore) is split into utils/activity/pgstat.c and
  utils/activity/pgstat_kind.c. I don't love the _kind name, but I
  couldn't come up with anything better.

- Implemented a new GUC, stats_fetch_consistency = {none, cache,
  snapshot}. I think the code overhead of it is pretty ok - most of the
  handling is entirely generalized.

- Most of the "per stats kind" handling is done in pgstat_kind.c. Nearly
  all the rest is done through an array with per-stats-kind information
  (extending what was already done with pgstat_sharedentsize etc).

- There is no separate "pending stats" hash anymore. If there are
  pending stats, they are referenced from 'pgStatSharedRefHash' (which
  used to be the "lookup cache" hash). All the entries with pending
  stats are in double linked list pgStatPending.

- A stat's entry's lwlock, refcount, .. are moved into the dshash
  entry. There is no need for them to be separate anymore. Also allows
  to avoid doing some dsa lookups while holding dshash locks.

- The dshash entries are not deleted until the refcount has reached
  0. That's an important building block to avoid constantly re-creating
  stats when flushing pending stats for a dropped object.

- The reference to the shared entry is established the first time stats
  for an object are reported. Together with the previous entry that
  avoids nearly all the avenues for re-creating already dropped stats
  (see below for the hole).

- I addded a bunch of pg_regress style tests, and a larger amount of
  isolationtester tests. The latter are possibly due to a new
  pg_stat_force_next_flush() function, avoiding the need to wait until
  stats are submitted.

- 2PC support for "precise" dropping of stats has been added, the
  collect_oids() based approach removed.

- lots of bugfixes, comments, etc...


I know of one nontrivial issue that can lead to dropped stats being
revived:

Within a transaction, a functions can be called even when another
transaction that dropped that function has already committed. I added a
spec test reproducing the issue:

# FIXME: this shows the bug that stats will be revived, because the
# shared stats in s2 is only referenced *after* the DROP FUNCTION
# committed. That's only possible because there is no locking (and
# thus no stats invalidation) around function calls.
permutation
  "s1_track_funcs_all" "s2_track_funcs_none"
  "s1_func_call" "s2_begin" "s2_func_call" "s1_func_drop" "s2_track_funcs_all" 
"s2_func_call" "s2_commit" "s2_ff" "s1_func_stats" "s2_func_stats"

I think the best fix here would be to associate an xid with the dropped
stats object, and only delete the dshash entry once there's no alive
with a horizon from before that xid...


There's also a second issue (stats for newly created objects surviving
the transaction), but that's pretty simple to resolve.

Here's all the gory details of my changes happening incrementally:

https://github.com/anarazel/postgres/compare/master...shmstat

I'll squash and split tomorrow. Too tired for today.


I think this is starting to look a lot better than what we have now. But
I'm getting less confident that it's realistic to get any of this into
PG14, given the state of the release cycle.



> I'm impressed that the way you resolved "who should load stats". Using
> static shared memory area to hold the point to existing DSA memory
> resolves the "first attacher problem".  However somewhat doubtful
> about the "who should write the stats file", I think it is reasonable
> in general.
>
> But the current place of calling pgstat_write_stats() is a bit to
> early.  Checkpointer reports some stats *after* calling
> ShutdownXLOG().  Perhaps we need to move it after pg_stat_report_*()
> calls in HandleCheckpointerInterrupts().

I now moved it into a before_shmem_exit(). I think that should avoid
that problem?



> https://github.com/anarazel/postgres/commit/03824a236597c87c99d07aa14b9af9d6fe04dd37
>
> +  * XXX: Why is this a good place to do this?
>
> Agreed. We don't need to so haste to clean up stats entries.  We could
> run that in pgstat_reporT_stat()?

I've not changed that yet, but I had the same thought.


> Agreed that it's better to move database stat entries to fixed
> pointers.

I actually ended up reverting that. My main motivation for it was that
it was problematic that new pending database stats entries could be
created at some random place in the hashtable. But with the linked list
of pending entries that's not a problem anymore. And I found it
nontrivial to manage the refcounts to the shared entry accurately this
way.

We could still add a cache for the two stats entries though...


> > - consider removing PgStatTypes and replacing it with the oid of the
> >   table the type of stats reside in. So PGSTAT_TYPE_DB would be
> >   DatabaseRelationId, PGSTAT_TYPE_

Re: TRUNCATE on foreign table

2021-04-01 Thread Fujii Masao





On 2021/04/02 9:37, Kohei KaiGai wrote:

It is fair enough for me to reverse the order of actual truncation.

How about the updated comments below?

 This is a multi-relation truncate.  We first open and grab exclusive
 lock on all relations involved, checking permissions (local database
 ACLs even if relations are foreign-tables) and otherwise verifying
 that the relation is OK for truncation. In CASCADE mode, ...(snip)...
 Finally all the relations are truncated and reindexed. If any foreign-
 tables are involved, its callback shall be invoked prior to the truncation
 of regular tables.


LGTM.



BTW, the latest patch doesn't seem to be applied cleanly to the master
because of commit 27e1f14563. Could you rebase it?


Onishi-san, go ahead. :-)


+1

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: libpq debug log

2021-04-01 Thread 'alvhe...@alvh.no-ip.org'

On 2021-Apr-01, 'alvhe...@alvh.no-ip.org' wrote:

> Ooh, wow ... now that is a silly bug!  Thanks, I'll push the fix in a
> minute.

It still didn't fix it!  Drongo is now reporting a difference in the
expected trace -- and the differences all seem to be message lengths.
Now that is pretty mysterious, because the messages themselves are
printed identically.  Perl's output is pretty unhelpful, but I wrote them to a
file and diffed manually; it's attached.

So for example when we expect
! # B   123 ErrorResponseS "ERROR" V "ERROR" C "42601" M "cannot insert 
multiple commands into a prepared statement" F "" L "" R "" \\x00

we actually get
! # B   173 ErrorResponseS "ERROR" V "ERROR" C "42601" M "cannot insert 
multiple commands into a prepared statement" F "" L "" R "" \\x00

[slaps forehead] I suppose the difference must be that the message
length includes the redacted string fields.  So the file in the F file
is 50 chars longer, *kaboom*.  (I'm actually very surprised that this
didn't blow up earlier.)

I'm not sure what to do about this. Maybe the message length for
ErrorResponse/NoticeResponse ought to be redacted too.

-- 
Álvaro Herrera39°49'30"S 73°17'W

Re: Data type correction in pgstat_report_replslot function parameters

2021-04-01 Thread vignesh C

On Thu, Apr 1, 2021 at 11:18 PM Fujii Masao  wrote:
>
>
>
> On 2021/04/02 2:18, Jeevan Ladhe wrote:
> >
> >
> > On Thu, Apr 1, 2021 at 10:20 PM vignesh C  > > wrote:
> >
> > Hi,
> >
> > While I was reviewing replication slot statistics code, I found one
> > issue in the data type used for pgstat_report_replslot function
> > parameters. We pass int64 variables to the function but the function
> > prototype uses int type. I I felt the function parameters should be
> > int64. Attached patch fixes the same.
>
> Isn't it better to use PgStat_Counter instead of int64?
>

Thanks for your comment, the updated patch contains the changes for it.

Regards,
Vignesh
From 63f6df1470a2f1a3815213aad3690461f59efcba Mon Sep 17 00:00:00 2001
From: vignesh 
Date: Thu, 1 Apr 2021 22:11:09 +0530
Subject: [PATCH v2] Data type correction in pgstat_report_replslot.

Data type correction in pgstat_report_replslot.
---
 src/backend/postmaster/pgstat.c | 6 --
 src/include/pgstat.h| 6 --
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 4b9bcd2b41..2f3f378e63 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -1773,8 +1773,10 @@ pgstat_report_tempfile(size_t filesize)
  * --
  */
 void
-pgstat_report_replslot(const char *slotname, int spilltxns, int spillcount,
-	   int spillbytes, int streamtxns, int streamcount, int streambytes)
+pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
+	   PgStat_Counter spillcount, PgStat_Counter spillbytes,
+	   PgStat_Counter streamtxns, PgStat_Counter streamcount,
+	   PgStat_Counter streambytes)
 {
 	PgStat_MsgReplSlot msg;
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index d699502cd9..fe6683cf5c 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1450,8 +1450,10 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const char *slotname, int spilltxns, int spillcount,
-   int spillbytes, int streamtxns, int streamcount, int streambytes);
+extern void pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
+   PgStat_Counter spillcount, PgStat_Counter spillbytes,
+   PgStat_Counter streamtxns, PgStat_Counter streamcount,
+   PgStat_Counter streambytes);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
-- 
2.25.1

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-04-01 Thread David Rowley

On Thu, 1 Apr 2021 at 23:41, houzj.f...@fujitsu.com
 wrote:
>
> > I've attached the updated patch.  I'll let the CFbot grab this to ensure 
> > it's
> > happy with it before I go looking to push it again.
>
> Hi,
>
> I took a look into the patch and noticed some minor things.
>
> 1.
> +   case T_ResultCache:
> +   ptype = "ResultCache";
> +   subpath = ((ResultCachePath *) path)->subpath;
> +   break;
> case T_UniquePath:
> ptype = "Unique";
> subpath = ((UniquePath *) path)->subpath;
> should we use "case T_ResultCachePath" here?
>
> 2.
> Is it better to add ResultCache's info to " src/backend/optimizer/README " ?
> Something like:
>   NestPath  - nested-loop joins
>   MergePath - merge joins
>   HashPath  - hash joins
> + ResultCachePath - Result cache

Thanks for pointing those two things out.

I've pushed the patch again with some updates to EXPLAIN to fix the
issue from yesterday.  I also disabled result cache in the
partition_prune tests as I suspect that the parallel tests there might
just be a bit too unstable in the buildfarm. The cache
hit/miss/eviction line might disappear if the main process does not
get a chance to do any work.

Well, it's now time to watch the buildfarm again...

David

Re: Flaky vacuum truncate test in reloptions.sql

2021-04-01 Thread Michael Paquier

On Thu, Apr 01, 2021 at 10:54:29PM +0900, Masahiko Sawada wrote:
> Looks good to me too.

Okay, applied and back-patched down to 12 then.
--
Michael


signature.asc
Description: PGP signature

Re: TRUNCATE on foreign table

2021-04-01 Thread Kohei KaiGai

2021年4月1日(木) 18:53 Fujii Masao :
>
> On 2021/04/01 0:09, Kohei KaiGai wrote:
> > What does the "permission checks" mean in this context?
> > The permission checks on the foreign tables involved are already checked
> > at truncate_check_rel(), by PostgreSQL's standard access control.
>
> I meant that's the permission check that happens in the remote server side.
> For example, when the foreign table is defined on postgres_fdw and truncated,
> TRUNCATE command is issued to the remote server via postgres_fdw and
> it checks the permission of the table before performing actual truncation.
>
>
> > Please assume an extreme example below:
> > 1. I defined a foreign table with file_fdw onto a local CSV file.
> > 2. Someone tries to scan the foreign table, and ACL allows it.
> > 3. I disallow the read remission of the CSV using chmod, after the above 
> > step,
> >  but prior to the query execution.
> > 4. Someone's query moved to the execution stage, then IterateForeignScan()
> >  raises an error due to OS level permission checks.
> >
> > FDW is a mechanism to convert from/to external data sources to/from 
> > PostgreSQL's
> > structured data, as literal. Once we checked the permissions of
> > foreign-tables by
> > database ACLs, any other permission checks handled by FDW driver are a part 
> > of
> > execution (like, OS permission check when file_fdw read(2) the
> > underlying CSV files).
> > And, we have no reliable way to check entire permissions preliminary,
> > like OS file
> > permission check or database permission checks by remote server. Even
> > if a checker
> > routine returned a "hint", it may be changed at the execution time.
> > Somebody might
> > change the "truncate" permission at the remote server.
> >
> > How about your opinions?
>
> I agree that something like checker routine might not be so useful and
> also be overkill. I was thinking that it's better to truncate the foreign 
> tables first
> and the local ones later. Otherwise it's a waste of cycles to truncate
> the local tables if the truncation on foreign table causes an permission error
> on the remote server.
>
> But ISTM that the order of tables to truncate that the current patch provides
> doesn't cause any actual bugs. So if you think the current order is better,
> I'm ok with that for now. In this case, the comments for ExecuteTruncate()
> should be updated.
>
It is fair enough for me to reverse the order of actual truncation.

How about the updated comments below?

This is a multi-relation truncate.  We first open and grab exclusive
lock on all relations involved, checking permissions (local database
ACLs even if relations are foreign-tables) and otherwise verifying
that the relation is OK for truncation. In CASCADE mode, ...(snip)...
Finally all the relations are truncated and reindexed. If any foreign-
tables are involved, its callback shall be invoked prior to the truncation
of regular tables.

> BTW, the latest patch doesn't seem to be applied cleanly to the master
> because of commit 27e1f14563. Could you rebase it?
>
Onishi-san, go ahead. :-)

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Process initialization labyrinth

2021-04-01 Thread Andres Freund

Hi,

While working on the shared memory stats patch I (not for the first
time), issues with our process initialization.

The concrete issue was that I noticed that some stats early in startup
weren't processed correctly - the stats system wasn't initialized yet. I
consequently added assertions ensuring that we don't try to report stats
before that. Which blew up.

Even in master we report stats well before the pgstat_initialize()
call. E.g. in autovac workers:
/*
 * Report autovac startup to the stats collector.  We 
deliberately do
 * this before InitPostgres, so that the last_autovac_time will 
get
 * updated even if the connection attempt fails.  This is to 
prevent
 * autovac from getting "stuck" repeatedly selecting an 
unopenable
 * database, rather than making any progress on stuff it can 
connect
 * to.
 */

That previously just didn't cause a problem, because we didn't really
need pgstat_initialize() to have happened for stats reporting to work.

In the shared memory stats patch there's no dependency on
pgstat_initialize() knowing MyBackendId anymore (broken out to a
separate function). So I tried moving the stats initialization to
somewhere earlier.


There currently is simply no way of doing that that doesn't cause
duplication, or weird conditions. We can't do it in:

- InitProcess()/InitAuxiliaryProcess(),
  CreateSharedMemoryAndSemaphores() hasn't yet run in EXEC_BACKEND
- below CreateSharedMemoryAndSemaphores(), as that isn't called for each
  backend in !EXEC_BACKEND
- InitPostgres(), because autovac workers report stats before that
- BaseInit(), because it's called before we have a PROC iff !EXEC_BACKEND
- ...

I have now worked around this by generous application of ugly, but I
think we really need to do something about this mazy mess. There's just
an insane amount of duplication, and it's too complicated to remember
more than a few minutes.

I really would like to not see things like

/*
 * Create a per-backend PGPROC struct in shared memory, except in the
 * EXEC_BACKEND case where this was done in SubPostmasterMain. We must 
do
 * this before we can use LWLocks (and in the EXEC_BACKEND case we 
already
 * had to do some stuff with LWLocks).
 */
#ifdef EXEC_BACKEND
if (!IsUnderPostmaster)
InitProcess();
#else
InitProcess();
#endif

Similarly, codeflow like bootstrap.c being involved in bog standard
stuff like starting up wal writer etc, is just pointlessly
confusing. Note that bootstrap itself does *not* go through
AuxiliaryProcessMain(), and thus has yet another set of initialization
needs.


Greetings,

Andres Freund

Re: Proposal: Save user's original authenticated identity for logging

2021-04-01 Thread Jacob Champion

On Thu, 2021-04-01 at 10:21 +0900, Michael Paquier wrote:
> This stuff can take advantage of 0d1a3343, and I
> think that we should make the kerberos, ldap, authentication and SSL
> test suites just use connect_ok() and connect_fails() from
> PostgresNode.pm.  They just need to be extended a bit with a new
> argument for the log pattern check.

v16, attached, migrates all tests in those suites to connect_ok/fails
(in the first two patches), and also adds the log pattern matching (in
the final feature patch).

A since-v15 diff is attached, but it should be viewed with suspicion
since I've rebased on top of the new SSL tests at the same time.

--Jacob
diff --git a/src/test/authentication/t/001_password.pl 
b/src/test/authentication/t/001_password.pl
index f75d555ae5..f2cd7377d1 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -17,7 +17,7 @@ if (!$use_unix_sockets)
 }
 else
 {
-   plan tests => 22;
+   plan tests => 23;
 }
 
 
@@ -35,47 +35,28 @@ sub reset_pg_hba
return;
 }
 
-# Test access for a single role, useful to wrap all tests into one.
+# Test access for a single role, useful to wrap all tests into one.  Extra
+# named parameters are passed to connect_ok/fails as-is.
 sub test_role
 {
-   my $node = shift;
-   my $role = shift;
-   my $method   = shift;
-   my $expected_res = shift;
-   my $expected_log_msg = shift;
-   my $status_string= 'failed';
-
+   my ($node, $role, $method, $expected_res, %params) = @_;
+   my $status_string = 'failed';
$status_string = 'success' if ($expected_res eq 0);
 
local $Test::Builder::Level = $Test::Builder::Level + 1;
 
-   my $res =
- $node->psql('postgres', undef, extra_params => [ '-U', $role, '-w' ]);
-   my $testname =
- "authentication $status_string for method $method, role $role";
-   is($res, $expected_res, $testname);
+   my $connstr = "user=$role";
+   my $testname = "authentication $status_string for method $method, role 
$role";
 
-   # Check if any log generated by authentication matches or not.
-   if ($expected_log_msg ne '')
+   if ($expected_res eq 0)
{
-   my $log_contents = slurp_file($node->logfile);
-   if ($res == 0)
-   {
-   like($log_contents, $expected_log_msg,
-   "$testname: logs matching");
-   }
-   else
-   {
-   unlike($log_contents, $expected_log_msg,
-   "$testname: logs not matching");
-   }
-
-   # Clean up any existing contents in the node's log file so as
-   # future tests don't step on each other's generated contents.
-   truncate $node->logfile, 0;
+   $node->connect_ok($connstr, $testname, %params, extra_params => 
[ '-w' ]);
+   }
+   else
+   {
+   # Currently, we don't check the error message, just the code.
+   $node->connect_fails($connstr, undef, $testname, %params, 
extra_params => [ '-w' ]);
}
-
-   return;
 }
 
 # Initialize primary node
@@ -94,47 +75,41 @@ $node->safe_psql('postgres',
 );
 $ENV{"PGPASSWORD"} = 'pass';
 
-# For "trust" method, all users should be able to connect.
+# For "trust" method, all users should be able to connect. These users are not
+# considered to be authenticated.
 reset_pg_hba($node, 'trust');
-test_role($node, 'scram_role', 'trust', 0);
-test_role($node, 'md5_role',   'trust', 0);
-
-# Particular case here, the logs should have generated nothing related
-# to authenticated users.
-my $log_contents = slurp_file($node->logfile);
-unlike(
-   $log_contents,
-   qr/connection authenticated:/,
-   "trust method does not authenticate users");
-truncate $node->logfile, 0;
+test_role($node, 'scram_role', 'trust', 0,
+   log_unlike => [ qr/connection authenticated:/ ]);
+test_role($node, 'md5_role',   'trust', 0,
+   log_unlike => [ qr/connection authenticated:/ ]);
 
 # For plain "password" method, all users should also be able to connect.
 reset_pg_hba($node, 'password');
 test_role($node, 'scram_role', 'password', 0,
-   qr/connection authenticated: identity="scram_role" method=password/);
+   log_like => [ qr/connection authenticated: identity="scram_role" 
method=password/ ]);
 test_role($node, 'md5_role', 'password', 0,
-   qr/connection authenticated: identity="md5_role" method=password/);
+   log_like => [ qr/connection authenticated: identity="md5_role" 
method=password/ ]);
 
 # For "scram-sha-256" method, user "scram_role" should be able to connect.
 reset_pg_hba($node, 'scram-sha-256');
 test_role($node, 'scram_role', 'scram-sha-256', 0,
-   qr/connection authenticated: identity="scram_role" 
method=scram-sha-256/);
+   log_like => [ qr/connection authenti

Re: DROP INDEX docs - explicit lock naming

2021-04-01 Thread Michael Paquier

On Thu, Apr 01, 2021 at 08:26:50AM -0400, Greg Rychlewski wrote:
> Thanks! I apologize, I added a commitfest entry for this and failed to add
> it to my message: https://commitfest.postgresql.org/33/3053/.
> 
> This is my first time submitting a patch and I'm not sure if it needs to be
> deleted now or if you are supposed to add yourself as a committer.

Thanks, I did not notice that.  The entry has been updated as it
should.
--
Michael


signature.asc
Description: PGP signature

Re: WIP: WAL prefetch (another approach)

2021-04-01 Thread Thomas Munro

On Fri, Mar 19, 2021 at 2:29 PM Tomas Vondra
 wrote:
> On 3/18/21 1:54 AM, Thomas Munro wrote:
> > I'm now looking at Horiguchi-san and Heikki's patch[2] to remove
> > XLogReader's callbacks, to try to understand how these two patch sets
> > are related.  I don't really like the way those callbacks work, and
> > I'm afraid had to make them more complicated.  But I don't yet know
> > very much about that other patch set.  More soon.
>
> OK. Do you think we should get both of those patches in, or do we need
> to commit them in a particular order? Or what is your concern?

I would like to commit the callback-removal patch first, and then the
WAL decoder and prefetcher patches become simpler and cleaner on top
of that.  I will post the rebase and explanation shortly.

Re: OpenBSD versus semaphores

2021-04-01 Thread Thomas Munro

On Fri, Apr 2, 2021 at 9:42 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > On Tue, Jan 8, 2019 at 7:14 PM Tom Lane  wrote:
> >> So I looked around for an alternative, and found out that modern
> >> OpenBSD releases support named POSIX semaphores (though not unnamed
> >> ones, at least not shared unnamed ones).  What's more, it appears that
> >> in this implementation, named semaphores don't eat open file descriptors
> >> as they do on macOS, removing our major objection to using them.
>
> > No OpenBSD here, but I was curious enough to peek at their
> > implementation.  Like others, they create a tiny file under /tmp for
> > each one, mmap() and close the fd straight away.
>
> Oh, yeah, I can see a bunch of tiny mappings with procmap.  I wonder
> whether that scales any better than an open FD per semaphore, when
> it comes to forking a bunch of child processes that will inherit
> all those mappings or FDs.  I've not tried to benchmark child process
> launch as such --- as I said, I'm not running this on hardware that
> would support serious benchmarking.

I also have no ability to benchmark on a real OpenBSD system, but once
a year or so when I spin up a little OpenBSD VM to test some patch or
other, it annoys me that our tests fail out of the box and then I have
to look up how to change the sysctls, so here's a patch.  I also
checked the release notes to confirm that 5.5 is the right release to
look for[1]; by now that's EOL and probably not even worth bothering
with the test but doesn't cost much to be cautious about that.  4.x is
surely too old to waste electrons on.  I guess the question for
OpenBSD experts is whether having (say) a thousand tiny mappings is
bad.  On the plus side, we know from other Oses that having semas
spread out is good for reducing false sharing on large systems.

[1] https://www.openbsd.org/55.html
From e7bcfa563e568ca8d3474432fd094a38205c78f0 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 2 Apr 2021 03:43:26 +1300
Subject: [PATCH] Use POSIX_NAMED_SEMAPHORES on OpenBSD.

For PostgreSQL to work out of the box without sysctl changes, let's use
POSIX semaphores instead of System V ones.  The "unamed" kind aren't
supported in shared memory, so we use the "named" kind.

Discussion: https://postgr.es/m/27582.1546928073%40sss.pgh.pa.us
---
 doc/src/sgml/runtime.sgml |  6 +-
 src/template/openbsd  | 11 +++
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index bf877c0e0c..9aa3129aba 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -998,11 +998,7 @@ psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such

 The default shared memory settings are usually good enough, unless
 you have set shared_memory_type to sysv.
-You will usually want to
-increase kern.seminfo.semmni
-and kern.seminfo.semmns,
-as OpenBSD's default settings
-for these are uncomfortably small.
+System V semaphores are not used on this platform (since OpenBSD 5.5).

 

diff --git a/src/template/openbsd b/src/template/openbsd
index 365268c489..d3b6bf464a 100644
--- a/src/template/openbsd
+++ b/src/template/openbsd
@@ -2,3 +2,14 @@
 
 # Extra CFLAGS for code that will go into a shared library
 CFLAGS_SL="-fPIC -DPIC"
+
+# OpenBSD 5.5 (2014) gained named POSIX semaphores.  They work out of the box
+# without changing any sysctl settings, unlike System V semaphores.
+case $host_os in
+  openbsd5.[01234]*)
+USE_SYSV_SEMAPHORES=1
+;;
+  *)
+USE_NAMED_POSIX_SEMAPHORES=1
+;;
+esac
-- 
2.30.1

Re: Force lookahead in COPY FROM parsing

2021-04-01 Thread Heikki Linnakangas


On 18/03/2021 17:09, John Naylor wrote:
The cfbot thinks this patch no longer applies, but it works for me, so 
still set to RFC. It looks like it's because the thread to remove the v2 
FE/BE protocol was still attached to the commitfest entry. I've deleted 
that, so let's see if that helps.


It was now truly bitrotted by commit f82de5c46b (the encoding conversion 
changes). Rebase attached.


To answer the side question of whether it makes any difference in 
performance, I used the blackhole AM [1] to isolate the copy code path 
as much as possible. Forcing lookahead seems to not make a noticeable 
difference (min of 5 runs):


master:
306ms

force lookahead:
304ms

I forgot to mention the small detail of what the test was:

create extension blackhole_am;
create table blackhole_tab (a text) using blackhole_am ;
copy blackhole_tab from program 'for i in {1..100}; do cat /path/to/UTF-8\ 
Sampler.htm ;  done;' ;

...where the .htm file is at http://kermitproject.org/utf8.html


Ok, I wouldn't expect to see much difference in that test, it gets 
drowned in all the other parsing overhead. I tested this now with this:


copy (select repeat('x', 1) from generate_series(1, 10)) to 
'/tmp/copydata-x.txt'

create table blackhole_tab (a text);

copy blackhole_tab from '/tmp/copydata-x.txt' where false;

I took the min of 5 runs of that COPY FROM statement:

master:
4107 ms

v3-0001-Simplify-COPY-FROM-parsing-by-forcing-lookahead.patch:
3172 ms

I was actually surprised it was so effective on that test, I expected a 
small but noticeable gain. But I'll take it.


Replying to your earlier comments (sorry for the delay):


Looks good to me. Just a couple minor things:

+ * Look at the next character.  If we're at EOF, c2 will wind
+ * up as '\0' because of the guaranteed pad of raw_buf.
  */
- IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
-
- /* get next char */
  c = copy_raw_buf[raw_buf_ptr];

The new comment seems copy-pasted from the c2 statements further down.


Fixed.


- if (raw_buf_ptr >= copy_buf_len || need_data)
+#define COPY_READ_LINE_LOOKAHEAD 4
+ if (raw_buf_ptr + COPY_READ_LINE_LOOKAHEAD >= copy_buf_len)

Is this #define deliberately put down here rather than at the top of the file?


Yeah, it's only used right here locally. Matter of taste, but I'd prefer 
to keep it here.



- * of the buffer and then we load more data after that.  This case occurs only
- * when a multibyte character crosses a bufferload boundary.
+ * of the buffer and then we load more data after that.

Is the removed comment really invalidated by this patch? I figured it was 
something not affected until the patch to do the encoding conversion in larger 
chunks.


Not sure anymore, but this is moot now, since the other patch was committed.

Thanks for the review and the testing!

- Heikki
>From 463ac83e337340a78179502a23c2e775a44afcbd Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Thu, 1 Apr 2021 23:22:05 +0300
Subject: [PATCH v3 1/1] Simplify COPY FROM parsing by forcing lookahead.

Now that we don't support the old-style COPY protocol anymore, we can
force four bytes of lookahead like was suggested in an existing comment,
and simplify the loop in CopyReadLineText().

Reviewed-by: John Naylor
Discussion: https://www.postgresql.org/message-id/89627a2a-c123-a8aa-267e-5d168feb73dd%40iki.fi
---
 src/backend/commands/copyfromparse.c | 111 +--
 1 file changed, 35 insertions(+), 76 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 0813424768..b52abc1520 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -90,21 +90,6 @@
  * empty statements.  See http://www.cit.gu.edu.au/~anthony/info/C/C.macros.
  */
 
-/*
- * This keeps the character read at the top of the loop in the buffer
- * even if there is more than one read-ahead.
- */
-#define IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(extralen) \
-if (1) \
-{ \
-	if (input_buf_ptr + (extralen) >= copy_buf_len && !hit_eof) \
-	{ \
-		input_buf_ptr = prev_raw_ptr; /* undo fetch */ \
-		need_data = true; \
-		continue; \
-	} \
-} else ((void) 0)
-
 /* This consumes the remainder of the buffer and breaks */
 #define IF_NEED_REFILL_AND_EOF_BREAK(extralen) \
 if (1) \
@@ -159,7 +144,7 @@ static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
 
 /* Low-level communications functions */
 static int	CopyGetData(CopyFromState cstate, void *databuf,
-		int minread, int maxread);
+		int maxread);
 static inline bool CopyGetInt32(CopyFromState cstate, int32 *val);
 static inline bool CopyGetInt16(CopyFromState cstate, int16 *val);
 static void CopyLoadInputBuf(CopyFromState cstate);
@@ -230,9 +215,8 @@ ReceiveCopyBinaryHeader(CopyFromState cstate)
 /*
  * CopyGetData reads data from the source (file or frontend)
  *
- * We attempt to read at least minread, and at most maxread, bytes from
- * the source.  The actual number of bytes read is returned

Re: pg_amcheck contrib application

2021-04-01 Thread Robert Haas

On Thu, Apr 1, 2021 at 1:41 PM Robert Haas  wrote:
> OK, committed. We still need to deal with what you had as 0003
> upthread, so I guess the next step is for me to spend some time
> reviewing that one.

I did this, and it was a bit depressing. It appears that we now have
duplicate checks for xmin and xmax being out of the valid range.
Somehow you have the removal of those duplicate checks in 0003, but
why in the world didn't you put them into one of the previous patches
and just make 0003 about fixing the
holding-buffer-lock-while-following-TOAST-pointers problem? (And, gah,
why did I not look carefully enough to notice that you hadn't done
that?)

Other than that, I notice a few other things:

- There are a total of two (2) calls in the current source code to
palloc0fast, and hundreds of calls to palloc0. So I think you should
forget about using the fast variant and just do what almost every
other caller does.

- If you want to make this code faster, a better idea would be to
avoid doing all of this allocate and free work and just allocate an
array that's guaranteed to be big enough, and then keep track of how
many elements of that array are actually in use.

- #ifdef DECOMPRESSION_CORRUPTION_CHECKING is not a useful way of
introducing such a feature. Either we do it for real and expose it via
SQL and pg_amcheck as an optional behavior, or we rip it out and
revisit it later. Given the nearness of feature freeze, my vote is for
the latter.

- I'd remove the USE_LZ4 bit, too. Let's not define the presence of
LZ4 data in a non-LZ4-enabled cluster as corruption. If we do, then
people will expect to be able to use this to find places where they
are dependent on LZ4 if they want to move away from it -- and if we
don't recurse into composite datums, that will not actually work.

- check_toast_tuple() has an odd and slightly unclear calling
convention for which there are no comments. I wonder if it would be
better to reverse things and make bool *error the return value and
what is now the return value into a pointer argument, but whatever we
do I think it needs a few words in the comments. We don't need to
slavishly explain every argument -- I think toasttup and ctx and tctx
are reasonably clear -- but this is not.

- To me it would be more logical to reverse the order of the
toast_pointer.va_toastrelid != ctx->rel->rd_rel->reltoastrelid and
VARATT_EXTERNAL_GET_EXTSIZE(toast_pointer) > toast_pointer.va_rawsize
- VARHDRSZ checks. Whether we're pointing at the correct relation
feels more fundamental.

- If we moved the toplevel foreach loop in check_toasted_attributes()
out to the caller, say renaming the function to just
check_toasted_attribute(), we'd save a level of indentation in that
whole function and probably add a tad of clarity, too. You wouldn't
feel the need to Assert(ctx.toasted_attributes == NIL) in the caller
if the caller had just done list_free(ctx->toasted_attributes);
ctx->toasted_attributes = NIL.

- Is there a reason we need a cross-check on both the number of chunks
and on the total size? It seems to me that we should check that each
individual chunk has the size we expect, and that the total number of
chunks is what we expect. The overall size is then necessarily
correct.

- Why are all the changes to the tests in this patch? What do they
have to do with getting the TOAST checks out from under the buffer
lock? I really need you to structure the patch series so that each
patch is about one topic and, equally, so that each topic is only
covered by one patch. Otherwise it's just way too confusing.

- I think some of these messages need a bit of word-smithing, too, but
we can leave that for when we're closer to being done with this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: New predefined roles- 'pg_read/write_all_data'

2021-04-01 Thread Stephen Frost

Greetings,

* gkokola...@pm.me (gkokola...@pm.me) wrote:
> On Monday, November 23, 2020 11:31 PM, Stephen Frost  
> wrote:
> > -   Anastasia Lubennikova (a.lubennik...@postgrespro.ru) wrote:
> >
> > > On 29.10.2020 17:19, Stephen Frost wrote:
> > >
> > > > -   Georgios Kokolatos (gkokola...@protonmail.com) wrote:
> > > >
> > > > > this patch is in "Ready for committer" state and the Cfbot is happy.
> > > > > Glad that's still the case. :)
> > > >
> > > > > Is there any committer that is available for taking a look at it?
> > > > > If there aren't any objections or further comments, I'll take another
> > > > > look through it and will commit it during the upcoming CF.
> > >
> > > CFM reminder. Just in case you forgot about this thread)
> > > The commitfest is heading to the end. And there was a plenty of time for
> > > anyone to object.
> >
> > Thanks, I've not forgotten, but it's a bit complicated given that I've
> > another patch in progress to rename default roles to be predefined
> > roles which conflicts with this one. Hopefully we'll have a few
> > comments on that and I can get it committed and this one updated with
> > the new naming. I'd rather not commit this one and then immediately
> > commit changes over top of it.
> 
> May I enquire about the status of the current?

The patch to rename default roles to predefined roles for v14 has gone
in, and so I've come back to this patch to add the
pg_read/write_all_data roles.

Having contemplated a bit further, I ended up deciding that it made more
sense for these predefined roles to *not* have BYPASSRLS, which gives
admins the flexibilty to choose if they actually want RLS to be
bypassed, or not, on the roles who they GRANT these roles to (if we just
always had bypassrls set, then they wouldn't have any choice but to
accept that, which doesn't seem very kind).  I've updated the
documentation to make a note of that and to encourage admins who use
these roles to consider if they want to set BYPASSRLS on the actual
login role which they'll have to create in order to use these roles
(since they can't be used to login directly).

Updated patch attached.  Will be playing with it a bit more but
generally feel like it's in pretty good shape.  Unless there's anything
further on this, I'll likely commit it over the weekend.

Thanks!

Stephen
From 3032fcf79d162c8e170d648af26d9230609c7b29 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Fri, 26 Mar 2021 17:24:18 -0400
Subject: [PATCH] Add pg_read_all_data and pg_write_all_data roles

A commonly requested use-case is to have a role who can run an
unfettered pg_dump without having to explicitly GRANT that user access
to all tables, schemas, et al, without that role being a superuser.
This address that by adding a "pg_read_all_data" role which implicitly
gives any member of this role SELECT rights on all tables, views and
sequences, and USAGE rights on all schemas.

As there may be cases where it's also useful to have a role who has
write access to all objects, pg_write_all_data is also introduced and
gives users implicit INSERT, UPDATE and DELETE rights on all tables,
views and sequences.

These roles can not be logged into directly but instead should be
GRANT'd to a role which is able to log in.  As noted in the
documentation, if RLS is being used then an administrator may (or may
not) wish to set BYPASSRLS on the login role which these predefined
roles are GRANT'd to.

Reviewed-by: Georgios Kokolatos
Discussion: https://postgr.es/m/20200828003023.gu29...@tamriel.snowman.net
---
 doc/src/sgml/user-manag.sgml | 18 +++
 src/backend/catalog/aclchk.c | 30 +++
 src/include/catalog/pg_authid.dat| 10 +++
 src/test/regress/expected/privileges.out | 38 +++-
 src/test/regress/sql/privileges.sql  | 20 +
 5 files changed, 115 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/user-manag.sgml b/doc/src/sgml/user-manag.sgml
index d171b13236..fe0bdb7599 100644
--- a/doc/src/sgml/user-manag.sgml
+++ b/doc/src/sgml/user-manag.sgml
@@ -518,6 +518,24 @@ DROP ROLE doomed_role;

+  
+   pg_read_all_data
+   Read all data (tables, views, sequences), as if having SELECT
+   rights on those objects, and USAGE rights on all schemas, even without
+   having it explicitly.  This role does not have the role attribute
+   BYPASSRLS set.  If RLS is being used, an administrator
+   may wish to set BYPASSRLS on roles which this role is
+   GRANTed to.
+  
+  
+   pg_write_all_data
+   Write all data (tables, views, sequences), as if having INSERT,
+   UPDATE, and DELETE rights on those objects, and USAGE rights on all
+   schemas, even without having it explicitly.  This role does not have the
+   role attribute BYPASSRLS set.  If RLS is being used,
+   an administrator may wish to set BYPASSRLS on roles
+   which this role is GRANTed to.
+

Re: Default role -> Predefined role

2021-04-01 Thread Stephen Frost

Greetings,

* Stephen Frost (sfr...@snowman.net) wrote:
> Unless there's anything further on this, I'll plan to push in the next
> day or so.

... and done.

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-01 Thread Bruce Momjian

On Fri, Apr  2, 2021 at 02:28:02AM +0800, Julien Rouhaud wrote:
> Unless I'm missing something query_string isn't a global variable, it's a
> parameter passed to exec_simple_query() from postgresMain().
> 
> It's then passed to the stats collector to be able to be displayed in
> pg_stat_activity through pgstat_report_activity() a bit like what I do for the
> queryid.
> 
> There's a global variable debug_query_string, but it's only for debugging
> purpose.
> 
> > > I wonder if the query hash
> > > should be a global variable too --- this would more clearly match how we
> > > handle top-level info like query_string.  Digging into the stats system
> > > to get top-level info does seem odd.
> 
> The main difference is that there's a single top level query_string,
> even if it contains multiple statements.  But there would be multiple queryid
> calculated in that case and we don't want to change it during a top level
> multi-statements execution, so we can't use the same approach.
> 
> Also, the query_string is directly logged from this code path, while the
> queryid is logged as a log_line_prefix, and almost all the code there also
> retrieve information from some shared structure.
> 
> And since it also has to be available in pg_stat_activity, having a single
> source of truth looked like a better approach.
> 
> > Also, if you go in that direction, make sure the hash it set in the same
> > places the query string is set, though I am unclear how extensions would
> > handle that.
> 
> It should be transparent for application, it's extracting the first queryid
> seen for each top level statement and export it.  The rest of the code still
> continue to see the queryid that corresponds to the really executed single
> statement.

OK, I am happy with your design decisions, thanks.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: libpq debug log

2021-04-01 Thread 'alvhe...@alvh.no-ip.org'

On 2021-Apr-01, Tom Lane wrote:

> So drongo is still failing, and after a bit of looking around at
> other code I finally got hit with the clue hammer.  Per port.h:
> 
>  * On Windows, setvbuf() does not support _IOLBF mode, and interprets that
>  * as _IOFBF.  To add insult to injury, setvbuf(file, NULL, _IOFBF, 0)
>  * crashes outright if "parameter validation" is enabled.  Therefore, in
>  * places where we'd like to select line-buffered mode, we fall back to
>  * unbuffered mode instead on Windows.  Always use PG_IOLBF not _IOLBF
>  * directly in order to implement this behavior.
> 
> You want to do the honors?  And do something about that shift bug
> while at it.

Ooh, wow ... now that is a silly bug!  Thanks, I'll push the fix in a
minute.

Andrew also noted that the use of command_ok() in the test file eats all
the output and is what is preventing us from seeing it in the first.
I'll try to get rid of that together with the rest.

-- 
Álvaro Herrera   Valdivia, Chile

Re: libpq debug log

2021-04-01 Thread 'alvhe...@alvh.no-ip.org'

On 2021-Apr-01, Tom Lane wrote:

> "'alvhe...@alvh.no-ip.org'"  writes:
> > On 2021-Apr-01, Tom Lane wrote:
> >> BTW, what in the world is this supposed to accomplish?
> >> -(long long) rows_to_send);
> >> +(1L << 62) + (long long) rows_to_send);
> 
> > It makes the text representation wider, which means some buffer fills up
> > faster and the program switches from sending to receiving.
> 
> Hm.  If the actual field value isn't relevant, why bother including
> rows_to_send in it?  A constant string would be simpler and much
> less confusing as to its purpose.

Hah, yeah, we could as well do that I guess :-)

-- 
Álvaro Herrera   Valdivia, Chile
"I think my standards have lowered enough that now I think 'good design'
is when the page doesn't irritate the living f*ck out of me." (JWZ)

Re: Since '2001-09-09 01:46:40'::timestamp microseconds are lost when extracting epoch

2021-04-01 Thread Peter Eisentraut


On 19.03.21 21:06, Tom Lane wrote:

Yeah, I was wondering if we could do something like that, but I hadn't
got as far as figuring a way to deal with divisors not a multiple of
NBASE.

Looking at the proposed code, I wonder if it wouldn't be better to
define the function as taking the base-10-log of the divisor, so that
you'd have the number of digits to shift (and the dscale) immediately
instead of needing repeated integer divisions to get that.


done that way, much simpler now


Also, the
risk of intermediate overflow here seems annoying:

+   if (unlikely(pg_mul_s64_overflow(val1, NBASE/x, &val1)))
+   elog(ERROR, "overflow");

Maybe that's unreachable for the ranges of inputs the current patch could
create, but it seems like it makes the function distinctly less
general-purpose than one would think from its comment.  Maybe, if that
overflows, we could handle the failure by making that adjustment after
we've converted to numeric?


also done

I also figured out a way to combine the float8 and numeric 
implementations so that there is not so much duplication.  Added tests 
to cover all the edge and overflow cases.


I think this is solid now.

The extract(julian from timestamp) is still a bit in the slow mode, but 
as I previously stated, it's not documented and gives the wrong result, 
so it's not clear whether it should be fixed and what it should do.  I 
think I'll register that part as an open item in any case, to see what 
we should do about that.
From d6d28fc50107401ffd9d8ed0e7c641e6ef21fe3c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 1 Apr 2021 20:12:07 +0200
Subject: [PATCH v6] Change return type of EXTRACT to numeric

The previous implementation of EXTRACT mapped internally to
date_part(), which returned type double precision (since it was
implemented long before the numeric type existed).  This can lead to
imprecise output in some cases, so returning numeric would be
preferrable.  Changing the return type of an existing function is a
bit risky, so instead we do the following:  We implement a new set of
functions, which are now called "extract", in parallel to the existing
date_part functions.  They work the same way internally but use
numeric instead of float8.  The EXTRACT construct is now mapped by the
parser to these new extract functions.  That way, dumps of views
etc. from old versions (which would use date_part) continue to work
unchanged, but new uses will map to the new extract functions.

Additionally, the reverse compilation of EXTRACT now reproduces the
original syntax, using the new mechanism introduced in
40c24bfef92530bd846e111c1742c2a54441c62c.

The following minor changes of behavior result from the new
implementation:

- The column name from an isolated EXTRACT call is now "extract"
  instead of "date_part".

- Extract from date now rejects inappropriate field names such as
  HOUR.  It was previously mapped internally to extract from
  timestamp, so it would silently accept everything appropriate for
  timestamp.

- Return values when extracting fields with possibly fractional
  values, such as second and epoch, now have the full scale that the
  value has internally (so, for example, '1.00' instead of just
  '1').

Discussion: 
https://www.postgresql.org/message-id/flat/42b73d2d-da12-ba9f-570a-420e0cce1...@phystech.edu
---
 doc/src/sgml/func.sgml  |  10 +-
 src/backend/parser/gram.y   |   2 +-
 src/backend/utils/adt/date.c| 308 +++--
 src/backend/utils/adt/numeric.c |  61 +++
 src/backend/utils/adt/ruleutils.c   |  21 +
 src/backend/utils/adt/timestamp.c   | 471 ++--
 src/include/catalog/pg_proc.dat |  18 +
 src/include/utils/numeric.h |   1 +
 src/test/regress/expected/create_view.out   |   2 +-
 src/test/regress/expected/date.out  | 452 +--
 src/test/regress/expected/interval.out  | 101 +++--
 src/test/regress/expected/psql_crosstab.out |  12 +-
 src/test/regress/expected/time.out  |  48 +-
 src/test/regress/expected/timestamp.out |  91 
 src/test/regress/expected/timestamptz.out   |  92 
 src/test/regress/expected/timetz.out|  60 ++-
 src/test/regress/sql/date.sql   |  22 +-
 src/test/regress/sql/interval.sql   |  12 +
 src/test/regress/sql/time.sql   |   7 +
 src/test/regress/sql/timestamp.sql  |  13 +
 src/test/regress/sql/timestamptz.sql|  13 +
 src/test/regress/sql/timetz.sql |   7 +
 22 files changed, 1329 insertions(+), 495 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 3cf243a16a..663f966b97 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -8872,7 +8872,7 @@ Date/Time Functions
   extract
  
  extract ( field 
from timestamp )
- double precision
+ numeric

Re: libpq debug log

2021-04-01 Thread Tom Lane

So drongo is still failing, and after a bit of looking around at
other code I finally got hit with the clue hammer.  Per port.h:

 * On Windows, setvbuf() does not support _IOLBF mode, and interprets that
 * as _IOFBF.  To add insult to injury, setvbuf(file, NULL, _IOFBF, 0)
 * crashes outright if "parameter validation" is enabled.  Therefore, in
 * places where we'd like to select line-buffered mode, we fall back to
 * unbuffered mode instead on Windows.  Always use PG_IOLBF not _IOLBF
 * directly in order to implement this behavior.

You want to do the honors?  And do something about that shift bug
while at it.

regards, tom lane

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-01 Thread Julien Rouhaud

On Thu, Apr 01, 2021 at 01:59:15PM -0400, Bruce Momjian wrote:
> On Thu, Apr  1, 2021 at 01:56:42PM -0400, Bruce Momjian wrote:
> > You are using:
> > 
> > /* --
> >  * pgstat_get_my_queryid() -
> >  *
> >  *  Return current backend's query identifier.
> >  */
> > uint64
> > pgstat_get_my_queryid(void)
> > {
> > if (!MyBEEntry)
> > return 0;
> > 
> > return MyBEEntry->st_queryid;
> > }
> > 
> > Looking at log_statement:
> > 
> > /* Log immediately if dictated by log_statement */
> > if (check_log_statement(parsetree_list))
> > {
> > ereport(LOG,
> > (errmsg("statement: %s", query_string),
> >  errhidestmt(true),
> >  errdetail_execute(parsetree_list)));
> > was_logged = true;
> > }
> > 
> > it uses the global variable query_string.

Unless I'm missing something query_string isn't a global variable, it's a
parameter passed to exec_simple_query() from postgresMain().

It's then passed to the stats collector to be able to be displayed in
pg_stat_activity through pgstat_report_activity() a bit like what I do for the
queryid.

There's a global variable debug_query_string, but it's only for debugging
purpose.

> > I wonder if the query hash
> > should be a global variable too --- this would more clearly match how we
> > handle top-level info like query_string.  Digging into the stats system
> > to get top-level info does seem odd.

The main difference is that there's a single top level query_string,
even if it contains multiple statements.  But there would be multiple queryid
calculated in that case and we don't want to change it during a top level
multi-statements execution, so we can't use the same approach.

Also, the query_string is directly logged from this code path, while the
queryid is logged as a log_line_prefix, and almost all the code there also
retrieve information from some shared structure.

And since it also has to be available in pg_stat_activity, having a single
source of truth looked like a better approach.

> Also, if you go in that direction, make sure the hash it set in the same
> places the query string is set, though I am unclear how extensions would
> handle that.

It should be transparent for application, it's extracting the first queryid
seen for each top level statement and export it.  The rest of the code still
continue to see the queryid that corresponds to the really executed single
statement.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-01 Thread Bruce Momjian

On Thu, Apr  1, 2021 at 01:56:42PM -0400, Bruce Momjian wrote:
> You are using:
> 
>   /* --
>* pgstat_get_my_queryid() -
>*
>*  Return current backend's query identifier.
>*/
>   uint64
>   pgstat_get_my_queryid(void)
>   {
>   if (!MyBEEntry)
>   return 0;
>   
>   return MyBEEntry->st_queryid;
>   }
> 
> Looking at log_statement:
> 
>   /* Log immediately if dictated by log_statement */
>   if (check_log_statement(parsetree_list))
>   {
>   ereport(LOG,
>   (errmsg("statement: %s", query_string),
>errhidestmt(true),
>errdetail_execute(parsetree_list)));
>   was_logged = true;
>   }
> 
> it uses the global variable query_string.  I wonder if the query hash
> should be a global variable too --- this would more clearly match how we
> handle top-level info like query_string.  Digging into the stats system
> to get top-level info does seem odd.

Also, if you go in that direction, make sure the hash it set in the same
places the query string is set, though I am unclear how extensions would
handle that.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-01 Thread Bruce Momjian

On Thu, Apr  1, 2021 at 11:30:15PM +0800, Julien Rouhaud wrote:
> On Thu, Apr 01, 2021 at 11:05:24PM +0800, Julien Rouhaud wrote:
> > On Wed, Mar 31, 2021 at 11:18:45AM -0300, Alvaro Herrera wrote:
> > > On 2021-Mar-31, Bruce Momjian wrote:
> > > > 
> > > > I assume it is since Alvaro didn't reply.  I am planning to apply this
> > > > soon.
> > > 
> > > I'm afraid I don't know enough about how parallel query works to make a
> > > good assessment on this being a good approach or not -- and no time at
> > > present to figure it all out.
> > 
> > I'm far from being an expert either, but at the time I wrote it and
> > looking at the code around it probably seemed sensible.  We could directly 
> > call
> > pgstat_get_my_queryid() in ExecSerializePlan() rather than passing it from 
> > the
> > various callers though, at least there would be a single source for it.
> 
> Here's a v21 that includes the mentioned change.

You are using:

/* --
 * pgstat_get_my_queryid() -
 *
 *  Return current backend's query identifier.
 */
uint64
pgstat_get_my_queryid(void)
{
if (!MyBEEntry)
return 0;

return MyBEEntry->st_queryid;
}

Looking at log_statement:

/* Log immediately if dictated by log_statement */
if (check_log_statement(parsetree_list))
{
ereport(LOG,
(errmsg("statement: %s", query_string),
 errhidestmt(true),
 errdetail_execute(parsetree_list)));
was_logged = true;
}

it uses the global variable query_string.  I wonder if the query hash
should be a global variable too --- this would more clearly match how we
handle top-level info like query_string.  Digging into the stats system
to get top-level info does seem odd.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Data type correction in pgstat_report_replslot function parameters

2021-04-01 Thread Fujii Masao





On 2021/04/02 2:18, Jeevan Ladhe wrote:



On Thu, Apr 1, 2021 at 10:20 PM vignesh C mailto:vignes...@gmail.com>> wrote:

Hi,

While I was reviewing replication slot statistics code, I found one
issue in the data type used for pgstat_report_replslot function
parameters. We pass int64 variables to the function but the function
prototype uses int type. I I felt the function parameters should be
int64. Attached patch fixes the same.


Isn't it better to use PgStat_Counter instead of int64?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: libpq debug log

2021-04-01 Thread Tom Lane

"'alvhe...@alvh.no-ip.org'"  writes:
> On 2021-Apr-01, Tom Lane wrote:
>> BTW, what in the world is this supposed to accomplish?
>> -(long long) rows_to_send);
>> +(1L << 62) + (long long) rows_to_send);

> It makes the text representation wider, which means some buffer fills up
> faster and the program switches from sending to receiving.

Hm.  If the actual field value isn't relevant, why bother including
rows_to_send in it?  A constant string would be simpler and much
less confusing as to its purpose.

regards, tom lane

Re: pg_amcheck contrib application

2021-04-01 Thread Robert Haas

On Thu, Apr 1, 2021 at 1:24 PM Mark Dilger  wrote:
> > On Apr 1, 2021, at 10:20 AM, Robert Haas  wrote:
> > OK, let's try that again.
> Looks good!

OK, committed. We still need to deal with what you had as 0003
upthread, so I guess the next step is for me to spend some time
reviewing that one.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_amcheck contrib application

2021-04-01 Thread Mark Dilger




> On Apr 1, 2021, at 10:20 AM, Robert Haas  wrote:
> 
> On Thu, Apr 1, 2021 at 1:06 PM Mark Dilger  
> wrote:
>> Seems fine other than the typo.
> 
> OK, let's try that again.

Looks good!

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-04-01 Thread Fujii Masao





On 2021/04/02 1:13, Bharath Rupireddy wrote:

On Thu, Apr 1, 2021 at 8:56 PM Fujii Masao  wrote:



Attaching v21 patch set, rebased onto the latest master.


I agree to add the server-level option. But I'm still not sure if it's good 
idea to also expose that option as GUC. Isn't the server-level option enough 
for most cases?

Also it's strange to expose only this option as GUC while there are other many 
postgres_fdw options?

With v21-002 patch, even when keep_connections GUC is disabled, the existing 
open connections are not close immediately. Only connections used in the 
transaction are closed at the end of that transaction. That is, the existing 
connections that no transactions use will never be closed. I'm not sure if this 
behavior is intuitive for users.

Therefore for now I'm thinking to support the server-level option at first... 
Then if we find it's not enough for most cases in practice, I'd like to 
consider to expose postgres_fdw options including keep_connections as GUC.

Thought?


+1 to have only a server-level option for now and if the need arises
we could expose it as a GUC.


BTW these patches fail to be applied to the master because of commit 
27e1f14563. I updated and simplified the 003 patch. Patch attached.


Thanks for updating the patch. It looks good to me. Just a minor
change, instead of using "true" and "off" for the option, I used "on"
and "off" in the docs. Attaching v23.


Thanks a lot! Barring any objection, I will commit this version.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: pg_amcheck contrib application

2021-04-01 Thread Robert Haas

On Thu, Apr 1, 2021 at 1:06 PM Mark Dilger  wrote:
> Seems fine other than the typo.

OK, let's try that again.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v17-0001-amcheck-Fix-verify_heapam-s-tuple-visibility-che.patch
Description: Binary data

Re: Data type correction in pgstat_report_replslot function parameters

2021-04-01 Thread Jeevan Ladhe

On Thu, Apr 1, 2021 at 10:20 PM vignesh C  wrote:

> Hi,
>
> While I was reviewing replication slot statistics code, I found one
> issue in the data type used for pgstat_report_replslot function
> parameters. We pass int64 variables to the function but the function
> prototype uses int type. I I felt the function parameters should be
> int64. Attached patch fixes the same.


+1 for the change. The patch LGTM.

Regards,
Jeevan Ladhe

Re: pg_amcheck contrib application

2021-04-01 Thread Mark Dilger




> On Apr 1, 2021, at 9:56 AM, Robert Haas  wrote:
> 
> On Thu, Apr 1, 2021 at 12:32 PM Mark Dilger
>  wrote:
>>> - If xmax is a multi but seems to be garbled, I changed it to return
>>> true rather than false. The inserter is known to have committed by
>>> that point, so I think it's OK to try to deform the tuple. We just
>>> shouldn't try to check TOAST.
>> 
>> It is hard to know what to do when at least one tuple header field is 
>> corrupt.  You don't necesarily know which one it is.  For example, if 
>> HEAP_XMAX_IS_MULTI is set, we try to interpret the xmax as a mxid, and if it 
>> is out of bounds, we report it as corrupt.  But was the xmax corrupt?  Or 
>> was the HEAP_XMAX_IS_MULTI bit corrupt?  It's not clear.  I took the view 
>> that if either xmin or xmax appear to be corrupt when interpreted in light 
>> of the various tuple header bits, all we really know is that the set of 
>> fields/bits don't make sense as a whole, so we report corruption, don't 
>> trust any of them, and abort further checking of the tuple.  You have be 
>> burden of proof the other way around.  If the xmin appears fine, and xmax 
>> appears corrupt, then we only know that xmax is corrupt, so the tuple is 
>> checkable because according to the xmin it committed.
> 
> I agree that it's hard to be sure what's gone once we start finding
> corrupted data, but deciding that maybe xmin didn't really commit
> because we see that there's something wrong with xmax seems excessive
> to me. I thought about a related case: if xmax is a bad multi but is
> also hinted invalid, should we try to follow TOAST pointers? I think
> that's hard to say, because we don't know whether (1) the invalid
> marking is in error, (2) it's wrong to consider it a multi rather than
> an XID, (3) the stored multi got overwritten with a garbage value, or
> (4) the stored multi got removed before the tuple was frozen. Not
> knowing which of those is the case, how are we supposed to decide
> whether the TOAST tuples might have been (or be about to get) pruned?
> 
> But, in the case we're talking about here, I don't think it's a
> particularly close decision. All we need to say is that if xmax or the
> infomask bits pertaining to it are corrupted, we're still going to
> suppose that xmin and the infomask bits pertaining to it, which are
> all different bytes and bits, are OK. To me, the contrary decision,
> namely that a bogus xmax means xmin was probably lying about the
> transaction having been committed in the first place, seems like a
> serious overreaction. As you say:
> 
>> I don't think how you have it causes undue problems, since deforming the 
>> tuple when you shouldn't merely risks a bunch of extra not-so-helpful 
>> corruption messages.  And hey, maybe they're helpful to somebody clever 
>> enough to diagnose why that particular bit of noise was generated.
> 
> I agree. The biggest risk here is that we might omit >0 complaints
> when only 0 are justified. That will panic users. The possibility that
> we might omit >x complaints when only x are justified, for some x>0,
> is also a risk, but it's not nearly as bad, because there's definitely
> something wrong, and it's just a question of what it is exactly. So we
> have to be really conservative about saying that X is corruption if
> there's any possibility that it might be fine. But once we've
> complained about one thing, we can take a more balanced approach about
> whether to risk issuing more complaints. The possibility that
> suppressing the additional complaints might complicate resolution of
> the issue also needs to be considered.

This all seems fine to me.  The main thing is that we don't go on to check the 
toast, which we don't.

>>* If xmin_status happens to be XID_IN_PROGRESS, then in theory
>> 
>> Did you mean to say XID_IS_CURRENT_XID here?
> 
> Yes, I did, thanks.

Ouch.  You've got a typo:  s/XID_IN_CURRENT_XID/XID_IS_CURRENT_XID/

>> /* xmax is an MXID, not an MXID. Sanity check it. */
>> 
>> Is it an MXID or isn't it?
> 
> Good catch.
> 
> New patch attached.

Seems fine other than the typo.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_amcheck contrib application

2021-04-01 Thread Robert Haas

On Thu, Apr 1, 2021 at 12:32 PM Mark Dilger
 wrote:
> > - If xmax is a multi but seems to be garbled, I changed it to return
> > true rather than false. The inserter is known to have committed by
> > that point, so I think it's OK to try to deform the tuple. We just
> > shouldn't try to check TOAST.
>
> It is hard to know what to do when at least one tuple header field is 
> corrupt.  You don't necesarily know which one it is.  For example, if 
> HEAP_XMAX_IS_MULTI is set, we try to interpret the xmax as a mxid, and if it 
> is out of bounds, we report it as corrupt.  But was the xmax corrupt?  Or was 
> the HEAP_XMAX_IS_MULTI bit corrupt?  It's not clear.  I took the view that if 
> either xmin or xmax appear to be corrupt when interpreted in light of the 
> various tuple header bits, all we really know is that the set of fields/bits 
> don't make sense as a whole, so we report corruption, don't trust any of 
> them, and abort further checking of the tuple.  You have be burden of proof 
> the other way around.  If the xmin appears fine, and xmax appears corrupt, 
> then we only know that xmax is corrupt, so the tuple is checkable because 
> according to the xmin it committed.

I agree that it's hard to be sure what's gone once we start finding
corrupted data, but deciding that maybe xmin didn't really commit
because we see that there's something wrong with xmax seems excessive
to me. I thought about a related case: if xmax is a bad multi but is
also hinted invalid, should we try to follow TOAST pointers? I think
that's hard to say, because we don't know whether (1) the invalid
marking is in error, (2) it's wrong to consider it a multi rather than
an XID, (3) the stored multi got overwritten with a garbage value, or
(4) the stored multi got removed before the tuple was frozen. Not
knowing which of those is the case, how are we supposed to decide
whether the TOAST tuples might have been (or be about to get) pruned?

But, in the case we're talking about here, I don't think it's a
particularly close decision. All we need to say is that if xmax or the
infomask bits pertaining to it are corrupted, we're still going to
suppose that xmin and the infomask bits pertaining to it, which are
all different bytes and bits, are OK. To me, the contrary decision,
namely that a bogus xmax means xmin was probably lying about the
transaction having been committed in the first place, seems like a
serious overreaction. As you say:

> I don't think how you have it causes undue problems, since deforming the 
> tuple when you shouldn't merely risks a bunch of extra not-so-helpful 
> corruption messages.  And hey, maybe they're helpful to somebody clever 
> enough to diagnose why that particular bit of noise was generated.

I agree. The biggest risk here is that we might omit >0 complaints
when only 0 are justified. That will panic users. The possibility that
we might omit >x complaints when only x are justified, for some x>0,
is also a risk, but it's not nearly as bad, because there's definitely
something wrong, and it's just a question of what it is exactly. So we
have to be really conservative about saying that X is corruption if
there's any possibility that it might be fine. But once we've
complained about one thing, we can take a more balanced approach about
whether to risk issuing more complaints. The possibility that
suppressing the additional complaints might complicate resolution of
the issue also needs to be considered.

> * If xmin_status happens to be XID_IN_PROGRESS, then in theory
>
> Did you mean to say XID_IS_CURRENT_XID here?

Yes, I did, thanks.

> /* xmax is an MXID, not an MXID. Sanity check it. */
>
> Is it an MXID or isn't it?

Good catch.

New patch attached.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

v16-0001-amcheck-Fix-verify_heapam-s-tuple-visibility-che.patch
Description: Binary data

Re: Replication slot stats misgivings

2021-04-01 Thread vignesh C

On Thu, Apr 1, 2021 at 5:58 PM Amit Kapila  wrote:
>
> On Thu, Apr 1, 2021 at 3:43 PM vignesh C  wrote:
> >
> > On Wed, Mar 31, 2021 at 11:32 AM vignesh C  wrote:
> > >
> > > On Tue, Mar 30, 2021 at 11:00 AM Andres Freund  wrote:
> > > >
> > > > Hi,
> > > >
> > > > On 2021-03-30 10:13:29 +0530, vignesh C wrote:
> > > > > On Tue, Mar 30, 2021 at 6:28 AM Andres Freund  
> > > > > wrote:
> > > > > > Any chance you could write a tap test exercising a few of these 
> > > > > > cases?
> > > > >
> > > > > I can try to write a patch for this if nobody objects.
> > > >
> > > > Cool!
> > > >
> > >
> > > Attached a patch which has the test for the first scenario.
> > >
> > > > > > E.g. things like:
> > > > > >
> > > > > > - create a few slots, drop one of them, shut down, start up, verify
> > > > > >   stats are still sane
> > > > > > - create a few slots, shut down, manually remove a slot, lower
> > > > > >   max_replication_slots, start up
> > > > >
> > > > > Here by "manually remove a slot", do you mean to remove the slot
> > > > > manually from the pg_replslot folder?
> > > >
> > > > Yep - thereby allowing max_replication_slots after the shutdown/start to
> > > > be lower than the number of slots-stats objects.
> > >
> > > I have not included the 2nd test in the patch as the test fails with
> > > following warnings and also displays the statistics of the removed
> > > slot:
> > > WARNING:  problem in alloc set Statistics snapshot: detected write
> > > past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438
> > > WARNING:  problem in alloc set Statistics snapshot: detected write
> > > past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438
> > >
> > > This happens because the statistics file has an additional slot
> > > present even though the replication slot was removed.  I felt this
> > > issue should be fixed. I will try to fix this issue and send the
> > > second test along with the fix.
> >
> > I felt from the statistics collector process, there is no way in which
> > we can identify if the replication slot is present or not because the
> > statistic collector process does not have access to shared memory.
> > Anything that the statistic collector process does independently by
> > traversing and removing the statistics of the replication slot
> > exceeding the max_replication_slot has its drawback of removing some
> > valid replication slot's statistics data.
> > Any thoughts on how we can identify the replication slot which has been 
> > dropped?
> > Can someone point me to the shared stats patch link with which message
> > loss can be avoided. I wanted to see a scenario where something like
> > the slot is dropped but the statistics are not updated because of an
> > immediate shutdown or server going down abruptly can occur or not with
> > the shared stats patch.
> >
>
> I don't think it is easy to simulate a scenario where the 'drop'
> message is dropped and I think that is why the test contains the step
> to manually remove the slot. At this stage, you can probably provide a
> test patch and a code-fix patch where it just drops the extra slots
> from the stats file. That will allow us to test it with a shared
> memory stats patch on which Andres and Horiguchi-San are working. If
> we still continue to pursue with current approach then as Andres
> suggested we might send additional information from
> RestoreSlotFromDisk to keep it in sync.

Thanks for your comments, Attached patch has the fix for the same.
Also attached a couple of more patches which addresses the comments
which Andres had listed i.e changing char to NameData type and also to
display the unspilled/unstreamed transaction information in the
replication statistics.
Thoughts?

Regards,
Vignesh
From 53d58eeca5335485aba96b57bea68b5807c7bcc4 Mon Sep 17 00:00:00 2001
From: vignesh 
Date: Thu, 1 Apr 2021 21:15:47 +0530
Subject: [PATCH v1 1/4] Changed char datatype to NameData datatype for
 slotname.

Changed char datatype to NameData datatype for slotname.
---
 src/backend/postmaster/pgstat.c | 15 ---
 src/backend/utils/adt/pgstatfuncs.c |  2 +-
 src/include/pgstat.h|  6 +++---
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 4b9bcd2b41..132fdef78c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -64,6 +64,7 @@
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
 #include "utils/ascii.h"
+#include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -1555,7 +1556,7 @@ pgstat_reset_replslot_counter(const char *name)
 		if (SlotIsPhysical(slot))
 			return;
 
-		strlcpy(msg.m_slotname, name, NAMEDATALEN);
+		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
 	else
@@ -1782,7 +1783,7 @@ pgstat_report_replslot(const char *slotname, int spilltxns, int spillcount,
 	 * Prepare and send the message
 	 */
 	pgsta

Data type correction in pgstat_report_replslot function parameters

2021-04-01 Thread vignesh C

Hi,

While I was reviewing replication slot statistics code, I found one
issue in the data type used for pgstat_report_replslot function
parameters. We pass int64 variables to the function but the function
prototype uses int type. I I felt the function parameters should be
int64. Attached patch fixes the same.
Thoughts?

Regards,
Vignesh
From d202d38357c6629efff112fbd56619b4f7f5b00d Mon Sep 17 00:00:00 2001
From: vignesh 
Date: Thu, 1 Apr 2021 22:11:09 +0530
Subject: [PATCH v1] Data type correction in pgstat_report_replslot.

Data type correction in pgstat_report_replslot.
---
 src/backend/postmaster/pgstat.c | 5 +++--
 src/include/pgstat.h| 6 --
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 4b9bcd2b41..0b5affcf62 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -1773,8 +1773,9 @@ pgstat_report_tempfile(size_t filesize)
  * --
  */
 void
-pgstat_report_replslot(const char *slotname, int spilltxns, int spillcount,
-	   int spillbytes, int streamtxns, int streamcount, int streambytes)
+pgstat_report_replslot(const char *slotname, int64 spilltxns, int64 spillcount,
+	   int64 spillbytes, int64 streamtxns, int64 streamcount,
+	   int64 streambytes)
 {
 	PgStat_MsgReplSlot msg;
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index d699502cd9..1f87e668df 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1450,8 +1450,10 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const char *slotname, int spilltxns, int spillcount,
-   int spillbytes, int streamtxns, int streamcount, int streambytes);
+extern void pgstat_report_replslot(const char *slotname, int64 spilltxns,
+   int64 spillcount, int64 spillbytes,
+   int64 streamtxns, int64 streamcount,
+   int64 streambytes);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
-- 
2.25.1

Re: pg_amcheck contrib application

2021-04-01 Thread Mark Dilger




> On Apr 1, 2021, at 8:08 AM, Robert Haas  wrote:
> 
> On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger
>  wrote:
>> These changes got lost between v11 and v12.  I've put them back, as well as 
>> updating to use your language.
> 
> Here's an updated patch that includes your 0001 and 0002 plus a bunch
> of changes by me. I intend to commit this version unless somebody
> spots a problem with it.
> 
> Here are the things I changed:
> 
> - I renamed tuple_can_be_pruned to tuple_could_be_pruned because I
> think it does a better job suggesting that we're uncertain about what
> will happen.

+1

> - I got rid of bool header_garbled and changed it to bool result,
> inverting the sense, because it didn't seem useful to have a function
> that ended with if (some_boolean) return false; return true when I
> could end the function with return some_other_boolean.

+1

> - I removed all the one-word comments that said /* corrupt */ or /*
> checkable */ because they seemed redundant.

Ok.

> - In the xmin_status section of check_tuple_visibility(), I got rid of
> the xmin_status == XID_IS_CURRENT_XID and xmin_status ==
> XID_IN_PROGRESS cases because they were redundant with the xmin_status
> != XID_COMMITTED case.

Ok.

> - If xmax is a multi but seems to be garbled, I changed it to return
> true rather than false. The inserter is known to have committed by
> that point, so I think it's OK to try to deform the tuple. We just
> shouldn't try to check TOAST.

It is hard to know what to do when at least one tuple header field is corrupt.  
You don't necesarily know which one it is.  For example, if HEAP_XMAX_IS_MULTI 
is set, we try to interpret the xmax as a mxid, and if it is out of bounds, we 
report it as corrupt.  But was the xmax corrupt?  Or was the HEAP_XMAX_IS_MULTI 
bit corrupt?  It's not clear.  I took the view that if either xmin or xmax 
appear to be corrupt when interpreted in light of the various tuple header 
bits, all we really know is that the set of fields/bits don't make sense as a 
whole, so we report corruption, don't trust any of them, and abort further 
checking of the tuple.  You have be burden of proof the other way around.  If 
the xmin appears fine, and xmax appears corrupt, then we only know that xmax is 
corrupt, so the tuple is checkable because according to the xmin it committed.

I don't think how you have it causes undue problems, since deforming the tuple 
when you shouldn't merely risks a bunch of extra not-so-helpful corruption 
messages.  And hey, maybe they're helpful to somebody clever enough to diagnose 
why that particular bit of noise was generated. 

> - I changed both switches over xmax_status to break in each case and
> then return true after instead of returning true for each case. I
> think this is clearer.

Ok.

> - I changed get_xid_status() to perform the TransactionIdIs... checks
> in the same order that HeapTupleSatisfies...() does them. I believe
> that it's incorrect to conclude that the transaction must be in
> progress because it neither IsCurrentTransaction nor DidCommit nor
> DidAbort, because all of those things will be false for a transaction
> that is running at the time of a system crash. The correct rule is
> that if it neither IsCurrentTransaction nor IsInProgress nor DidCommit
> then it's aborted.

Ok.

> - I moved a few comments and rewrote some others, including some of
> the ones that you took from my earlier draft patch. The thing is, the
> comment needs to be adjusted based on where you put it. Like, I had a
> comment that says"It should be impossible for xvac to still be
> running, since we've removed all that code, but even if it were, it
> ought to be safe to read the tuple, since the original inserter must
> have committed.  But, if the xvac transaction committed, this tuple
> (and its associated TOAST tuples) could be pruned at any time." which
> in my version was right before a TransactionIdDidCommit() test, and
> explains why that test is there and why the code does what it does as
> a result. But in your version you've moved it to a place where we've
> already tested that the transaction has committed, and more
> importantly, where we've already tested that it's not still running.
> Saying that it "should" be impossible for it not to be running when
> we've *actually checked that* doesn't make nearly as much sense as it
> does when we haven't checked that and aren't going to do so.


* If xmin_status happens to be XID_IN_PROGRESS, then in theory  
   
 
Did you mean to say XID_IS_CURRENT_XID here? 

/* xmax is an MXID, not an MXID. Sanity check it. */ 

Is it an MXID or isn't it?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-04-01 Thread Bharath Rupireddy

On Thu, Apr 1, 2021 at 8:56 PM Fujii Masao  wrote:
>
> > Attaching v21 patch set, rebased onto the latest master.
>
> I agree to add the server-level option. But I'm still not sure if it's good 
> idea to also expose that option as GUC. Isn't the server-level option enough 
> for most cases?
>
> Also it's strange to expose only this option as GUC while there are other 
> many postgres_fdw options?
>
> With v21-002 patch, even when keep_connections GUC is disabled, the existing 
> open connections are not close immediately. Only connections used in the 
> transaction are closed at the end of that transaction. That is, the existing 
> connections that no transactions use will never be closed. I'm not sure if 
> this behavior is intuitive for users.
>
> Therefore for now I'm thinking to support the server-level option at first... 
> Then if we find it's not enough for most cases in practice, I'd like to 
> consider to expose postgres_fdw options including keep_connections as GUC.
>
> Thought?

+1 to have only a server-level option for now and if the need arises
we could expose it as a GUC.

> BTW these patches fail to be applied to the master because of commit 
> 27e1f14563. I updated and simplified the 003 patch. Patch attached.

Thanks for updating the patch. It looks good to me. Just a minor
change, instead of using "true" and "off" for the option, I used "on"
and "off" in the docs. Attaching v23.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v23-0003-postgres_fdw-server-level-option-keep_connection.patch
Description: Binary data

Re: libpq debug log

2021-04-01 Thread 'alvhe...@alvh.no-ip.org'

On 2021-Apr-01, Tom Lane wrote:

> BTW, what in the world is this supposed to accomplish?
> 
> -(long long) rows_to_send);
> +(1L << 62) + (long long) rows_to_send);
> 
> Various buildfarm members are complaining that the shift distance
> is more than the width of "long", which I'd just fix with s/L/LL/
> except that the addition of the constant seems just wrong to
> begin with.

It makes the text representation wider, which means some buffer fills up
faster and the program switches from sending to receiving.

-- 
Álvaro Herrera   Valdivia, Chile
"Ni aún el genio muy grande llegaría muy lejos
si tuviera que sacarlo todo de su propio interior" (Goethe)

Re: truncating timestamps on arbitrary intervals

2021-04-01 Thread John Naylor

On Thu, Apr 1, 2021 at 9:11 AM Salek Talangi 
wrote:
>
> Hi all,
>
> it might be a bit late now, but do you know that TimescaleDB already has
a similar feature, named time_bucket?
> https://docs.timescale.com/latest/api#time_bucket
> Perhaps that can help with some design decisions.

Yes, thanks I'm aware of it. It's a bit more feature-rich, and I wanted to
have something basic that users can have available without installing an
extension.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Support ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION ... syntax

2021-04-01 Thread Bharath Rupireddy

On Tue, Mar 23, 2021 at 8:39 PM Japin Li  wrote:
> I check the duplicates for newpublist in merge_publications(). The code is
> copied from publicationListToArray().

IMO, we can have the same code inside a function, probably named
"check_duplicates_in_publist" or some other better name:
static void
check_duplicates_in_publist(List *publist, Datum *datums)
{
intj = 0;
ListCell   *cell;

foreach(cell, publist)
{
char   *name = strVal(lfirst(cell));
ListCell   *pcell;

/* Check for duplicates. */
foreach(pcell, publist)
{
char   *pname = strVal(lfirst(pcell));

if (pcell == cell)
break;

if (strcmp(name, pname) == 0)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
 errmsg("publication name \"%s\" used more than once",
pname)));
}

if (datums)
datums[j++] = CStringGetTextDatum(name);
}
}

>From publicationListToArray, call check_duplicates_in_publist(publist, datums);
>From merge_publications, call check_duplicates_in_publist(newpublist, NULL);

> I do not check for all duplicates because it will make the code more complex.
> For example:
>
> ALTER SUBSCRIPTION mysub ADD PUBLICATION mypub2, mypub2, mypub2;

That's fine because we anyways, error out.

0002:
The tests added in subscription.sql look fine to me and they cover
most of the syntax related code. But it will be good if we can add
tests to see if the data of the newly added/dropped publications
would/would not reflect on the subscriber, maybe you can consider
adding these tests into 001_rep_changes.pl, similar to ALTER
SUBSCRIPTION ... SET PUBLICATION test there.

0003:
I think it's not "set_publication_option", they are
"add_publication_option" and "drop_publication_option" for ADD and
DROP respectively. Please change it wherever "set_publication_option"
is used instead.
+ALTER SUBSCRIPTION name
ADD PUBLICATION publication_name [, ...] [ WITH (
set_publication_option [=
value] [, ... ] ) ]
+ALTER SUBSCRIPTION name
DROP PUBLICATION publication_name [, ...] [ WITH (
set_publication_option [=
value] [, ... ] ) ]

0004:
LGTM.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Asynchronous Append on postgres_fdw nodes.

2021-04-01 Thread Etsuro Fujita

On Fri, Apr 2, 2021 at 12:09 AM Tom Lane  wrote:
> The buildfarm points out that this fails under valgrind.
> I easily reproduced it here:

> Sorta looks like something is relying on a pointer into the relcache
> to be valid for longer than it can safely rely on that.  The
> CLOBBER_CACHE_ALWAYS animals will probably be unhappy too, but
> they are slower than valgrind.
>
> (Note that the test case appears to succeed, you have to notice that
> the backend crashed after exiting.)

Will look into this.

Best regards,
Etsuro Fujita

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-01 Thread Julien Rouhaud

On Thu, Apr 01, 2021 at 11:05:24PM +0800, Julien Rouhaud wrote:
> On Wed, Mar 31, 2021 at 11:18:45AM -0300, Alvaro Herrera wrote:
> > On 2021-Mar-31, Bruce Momjian wrote:
> > > 
> > > I assume it is since Alvaro didn't reply.  I am planning to apply this
> > > soon.
> > 
> > I'm afraid I don't know enough about how parallel query works to make a
> > good assessment on this being a good approach or not -- and no time at
> > present to figure it all out.
> 
> I'm far from being an expert either, but at the time I wrote it and
> looking at the code around it probably seemed sensible.  We could directly 
> call
> pgstat_get_my_queryid() in ExecSerializePlan() rather than passing it from the
> various callers though, at least there would be a single source for it.

Here's a v21 that includes the mentioned change.
>From 819a45faf520dfd60b4fe3e9aea71e3a2b69 Mon Sep 17 00:00:00 2001
From: Bruce Momjian 
Date: Mon, 22 Mar 2021 17:43:22 -0400
Subject: [PATCH v21 1/3] Move pg_stat_statements query jumbling to core.

A new compute_query_id GUC is also added, to control whether a query identifier
should be computed by the core.  It's thefore now possible to disable core
queryid computation and use pg_stat_statements with a different algorithm to
compute the query identifier by using third-party module.

To ensure that a single source of query identifier can be used and is well
defined, modules that calculate a query identifier should throw an error if
compute_query_id is enabled or if a query idenfitier was already calculated.
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +
 .../pg_stat_statements.conf   |   1 +
 doc/src/sgml/config.sgml  |  25 +
 doc/src/sgml/pgstatstatements.sgml|  20 +-
 src/backend/parser/analyze.c  |  14 +-
 src/backend/tcop/postgres.c   |   6 +-
 src/backend/utils/misc/Makefile   |   1 +
 src/backend/utils/misc/guc.c  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c  | 834 ++
 src/include/parser/analyze.h  |   4 +-
 src/include/utils/guc.h   |   1 +
 src/include/utils/queryjumble.h   |  58 ++
 13 files changed, 995 insertions(+), 785 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 62cccbfa44..bd8c96728c 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * Starting in Postgres 9.2, this module normalized query entries.  As of
+ * Postgres 14, the normalization is done by the core if compute_query_id is
+ * enabled, or optionally by third-party modules.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -114,8 +99,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -235,40 +218,6 @@ typedef struct pgssSharedState
 	pgssGlobalStats stats;		/* global statistics for pgss */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-04-01 Thread Fujii Masao




On 2021/02/22 14:55, Bharath Rupireddy wrote:

On Thu, Feb 4, 2021 at 9:36 AM Bharath Rupireddy
 wrote:


On Wed, Feb 3, 2021 at 4:22 PM Fujii Masao  wrote:

Maybe my explanation in the previous email was unclear. What I think is; If the 
server-level option is explicitly specified, its setting is used whatever GUC 
is. On the other hand, if the server-level option is NOT specified, GUC setting 
is used. For example, if we define the server as follows, GUC setting is used 
because the server-level option is NOT specified.

  CREATE SERVER loopback FOREIGN DATA WRAPPER postgres;

If we define the server as follows, the server-level setting is used.

  CREATE SERVER loopback FOREIGN DATA WRAPPER postgres OPTIONS 
(keep_connections 'on');


Attaching v20 patch set. Now, server level option if provided
overrides the GUC.The GUC will be used only if server level option is
not provided. And also, both server level option and GUC are named the
same - "keep_connections".

Please have a look.


Attaching v21 patch set, rebased onto the latest master.


I agree to add the server-level option. But I'm still not sure if it's good 
idea to also expose that option as GUC. Isn't the server-level option enough 
for most cases?

Also it's strange to expose only this option as GUC while there are other many 
postgres_fdw options?

With v21-002 patch, even when keep_connections GUC is disabled, the existing 
open connections are not close immediately. Only connections used in the 
transaction are closed at the end of that transaction. That is, the existing 
connections that no transactions use will never be closed. I'm not sure if this 
behavior is intuitive for users.

Therefore for now I'm thinking to support the server-level option at first... 
Then if we find it's not enough for most cases in practice, I'd like to 
consider to expose postgres_fdw options including keep_connections as GUC.

Thought?

BTW these patches fail to be applied to the master because of commit 
27e1f14563. I updated and simplified the 003 patch. Patch attached.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/contrib/postgres_fdw/connection.c 
b/contrib/postgres_fdw/connection.c
index 54ab8edfab..6a61d83862 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -59,6 +59,8 @@ typedef struct ConnCacheEntry
boolhave_error; /* have any subxacts aborted in 
this xact? */
boolchanging_xact_state;/* xact state change in process 
*/
boolinvalidated;/* true if reconnect is pending */
+   boolkeep_connections;   /* setting value of 
keep_connections
+* 
server option */
Oid serverid;   /* foreign server OID 
used to get server name */
uint32  server_hashvalue;   /* hash value of foreign server 
OID */
uint32  mapping_hashvalue;  /* hash value of user mapping 
OID */
@@ -286,6 +288,7 @@ static void
 make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 {
ForeignServer *server = GetForeignServer(user->serverid);
+   ListCell   *lc;
 
Assert(entry->conn == NULL);
 
@@ -304,6 +307,26 @@ make_new_connection(ConnCacheEntry *entry, UserMapping 
*user)
  
ObjectIdGetDatum(user->umid));
memset(&entry->state, 0, sizeof(entry->state));
 
+   /*
+* Determine whether to keep the connection that we're about to make 
here
+* open even after the transaction using it ends, so that the subsequent
+* transactions can re-use it.
+*
+* It's enough to determine this only when making new connection because
+* all the connections to the foreign server whose keep_connections 
option
+* is changed will be closed and re-made later.
+*
+* By default, all the connections to any foreign servers are kept open.
+*/
+   entry->keep_connections = true;
+   foreach(lc, server->options)
+   {
+   DefElem*def = (DefElem *) lfirst(lc);
+
+   if (strcmp(def->defname, "keep_connections") == 0)
+   entry->keep_connections = defGetBoolean(def);
+   }
+
/* Now try to make the connection */
entry->conn = connect_pg_server(server, user);
 
@@ -970,14 +993,16 @@ pgfdw_xact_callback(XactEvent event, void *arg)
entry->xact_depth = 0;
 
/*
-* If the connection isn't in a good idle state or it is marked 
as
-* invalid, then discard it to recover. Next GetConnection will 
open a
-* new connection.
+* If the connection isn't in a good idle state, it is marked as
+* inv

Re: Asynchronous Append on postgres_fdw nodes.

2021-04-01 Thread Tom Lane

Etsuro Fujita  writes:
> Pushed.

The buildfarm points out that this fails under valgrind.
I easily reproduced it here:

==00:00:03:42.115 3410499== Syscall param epoll_wait(events) points to 
unaddressable byte(s)
==00:00:03:42.115 3410499==at 0x58E926B: epoll_wait (epoll_wait.c:30)
==00:00:03:42.115 3410499==by 0x7FC903: WaitEventSetWaitBlock (latch.c:1452)
==00:00:03:42.115 3410499==by 0x7FC903: WaitEventSetWait (latch.c:1398)
==00:00:03:42.115 3410499==by 0x6BF46C: ExecAppendAsyncEventWait 
(nodeAppend.c:1025)
==00:00:03:42.115 3410499==by 0x6BF667: ExecAppendAsyncGetNext 
(nodeAppend.c:915)
==00:00:03:42.115 3410499==by 0x6BF667: ExecAppend (nodeAppend.c:337)
==00:00:03:42.115 3410499==by 0x6D49E4: ExecProcNode (executor.h:257)
==00:00:03:42.115 3410499==by 0x6D49E4: ExecModifyTable 
(nodeModifyTable.c:)
==00:00:03:42.115 3410499==by 0x6A87F2: ExecProcNode (executor.h:257)
==00:00:03:42.115 3410499==by 0x6A87F2: ExecutePlan (execMain.c:1531)
==00:00:03:42.115 3410499==by 0x6A87F2: standard_ExecutorRun 
(execMain.c:350)
==00:00:03:42.115 3410499==by 0x82597F: ProcessQuery (pquery.c:160)
==00:00:03:42.115 3410499==by 0x825BE9: PortalRunMulti (pquery.c:1267)
==00:00:03:42.115 3410499==by 0x826826: PortalRun (pquery.c:779)
==00:00:03:42.115 3410499==by 0x82291E: exec_simple_query (postgres.c:1185)
==00:00:03:42.115 3410499==by 0x823F3E: PostgresMain (postgres.c:4415)
==00:00:03:42.115 3410499==by 0x79BAC1: BackendRun (postmaster.c:4483)
==00:00:03:42.115 3410499==by 0x79BAC1: BackendStartup (postmaster.c:4205)
==00:00:03:42.115 3410499==by 0x79BAC1: ServerLoop (postmaster.c:1737)
==00:00:03:42.115 3410499==  Address 0x10d10628 is 7,960 bytes inside a 
recently re-allocated block of size 8,192 alloc'd
==00:00:03:42.115 3410499==at 0x4C30F0B: malloc (vg_replace_malloc.c:307)
==00:00:03:42.115 3410499==by 0x94F9EA: AllocSetAlloc (aset.c:919)
==00:00:03:42.115 3410499==by 0x957BAF: MemoryContextAlloc (mcxt.c:809)
==00:00:03:42.115 3410499==by 0x958CC0: MemoryContextStrdup (mcxt.c:1179)
==00:00:03:42.115 3410499==by 0x516AE4: untransformRelOptions 
(reloptions.c:1336)
==00:00:03:42.115 3410499==by 0x6E6ADF: GetForeignTable (foreign.c:273)
==00:00:03:42.115 3410499==by 0xF3BD470: postgresBeginForeignScan 
(postgres_fdw.c:1479)
==00:00:03:42.115 3410499==by 0x6C2E83: ExecInitForeignScan 
(nodeForeignscan.c:236)
==00:00:03:42.115 3410499==by 0x6AF893: ExecInitNode (execProcnode.c:283)
==00:00:03:42.115 3410499==by 0x6C0007: ExecInitAppend (nodeAppend.c:232)
==00:00:03:42.115 3410499==by 0x6AFA37: ExecInitNode (execProcnode.c:180)
==00:00:03:42.115 3410499==by 0x6D533A: ExecInitModifyTable 
(nodeModifyTable.c:2575)

==00:00:03:44.907 3410499== Syscall param epoll_wait(events) points to 
unaddressable byte(s)
==00:00:03:44.907 3410499==at 0x58E926B: epoll_wait (epoll_wait.c:30)
==00:00:03:44.907 3410499==by 0x7FC903: WaitEventSetWaitBlock (latch.c:1452)
==00:00:03:44.907 3410499==by 0x7FC903: WaitEventSetWait (latch.c:1398)
==00:00:03:44.907 3410499==by 0x6BF46C: ExecAppendAsyncEventWait 
(nodeAppend.c:1025)
==00:00:03:44.907 3410499==by 0x6BF718: ExecAppend (nodeAppend.c:370)
==00:00:03:44.907 3410499==by 0x6D49E4: ExecProcNode (executor.h:257)
==00:00:03:44.907 3410499==by 0x6D49E4: ExecModifyTable 
(nodeModifyTable.c:)
==00:00:03:44.907 3410499==by 0x6A87F2: ExecProcNode (executor.h:257)
==00:00:03:44.907 3410499==by 0x6A87F2: ExecutePlan (execMain.c:1531)
==00:00:03:44.907 3410499==by 0x6A87F2: standard_ExecutorRun 
(execMain.c:350)
==00:00:03:44.907 3410499==by 0x82597F: ProcessQuery (pquery.c:160)
==00:00:03:44.907 3410499==by 0x825BE9: PortalRunMulti (pquery.c:1267)
==00:00:03:44.907 3410499==by 0x826826: PortalRun (pquery.c:779)
==00:00:03:44.907 3410499==by 0x82291E: exec_simple_query (postgres.c:1185)
==00:00:03:44.907 3410499==by 0x823F3E: PostgresMain (postgres.c:4415)
==00:00:03:44.907 3410499==by 0x79BAC1: BackendRun (postmaster.c:4483)
==00:00:03:44.907 3410499==by 0x79BAC1: BackendStartup (postmaster.c:4205)
==00:00:03:44.907 3410499==by 0x79BAC1: ServerLoop (postmaster.c:1737)
==00:00:03:44.907 3410499==  Address 0x1093fdd8 is 2,904 bytes inside a 
recently re-allocated block of size 16,384 alloc'd
==00:00:03:44.907 3410499==at 0x4C30F0B: malloc (vg_replace_malloc.c:307)
==00:00:03:44.907 3410499==by 0x94F9EA: AllocSetAlloc (aset.c:919)
==00:00:03:44.907 3410499==by 0x958233: palloc (mcxt.c:964)
==00:00:03:44.907 3410499==by 0x69C400: ExprEvalPushStep (execExpr.c:2310)
==00:00:03:44.907 3410499==by 0x69C541: ExecPushExprSlots (execExpr.c:2490)
==00:00:03:44.907 3410499==by 0x69C580: ExecInitExprSlots (execExpr.c:2445)
==00:00:03:44.907 3410499==by 0x69F0DD: ExecInitQual (execExpr.c:231)
==00:00:03:44.907 3410499==by 0x6D80EF: ExecInitSeqScan (nodeSeqscan.c:172)
==00:

Re: pg_amcheck contrib application

2021-04-01 Thread Robert Haas

On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger
 wrote:
> These changes got lost between v11 and v12.  I've put them back, as well as 
> updating to use your language.

Here's an updated patch that includes your 0001 and 0002 plus a bunch
of changes by me. I intend to commit this version unless somebody
spots a problem with it.

Here are the things I changed:

- I renamed tuple_can_be_pruned to tuple_could_be_pruned because I
think it does a better job suggesting that we're uncertain about what
will happen.

- I got rid of bool header_garbled and changed it to bool result,
inverting the sense, because it didn't seem useful to have a function
that ended with if (some_boolean) return false; return true when I
could end the function with return some_other_boolean.

- I removed all the one-word comments that said /* corrupt */ or /*
checkable */ because they seemed redundant.

- In the xmin_status section of check_tuple_visibility(), I got rid of
the xmin_status == XID_IS_CURRENT_XID and xmin_status ==
XID_IN_PROGRESS cases because they were redundant with the xmin_status
!= XID_COMMITTED case.

- If xmax is a multi but seems to be garbled, I changed it to return
true rather than false. The inserter is known to have committed by
that point, so I think it's OK to try to deform the tuple. We just
shouldn't try to check TOAST.

- I changed both switches over xmax_status to break in each case and
then return true after instead of returning true for each case. I
think this is clearer.

- I changed get_xid_status() to perform the TransactionIdIs... checks
in the same order that HeapTupleSatisfies...() does them. I believe
that it's incorrect to conclude that the transaction must be in
progress because it neither IsCurrentTransaction nor DidCommit nor
DidAbort, because all of those things will be false for a transaction
that is running at the time of a system crash. The correct rule is
that if it neither IsCurrentTransaction nor IsInProgress nor DidCommit
then it's aborted.

- I moved a few comments and rewrote some others, including some of
the ones that you took from my earlier draft patch. The thing is, the
comment needs to be adjusted based on where you put it. Like, I had a
comment that says"It should be impossible for xvac to still be
running, since we've removed all that code, but even if it were, it
ought to be safe to read the tuple, since the original inserter must
have committed.  But, if the xvac transaction committed, this tuple
(and its associated TOAST tuples) could be pruned at any time." which
in my version was right before a TransactionIdDidCommit() test, and
explains why that test is there and why the code does what it does as
a result. But in your version you've moved it to a place where we've
already tested that the transaction has committed, and more
importantly, where we've already tested that it's not still running.
Saying that it "should" be impossible for it not to be running when
we've *actually checked that* doesn't make nearly as much sense as it
does when we haven't checked that and aren't going to do so.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v15-0001-amcheck-Fix-verify_heapam-s-tuple-visibility-che.patch
Description: Binary data

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-01 Thread Julien Rouhaud

On Wed, Mar 31, 2021 at 11:18:45AM -0300, Alvaro Herrera wrote:
> On 2021-Mar-31, Bruce Momjian wrote:
> > 
> > I assume it is since Alvaro didn't reply.  I am planning to apply this
> > soon.
> 
> I'm afraid I don't know enough about how parallel query works to make a
> good assessment on this being a good approach or not -- and no time at
> present to figure it all out.

I'm far from being an expert either, but at the time I wrote it and
looking at the code around it probably seemed sensible.  We could directly call
pgstat_get_my_queryid() in ExecSerializePlan() rather than passing it from the
various callers though, at least there would be a single source for it.

Re: ModifyTable overheads in generic plans

2021-04-01 Thread Amit Langote

On Thu, Apr 1, 2021 at 10:12 PM Amit Langote  wrote:
>
> On Thu, Apr 1, 2021 at 3:12 AM Tom Lane  wrote:
> > Amit Langote  writes:
> > > [ v14-0002-Initialize-result-relation-information-lazily.patch ]
> > Needs YA rebase over 86dc90056.
>
> Done.  I will post the updated results for -Mprepared benchmarks I did
> in the other thread shortly.

Test details:

pgbench -n -T60 -Mprepared -f nojoin.sql

nojoin.sql:

\set a random(1, 100)
update test_table t set b = :a where a = :a;

* test_table has 40 columns and partitions as shown below
* plan_cache_mode = force_generic_plan

Results:

nparts  master  patched

64  626217118
128 344912082
256 17227643
1024359 2099

* tps figures shown are the median of 3 runs.

So, drastic speedup can be seen by even just not creating
ResultRelInfos for child relations that are not updated, as the patch
does.  I haven't yet included any changes for AcquireExecutorLocks()
and ExecCheckRTPerms() bottlenecks that still remain and cause the
drop in tps as partition count increases.

--
Amit Langote
EDB: http://www.enterprisedb.com

Re: libpq debug log

2021-04-01 Thread Tom Lane

BTW, what in the world is this supposed to accomplish?

-(long long) rows_to_send);
+(1L << 62) + (long long) rows_to_send);

Various buildfarm members are complaining that the shift distance
is more than the width of "long", which I'd just fix with s/L/LL/
except that the addition of the constant seems just wrong to
begin with.

regards, tom lane

Re: Rename of triggers for partitioned tables

2021-04-01 Thread Arne Roland

Alvaro Herrera wrote:
> I think the child cannot not have a corresponding trigger.  If you try
> to drop the trigger on child, it should say that it cannot be dropped
> because the trigger on parent requires it.  So I don't think there's any
> case where altering name of trigger in parent would raise an user-facing
> error.  If you can't find the trigger in child, that's a case of catalog
> corruption [...]

Ah, I didn't know that. That changes my reasoning about that. I only knew, that 
the alter table ... detach partition ... doesn't detach the child trigger from 
the partitioned trigger, but the the attach partition seem to add one. Maybe we 
should be able to make sure that there is a one to one correspondence for child 
triggers and child partitions.

Currently upon detaching we keep the trigger within the partitioned trigger of 
the parent relation. (So the trigger remains in place and can only be dropped 
with the parent one.) Only we try to attach the partition again any partitioned 
triggers are simply removed.

One idea would be to detach partitioned triggers there in a similar manner we 
detach partitioned indexes. That could give above guarantee. (At least if one 
would be willing to write a migration for that.) But then we can't tell anymore 
where the trigger stems from. That hence would break our attach/detach workflow.

I was annoyed by the inability to drop partitioned triggers from detached 
partitions, but it seems I just got a workaround. Ugh.

This is a bit of a sidetrack discussion, but it is related.

> Consider this.  You have table p and partition p1.  Add trigger t to p,
> and do
> ALTER TRIGGER t ON p1 RENAME TO q;

> What I'm saying is that if you later do
> ALTER TRIGGER t ON ONLY p RENAME TO r;
> then the trigger on parent is changed, and the trigger on child stays q.
> If you later do
> ALTER TRIGGER r ON p RENAME TO s;
> then triggers on both tables end up with name s.

If we get into the mindset of expecting one trigger on the child, we have the 
following edge case:

- The trigger is renamed. You seem to favor the silent rename in that case over 
raising a notice (or even an error). I am not convinced a notice wouldn't the 
better in that case.
- The trigger is outside of partitioned table. That still means that the 
trigger might still be in the inheritance tree, but likely isn't. Should we 
rename it anyways, or skip that? Should be raise a notice about what we are 
doing here? I think that would be helpful to the end user.

> Hmm, ok, maybe this is sufficient, I didn't actually test it.  I think
> you do need to add a few test cases to ensure everything is sane.  Make
> sure to verify what happens if you have multi-level partitioning
> (grandparent, parent, child) and you ALTER the one in the middle.  Also
> things like if parent has two children and one child has multiple
> children.

The test I had somewhere upwards this thread, already tested that.

I am not sure how to add a test for pg_dump though. Can you point me to the 
location in pg_regress for pg_dump?

> Also, please make sure that it works correctly with INHERITS (legacy
> inheritance).  We probably don't want to cause recursion in that case.

That works, but I will add a test to make sure it stays that way. Thank you for 
your input!

Regards
Arne

Re: sepgsql logging

2021-04-01 Thread Dave Page

On Thu, Apr 1, 2021 at 3:23 PM Tom Lane  wrote:

> Andrew Dunstan  writes:
> > On 4/1/21 8:32 AM, Dave Page wrote:
> >> It seems to me that sepgsql should also log the denial, but flag that
> >> permissive mode is on.
>
> > +1 for doing what selinux does if possible.
>
> +1.  If selinux itself is doing that, it's hard to see a reason why
> we should not; and I concur that the info is useful.
>

Thanks both. I'll take a look at the code and see if I can whip up a patch
(it'll be a week or so as I'm taking some time off for Easter).

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

Re: sepgsql logging

2021-04-01 Thread Tom Lane

Andrew Dunstan  writes:
> On 4/1/21 8:32 AM, Dave Page wrote:
>> It seems to me that sepgsql should also log the denial, but flag that
>> permissive mode is on.

> +1 for doing what selinux does if possible.

+1.  If selinux itself is doing that, it's hard to see a reason why
we should not; and I concur that the info is useful.

regards, tom lane

Re: [POC] verifying UTF-8 using SIMD instructions

2021-04-01 Thread John Naylor

v9 is just a rebase.

--
John Naylor
EDB: http://www.enterprisedb.com
From e876049ad3b153e8725ab23f65ae8f021a970470 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Thu, 1 Apr 2021 08:24:05 -0400
Subject: [PATCH v9] Replace pg_utf8_verifystr() with two faster
 implementations:

On x86-64, we use SSE 4.1 with a lookup algorithm based on "Validating
UTF-8 In Less Than One Instruction Per Byte" by John Keiser and Daniel
Lemire. Since configure already tests for SSE 4.2 for CRC, we piggy-back
on top of that. The lookup tables are taken from the simdjson library
(Apache 2.0 licensed), but the code is written from scratch using
simdjson as a reference.

On other platforms, we still get a performance boost by using a bespoke
fallback function, rather than one that relies on pg_utf8_verifychar()
and pg_utf8_isvalid(). This one is loosely based on the fallback that
is part of the simdjson library.
---
 config/c-compiler.m4 |  28 +-
 configure| 114 +++--
 configure.ac |  61 ++-
 src/Makefile.global.in   |   3 +
 src/common/wchar.c   |  27 +-
 src/include/pg_config.h.in   |   9 +
 src/include/port/pg_utf8.h   |  86 
 src/port/Makefile|   6 +
 src/port/pg_utf8_fallback.c  | 129 ++
 src/port/pg_utf8_sse42.c | 537 +++
 src/port/pg_utf8_sse42_choose.c  |  68 +++
 src/test/regress/expected/conversion.out |  52 +++
 src/test/regress/sql/conversion.sql  |  28 ++
 src/tools/msvc/Mkvcbuild.pm  |   4 +
 src/tools/msvc/Solution.pm   |   3 +
 15 files changed, 1088 insertions(+), 67 deletions(-)
 create mode 100644 src/include/port/pg_utf8.h
 create mode 100644 src/port/pg_utf8_fallback.c
 create mode 100644 src/port/pg_utf8_sse42.c
 create mode 100644 src/port/pg_utf8_sse42_choose.c

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 780e906ecc..b1604eac58 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -591,36 +591,46 @@ if test x"$pgac_cv_gcc_atomic_int64_cas" = x"yes"; then
   AC_DEFINE(HAVE_GCC__ATOMIC_INT64_CAS, 1, [Define to 1 if you have __atomic_compare_exchange_n(int64 *, int64 *, int64).])
 fi])# PGAC_HAVE_GCC__ATOMIC_INT64_CAS
 
-# PGAC_SSE42_CRC32_INTRINSICS
+# PGAC_SSE42_INTRINSICS
 # ---
 # Check if the compiler supports the x86 CRC instructions added in SSE 4.2,
 # using the _mm_crc32_u8 and _mm_crc32_u32 intrinsic functions. (We don't
 # test the 8-byte variant, _mm_crc32_u64, but it is assumed to be present if
 # the other ones are, on x86-64 platforms)
 #
+# While at it, check for support x86 instructions added in SSSE3 and SSE4.1,
+# in particular _mm_alignr_epi8, _mm_shuffle_epi8, and _mm_testz_si128.
+# We should be able to assume these are understood by the compiler if CRC
+# intrinsics are, but it's better to document our reliance on them here.
+#
+# We don't test for SSE2 intrinsics, as they are assumed to be present on
+# x86-64 platforms, which we can easily check at compile time.
+#
 # An optional compiler flag can be passed as argument (e.g. -msse4.2). If the
-# intrinsics are supported, sets pgac_sse42_crc32_intrinsics, and CFLAGS_SSE42.
-AC_DEFUN([PGAC_SSE42_CRC32_INTRINSICS],
-[define([Ac_cachevar], [AS_TR_SH([pgac_cv_sse42_crc32_intrinsics_$1])])dnl
-AC_CACHE_CHECK([for _mm_crc32_u8 and _mm_crc32_u32 with CFLAGS=$1], [Ac_cachevar],
+# intrinsics are supported, sets pgac_sse42_intrinsics, and CFLAGS_SSE42.
+AC_DEFUN([PGAC_SSE42_INTRINSICS],
+[define([Ac_cachevar], [AS_TR_SH([pgac_cv_sse42_intrinsics_$1])])dnl
+AC_CACHE_CHECK([for for _mm_crc32_u8, _mm_crc32_u32, _mm_alignr_epi8, _mm_shuffle_epi8, and _mm_testz_si128 with CFLAGS=$1], [Ac_cachevar],
 [pgac_save_CFLAGS=$CFLAGS
 CFLAGS="$pgac_save_CFLAGS $1"
 AC_LINK_IFELSE([AC_LANG_PROGRAM([#include ],
   [unsigned int crc = 0;
crc = _mm_crc32_u8(crc, 0);
crc = _mm_crc32_u32(crc, 0);
+   __m128i vec = _mm_set1_epi8(crc);
+   vec = _mm_shuffle_epi8(vec,
+ _mm_alignr_epi8(vec, vec, 1));
/* return computed value, to prevent the above being optimized away */
-   return crc == 0;])],
+   return _mm_testz_si128(vec, vec);])],
   [Ac_cachevar=yes],
   [Ac_cachevar=no])
 CFLAGS="$pgac_save_CFLAGS"])
 if test x"$Ac_cachevar" = x"yes"; then
   CFLAGS_SSE42="$1"
-  pgac_sse42_crc32_intrinsics=yes
+  pgac_sse42_intrinsics=yes
 fi
 undefine([Ac_cachevar])dnl
-])# PGAC_SSE42_CRC32_INTRINSICS
-
+])# PGAC_SSE42_INTRINSICS
 
 # PGAC_ARMV8_CRC32C_INTRINSICS
 # 
diff --git a/configure b/configure
index 06ad9aeb71..4d70f10fab 100755
--- a/configure
+++ b/configure
@@ -645,6 +645,7 @@ XGETTEXT
 MSGMERGE
 MSGFMT_FLAGS
 MSGFMT
+PG_UTF8_OBJS
 PG_CRC32C_OBJS
 CFLAGS_ARMV8_CRC32C
 CFLAGS_SSE42
@@ -17963,14 +17964,14 @@ $as_echo "#define HAVE__CPUID 1" >>confdefs.h
 
 fi
 
-# Check for Intel SSE 4.2 intrinsics to do CRC calcul

Re: sepgsql logging

2021-04-01 Thread Andrew Dunstan



On 4/1/21 8:32 AM, Dave Page wrote:
> Hi
>
> I've been trying to figure out selinux with sepgsql (which is proving
> quite difficult as there is an almost total lack of
> documentation/blogs etc. on the topic) and ran into an issue. Whilst
> my system had selinux in enforcing mode, I mistakenly had sepgsql in
> permissive mode. I created a table and restricted access to one column
> to regular users using the label
> system_u:object_r:sepgsql_secret_table_t:s0. Because sepgsql was in
> permissive mode, my test user could still access the restricted column.
>
> Postgres logged this:
>
> 2021-03-31 17:12:29.713 BST [3917] LOG:  SELinux: allowed { select }
> scontext=user_u:user_r:user_t:s0
> tcontext=system_u:object_r:sepgsql_secret_table_t:s0 tclass=db_column
> name="column private of table t1"
>
> That's very confusing, because the norm in selinux is to log denials
> as if the system were in enforcing mode, but then allow the action to
> proceed anyway, when in permissive mode. For example, log entries such
> as this are created when my restricted user tries to run an executable
> from /tmp after running "setsebool -P user_exec_content off":
>
> type=AVC msg=audit(1617278924.917:484): avc:  denied  { execute } for
>  pid=53036 comm="bash" name="ls" dev="dm-0" ino=319727
> scontext=user_u:user_r:user_t:s0
> tcontext=user_u:object_r:user_tmp_t:s0 tclass=file permissive=1
>
> The point being to let the admin know what would fail if the system
> were switched to enforcing mode. Whilst that wasn't the point of what
> I was trying to do, such a message would have indicated to me that I
> was in permissive mode without realising.
>
> It seems to me that sepgsql should also log the denial, but flag that
> permissive mode is on.
>
> Any reason not to do that?


+1 for doing what selinux does if possible.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Support for NSS as a libpq TLS backend

2021-04-01 Thread Stephen Frost

Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Wed, Mar 31, 2021 at 10:15:15PM +, Jacob Champion wrote:
> > I think we're going to need some analogue to PQinitOpenSSL() to help
> > client applications cut through the mess, but I'm not sure what it
> > should look like, or how we would maintain any sort of API
> > compatibility between the two flavors. And does libpq already have some
> > notion of a "main thread" that I'm missing?
> 
> Nope as far as I recall.  With OpenSSL, the initialization of the SSL
> mutex lock and the crypto callback initialization is done by the first
> thread in.

Yeah, we haven't got any such concept in libpq.  I do think that some of
this can simply be documented as "if you do this, then you need to make
sure to do this".

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: libpq debug log

2021-04-01 Thread Tom Lane

"'alvhe...@alvh.no-ip.org'"  writes:
> Eh, so I forgot to strdup(optarg[optind]).  Apparently that works fine
> in glibc but other getopt implementations are likely not so friendly.

Hmm ... interesting theory, but I don't think I buy it, since the
program isn't doing anything that should damage argv[].  I guess
we'll soon find out.

regards, tom lane

Re: Flaky vacuum truncate test in reloptions.sql

2021-04-01 Thread Masahiko Sawada

On Thu, Apr 1, 2021 at 5:49 PM Michael Paquier  wrote:
>
> On Thu, Apr 01, 2021 at 10:58:25AM +0300, Arseny Sher wrote:
> > How about the attached?

Thank you for updating the patch!

> Sounds fine to me.  Sawada-san?

Looks good to me too.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Crash in BRIN minmax-multi indexes

2021-04-01 Thread Zhihong Yu

Hi, Tomas:
Thanks for the correction.

I think switching to interval_cmp_value() would be better (with a comment
explaining why).

Cheers

On Thu, Apr 1, 2021 at 6:23 AM Tomas Vondra 
wrote:

> On 4/1/21 3:09 PM, Zhihong Yu wrote:
> > Hi,
> > Can you try this patch ?
> >
> > Thanks
> >
> > diff --git a/src/backend/access/brin/brin_minmax_multi.c
> > b/src/backend/access/brin/brin_minmax_multi.c
> > index 70109960e8..25d6d2e274 100644
> > --- a/src/backend/access/brin/brin_minmax_multi.c
> > +++ b/src/backend/access/brin/brin_minmax_multi.c
> > @@ -2161,7 +2161,7 @@
> brin_minmax_multi_distance_interval(PG_FUNCTION_ARGS)
> >  delta = 24L * 3600L * delta;
> >
> >  /* and add the time part */
> > -delta += result->time / (float8) 100.0;
> > +delta += (result->time + result->zone * USECS_PER_SEC) / (float8)
> > 100.0;
> >
>
> That won't work, because Interval does not have a "zone" field, so this
> won't even compile.
>
> The problem is that interval comparisons convert the value using 30 days
> per month (see interval_cmp_value), but the formula in this function
> uses 31. So either we can tweak that (seems to fix it for me), or maybe
> just switch to interval_cmp_value directly.
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: Crash in BRIN minmax-multi indexes

2021-04-01 Thread Tomas Vondra

On 4/1/21 3:09 PM, Zhihong Yu wrote:
> Hi,
> Can you try this patch ?
> 
> Thanks
> 
> diff --git a/src/backend/access/brin/brin_minmax_multi.c
> b/src/backend/access/brin/brin_minmax_multi.c
> index 70109960e8..25d6d2e274 100644
> --- a/src/backend/access/brin/brin_minmax_multi.c
> +++ b/src/backend/access/brin/brin_minmax_multi.c
> @@ -2161,7 +2161,7 @@ brin_minmax_multi_distance_interval(PG_FUNCTION_ARGS)
>      delta = 24L * 3600L * delta;
> 
>      /* and add the time part */
> -    delta += result->time / (float8) 100.0;
> +    delta += (result->time + result->zone * USECS_PER_SEC) / (float8)
> 100.0;
> 

That won't work, because Interval does not have a "zone" field, so this
won't even compile.

The problem is that interval comparisons convert the value using 30 days
per month (see interval_cmp_value), but the formula in this function
uses 31. So either we can tweak that (seems to fix it for me), or maybe
just switch to interval_cmp_value directly.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: truncating timestamps on arbitrary intervals

2021-04-01 Thread Salek Talangi

Hi all,

it might be a bit late now, but do you know that TimescaleDB already has a
similar feature, named time_bucket?
https://docs.timescale.com/latest/api#time_bucket
Perhaps that can help with some design decisions.
I saw your feature on Depesz' "Waiting for PostgreSQL 14" and remembered
reading about it just two days ago.

Best regards
Salek Talangi

Am Do., 1. Apr. 2021 um 13:31 Uhr schrieb John Naylor <
john.nay...@enterprisedb.com>:

> On Sat, Mar 27, 2021 at 1:06 PM Justin Pryzby 
> wrote:
> >
> > The current docs seem to be missing a "synopsis", like
> >
> > +
> > +date_trunc(stride,
> timestamp, origin)
> > +
>
> The attached
> - adds a synopsis
> - adds a bit more description to the parameters similar to those in
> date_trunc
> - documents that negative intervals are treated the same as positive ones
>
> Note on the last point: This just falls out of the math, so was not
> deliberate, but it seems fine to me. We could ban negative intervals, but
> that would possibly just inconvenience some people unnecessarily. We could
> also treat negative strides differently somehow, but I don't immediately
> see a useful and/or intuitive change in behavior to come of that.
>
> --
> John Naylor
> EDB: http://www.enterprisedb.com
>

Re: New IndexAM API controlling index vacuum strategies

2021-04-01 Thread Robert Haas

On Wed, Mar 31, 2021 at 9:44 PM Masahiko Sawada  wrote:
> > But let me think about it...I suppose we could do it when one-pass
> > VACUUM considers vacuuming a range of FSM pages every
> > VACUUM_FSM_EVERY_PAGES. That's kind of similar to index vacuuming, in
> > a way -- it wouldn't be too bad to check for emergencies in the same
> > way there.
>
> Yeah, I also thought that would be a good place to check for
> emergencies. That sounds reasonable.

Without offering an opinion on this particular implementation choice,
+1 for the idea of trying to make the table-with-indexes and the
table-without-indexes cases work in ways that will feel similar to the
user. Tables without indexes are probably rare in practice, but if
some behaviors are implemented for one case and not the other, it will
probably be confusing. One thought here is that it might help to try
to write documentation for whatever behavior you choose. If it's hard
to document without weasel-words, maybe it's not the best approach.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: ModifyTable overheads in generic plans

2021-04-01 Thread Amit Langote

On Thu, Apr 1, 2021 at 3:12 AM Tom Lane  wrote:
> Amit Langote  writes:
> > [ v14-0002-Initialize-result-relation-information-lazily.patch ]
> Needs YA rebase over 86dc90056.

Done.  I will post the updated results for -Mprepared benchmarks I did
in the other thread shortly.

--
Amit Langote
EDB: http://www.enterprisedb.com


v15-0001-Set-ForeignScanState.resultRelInfo-lazily.patch
Description: Binary data


v15-0002-Initialize-result-relation-information-lazily.patch
Description: Binary data

Re: Crash in BRIN minmax-multi indexes

2021-04-01 Thread Zhihong Yu

Hi,
Can you try this patch ?

Thanks

diff --git a/src/backend/access/brin/brin_minmax_multi.c
b/src/backend/access/brin/brin_minmax_multi.c
index 70109960e8..25d6d2e274 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2161,7 +2161,7 @@ brin_minmax_multi_distance_interval(PG_FUNCTION_ARGS)
 delta = 24L * 3600L * delta;

 /* and add the time part */
-delta += result->time / (float8) 100.0;
+delta += (result->time + result->zone * USECS_PER_SEC) / (float8)
100.0;

 Assert(delta >= 0);

On Wed, Mar 31, 2021 at 11:25 PM Jaime Casanova <
jcasa...@systemguards.com.ec> wrote:

> On Wed, Mar 31, 2021 at 6:19 PM Jaime Casanova
>  wrote:
> >
> > On Wed, Mar 31, 2021 at 5:25 PM Tomas Vondra
> >  wrote:
> > >
> > > Hi,
> > >
> > > I think I found the issue - it's kinda obvious, really. We need to
> > > consider the timezone, because the "time" parts alone may be sorted
> > > differently. The attached patch should fix this, and it also fixes a
> > > similar issue in the inet data type.
> > >
> >
> > ah! yeah! obvious... if you say so ;)
> >
> > > As for why the regression tests did not catch this, it's most likely
> > > because the data is likely generated in "nice" ordering, or something
> > > like that. I'll see if I can tweak the ordering to trigger these issues
> > > reliably, and I'll do a bit more randomized testing.
> > >
> > > There's also the question of rounding errors, which I think might cause
> > > random assert failures (but in practice it's harmless, in the worst
> case
> > > we'll merge the ranges a bit differently).
> > >
> > >
> >
> > I can confirm this fixes the crash in the query I showed and the
> original case.
> >
>
> But I found another, but similar issue.
>
> ```
> update public.brintest_multi set
>   intervalcol = (select pg_catalog.avg(intervalcol) from
> public.brintest_bloom)
> ;
> ```
>
> BTW, i can reproduce just by executing "make installcheck" and
> immediately execute that query
>
> --
> Jaime Casanova
> Director de Servicios Profesionales
> SYSTEMGUARDS - Consultores de PostgreSQL
>

Re: Replication slot stats misgivings

2021-04-01 Thread Masahiko Sawada

On Tue, Mar 30, 2021 at 9:58 AM Andres Freund  wrote:
>
> IMO, independent of the shutdown / startup issue, it'd be worth writing
> a patch tracking the bytes sent independently of the slot stats storage
> issues. That would also make the testing for the above cheaper...

Agreed.

I think the bytes sent should be recorded by the decoding plugin, not
by the core side. Given that table filtering and row filtering,
tracking the bytes passed to the decoding plugin would not help gauge
the actual network I/O. In that sense, the description of stream_bytes
in the doc seems not accurate:

---
This and other streaming counters for this slot can be used to gauge
the network I/O which occurred during logical decoding and allow
tuning logical_decoding_work_mem.
---

It can surely be used to allow tuning logical_decoding_work_mem but it
could not be true for gauging the network I/O which occurred during
logical decoding.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: libpq debug log

2021-04-01 Thread 'alvhe...@alvh.no-ip.org'

On 2021-Mar-31, Tom Lane wrote:

> I wrote:
> > That is weird - only test 4 (of 8) runs at all, the rest seem to
> > fail to connect.  What's different about pipelined_insert?
> 
> Oh ... there's a pretty obvious theory.  pipelined_insert is
> the only one that is not asked to write a trace file.
> So for some reason, opening the trace file fails.
> (I wonder why we don't see an error message for this though.)

Eh, so I forgot to strdup(optarg[optind]).  Apparently that works fine
in glibc but other getopt implementations are likely not so friendly.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"I am amazed at [the pgsql-sql] mailing list for the wonderful support, and
lack of hesitasion in answering a lost soul's question, I just wished the rest
of the mailing list could be like this."   (Fotis)
   (http://archives.postgresql.org/pgsql-sql/2006-06/msg00265.php)

Re: Add client connection check during the execution of the query

2021-04-01 Thread Bharath Rupireddy

On Thu, Apr 1, 2021 at 11:29 AM Thomas Munro  wrote:
>
> On Tue, Mar 30, 2021 at 10:00 AM Thomas Munro  wrote:
> > If we want to ship this in v14 we have to make a decision ASAP:
> >
> > 1.  Ship the POLLHUP patch (like v9) that only works reliably on
> > Linux.  Maybe disable the feature completely on other OSes?
> > 2.  Ship the patch that tries to read (like v7).  It should work on
> > all systems, but it can be fooled by pipelined commands (though it can
> > detect a pipelined 'X').
> >
> > Personally, I lean towards #2.
>
> I changed my mind.  Let's commit the pleasingly simple Linux-only
> feature for now, and extend to it to send some kind of no-op message
> in a later release.  So this is the version I'd like to go with.
> Objections?
>
> I moved the GUC into tcop/postgres.c and tcop/tcopprot.h, because it
> directly controls postgres.c's behaviour, not pqcomm.c's.  The latter
> only contains the code to perform the check.

Here's a minor comment: it would be good if we have an extra line
after variable assignments, before and after function calls/if
clauses, something like

+pollfd.revents = 0;
+
+rc = poll(&pollfd, 1, 0);
+
+if (rc < 0)

And also here
 }
+
+if (CheckClientConnectionPending)
+{
+CheckClientConnectionPending = false;

And
+}
+}
+
 if (ClientConnectionLost)

And
+0, 0, INT_MAX,
+check_client_connection_check_interval, NULL, NULL
+},
+
 /* End-of-list marker */

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

sepgsql logging

2021-04-01 Thread Dave Page

Hi

I've been trying to figure out selinux with sepgsql (which is proving quite
difficult as there is an almost total lack of documentation/blogs etc. on
the topic) and ran into an issue. Whilst my system had selinux in enforcing
mode, I mistakenly had sepgsql in permissive mode. I created a table and
restricted access to one column to regular users using the label
system_u:object_r:sepgsql_secret_table_t:s0. Because sepgsql was in
permissive mode, my test user could still access the restricted column.

Postgres logged this:

2021-03-31 17:12:29.713 BST [3917] LOG:  SELinux: allowed { select }
scontext=user_u:user_r:user_t:s0
tcontext=system_u:object_r:sepgsql_secret_table_t:s0 tclass=db_column
name="column private of table t1"

That's very confusing, because the norm in selinux is to log denials as if
the system were in enforcing mode, but then allow the action to proceed
anyway, when in permissive mode. For example, log entries such as this are
created when my restricted user tries to run an executable from /tmp after
running "setsebool -P user_exec_content off":

type=AVC msg=audit(1617278924.917:484): avc:  denied  { execute } for
 pid=53036 comm="bash" name="ls" dev="dm-0" ino=319727
scontext=user_u:user_r:user_t:s0 tcontext=user_u:object_r:user_tmp_t:s0
tclass=file permissive=1

The point being to let the admin know what would fail if the system were
switched to enforcing mode. Whilst that wasn't the point of what I was
trying to do, such a message would have indicated to me that I was in
permissive mode without realising.

It seems to me that sepgsql should also log the denial, but flag that
permissive mode is on.

Any reason not to do that?

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

Re: Replication slot stats misgivings

2021-04-01 Thread Amit Kapila

On Thu, Apr 1, 2021 at 3:43 PM vignesh C  wrote:
>
> On Wed, Mar 31, 2021 at 11:32 AM vignesh C  wrote:
> >
> > On Tue, Mar 30, 2021 at 11:00 AM Andres Freund  wrote:
> > >
> > > Hi,
> > >
> > > On 2021-03-30 10:13:29 +0530, vignesh C wrote:
> > > > On Tue, Mar 30, 2021 at 6:28 AM Andres Freund  
> > > > wrote:
> > > > > Any chance you could write a tap test exercising a few of these cases?
> > > >
> > > > I can try to write a patch for this if nobody objects.
> > >
> > > Cool!
> > >
> >
> > Attached a patch which has the test for the first scenario.
> >
> > > > > E.g. things like:
> > > > >
> > > > > - create a few slots, drop one of them, shut down, start up, verify
> > > > >   stats are still sane
> > > > > - create a few slots, shut down, manually remove a slot, lower
> > > > >   max_replication_slots, start up
> > > >
> > > > Here by "manually remove a slot", do you mean to remove the slot
> > > > manually from the pg_replslot folder?
> > >
> > > Yep - thereby allowing max_replication_slots after the shutdown/start to
> > > be lower than the number of slots-stats objects.
> >
> > I have not included the 2nd test in the patch as the test fails with
> > following warnings and also displays the statistics of the removed
> > slot:
> > WARNING:  problem in alloc set Statistics snapshot: detected write
> > past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438
> > WARNING:  problem in alloc set Statistics snapshot: detected write
> > past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438
> >
> > This happens because the statistics file has an additional slot
> > present even though the replication slot was removed.  I felt this
> > issue should be fixed. I will try to fix this issue and send the
> > second test along with the fix.
>
> I felt from the statistics collector process, there is no way in which
> we can identify if the replication slot is present or not because the
> statistic collector process does not have access to shared memory.
> Anything that the statistic collector process does independently by
> traversing and removing the statistics of the replication slot
> exceeding the max_replication_slot has its drawback of removing some
> valid replication slot's statistics data.
> Any thoughts on how we can identify the replication slot which has been 
> dropped?
> Can someone point me to the shared stats patch link with which message
> loss can be avoided. I wanted to see a scenario where something like
> the slot is dropped but the statistics are not updated because of an
> immediate shutdown or server going down abruptly can occur or not with
> the shared stats patch.
>

I don't think it is easy to simulate a scenario where the 'drop'
message is dropped and I think that is why the test contains the step
to manually remove the slot. At this stage, you can probably provide a
test patch and a code-fix patch where it just drops the extra slots
from the stats file. That will allow us to test it with a shared
memory stats patch on which Andres and Horiguchi-San are working. If
we still continue to pursue with current approach then as Andres
suggested we might send additional information from
RestoreSlotFromDisk to keep it in sync.

-- 
With Regards,
Amit Kapila.

Re: DROP INDEX docs - explicit lock naming

2021-04-01 Thread Greg Rychlewski

Thanks! I apologize, I added a commitfest entry for this and failed to add
it to my message: https://commitfest.postgresql.org/33/3053/.

This is my first time submitting a patch and I'm not sure if it needs to be
deleted now or if you are supposed to add yourself as a committer.

On Thu, Apr 1, 2021 at 2:32 AM Michael Paquier  wrote:

> On Tue, Mar 30, 2021 at 11:29:17PM -0400, Greg Rychlewski wrote:
> > Thanks for pointing that out. I've attached a new patch with several
> other
> > updates where I felt confident the docs were referring to an ACCESS
> > EXCLUSIVE lock.
>
> Thanks, applied!  I have reviewed the whole and there is one place in
> vacuum.sgml that could switch "exclusive lock" to "SHARE UPDATE
> EXCLUSIVE lock" but I have left that out as it does not bring more
> clarity in the text.  The change in indexam.sgml was partially wrong
> as REINDEX CONCURRENTLY does not take an access exclusive lock, and I
> have tweaked a bit the wording of pgrowlocks.sgml.
> --
> Michael
>

Re: Failed assertion on standby while shutdown

2021-04-01 Thread Fujii Masao




On 2021/03/31 19:51, Maxim Orlov wrote:

On 2021-03-30 20:44, Maxim Orlov wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:    not tested

All the tests passed successfully.

The new status of this patch is: Ready for Committer


The patch is good. One note, should we put a comment about 
ShutdownRecoveryTransactionEnvironment not reentrant behaviour? Or maybe rename 
it to ShutdownRecoveryTransactionEnvironmentOnce?


+1 to add more comments into ShutdownRecoveryTransactionEnvironment().
I did that. What about the attached patch?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 22135d5e07..69077bd207 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -62,6 +62,9 @@ static volatile sig_atomic_t in_restore_command = false;
 static void StartupProcTriggerHandler(SIGNAL_ARGS);
 static void StartupProcSigHupHandler(SIGNAL_ARGS);
 
+/* Callbacks */
+static void StartupProcExit(int code, Datum arg);
+
 
 /* 
  * signal handler routines
@@ -183,6 +186,19 @@ HandleStartupProcInterrupts(void)
 }
 
 
+/* 
+ * signal handler routines
+ * 
+ */
+static void
+StartupProcExit(int code, Datum arg)
+{
+   /* Shutdown the recovery environment */
+   if (standbyState != STANDBY_DISABLED)
+   ShutdownRecoveryTransactionEnvironment();
+}
+
+
 /* --
  * Startup Process main entry point
  * --
@@ -190,6 +206,9 @@ HandleStartupProcInterrupts(void)
 void
 StartupProcessMain(void)
 {
+   /* Arrange to clean up at startup process exit */
+   on_shmem_exit(StartupProcExit, 0);
+
/*
 * Properly accept or ignore signals the postmaster might send us.
 */
diff --git a/src/backend/storage/ipc/standby.c 
b/src/backend/storage/ipc/standby.c
index 17de5a6d0e..1465ee44a1 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -127,10 +127,25 @@ InitRecoveryTransactionEnvironment(void)
  *
  * Prepare to switch from hot standby mode to normal operation. Shut down
  * recovery-time transaction tracking.
+ *
+ * This must be called even in shutdown of startup process if transaction
+ * tracking has been initialized. Otherwise some locks the tracked
+ * transactions were holding will not be released and and may interfere with
+ * the processes still running (but will exit soon later) at the exit of
+ * startup process.
  */
 void
 ShutdownRecoveryTransactionEnvironment(void)
 {
+   /*
+* Do nothing if RecoveryLockLists is NULL because which means that
+* transaction tracking has not been yet initialized or has been already
+* shutdowned. This prevents transaction tracking from being shutdowned
+* unexpectedly more than once.
+*/
+   if (RecoveryLockLists == NULL)
+   return;
+
/* Mark all tracked in-progress transactions as finished. */
ExpireAllKnownAssignedTransactionIds();

Re: Autovacuum on partitioned table (autoanalyze)

2021-04-01 Thread yuzuko

Hi Tomas,

Thank you for reviewing the patch.

> Firstly, the patch propagates the changes_since_analyze values from
> do_analyze_rel, i.e. from the worker after it analyzes the relation.
> That may easily lead to cases with unnecessary analyzes - consider a
> partitioned with 4 child relations:
>  [ explanation ]
>
I didn't realize that till now.  Indeed, this approach increments parent's
changes_since_analyze counter according to its leaf partition's counter
when the leaf partition is analyzed, so it will cause unnecessary ANALYZE
on partitioned tables as you described.


> I propose a different approach - instead of propagating the counts in
> do_analyze_rel for individual leaf tables, let's do that in bulk in
> relation_needs_vacanalyze. Before the (existing) first pass over
> pg_class, we can add a new one, propagating counts from leaf tables to
> parents.
>
Thank you for your suggestion.  I think it could solve all the issues
you mentioned.  I modified the patch based on this approach:

- Create a new counter, PgStat_Counter changes_since_analyze_reported,
  to track changes_since_analyze we already propagated to ancestors.
  This is used for internal processing and users may not need to know it.
  So this counter is not displayed at pg_stat_all_tables view for now.

- Create a new function, pgstat_propagate_changes() which propagates
  changes_since_analyze counter to all ancestors and saves
  changes_since_analyze_reported.  This function is called in
  do_autovacuum() before relation_needs_vacanalyze().


> Note: I do have some ideas about how to improve that, I've started a
> separate thread about it [1].
>
I'm also interested in merging children's statistics for partitioned tables
because it will make ANALYZE on inheritance trees more efficient.
So I'll check it later.

> I forgot to mention one additional thing yesterday - I wonder if we need
> to do something similar after a partition is attached/detached. That can
> also change the parent's statistics significantly, so maybe we should
> handle all partition's rows as changes_since_analyze? Not necessarily
> something this patch has to handle, but might be related.
>
Regarding attached/detached partitions,  I think we should update statistics
of partitioned tables according to the new inheritance tree.  The latest patch
hasn't handled this case yet, but I'll give it a try soon.

Attach the v13 patch to this email.  Could you please check it again?

--
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


v13_autovacuum_on_partitioned_table.patch
Description: Binary data

RE: Hybrid Hash/Nested Loop joins and caching results from subplans

2021-04-01 Thread houzj.f...@fujitsu.com

> I've attached the updated patch.  I'll let the CFbot grab this to ensure it's
> happy with it before I go looking to push it again.

Hi,

I took a look into the patch and noticed some minor things.

1.
+   case T_ResultCache:
+   ptype = "ResultCache";
+   subpath = ((ResultCachePath *) path)->subpath;
+   break;
case T_UniquePath:
ptype = "Unique";
subpath = ((UniquePath *) path)->subpath;
should we use "case T_ResultCachePath" here?

2.
Is it better to add ResultCache's info to " src/backend/optimizer/README " ?
Something like:
  NestPath  - nested-loop joins
  MergePath - merge joins
  HashPath  - hash joins
+ ResultCachePath - Result cache

Best regards,
Hou zhijie

Re: Any objections to implementing LogicalDecodeMessageCB for pgoutput?

2021-04-01 Thread Amit Kapila

On Wed, Nov 25, 2020 at 8:58 AM Euler Taveira
 wrote:
>
> On Wed, 18 Nov 2020 at 03:04, David Pirotte  wrote:
>>
>> On Fri, Nov 6, 2020 at 7:05 AM Ashutosh Bapat  
>> wrote:
>>>
>>> +/*
>>> + * Write MESSAGE to stream
>>> + */
>>> +void
>>> +logicalrep_write_message(StringInfo out, ReorderBufferTXN *txn, XLogRecPtr 
>>> lsn,
>>> +bool transactional, const char *prefix, Size sz,
>>> +const char *message)
>>> +{
>>> +   uint8   flags = 0;
>>> +
>>> +   pq_sendbyte(out, LOGICAL_REP_MSG_MESSAGE);
>>> +
>>>
>>> Similar to the UPDATE/DELETE/INSERT records decoded when streaming is being
>>> used, we need to add transaction id for transactional messages. May be we 
>>> add
>>> that even in case of non-streaming case and use it to decide whether it's a
>>> transactional message or not. That might save us a byte when we are adding a
>>> transaction id.
>>
>>
> I also reviewed your patch. This feature would be really useful for 
> replication
> scenarios. Supporting this feature means that you don't need to use a table to
> pass messages from one node to another one. Here are a few comments/ideas.
>

Your ideas/suggestions look good to me. Don't we need to provide a
read function corresponding to logicalrep_write_message? We have it
for other write functions. Can you please combine all of your changes
into one patch?

-- 
With Regards,
Amit Kapila.

Re: Replication slot stats misgivings

2021-04-01 Thread vignesh C

On Wed, Mar 31, 2021 at 11:32 AM vignesh C  wrote:
>
> On Tue, Mar 30, 2021 at 11:00 AM Andres Freund  wrote:
> >
> > Hi,
> >
> > On 2021-03-30 10:13:29 +0530, vignesh C wrote:
> > > On Tue, Mar 30, 2021 at 6:28 AM Andres Freund  wrote:
> > > > Any chance you could write a tap test exercising a few of these cases?
> > >
> > > I can try to write a patch for this if nobody objects.
> >
> > Cool!
> >
>
> Attached a patch which has the test for the first scenario.
>
> > > > E.g. things like:
> > > >
> > > > - create a few slots, drop one of them, shut down, start up, verify
> > > >   stats are still sane
> > > > - create a few slots, shut down, manually remove a slot, lower
> > > >   max_replication_slots, start up
> > >
> > > Here by "manually remove a slot", do you mean to remove the slot
> > > manually from the pg_replslot folder?
> >
> > Yep - thereby allowing max_replication_slots after the shutdown/start to
> > be lower than the number of slots-stats objects.
>
> I have not included the 2nd test in the patch as the test fails with
> following warnings and also displays the statistics of the removed
> slot:
> WARNING:  problem in alloc set Statistics snapshot: detected write
> past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438
> WARNING:  problem in alloc set Statistics snapshot: detected write
> past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438
>
> This happens because the statistics file has an additional slot
> present even though the replication slot was removed.  I felt this
> issue should be fixed. I will try to fix this issue and send the
> second test along with the fix.

I felt from the statistics collector process, there is no way in which
we can identify if the replication slot is present or not because the
statistic collector process does not have access to shared memory.
Anything that the statistic collector process does independently by
traversing and removing the statistics of the replication slot
exceeding the max_replication_slot has its drawback of removing some
valid replication slot's statistics data.
Any thoughts on how we can identify the replication slot which has been dropped?
Can someone point me to the shared stats patch link with which message
loss can be avoided. I wanted to see a scenario where something like
the slot is dropped but the statistics are not updated because of an
immediate shutdown or server going down abruptly can occur or not with
the shared stats patch.

Regards,
Vignesh

Re: Get memory contexts of an arbitrary backend process

2021-04-01 Thread Fujii Masao





On 2021/03/31 15:16, Kyotaro Horiguchi wrote:

+ The memory contexts will be logged based on the log configuration
set. For example:

How do you think?


How about "The memory contexts will be logged in the server log" ?
I think "server log" doesn't suggest any concrete target.


Or just using "logged" is enough?

Also I'd like to document that one message for each memory context is logged.
So what about the following?

One message for each memory context will be logged. For example,



+1 from me. It looks like more compliant with the standard?


+1, too.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: TRUNCATE on foreign table

2021-04-01 Thread Fujii Masao





On 2021/04/01 0:09, Kohei KaiGai wrote:

What does the "permission checks" mean in this context?
The permission checks on the foreign tables involved are already checked
at truncate_check_rel(), by PostgreSQL's standard access control.


I meant that's the permission check that happens in the remote server side.
For example, when the foreign table is defined on postgres_fdw and truncated,
TRUNCATE command is issued to the remote server via postgres_fdw and
it checks the permission of the table before performing actual truncation.



Please assume an extreme example below:
1. I defined a foreign table with file_fdw onto a local CSV file.
2. Someone tries to scan the foreign table, and ACL allows it.
3. I disallow the read remission of the CSV using chmod, after the above step,
 but prior to the query execution.
4. Someone's query moved to the execution stage, then IterateForeignScan()
 raises an error due to OS level permission checks.

FDW is a mechanism to convert from/to external data sources to/from PostgreSQL's
structured data, as literal. Once we checked the permissions of
foreign-tables by
database ACLs, any other permission checks handled by FDW driver are a part of
execution (like, OS permission check when file_fdw read(2) the
underlying CSV files).
And, we have no reliable way to check entire permissions preliminary,
like OS file
permission check or database permission checks by remote server. Even
if a checker
routine returned a "hint", it may be changed at the execution time.
Somebody might
change the "truncate" permission at the remote server.

How about your opinions?


I agree that something like checker routine might not be so useful and
also be overkill. I was thinking that it's better to truncate the foreign 
tables first
and the local ones later. Otherwise it's a waste of cycles to truncate
the local tables if the truncation on foreign table causes an permission error
on the remote server.

But ISTM that the order of tables to truncate that the current patch provides
doesn't cause any actual bugs. So if you think the current order is better,
I'm ok with that for now. In this case, the comments for ExecuteTruncate()
should be updated.

BTW, the latest patch doesn't seem to be applied cleanly to the master
because of commit 27e1f14563. Could you rebase it?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

1 2 >

1 - 100 of 109 matches

Mail list logo