Re: [PATCH] Identify LWLocks in tracepoints

2021-01-07 Thread Craig Ringer
On Sat, 19 Dec 2020 at 13:00, Craig Ringer 
wrote:

> Hi all
>
> The attached patch set follows on from the discussion in [1] "Add LWLock
> blocker(s) information" by adding the actual LWLock* and the numeric
> tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.
>
>
I've attached a systemtap script that makes use of the information exported
by the enhanced LWLock tracepoints. It offers something akin to dynamic
-DLWLOCK_STATS with automatic statistical aggregation and some selective
LWLOCK_DEBUG output.

The script isn't super pretty. I didn't try to separate event-data
collection from results output, and there's some duplication in places. But
it gives you an idea what's possible when we report lock pointers and
tranche IDs to tracepoints and add entry/exit tracepoints.

Key features:

* Collect statistical aggregates on lwlock hold and wait durations across
all processes. Stats are grouped by lockmode (shared or exclusive) and by
tranche name, as well as rollup stats across all tranches.
* Report lock wait and hold durations for each process when that process
exits. Again, reported by mode and tranche name.
* For long lock waits, print the waiter pid and waiting pid, along with
each process's backend type and application_name if known, the acquire
mode, and the acquire function

The output is intended to be human-readable, but it'd be quite simple to
convert it into raw tsv-style output suitable for ingestion into
statistical postprocessing or graphing tools.

It should be fairly easy to break down the stats by acquire function if
desired, so LWLockAcquire(), LWLockWaitForVar(), and LWLockAcquireOrWait()
are reported separately. They're combined in the current output.

Capturing the current query string is pretty simple if needed, but I didn't
think it was likely to be especially useful.

Sample output for a pg_regress run attached. Abridged version follows. Here
the !!W!! lines are "waited a long time", the !!H!! lines are "held a long
time". Then [pid]:MyBackendType tranche_name wait_time_us (wait_time) in
wait_func (appliation_name) => [blocker_pid] (blocker_application_name) .
If blocker pid wasn't identified it won't be reported - I know how to fix
that and will do so soon.

!!W!! [ 93030]:3  BufferContent12993 (0m0.012993s) in
lwlock__acquire__start (pg_regress/text)
!!W!! [ 93036]:3LockManager14540 (0m0.014540s) in
lwlock__acquire__start (pg_regress/float8) => [ 93045] (pg_regress/regproc)
!!W!! [ 93035]:3  BufferContent12608 (0m0.012608s) in
lwlock__acquire__start (pg_regress/float4) => [ 93034] (pg_regress/oid)
!!W!! [ 93036]:3LockManager10301 (0m0.010301s) in
lwlock__acquire__start (pg_regress/float8)
!!W!! [ 93043]:3LockManager10356 (0m0.010356s) in
lwlock__acquire__start (pg_regress/pg_lsn)
!!H!! [ 93033]:3  BufferContent20579 (0m0.020579s)
(pg_regress/int8)
!!W!! [ 93027]:3  BufferContent10766 (0m0.010766s) in
lwlock__acquire__start (pg_regress/char) => [ 93037] (pg_regress/bit)
!!W!! [ 93036]:3 OidGen12876 (0m0.012876s) in
lwlock__acquire__start (pg_regress/float8)
...

Then the summary rollup at the end of the run. This can also be output
periodically during the run. Abbreviated for highlights:

wait locks: all procstranche modecount
 total  avg variance  min  max
  W LW_EXCLUSIVE  (all)E54185
14062734  259  1850265144177
  W LW_SHARED (all)S 3668
 1116022  304  1527261218642

held locks: all procstranche modecount
 total  avg variance  min  max
  H LW_EXCLUSIVE  (all)E 10438060
 153077259   14370351   195043
  H LW_SHARED (all)S 14199902
654669344 5318144030

all procs by tranchetranche modecount
 total  avg variance  min  max
  W tranche   (all)S 3668
 1116022  304  1527261218642
  W tranche   (all)E54185
14062734  259  1850265144177
  W tranche   WALInsertE 9839
 2393229  243  1180294214209
  W tranche   BufferContentE 3012
 1726543  573  3869409228186
  W tranche   BufferContentS 1664
657855  395  2185694218642
  W trancheLockFastPathE28314
 6327801  223  1278053126133
  W trancheLockFastPathS   87
 59401  682  3703217   19 9454
  W tranche LockManagerE 7223
 2764392  382  2514863244177


Hope this is interesting to 

Re: patch: reduce overhead of execution of CALL statement in no atomic mode from PL/pgSQL

2021-01-07 Thread Pavel Stehule
pá 1. 1. 2021 v 9:15 odesílatel Pavel Stehule 
napsal:

> Hi
>
> only rebase
>

rebase

regards

Pavel


> Regards
>
> Pavel
>
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 653ef8e41a..6f8643cc4a 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -230,7 +230,7 @@ ExecuteQuery(ParseState *pstate,
 	   entry->plansource->query_string);
 
 	/* Replan if needed, and increment plan refcount for portal */
-	cplan = GetCachedPlan(entry->plansource, paramLI, false, NULL);
+	cplan = GetCachedPlan(entry->plansource, paramLI, false, NULL, NULL);
 	plan_list = cplan->stmt_list;
 
 	/*
@@ -651,7 +651,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 	}
 
 	/* Replan if needed, and acquire a transient refcount */
-	cplan = GetCachedPlan(entry->plansource, paramLI, true, queryEnv);
+	cplan = GetCachedPlan(entry->plansource, paramLI, true, CurrentResourceOwner, queryEnv);
 
 	INSTR_TIME_SET_CURRENT(planduration);
 	INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -687,7 +687,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 	if (estate)
 		FreeExecutorState(estate);
 
-	ReleaseCachedPlan(cplan, true);
+	ReleaseCachedPlan(cplan, true, CurrentResourceOwner);
 }
 
 /*
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index e28d242922..a4fe7eb2c3 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -67,7 +67,7 @@ static void _SPI_prepare_oneshot_plan(const char *src, SPIPlanPtr plan);
 static int	_SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 			  Snapshot snapshot, Snapshot crosscheck_snapshot,
 			  bool read_only, bool fire_triggers, uint64 tcount,
-			  DestReceiver *caller_dest);
+			  DestReceiver *caller_dest, ResourceOwner owner);
 
 static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
 		 Datum *Values, const char *Nulls);
@@ -521,7 +521,8 @@ SPI_execute(const char *src, bool read_only, long tcount)
 
 	res = _SPI_execute_plan(, NULL,
 			InvalidSnapshot, InvalidSnapshot,
-			read_only, true, tcount, NULL);
+			read_only, true, tcount, NULL,
+			CurrentResourceOwner);
 
 	_SPI_end_call(true);
 	return res;
@@ -555,7 +556,8 @@ SPI_execute_plan(SPIPlanPtr plan, Datum *Values, const char *Nulls,
 			_SPI_convert_params(plan->nargs, plan->argtypes,
 Values, Nulls),
 			InvalidSnapshot, InvalidSnapshot,
-			read_only, true, tcount, NULL);
+			read_only, true, tcount, NULL,
+			CurrentResourceOwner);
 
 	_SPI_end_call(true);
 	return res;
@@ -584,7 +586,8 @@ SPI_execute_plan_with_paramlist(SPIPlanPtr plan, ParamListInfo params,
 
 	res = _SPI_execute_plan(plan, params,
 			InvalidSnapshot, InvalidSnapshot,
-			read_only, true, tcount, NULL);
+			read_only, true, tcount, NULL,
+			CurrentResourceOwner);
 
 	_SPI_end_call(true);
 	return res;
@@ -613,7 +616,37 @@ SPI_execute_plan_with_receiver(SPIPlanPtr plan,
 
 	res = _SPI_execute_plan(plan, params,
 			InvalidSnapshot, InvalidSnapshot,
-			read_only, true, tcount, dest);
+			read_only, true, tcount, dest,
+			CurrentResourceOwner);
+
+	_SPI_end_call(true);
+	return res;
+}
+
+
+/*
+ * Execute a previously prepared plan with possibility to
+ * specify resource owner
+ */
+int
+SPI_execute_plan_with_resowner(SPIPlanPtr plan,
+ParamListInfo params,
+bool read_only, long tcount,
+ResourceOwner owner)
+{
+	int			res;
+
+	if (plan == NULL || plan->magic != _SPI_PLAN_MAGIC || tcount < 0)
+		return SPI_ERROR_ARGUMENT;
+
+	res = _SPI_begin_call(true);
+	if (res < 0)
+		return res;
+
+	res = _SPI_execute_plan(plan, params,
+			InvalidSnapshot, InvalidSnapshot,
+			read_only, true, tcount, NULL,
+			owner);
 
 	_SPI_end_call(true);
 	return res;
@@ -654,7 +687,8 @@ SPI_execute_snapshot(SPIPlanPtr plan,
 			_SPI_convert_params(plan->nargs, plan->argtypes,
 Values, Nulls),
 			snapshot, crosscheck_snapshot,
-			read_only, fire_triggers, tcount, NULL);
+			read_only, fire_triggers, tcount, NULL,
+			CurrentResourceOwner);
 
 	_SPI_end_call(true);
 	return res;
@@ -702,7 +736,8 @@ SPI_execute_with_args(const char *src,
 
 	res = _SPI_execute_plan(, paramLI,
 			InvalidSnapshot, InvalidSnapshot,
-			read_only, true, tcount, NULL);
+			read_only, true, tcount, NULL,
+			CurrentResourceOwner);
 
 	_SPI_end_call(true);
 	return res;
@@ -746,7 +781,8 @@ SPI_execute_with_receiver(const char *src,
 
 	res = _SPI_execute_plan(, params,
 			InvalidSnapshot, InvalidSnapshot,
-			read_only, true, tcount, dest);
+			read_only, true, tcount, dest,
+			CurrentResourceOwner);
 
 	_SPI_end_call(true);
 	return res;
@@ -1554,7 +1590,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
 	 */
 
 	/* Replan if needed, and increment plan refcount for portal */
-	cplan = 

Re: WIP: System Versioned Temporal Table

2021-01-07 Thread Simon Riggs
On Thu, Jan 7, 2021 at 5:59 PM Simon Riggs  wrote:
>
> On Mon, Jan 4, 2021 at 2:24 PM Masahiko Sawada  wrote:
> > Please also note the v7 patch cannot be applied to the current HEAD. I'm 
> > switching the patch as Waiting on Author.
>
> Surafel, please say whether you are working on this or not. If you
> need help, let us know.

I've minimally rebased the patch to current head so that it compiles
and passes current make check.

>From here, I will add further docs and tests to enhance review and
discover issues.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


system-versioned-temporal-table_2021_v8.patch
Description: Binary data


Re: Refactoring HMAC in the core code

2021-01-07 Thread Michael Paquier
On Fri, Dec 18, 2020 at 03:46:42PM +0900, Michael Paquier wrote:
> This has been tested on Windows and Linux across all the versions of
> OpenSSL we support on HEAD.  I am also attaching a small module called
> hmacfuncs that I used as a way to validate this patch across all the
> versions of OpenSSL and the fallback implementation.  As a reference,
> this matches with the results from Wikipedia here:
> https://en.wikipedia.org/wiki/HMAC#Examples

Please find attached a rebased version.  I have simplified the
implementation to use an opaque pointer similar to the cryptohash
part, leading to a large cleanup of the allocation logic for both
implementations, with and without OpenSSL.
--
Michael
From 3620d422a6d9c7a664c2137d445bb7df15ac103d Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 8 Jan 2021 16:11:06 +0900
Subject: [PATCH v3] Refactor HMAC implementations

---
 src/include/common/hmac.h |  29 +++
 src/include/common/md5.h  |   2 +
 src/include/common/scram-common.h |  13 --
 src/include/pg_config.h.in|   6 +
 src/include/utils/resowner_private.h  |   7 +
 src/backend/libpq/auth-scram.c|  61 +++---
 src/backend/utils/resowner/resowner.c |  61 ++
 src/common/Makefile   |   4 +-
 src/common/hmac.c | 258 ++
 src/common/hmac_openssl.c | 222 ++
 src/common/scram-common.c | 158 
 src/interfaces/libpq/fe-auth-scram.c  |  70 ---
 configure |   2 +-
 configure.ac  |   2 +-
 src/tools/msvc/Mkvcbuild.pm   |   2 +
 src/tools/msvc/Solution.pm|   4 +
 src/tools/pgindent/typedefs.list  |   2 +-
 17 files changed, 705 insertions(+), 198 deletions(-)
 create mode 100644 src/include/common/hmac.h
 create mode 100644 src/common/hmac.c
 create mode 100644 src/common/hmac_openssl.c

diff --git a/src/include/common/hmac.h b/src/include/common/hmac.h
new file mode 100644
index 00..f31a57d324
--- /dev/null
+++ b/src/include/common/hmac.h
@@ -0,0 +1,29 @@
+/*-
+ *
+ * hmac.h
+ *	  Generic headers for HMAC
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *		  src/include/common/hmac.h
+ *
+ *-
+ */
+
+#ifndef PG_HMAC_H
+#define PG_HMAC_H
+
+#include "common/cryptohash.h"
+
+/* opaque context, private to each HMAC implementation */
+typedef struct pg_hmac_ctx pg_hmac_ctx;
+
+extern pg_hmac_ctx *pg_hmac_create(pg_cryptohash_type type);
+extern int	pg_hmac_init(pg_hmac_ctx *ctx, const uint8 *key, size_t len);
+extern int	pg_hmac_update(pg_hmac_ctx *ctx, const uint8 *data, size_t len);
+extern int	pg_hmac_final(pg_hmac_ctx *ctx, uint8 *dest);
+extern void pg_hmac_free(pg_hmac_ctx *ctx);
+
+#endif			/* PG_HMAC_H */
diff --git a/src/include/common/md5.h b/src/include/common/md5.h
index 6d100f5cfc..62a31e6ed4 100644
--- a/src/include/common/md5.h
+++ b/src/include/common/md5.h
@@ -18,6 +18,8 @@
 
 /* Size of result generated by MD5 computation */
 #define MD5_DIGEST_LENGTH 16
+/* Block size for MD5 */
+#define MD5_BLOCK_SIZE	64
 
 /* password-related data */
 #define MD5_PASSWD_CHARSET	"0123456789abcdef"
diff --git a/src/include/common/scram-common.h b/src/include/common/scram-common.h
index 9d684b41e8..5777ce9fe3 100644
--- a/src/include/common/scram-common.h
+++ b/src/include/common/scram-common.h
@@ -46,19 +46,6 @@
  */
 #define SCRAM_DEFAULT_ITERATIONS	4096
 
-/*
- * Context data for HMAC used in SCRAM authentication.
- */
-typedef struct
-{
-	pg_cryptohash_ctx *sha256ctx;
-	uint8		k_opad[SHA256_HMAC_B];
-} scram_HMAC_ctx;
-
-extern int	scram_HMAC_init(scram_HMAC_ctx *ctx, const uint8 *key, int keylen);
-extern int	scram_HMAC_update(scram_HMAC_ctx *ctx, const char *str, int slen);
-extern int	scram_HMAC_final(uint8 *result, scram_HMAC_ctx *ctx);
-
 extern int	scram_SaltedPassword(const char *password, const char *salt,
  int saltlen, int iterations, uint8 *result);
 extern int	scram_H(const uint8 *str, int len, uint8 *result);
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index ddaa9e8e18..b7d01a3b37 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -268,6 +268,12 @@
 /* Define to 1 if you have the `history_truncate_file' function. */
 #undef HAVE_HISTORY_TRUNCATE_FILE
 
+/* Define to 1 if you have the `HMAC_CTX_free' function. */
+#undef HAVE_HMAC_CTX_FREE
+
+/* Define to 1 if you have the `HMAC_CTX_new' function. */
+#undef HAVE_HMAC_CTX_NEW
+
 /* Define to 1 if you have the  header file. */
 #undef HAVE_IFADDRS_H
 
diff --git a/src/include/utils/resowner_private.h b/src/include/utils/resowner_private.h
index c480a1a24b..6dafc87e28 100644

Re: Parallel INSERT (INTO ... SELECT ...)

2021-01-07 Thread Antonin Houska
Amit Kapila  wrote:

> On Wed, Jan 6, 2021 at 2:09 PM Antonin Houska  wrote:
> >
> > Greg Nancarrow  wrote:
> >
> > > Posting an updated set of patches to address recent feedback:
> >
> > Following is my review.
> >
> > v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch
> > --
> >
> > @@ -342,6 +343,18 @@ standard_planner(Query *parse, const char 
> > *query_string, int cursorOptions,
> > /* all the cheap tests pass, so scan the query tree */
> > glob->maxParallelHazard = max_parallel_hazard(parse);
> > glob->parallelModeOK = (glob->maxParallelHazard != 
> > PROPARALLEL_UNSAFE);
> > +
> > +   /*
> > +* Additional parallel-mode safety checks are required in 
> > order to
> > +* allow an underlying parallel query to be used for a
> > +* table-modification command that is supported in 
> > parallel-mode.
> > +*/
> > +   if (glob->parallelModeOK &&
> > +   IsModifySupportedInParallelMode(parse->commandType))
> > +   {
> > +   glob->maxParallelHazard = 
> > max_parallel_hazard_for_modify(parse, >maxParallelHazard);
> > +   glob->parallelModeOK = (glob->maxParallelHazard != 
> > PROPARALLEL_UNSAFE);
> > +   }
> >
> > Is it really ok to allow PROPARALLEL_RESTRICTED? Per definition, these
> > functions should not be called by parallel worker.
> >
> 
> What in the above change indicates that the parallel_restricted will
> be allowed in parallel workers. This just sets paralleModeOK to allow
> parallel plans for Selects if the Insert can be performed safely in a
> leader backend.

Well, this is just the initial setting, while the distinction between "Gather
-> Insert -> ..." and "Insert -> Gather -> ..." is made later. So I withdraw
my objection.

> >
> > @@ -1015,6 +1016,27 @@ IsInParallelMode(void)
> >  }
> >
> >  /*
> > + * PrepareParallelMode
> > + *
> > + * Prepare for entering parallel mode, based on command-type.
> > + */
> > +void
> > +PrepareParallelMode(CmdType commandType)
> > +{
> > +   Assert(!IsInParallelMode() || force_parallel_mode != 
> > FORCE_PARALLEL_OFF);
> >
> > Isn't the test of force_parallel_mode just a hack to make regression tests
> > pass? When I removed this part and ran the regression tests with
> > force_parallel_mode=regress, the assertion fired when executing a subquery
> > because the executor was already in parallel mode due to the main query
> > execution.
> >
> 
> I think this Assert is bogus. We are allowed to enter in parallel-mode
> if we are already in parallel-mode, see EnterParallelMode.

Right.

> But we shouldn't be allowed allocate xid in parallel-mode. So the
> Assert(!IsInParallelMode()) should be moved inside the check if
> (IsModifySupportedInParallelMode(commandType)) in this function. Can you
> check if it still fails after such a modification?

Yes, this works.


> > As an alternative, have you considered allocation of the XID even in 
> > parallel
> > mode? I imagine that the first parallel worker that needs the XID for
> > insertions allocates it and shares it with the other workers as well as with
> > the leader process.
> >
> 
> As a matter of this patch
> (v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch), we
> never need to allocate xids by workers because Insert is always
> performed by leader backend.

When writing this comment, I was actually thinking of
v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch rather
than v11-0001, see below. On the other hand, if we allowed XID allocation in
the parallel mode (as a separate patch), even the 0001 patch would get a bit
simpler.

> Even, if we want to do what you are suggesting it would be tricky because
> currently, we don't have such an infrastructure where we can pass
> information among workers.

How about barriers (storage/ipc/barrier.c)? What I imagine is that all the
workers "meet" at the barrier when they want to insert the first tuple. Then
one of them allocates the XID, makes it available to others (via shared
memory) and all the workers can continue.

> > One problem of the current patch version is that the "INSERT INTO ... SELECT
> > ..." statement consumes XID even if the SELECT eventually does not return 
> > any
> > row. However, if the same query is processed w/o parallelism, the XID is 
> > only
> > allocated if at least one tuple needs to be inserted.
> >

> Yeah, that is true but I think this can happen w/o parallelism for
> updates and deletes where by the time we try to modify the row, it got
> modified by a concurrent session and the first session will needlessly
> allocate XID.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: Huge memory consumption on partitioned table with FKs

2021-01-07 Thread Amit Langote
Kuroda-san,

On Fri, Jan 8, 2021 at 1:02 PM Keisuke Kuroda
 wrote:
> Hi Amit-san,
>
> I noticed that this patch
> (v3-0001-ri_triggers.c-Use-root-constraint-OID-as-key-to-r.patch)
> is not registered in the commitfest. I think it needs to be registered for
> commit, is that correct?
>
> I have confirmed that this patch can be applied to HEAD (master),
> and that check-world PASS.

Thanks for checking.  Indeed, it should have been added to the January
commit-fest.  I've added it to the March one:

https://commitfest.postgresql.org/32/2930/

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: Add Information during standby recovery conflicts

2021-01-07 Thread Fujii Masao




On 2021/01/08 14:02, Drouvot, Bertrand wrote:

Hi,

On 1/7/21 4:51 PM, Fujii Masao wrote:

Thanks for the review! I pushed the latest patch.


Thanks all of you for your precious help on this patch!

The original idea behind this thread has been split into 3 pieces.

Pieces 1 (9d0bd95fa90a7243047a74e29f265296a9fc556d) and 2 
(0650ff23038bc3eb8d8fd851744db837d921e285) have now been committed, the last 
one is to add more information regarding the canceled statements (if any), like:

* What was the blocker(s) doing?


This "canceled statement" is just one that's canceled by recovery conflict?
If so, the blocker is always the startup process? Sorry maybe I fail to
understand this idea well..



* When did the blocker(s) started their queries (if any)?


If the blocker is the startup process, it doesn't start any query at all?

Anyway if you post the patch, I'm happy to review that!

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: proposal: schema variables

2021-01-07 Thread Pavel Stehule
Hi

just rebase

Regards

Pavel


schema-variables-20200108.patch.gz
Description: application/gzip


Re: Disable WAL logging to speed up data loading

2021-01-07 Thread Kyotaro Horiguchi
At Fri, 8 Jan 2021 05:12:02 +, "tsunakawa.ta...@fujitsu.com" 
 wrote in 
> From: Robert Haas 
> > Were the issues that I mentioned regarding GIST (and maybe other AMs)
> > in the last paragraph of
> > http://postgr.es/m/CA+TgmoZEZ5RONS49C7mEpjhjndqMQtVrz_LCQUkpRW
> > dmrev...@mail.gmail.com
> > addressed in some way? That seems like a pretty hard engineering
> > problem to me, and I don't see that there's been any discussion of it.
> > Those are correctness concerns separate from any wal_level tracking we
> > might want to do to avoid accidental mistakes.
> 
> Thank you very much for reminding me of this.  I forgot I replied as follows:
> 
> 
> --
> Unlogged GiST indexes use fake LSNs that are instance-wide.  Unlogged 
> temporary GiST indexes use backend-local sequence values.  Other unlogged and 
> temporary relations don't set LSNs on pages.  So, I think it's enough to call 
> GetFakeLSNForUnloggedRel() when wal_level = none as well.
> --
> 
> 
> But this is not correct.  We have to allow (RM_GIST_ID,
> XLOG_GIST_ASSIGN_LSN) WAL records to be emitted (by tweaking the
> filter in XLogInsert()), just like those WAL records are emitted
> when wal_level = minimal and and other WAL records are not emitted.

Yeah, although LOGGED and UNLOGGED use incompatible LSNs, LOGGED
tables always uses real LSNs, maintained by that type of WAL record
while wal_level = minimal.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: In-placre persistance change of a relation

2021-01-07 Thread Kyotaro Horiguchi
At Fri, 25 Dec 2020 09:12:52 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Hello.
> 
> At Thu, 24 Dec 2020 17:02:20 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > The patch is attached to the next message.
> 
> The reason for separating this message is that I modified this so that
> it could solve another issue.
> 
> There's a complain about orphan files after crash. [1]
> 
> 1: https://www.postgresql.org/message-id/16771-cbef7d97ba93f...@postgresql.org
> 
> That is, the case where a relation file is left alone after a server
> crash that happened before the end of the transaction that has created
> a relation.  As I read this, I noticed this feature can solve the
> issue with a small change.
> 
> This version gets changes in RelationCreateStorage and
> smgrDoPendingDeletes.
> 
> Previously inittmp fork is created only along with an init fork. This
> version creates one always when a relation storage file is created. As
> the result ResetUnloggedRelationsInDbspaceDir removes all forks if the
> inttmp fork of a logged relations is found.  Now that pendingDeletes
> can contain multiple entries for the same relation, it has been
> modified not to close the same smgr multiple times.
> 
> - It might be better to split 0001 into two peaces.
> 
> - The function name ResetUnloggedRelationsInDbspaceDir is no longer
>   represents the function correctly.

As pointed by Robert in another thread [1], persisntence of (at least)
GiST index cannot be flipped in-place due to incompatibility of fake
LSNs with real ones.

This version RelationChangePersistence() is changed not to choose
in-place method for indexes other than btree. It seems to be usable
with all kind of indexes other than Gist, but at the mement it applies
only to btrees.

1: 
https://www.postgresql.org/message-id/ca+tgmozez5rons49c7mepjhjndqmqtvrz_lcqukprwdmrev...@mail.gmail.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 1d47e7872d1e7ef18007f752e55cec9772373cc9 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 11 Nov 2020 21:51:11 +0900
Subject: [PATCH v3 1/2] In-place table persistence change

Even though ALTER TABLE SET LOGGED/UNLOGGED does not require data
rewriting, currently it runs heap rewrite which causes large amount of
file I/O.  This patch makes the command run without heap rewrite.
Addition to that, SET LOGGED while wal_level > minimal emits WAL using
XLOG_FPI instead of massive number of HEAP_INSERT's, which should be
smaller.

Also this allows for the cleanup of files left behind in the crash of
the transaction that created it.
---
 src/backend/access/rmgrdesc/smgrdesc.c |  23 ++
 src/backend/access/transam/README  |  10 +
 src/backend/catalog/storage.c  | 420 +++--
 src/backend/commands/tablecmds.c   | 246 ---
 src/backend/storage/buffer/bufmgr.c|  88 ++
 src/backend/storage/file/reinit.c  | 162 ++
 src/backend/storage/smgr/md.c  |   4 +-
 src/backend/storage/smgr/smgr.c|   6 +
 src/common/relpath.c   |   3 +-
 src/include/catalog/storage.h  |   2 +
 src/include/catalog/storage_xlog.h |  22 +-
 src/include/common/relpath.h   |   5 +-
 src/include/storage/bufmgr.h   |   2 +
 src/include/storage/smgr.h |   1 +
 14 files changed, 854 insertions(+), 140 deletions(-)

diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c
index 773d57..2c109b8ca4 100644
--- a/src/backend/access/rmgrdesc/smgrdesc.c
+++ b/src/backend/access/rmgrdesc/smgrdesc.c
@@ -40,6 +40,23 @@ smgr_desc(StringInfo buf, XLogReaderState *record)
 		 xlrec->blkno, xlrec->flags);
 		pfree(path);
 	}
+	else if (info == XLOG_SMGR_UNLINK)
+	{
+		xl_smgr_unlink *xlrec = (xl_smgr_unlink *) rec;
+		char	   *path = relpathperm(xlrec->rnode, xlrec->forkNum);
+
+		appendStringInfoString(buf, path);
+		pfree(path);
+	}
+	else if (info == XLOG_SMGR_BUFPERSISTENCE)
+	{
+		xl_smgr_bufpersistence *xlrec = (xl_smgr_bufpersistence *) rec;
+		char	   *path = relpathperm(xlrec->rnode, MAIN_FORKNUM);
+
+		appendStringInfoString(buf, path);
+		appendStringInfo(buf, " persistence %d", xlrec->persistence);
+		pfree(path);
+	}
 }
 
 const char *
@@ -55,6 +72,12 @@ smgr_identify(uint8 info)
 		case XLOG_SMGR_TRUNCATE:
 			id = "TRUNCATE";
 			break;
+		case XLOG_SMGR_UNLINK:
+			id = "UNLINK";
+			break;
+		case XLOG_SMGR_BUFPERSISTENCE:
+			id = "BUFPERSISTENCE";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README
index 1edc8180c1..51616b2458 100644
--- a/src/backend/access/transam/README
+++ b/src/backend/access/transam/README
@@ -724,6 +724,16 @@ we must panic and abort recovery.  The DBA will have to manually clean up
 then restart recovery.  This is part of the reason for not writing a WAL
 entry until we've successfully done the original action.
 
+The INITTMP fork file
+
+
+An INITTMP 

RE: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-07 Thread tsunakawa.ta...@fujitsu.com
From: Stephen Frost 
> I hadn't intended to make an argument that enabling checksums was
> equivilant to enabling or disabling fsync- I said it was 'akin', by which I 
> meant it
> was similar in character, as in, as I said previously, a way for PG to hedge
> against certain external-to-PG risks (though, unfortunately, our checksums
> aren't able to actually mitigate any of the risks but merely to detect them, 
> but
> there is certainly value in that too).

I only skimmed a few recent mails in this thread, but I've been hoping that the 
checksum gets on by default.  So +1.  As the storage layers get complex, 
including the container and virtual storage, we may increasingly be passed 
bogus data by software bugs.  I always tell my little kid to fasten a seatbelt, 
although she dislikes it as she can't move well.

I've also felt it sad that checksum is not mentioned in the database cluster 
setup section of the manual [1].  If checksum is turned on by default, I think 
we can mention that checksum can be disabled for maximum speed in this page.

It'd be desirable to lighten the overhead of checksumming, but I have no idea 
now.  IIRC, Oracle and SQL Server also scan the whole page to compute the 
checksum.  Maybe the disadvantage of Postgres is that it has to copy the whole 
page for fear of concurrent hint bit updates?


[1]
19.2. Creating a Database Cluster
https://www.postgresql.org/docs/devel/creating-cluster.html


Regards
Takayuki Tsunakawa





RE: Disable WAL logging to speed up data loading

2021-01-07 Thread tsunakawa.ta...@fujitsu.com
From: Robert Haas 
> Were the issues that I mentioned regarding GIST (and maybe other AMs)
> in the last paragraph of
> http://postgr.es/m/CA+TgmoZEZ5RONS49C7mEpjhjndqMQtVrz_LCQUkpRW
> dmrev...@mail.gmail.com
> addressed in some way? That seems like a pretty hard engineering
> problem to me, and I don't see that there's been any discussion of it.
> Those are correctness concerns separate from any wal_level tracking we
> might want to do to avoid accidental mistakes.

Thank you very much for reminding me of this.  I forgot I replied as follows:


--
Unlogged GiST indexes use fake LSNs that are instance-wide.  Unlogged temporary 
GiST indexes use backend-local sequence values.  Other unlogged and temporary 
relations don't set LSNs on pages.  So, I think it's enough to call 
GetFakeLSNForUnloggedRel() when wal_level = none as well.
--


But this is not correct.  We have to allow (RM_GIST_ID, XLOG_GIST_ASSIGN_LSN) 
WAL records to be emitted (by tweaking the filter in XLogInsert()), just like 
those WAL records are emitted when wal_level = minimal and and other WAL 
records are not emitted.


Regards
Takayuki Tsunakawa



Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Amit Kapila
On Fri, Jan 8, 2021 at 9:45 AM Josef Šimánek  wrote:
>
> pá 8. 1. 2021 v 5:03 odesílatel Amit Kapila  napsal:
> >
> > On Fri, Jan 8, 2021 at 8:42 AM Josef Šimánek  
> > wrote:
> > >
> > > pá 8. 1. 2021 v 3:55 odesílatel Amit Kapila  
> > > napsal:
> > > >
> > > >
> > > > Can't we display the entire COPY command? I checked that
> > > > pg_stat_statements display the query so there shouldn't be a problem
> > > > to display the entire command.
> > >
> > > In previous discussions there was mentioned it doesn't make sense
> > > since you can join with pg_stat_statements on the pid column if
> > > needed. What would be the reason to duplicate that info?
> > >
> >
> > But pg_stat_staments won't be present by default. Also, the same
> > argument could be applied for the command to be present in
> > stat_progress views. It occurred to me only when I was trying to
> > compare what we display in all the progress views. I think there is
> > some merit in being consistent.
>
> Sorry, I mean pg_stat_activity (not pg_stat_statements). That is
> included by default (at least in my installations).
>

Okay, that makes sense but I still wonder why we display it in other
stat_progress views?

-- 
With Regards,
Amit Kapila.




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-07 Thread Bharath Rupireddy
On Fri, Jan 8, 2021 at 7:29 AM Fujii Masao  wrote:
> On 2021/01/07 17:21, Bharath Rupireddy wrote:
> > On Thu, Jan 7, 2021 at 9:49 AM Fujii Masao  
> > wrote:
> >> On 2021/01/05 16:56, Bharath Rupireddy wrote:
> >>> Attaching v8 patch set. Hopefully, cf bot will be happy with v8.
> >>>
> >>> Please consider the v8 patch set for further review.
> >> -DATA = postgres_fdw--1.0.sql
> >> +DATA = postgres_fdw--1.1.sql postgres_fdw--1.0--1.1.sql
> >>
> >> Shouldn't we leave 1.0.sql as it is and create 1.0--1.1.sql so that
> >> we can run the followings?
> >>
> >>   CREATE EXTENSION postgres_fdw VERSION "1.0";
> >>   ALTER EXTENSION postgres_fdw UPDATE TO "1.1";
> >
> > Yes we can. In that case, to use the new functions users have to
> > update postgres_fdw to 1.1, in that case, do we need to mention in the
> > documentation that to make use of the new functions, update
> > postgres_fdw to version 1.1?
>
> But since postgres_fdw.control indicates that the default version is 1.1,
> "CREATE EXTENSION postgres_fdw" installs v1.1. So basically the users
> don't need to update postgres_fdw from v1.0 to v1.1. Only the users of
> v1.0 need to update that to v1.1 to use new functions. No?

It works this way:
scenario 1:
1) create extension postgres_fdw;   --> this is run before our feature
i.e default_version 1.0
2) after the feature i..e default_version 1.1, users can run alter
extension postgres_fdw update to "1.1"; which gets the new functions
from postgres_fdw--1.0--1.1.sql.

scenario 2:
1) create extension postgres_fdw;   --> this is run after our feature
i.e default_version 1.1, then the new functions will be installed with
create extension itself, no need to run alter update to get the
functions,

I will make the changes and post a new patch set soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Add Information during standby recovery conflicts

2021-01-07 Thread Fujii Masao



On 2021/01/08 11:17, Kyotaro Horiguchi wrote:

At Fri, 8 Jan 2021 01:32:11 +0900, Fujii Masao  
wrote in


Attached is the updated version of the patch. This can be applied to
current master.

With the patch, for example, if the startup process waited longer than
deadlock_timeout for the recovery conflict on the lock, the latter log
message in the followings would be additionally output.

 LOG: recovery still waiting after 1001.223 ms: recovery conflict on
 lock
 LOG: recovery finished waiting after 19004.694 ms: recovery conflict
 on lock


+   /*
+* Emit the log message if recovery conflict on buffer 
pin was resolved but
+* the startup process waited longer than 
deadlock_timeout for it.

The first line is beyond the 80th column.


Thanks for pointing out this! This happened because I forgot to run pgindent
for bufmgr.c. Attached is the updated version of the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7c0a673a8d..82864bbb24 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6961,7 +6961,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
   

 Controls whether a log message is produced when the startup process
-is waiting longer than deadlock_timeout
+waits longer than deadlock_timeout
 for recovery conflicts.  This is useful in determining if recovery
 conflicts prevent the recovery from applying WAL.

diff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c
index 71b5852224..c116e75ca1 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -3846,6 +3846,16 @@ LockBufferForCleanup(Buffer buffer)
/* Successfully acquired exclusive lock with pincount 1 
*/
UnlockBufHdr(bufHdr, buf_state);
 
+   /*
+* Emit the log message if recovery conflict on buffer 
pin was
+* resolved but the startup process waited longer than
+* deadlock_timeout for it.
+*/
+   if (logged_recovery_conflict)
+   
LogRecoveryConflict(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
+   
waitStart, GetCurrentTimestamp(),
+   NULL, 
false);
+
/* Report change to non-waiting status */
if (new_status)
{
@@ -3900,7 +3910,7 @@ LockBufferForCleanup(Buffer buffer)

   DeadlockTimeout))
{

LogRecoveryConflict(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
-   
waitStart, now, NULL);
+   
waitStart, now, NULL, true);
logged_recovery_conflict = true;
}
}
diff --git a/src/backend/storage/ipc/standby.c 
b/src/backend/storage/ipc/standby.c
index d4b0f65ba2..39a30c00f7 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -226,11 +226,14 @@ WaitExceedsMaxStandbyDelay(uint32 wait_event_info)
  * wait_start is the timestamp when the caller started to wait.
  * now is the timestamp when this function has been called.
  * wait_list is the list of virtual transaction ids assigned to
- * conflicting processes.
+ * conflicting processes. still_waiting indicates whether
+ * the startup process is still waiting for the recovery conflict
+ * to be resolved or not.
  */
 void
 LogRecoveryConflict(ProcSignalReason reason, TimestampTz wait_start,
-   TimestampTz now, VirtualTransactionId 
*wait_list)
+   TimestampTz now, VirtualTransactionId 
*wait_list,
+   bool still_waiting)
 {
longsecs;
int usecs;
@@ -238,6 +241,12 @@ LogRecoveryConflict(ProcSignalReason reason, TimestampTz 
wait_start,
StringInfoData buf;
int nprocs = 0;
 
+   /*
+* There must be no conflicting processes when the recovery conflict has
+* already been resolved.
+*/
+   Assert(still_waiting || wait_list == NULL);
+
TimestampDifference(wait_start, now, , );
msecs = secs * 1000 + usecs / 1000;
usecs = usecs % 1000;
@@ -275,12 +284,21 @@ 

Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Josef Šimánek
pá 8. 1. 2021 v 5:03 odesílatel Amit Kapila  napsal:
>
> On Fri, Jan 8, 2021 at 8:42 AM Josef Šimánek  wrote:
> >
> > pá 8. 1. 2021 v 3:55 odesílatel Amit Kapila  
> > napsal:
> > >
> > >
> > > Can't we display the entire COPY command? I checked that
> > > pg_stat_statements display the query so there shouldn't be a problem
> > > to display the entire command.
> >
> > In previous discussions there was mentioned it doesn't make sense
> > since you can join with pg_stat_statements on the pid column if
> > needed. What would be the reason to duplicate that info?
> >
>
> But pg_stat_staments won't be present by default. Also, the same
> argument could be applied for the command to be present in
> stat_progress views. It occurred to me only when I was trying to
> compare what we display in all the progress views. I think there is
> some merit in being consistent.

Sorry, I mean pg_stat_activity (not pg_stat_statements). That is
included by default (at least in my installations).

> --
> With Regards,
> Amit Kapila.




Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Amit Kapila
On Fri, Jan 8, 2021 at 8:42 AM Josef Šimánek  wrote:
>
> pá 8. 1. 2021 v 3:55 odesílatel Amit Kapila  napsal:
> >
> >
> > Can't we display the entire COPY command? I checked that
> > pg_stat_statements display the query so there shouldn't be a problem
> > to display the entire command.
>
> In previous discussions there was mentioned it doesn't make sense
> since you can join with pg_stat_statements on the pid column if
> needed. What would be the reason to duplicate that info?
>

But pg_stat_staments won't be present by default. Also, the same
argument could be applied for the command to be present in
stat_progress views. It occurred to me only when I was trying to
compare what we display in all the progress views. I think there is
some merit in being consistent.

-- 
With Regards,
Amit Kapila.




Re: Huge memory consumption on partitioned table with FKs

2021-01-07 Thread Keisuke Kuroda
Hi Amit-san,

I noticed that this patch
(v3-0001-ri_triggers.c-Use-root-constraint-OID-as-key-to-r.patch)
is not registered in the commitfest. I think it needs to be registered for
commit, is that correct?

I have confirmed that this patch can be applied to HEAD (master),
and that check-world PASS.

Best Regards,

-- 
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com




Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Josef Šimánek
pá 8. 1. 2021 v 3:55 odesílatel Amit Kapila  napsal:
>
> On Thu, Jan 7, 2021 at 7:02 PM Josef Šimánek  wrote:
> >
> > čt 7. 1. 2021 v 14:08 odesílatel Amit Kapila  
> > napsal:
> > >
> > > On Thu, Jan 7, 2021 at 3:15 AM Tomas Vondra
> > >  wrote:
> > > >
> > > > On 1/5/21 11:02 AM, Josef Šimánek wrote:
> > > > > I'm attaching the whole patch since commitfest failed to ingest the
> > > > > last incremental on CI.
> > > > >
> > > >
> > > > Yeah, the whole patch needs to be attached for the commitfest tester to
> > > > work correctly - it can't apply pieces from multiple messages, etc.
> > > >
> > > > Anyway, I pushed this last version of patch, after a couple more tweaks,
> > > > mainly to the docs - one place used pg_stat_copy_progress, the section
> > > > was not indexed properly, and so on.
> > > >
> > >
> > > How about including a column for command similar to
> > > pg_stat_progress_create_index and pg_stat_progress_cluster? It seems
> > > that command will be useful in the context of COPY as there are many
> > > variants of COPY.
> > >
> >
> > From pg_stat_progress_create_index docs:
> >
> > The command that is running: CREATE INDEX, CREATE INDEX CONCURRENTLY,
> > REINDEX, or REINDEX CONCURRENTLY.
> >
> > Which values should be present in copy progress? Just COPY FROM or COPY TO?
> >
>
> Can't we display the entire COPY command? I checked that
> pg_stat_statements display the query so there shouldn't be a problem
> to display the entire command.

In previous discussions there was mentioned it doesn't make sense
since you can join with pg_stat_statements on the pid column if
needed. What would be the reason to duplicate that info?

> --
> With Regards,
> Amit Kapila.




Re: Add session statistics to pg_stat_database

2021-01-07 Thread Masahiro Ikeda

On 2021-01-08 00:47, Laurenz Albe wrote:

On Fri, 2020-12-25 at 20:28 +0900, Masahiro Ikeda wrote:

As a user, I want this feature to know whether
clients' session activities are as expected.

I have some comments about the patch.


Thanks you for the thorough review!


Thanks for updating the patch!


1. pg_proc.dat

The unit of "session time" and so on says "in seconds".
But, is "in milliseconds" right?


You are right.  Fixed.


2. monitoring.sgml

IIUC, "active_time" includes the time executes a fast-path function 
and

"idle in transaction" includes "idle in transaction(aborted)" time.

Why don't you reference pg_stat_activity's "state" column and
"active_time" is the total time when the state is "active" and "fast
path"?
"idle in transaction" is as same too.


Good idea; I have expanded the documentation like that.


BTW, is there any reason to merge the above statistics?
IIUC, to separate statistics' cons is that two columns increase, and
there is no performance penalty. So, I wonder that there is a way to 
separate them

corresponding to the state column of pg_stat_activity.


3. pgstat.h

The comment of PgStat_MsgConn says "Sent by pgstat_connection".
I thought "pgstat_connection" is a function, but it doesn't exist.

Is "Sent by the backend" right?


The function was renamed and is now called "pgstat_send_connstats".

But you are right, I might as well match the surrounding code and
write "Sent by the backend".


Although this is a trivial thing, the following row has too many tabs.

Other structs have only one space.

// }Pgstat_MsgConn;


Yes, I messed that up during the pgindent run.  Fixed.

Patch version 11 is attached.


There are some following codes in pgstatfuncs.c.
int64 result = 0.0;

But, I think the following is better.
int64 result = 0;

Although now pg_stat_get_db_session_time is initialize "result" to zero 
when it is declared,
another pg_stat_XXX function didn't initialize. Is it better to change 
it?


Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Amit Kapila
On Thu, Jan 7, 2021 at 7:02 PM Josef Šimánek  wrote:
>
> čt 7. 1. 2021 v 14:08 odesílatel Amit Kapila  napsal:
> >
> > On Thu, Jan 7, 2021 at 3:15 AM Tomas Vondra
> >  wrote:
> > >
> > > On 1/5/21 11:02 AM, Josef Šimánek wrote:
> > > > I'm attaching the whole patch since commitfest failed to ingest the
> > > > last incremental on CI.
> > > >
> > >
> > > Yeah, the whole patch needs to be attached for the commitfest tester to
> > > work correctly - it can't apply pieces from multiple messages, etc.
> > >
> > > Anyway, I pushed this last version of patch, after a couple more tweaks,
> > > mainly to the docs - one place used pg_stat_copy_progress, the section
> > > was not indexed properly, and so on.
> > >
> >
> > How about including a column for command similar to
> > pg_stat_progress_create_index and pg_stat_progress_cluster? It seems
> > that command will be useful in the context of COPY as there are many
> > variants of COPY.
> >
>
> From pg_stat_progress_create_index docs:
>
> The command that is running: CREATE INDEX, CREATE INDEX CONCURRENTLY,
> REINDEX, or REINDEX CONCURRENTLY.
>
> Which values should be present in copy progress? Just COPY FROM or COPY TO?
>

Can't we display the entire COPY command? I checked that
pg_stat_statements display the query so there shouldn't be a problem
to display the entire command.

-- 
With Regards,
Amit Kapila.




Re: Single transaction in the tablesync worker?

2021-01-07 Thread Amit Kapila
On Fri, Jan 8, 2021 at 7:14 AM Peter Smith  wrote:
>
> On Thu, Jan 7, 2021 at 3:20 PM Amit Kapila  wrote:
> >
> > On Wed, Jan 6, 2021 at 3:39 PM Amit Kapila  wrote:
> > >
> > > On Wed, Jan 6, 2021 at 2:13 PM Peter Smith  wrote:
> > > >
> > > > I think it makes sense. If there can be a race between the tablesync
> > > > re-launching (after error), and the AlterSubscription_refresh removing
> > > > some table’s relid from the subscription then there could be lurking
> > > > slot/origin tablesync resources (of the removed table) which a
> > > > subsequent DROP SUBSCRIPTION cannot discover. I will think more about
> > > > how/if it is possible to make this happen. Anyway, I suppose I ought
> > > > to refactor/isolate some of the tablesync cleanup code in case it
> > > > needs to be commonly called from DropSubscription and/or from
> > > > AlterSubscription_refresh.
> > > >
> > >
> > > Fair enough.
> > >
> >
> > I think before implementing, we should once try to reproduce this
> > case. I understand this is a timing issue and can be reproduced only
> > with the help of debugger but we should do that.
>
> FYI, I was able to reproduce this case in debugger. PSA logs showing details.
>

Thanks for reproducing as I was worried about exactly this case. I
have one question related to logs:

##
## ALTER SUBSCRIPTION to REFRESH the publication

## This blocks on some latch until the tablesync worker dies, then it continues
##

Did you check which exact latch or lock blocks this? It is important
to retain this interlock as otherwise even if decide to drop slot (and
or origin) the tablesync worker might continue.

-- 
With Regards,
Amit Kapila.




Re: PoC/WIP: Extended statistics on expressions

2021-01-07 Thread Justin Pryzby
On Fri, Jan 08, 2021 at 01:57:29AM +0100, Tomas Vondra wrote:
> Attached is a patch fixing most of the issues. There are a couple
> exceptions:

In the docs:

+at the cost that its schema must be extended whenever the structure 

 
+   of statistics pg_statistic 
changes.
 

should say "of statistics *IN* pg_statistics changes" ?

+   to an expression index. The full variant allows defining statistics objects 

 
+   on multiple columns and expressions, and pick which statistics kinds will   

 
+   be built. The per-expression statistics are built automatically when there  

 

"and pick" is wrong - maybe say "and selecting which.."

+   and run a query using an expression on that column.  Without the

 

remove "the" ?

+   extended statistics, the planner has no information about data  

 
+   distribution for reasults of those expression, and uses default 

 

*results

+   estimates as illustrated by the first query.  The planner also does 

 
+   not realize the value of the second column fully defines the value  

 
+   of the other column, because date truncated to day still identifies 

 
+   the month). Then expression and ndistinct statistics are built on   

 

The ")" is unbalanced

+   /* all parts of thi expression are covered by 
this statistics */  
   

this

+ * GrouExprInfos, but only if it's not known equal to any of the existing  

 

Group

+* we don't allow specifying any statistis kinds.  The simple variant   

 

statistics

+* If no statistic type was specified, build them all (but request  

 

Say "kind" not "type" ?

+ * expression is a simple Var. OTOH we check that there's at least one 

 
+ * statistics matching the expression. 

 

one statistic (singular) ?

+* the future, we might consider

 
+*/ 

 

consider ???

+-- (not it fails, when there are no simple column references)  

 

note?

There's some remaining copy/paste stuff from index expressions:

errmsg("statistics expressions and predicates can refer only to the table being 
indexed")));
left behind by evaluating the 

Re: Incorrect allocation handling for cryptohash functions with OpenSSL

2021-01-07 Thread Michael Paquier
On Thu, Jan 07, 2021 at 09:51:00AM +0200, Heikki Linnakangas wrote:
> Hmm. Perhaps it would be best to change all the errors in mock
> authentication to COMMERROR. It'd be nice to have an accurate error message
> in the log, but no need to send it to the client.

Yeah, we could do that.  Still, this mode still requires a hard
failure because COMMERROR is just a log, and if only COMMERROR is done
we still expect a salt to be generated to send a challenge back to the
client, which would require a fallback for the salt if the one
generated from the mock nonce cannot.  Need to think more about that.

>> Using separate fields looked cleaner to me if it came to debugging,
>> and the cleanup was rather minimal except if we use more switch/case
>> to set up the various variables.  What about something like the
>> attached?
> 
> +1

Thanks, I have committed this part.
--
Michael


signature.asc
Description: PGP signature


Re: Add Information during standby recovery conflicts

2021-01-07 Thread Kyotaro Horiguchi
At Fri, 8 Jan 2021 01:32:11 +0900, Fujii Masao  
wrote in 
> 
> Attached is the updated version of the patch. This can be applied to
> current master.
> 
> With the patch, for example, if the startup process waited longer than
> deadlock_timeout for the recovery conflict on the lock, the latter log
> message in the followings would be additionally output.
> 
> LOG: recovery still waiting after 1001.223 ms: recovery conflict on
> lock
> LOG: recovery finished waiting after 19004.694 ms: recovery conflict
> on lock

+   /*
+* Emit the log message if recovery conflict on buffer 
pin was resolved but
+* the startup process waited longer than 
deadlock_timeout for it.

The first line is beyond the 80th column.

LGTM other than the above.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: Single transaction in the tablesync worker?

2021-01-07 Thread Hou, Zhijie
> PSA the v12 patch for the Tablesync Solution1.
> 
> Differences from v11:
>   + Added PG docs to mention the tablesync slot
>   + Refactored tablesync slot drop (done by
> DropSubscription/process_syncing_tables_for_sync)
>   + Fixed PG docs mentioning wrong state code
>   + Fixed wrong code comment describing TCOPYDONE state
> 
Hi

I look into the new patch and have some comments.

1.
+   /* Setup replication origin tracking. */
①+  originid = replorigin_by_name(originname, true);
+   if (!OidIsValid(originid))
+   {

②+  originid = replorigin_by_name(originname, true);
+   if (originid != InvalidRepOriginId)
+   {

There are two different style code which check whether originid is valid.
Both are fine, but do you think it’s better to have a same style here?


2.
  *  state to SYNCDONE.  There might be zero changes applied between
  *  CATCHUP and SYNCDONE, because the sync worker might be ahead 
of the
  *  apply worker.
+ *- The sync worker has a intermediary state TCOPYDONE which comes 
after
+ * DATASYNC and before SYNCWAIT. This state indicates that the 
initial

This comment about TCOPYDONE is better to be placed at [1]*, where is between 
DATASYNC and SYNCWAIT.

 * - Tablesync worker starts; changes table state from INIT to DATASYNC 
while
 *   copying.
 [1]*
 * - Tablesync worker finishes the copy and sets table state to 
SYNCWAIT;
 *   waits for state change.

3.
+   /*
+* To build a slot name for the sync work, we are limited to 
NAMEDATALEN -
+* 1 characters.
+*
+* The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0'). 
(It's
+* actually the NAMEDATALEN on the remote that matters, but this scheme
+* will also work reasonably if that is different.)
+*/
+   StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small");   /* for 
sanity */
+
+   syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);

The comments says syncslotname is limit to NAMEDATALEN - 1 characters.
But the actual size of it is (3 + 10 + 6 + 10 + '\0') = 30,which seems not 
NAMEDATALEN - 1.
Should we change the comment here?

Best regards,
houzj






Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-07 Thread Fujii Masao




On 2021/01/07 17:21, Bharath Rupireddy wrote:

On Thu, Jan 7, 2021 at 9:49 AM Fujii Masao  wrote:

On 2021/01/05 16:56, Bharath Rupireddy wrote:

Attaching v8 patch set. Hopefully, cf bot will be happy with v8.

Please consider the v8 patch set for further review.

-DATA = postgres_fdw--1.0.sql
+DATA = postgres_fdw--1.1.sql postgres_fdw--1.0--1.1.sql

Shouldn't we leave 1.0.sql as it is and create 1.0--1.1.sql so that
we can run the followings?

  CREATE EXTENSION postgres_fdw VERSION "1.0";
  ALTER EXTENSION postgres_fdw UPDATE TO "1.1";


Yes we can. In that case, to use the new functions users have to
update postgres_fdw to 1.1, in that case, do we need to mention in the
documentation that to make use of the new functions, update
postgres_fdw to version 1.1?


But since postgres_fdw.control indicates that the default version is 1.1,
"CREATE EXTENSION postgres_fdw" installs v1.1. So basically the users
don't need to update postgres_fdw from v1.0 to v1.1. Only the users of
v1.0 need to update that to v1.1 to use new functions. No?




With the above change, the contents of postgres_fdw--1.0.sql remain as
is and in postgres_fdw--1.0--1.1.sql we will have only the new
function statements:


Yes.




/* contrib/postgres_fdw/postgres_fdw--1.0--1.1.sql */

-- complain if script is sourced in psql, rather than via ALTER EXTENSION
\echo Use "ALTER EXTENSION postgres_fdw UPDATE TO '1.1'" to load this
file. \quit

CREATE FUNCTION postgres_fdw_get_connections ()
RETURNS text[]
AS 'MODULE_PATHNAME','postgres_fdw_get_connections'
LANGUAGE C STRICT PARALLEL RESTRICTED;

CREATE FUNCTION postgres_fdw_disconnect ()
RETURNS bool
AS 'MODULE_PATHNAME','postgres_fdw_disconnect'
LANGUAGE C STRICT PARALLEL RESTRICTED;

CREATE FUNCTION postgres_fdw_disconnect (text)
RETURNS bool
AS 'MODULE_PATHNAME','postgres_fdw_disconnect'
LANGUAGE C STRICT PARALLEL RESTRICTED;


+
+  Functions

The document format for functions should be consistent with
that in other contrib module like pgstattuple?


pgstattuple has so many columns to show up in output because of that
they have a table listing all the output columns and their types. The
new functions introduced here have only one or none input and an
output. I think, we don't need a table listing the input and output
names and types.

IMO, we can have something similar to what pg_visibility_map has for
functions, and also an example for each function showing how it can be
used. Thoughts?


Sounds good.





+   When called in the local session, it returns an array with each element as a
+   pair of the foreign server names of all the open connections that are
+   previously made to the foreign servers and true or
+   false to show whether or not the connection is valid.

We thought that the information about whether the connection is valid or
not was useful to, for example, identify and close explicitly the long-living
invalid connections because they were useless. But thanks to the recent
bug fix for connection leak issue, that information would be no longer
so helpful for us? False is returned only when the connection is used in
this local transaction but it's marked as invalidated. In this case that
connection cannot be explicitly closed because it's used in this transaction.
It will be closed at the end of transaction. Thought?


Yes, connection's validity can be false only when the connection gets
invalidated and postgres_fdw_get_connections is called within an
explicit txn i.e. begin; commit;. In implicit txn, since the
invalidated connections get closed either during invalidation callback
or at the end of txn, postgres_fdw_get_connections will always show
valid connections. Having said that, I still feel we need the
true/false for valid/invalid in the output of the
postgres_fdw_get_connections, otherwise we might miss giving
connection validity information to the user in a very narrow use case
of explicit txn.


Understood. I withdraw my suggestion and am fine to display
valid/invalid information.



If required, we can issue a warning whenever we see
an invalid connection saying "invalid connections connections are
discarded at the end of remote transaction". Thoughts?


IMO it's overkill to emit such warinng message because that
situation is normal one. OTOH, it seems worth documenting that.





I guess that you made postgres_fdw_get_connections() return the array
because the similar function dblink_get_connections() does that. But
isn't it more convenient to make that return the set of records?


Yes, for postgres_fdw_get_connections we can return a set of records
of (server_name, valid). To do so, I can refer to dblink_get_pkey.
Thoughts?


Yes.

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Single transaction in the tablesync worker?

2021-01-07 Thread Peter Smith
On Thu, Jan 7, 2021 at 3:20 PM Amit Kapila  wrote:
>
> On Wed, Jan 6, 2021 at 3:39 PM Amit Kapila  wrote:
> >
> > On Wed, Jan 6, 2021 at 2:13 PM Peter Smith  wrote:
> > >
> > > I think it makes sense. If there can be a race between the tablesync
> > > re-launching (after error), and the AlterSubscription_refresh removing
> > > some table’s relid from the subscription then there could be lurking
> > > slot/origin tablesync resources (of the removed table) which a
> > > subsequent DROP SUBSCRIPTION cannot discover. I will think more about
> > > how/if it is possible to make this happen. Anyway, I suppose I ought
> > > to refactor/isolate some of the tablesync cleanup code in case it
> > > needs to be commonly called from DropSubscription and/or from
> > > AlterSubscription_refresh.
> > >
> >
> > Fair enough.
> >
>
> I think before implementing, we should once try to reproduce this
> case. I understand this is a timing issue and can be reproduced only
> with the help of debugger but we should do that.

FYI, I was able to reproduce this case in debugger. PSA logs showing details.


Kind Regards,
Peter Smith.
Fujitsu Australia
TEST SCENARIO

Purpose: See if using ALTER PUBLICATION/SUBSCRIPTION might expose a gap in the 
tablesync slot cleanups

Note: The "!!>>" is extra logging added for testing, not a normal part of PG.

Steps:
1.  CREATE PUBLICATION for some table T1
2.  CREATE SUBSCRIPTION for that publication
3.  Pause the tablesync worker in CATCHUP mode (eg after created slot but 
before dropped that tablesyn slot)
4.  Show the slots 
5.  ALTER PUBLICATION to DROP TABLE T1
6.  ALTER SUBSCRIPTION REFRESH PUBLICATION // will discover table T1 is no 
longer subscribed to.
7.  Show the slots
8.  DROP SUBSCRIPTION // since subscription no longer knows about T1 this 
would not drop the tablesync slot. So if step 6 does no T1 slots cleanup maybe 
nobody will
9.  Show the slots // see if the T1 tablesync slot is still present or not

==

##
## Normal PUBLICATION of a table 
##

[postgres@CentOS7-x64 ~]$ psql -d test_pub -c "CREATE PUBLICATION tap_pub FOR 
TABLE test_tab;"
CREATE PUBLICATION

##
## Create subscription, and pause the tablesync in the debugger when it gets to 
CATCHUP state
##

[postgres@CentOS7-x64 ~]$ psql -d test_sub -p 54321 -c "CREATE SUBSCRIPTION 
tap_sub CONNECTION 'host=localhost dbname=test_pub application_name=tap_sub' 
PUBLICATION tap_pub;"
2021-01-08 11:36:38.418 AEDT [22468] LOG:  logical decoding found consistent 
point at 0/1639360
2021-01-08 11:36:38.418 AEDT [22468] DETAIL:  There are no running transactions.
2021-01-08 11:36:38.418 AEDT [22468] STATEMENT:  CREATE_REPLICATION_SLOT 
"tap_sub" LOGICAL pgoutput NOEXPORT_SNAPSHOT
NOTICE:  created replication slot "tap_sub" on publisher
CREATE SUBSCRIPTION
2021-01-08 11:36:38.428 AEDT [22469] LOG:  logical replication apply worker for 
subscription "tap_sub" has started
2021-01-08 11:36:38.428 AEDT [22469] LOG:  !!>> The apply worker process has 
PID = 22469
[postgres@CentOS7-x64 ~]$ 2021-01-08 11:36:38.445 AEDT [22474] LOG:  starting 
logical decoding for slot "tap_sub"
2021-01-08 11:36:38.445 AEDT [22474] DETAIL:  Streaming transactions committing 
after 0/1639398, reading WAL from 0/1639360.
2021-01-08 11:36:38.445 AEDT [22474] STATEMENT:  START_REPLICATION SLOT 
"tap_sub" LOGICAL 0/0 (proto_version '2', publication_names '"tap_pub"')
2021-01-08 11:36:38.445 AEDT [22474] LOG:  logical decoding found consistent 
point at 0/1639360
2021-01-08 11:36:38.445 AEDT [22474] DETAIL:  There are no running transactions.
2021-01-08 11:36:38.445 AEDT [22474] STATEMENT:  START_REPLICATION SLOT 
"tap_sub" LOGICAL 0/0 (proto_version '2', publication_names '"tap_pub"')
2021-01-08 11:36:38.446 AEDT [22469] LOG:  !!>> apply worker: 
LogicalRepApplyLoop
2021-01-08 11:36:38.446 AEDT [22469] LOG:  !!>> apply worker: called 
process_syncing_tables
2021-01-08 11:36:38.453 AEDT [22477] LOG:  logical replication table 
synchronization worker for subscription "tap_sub", table "test_tab" has started
2021-01-08 11:36:38.453 AEDT [22477] LOG:  !!>> The tablesync worker process 
has PID = 22477
2021-01-08 11:36:38.453 AEDT [22477] LOG:  !!>>


Sleeping 30 secs. For debugging, attach to process 22477 now!



2021-01-08 11:36:39.458 AEDT [22469] LOG:  !!>> apply worker: 
LogicalRepApplyLoop
2021-01-08 11:36:39.458 AEDT [22469] LOG:  !!>> apply worker: called 
process_syncing_tables
2021-01-08 11:36:40.461 AEDT [22469] LOG:  !!>> apply worker: 
LogicalRepApplyLoop
2021-01-08 11:36:40.461 AEDT [22469] LOG:  !!>> apply worker: called 
process_syncing_tables
2021-01-08 11:36:41.464 AEDT [22469] LOG:  !!>> apply worker: 
LogicalRepApplyLoop
2021-01-08 11:36:41.464 AEDT [22469] LOG:  !!>> apply worker: called 
process_syncing_tables
2021-01-08 11:36:42.467 AEDT [22469] LOG:  !!>> apply worker: 
LogicalRepApplyLoop
2021-01-08 11:36:42.467 AEDT [22469] LOG:  !!>> apply 

Re: [Patch] Optimize dropping of relation buffers using dlist

2021-01-07 Thread Kyotaro Horiguchi
At Thu, 7 Jan 2021 09:25:22 +, "k.jami...@fujitsu.com" 
 wrote in 
> On Thu, January 7, 2021 5:36 PM (JST), Amit Kapila wrote:
> > 
> > On Wed, Jan 6, 2021 at 6:43 PM k.jami...@fujitsu.com
> >  wrote:
> > >
> > > [Results for VACUUM on single relation]
> > > Average of 5 runs.
> > >
> > > 1. % REGRESSION
> > > % Regression: (patched - master)/master
> > >
> > > | rel_size | 128MB  | 1GB| 20GB   | 100GB|
> > > |--||||--|
> > > | NB/512   | 0.000% | 0.000% | 0.000% | -32.680% |
> > > | NB/256   | 0.000% | 0.000% | 0.000% | 0.000%   |
> > > | NB/128   | 0.000% | 0.000% | 0.000% | -16.502% |
> > > | NB/64| 0.000% | 0.000% | 0.000% | -9.841%  |
> > > | NB/32| 0.000% | 0.000% | 0.000% | -6.219%  |
> > > | NB/16| 0.000% | 0.000% | 0.000% | 3.323%   |
> > > | NB/8 | 0.000% | 0.000% | 0.000% | 8.178%   |
> > >
> > > For 100GB shared_buffers, we can observe regression
> > > beyond NBuffers/32. So with this, we can conclude
> > > that NBuffers/32 is the right threshold.
> > > For NBuffers/16 and beyond, the patched performs
> > > worse than master. In other words, the cost of for finding
> > > to be invalidated buffers gets higher in the optimized path
> > > than the traditional path.
> > >
> > > So in attached V39 patches, I have updated the threshold
> > > BUF_DROP_FULL_SCAN_THRESHOLD to NBuffers/32.
> > >
> > 
> > Thanks for the detailed tests. NBuffers/32 seems like an appropriate
> > value for the threshold based on these results. I would like to
> > slightly modify part of the commit message in the first patch as below
> > [1], otherwise, I am fine with the changes. Unless you or anyone else
> > has any more comments, I am planning to push the 0001 and 0002
> > sometime next week.
> > 
> > [1]
> > "The recovery path of DropRelFileNodeBuffers() is optimized so that
> > scanning of the whole buffer pool can be avoided when the number of
> > blocks to be truncated in a relation is below a certain threshold. For
> > such cases, we find the buffers by doing lookups in BufMapping table.
> > This improves the performance by more than 100 times in many cases
> > when several small tables (tested with 1000 relations) are truncated
> > and where the server is configured with a large value of shared
> > buffers (greater than 100GB)."
> 
> Thank you for taking a look at the results of the tests. And it's also 
> consistent with the results from Tang too.
> The commit message LGTM.

+1.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Default wal_sync_method on FreeBSD

2021-01-07 Thread Thomas Munro
Hi hackers,

My learn-me-some-operating-system-hacking project for the holidays was
to add O_DSYNC to FreeBSD 13 (due out end of Q1ish).  I was motivated
by my project to port some of Andres's Linux-only PostgreSQL AIO stuff
to POSIX interfaces, where you need O_DSYNC to initiate the
asynchronous equivalent of fdatasync(2).

The system header change has one interesting consequence for existing
releases of PostgreSQL, though: xlogdefs.h now sees that there is an
O_DSYNC macro that is distinct from O_SYNC, and defaults to
wal_sync_method=open_datasync.  That's not a great default setting,
because it gets you O_DIRECT | O_DSYNC, which performs terribly when
you're writing 8KB blocks on UFS's default 32KB logical block size (it
triggers read-before-write, quite visibly destroying performance with
eg pg_test_fsync), and for all I know, it might even not work at all
on some other file systems.  I suspect it might come out very slightly
ahead on a UFS filesystem created with 8KB blocks, but in any case,
that seems like something you should have to opt in to, as you do on
Linux.

One idle question I have is whether there is any platform on Earth
where it's a good idea to use open_datasync as the default,
considering the complications of those two flags.  I can't answer
that, and it'd be hard to justify unleashing a global change on the
world, so I think the right change would be to single out FreeBSD for
the exact same treatment we give Linux.  That is, I'd like to force
the default to fdatasync in all release branches on that platform.
Here is patch to do that.

I wrapped it in #ifdef HAVE_FDATASYNC.  There are no supported
releases of FreeBSD that lack fdatasync(2), but older releases will be
out there (huh, there's an animal in our build farm that might
qualify), so in that case we should just fall back to the regular
decision logic that'll wind up using good old fsync().
From b5c7b0512ccbc1378803a8cc9a33fb9abf859eaf Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 8 Jan 2021 14:06:27 +1300
Subject: [PATCH] Default to wal_sync_method=fdatasync on FreeBSD.

FreeBSD 13 gained O_DSYNC, which would normally cause wal_sync_method to
default to open_datasync.  That wouldn't be a good default for UFS due
to a mismatch in the default block sizes of PostgreSQL and UFS.  Let's
make sure we keep using fdatasync by default, to preserve existing
behavior.

Back-patch to all releases.
---
 doc/src/sgml/config.sgml  |  2 +-
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/include/port/freebsd.h| 10 ++
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4b60382778..e0711444db 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2920,7 +2920,7 @@ include_dir 'conf.d'
 Not all of these choices are available on all platforms.
 The default is the first method in the above list that is supported
 by the platform, except that fdatasync is the default on
-Linux.  The default is not necessarily ideal; it might be
+Linux and FreeBSD.  The default is not necessarily ideal; it might be
 necessary to change this setting or other aspects of your system
 configuration in order to create a crash-safe configuration or
 achieve optimal performance.
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index b7fb2ec1fe..b15d05e594 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -206,7 +206,7 @@
 #wal_sync_method = fsync		# the default is the first option
 	# supported by the operating system:
 	#   open_datasync
-	#   fdatasync (default on Linux)
+	#   fdatasync (default on Linux and FreeBSD)
 	#   fsync
 	#   fsync_writethrough
 	#   open_sync
diff --git a/src/include/port/freebsd.h b/src/include/port/freebsd.h
index 2e36d3da4f..567d2dbaf1 100644
--- a/src/include/port/freebsd.h
+++ b/src/include/port/freebsd.h
@@ -1 +1,11 @@
 /* src/include/port/freebsd.h */
+
+/*
+ * FreeBSD 13 gained O_DSYNC support.  Set the default wal_sync_method to
+ * fdatasync, because xlogdefs.h's normal rules would prefer open_datasync.
+ * That wouldn't be a good default, because 8KB direct writes interact badly
+ * with the default 32KB block size on UFS, requiring read-before-write.
+ */
+#ifdef HAVE_FDATASYNC
+#define PLATFORM_DEFAULT_SYNC_METHOD	SYNC_METHOD_FDATASYNC
+#endif
-- 
2.20.1



Re: Key management with tests

2021-01-07 Thread Bruce Momjian
On Thu, Jan  7, 2021 at 10:02:14AM -0500, Bruce Momjian wrote:
> My next step is to add the high-level tests.

Here is the high-level script, and the log output.  I used the
pg_upgrade test.sh as a model.

It uses "CFE DEBUG" lines that are already in the code to compare the
initdb encryption with the other initdb decryption and pg_ctl
decryption.  It was easier than I thought.

What it does not do is to test the file descriptor passing from
/dev/tty, or the sample scripts.  This seems acceptable to me since I
test them and they rarely change.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee



test.sh
Description: Bourne shell script
make[1]: Leaving directory '/usr/local/src/gen/pgsql/postgresql/src/common'
MAKE= PATH="/pgtop/tmp_install/usr/local/pgsql/bin:$PATH" 
LD_LIBRARY_PATH="/pgtop/tmp_install/usr/local/pgsql/lib"  
bindir=/pgtop/tmp_install//usr/local/pgsql/bin /bin/sh test.sh
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.
Cluster file encryption is enabled.

creating directory /pgtop/src/test/crypto/tmp_check/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... America/New_York
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok

Sync to disk skipped.
The data directory might become corrupt if the operating system crashes.

Success. You can now start the database server using:

/pgtop/tmp_install/usr/local/pgsql/bin/pg_ctl -D 
/pgtop/src/test/crypto/tmp_check/data -l logfile start

waiting for server to start done
server started
waiting for server to shut down done
server stopped
PASSED
rm -rf '/usr/local/src/gen/pgsql/postgresql/src/test/crypto'/tmp_check
/bin/mkdir -p '/usr/local/src/gen/pgsql/postgresql/src/test/crypto'/tmp_check
cd . && TESTDIR='/usr/local/src/gen/pgsql/postgresql/src/test/crypto' 
PATH="/pgtop/tmp_install/usr/local/pgsql/bin:$PATH" 
LD_LIBRARY_PATH="/pgtop/tmp_install/usr/local/pgsql/lib"  PGPORT='65432' 
PG_REGRESS='/usr/local/src/gen/pgsql/postgresql/src/test/crypto/../../../src/test/regress/pg_regress'
 REGRESS_SHLIB='/pgtop/src/test/regress/regress.so' /usr/bin/prove -I 
../../../src/test/perl/ -I .  t/*.pl
t/001_testcrypto.pl .. ok
t/002_testkw.pl .. ok
All tests successful.
Files=2, Tests=10296, 138 wallclock secs ( 1.81 usr  0.22 sys + 26.55 cusr 
24.02 csys = 52.60 CPU)
Result: PASS


Re: Key management with tests

2021-01-07 Thread Bruce Momjian
On Thu, Jan  7, 2021 at 04:08:49PM -0300, Álvaro Herrera wrote:
> On 2021-Jan-07, Bruce Momjian wrote:
> 
> > All the tests pass now.  The current src/test directory is 19MB, and
> > adding these tests takes it to 23MB, or a 20% increase.  That seems like
> > a lot.  It is testing 128-bit and 256-bit keys --- should we do fewer
> > tests, or just test 256, or use gzip to compress the tests by 50%? 
> > (Does every platform have gzip?)
> 
> So the tests are about 95% of the patch ... do we really need that many
> tests?

No, I don't think so.  Stephen imported the entire NIST test suite.  It
was so comperhensive, it detected several OpenSSL bugs for zero-length
strings, which I already reported, but we would never be encrypting
zero-length strings, so there wasn't a lot of value to it.

Anyway, I think we need to figure out how to trim.  The first part would
be to figure out whether we need 128 _and_ 256-bit tests, and then see
what items are really useful.  Stephen, do you have any ideas on that?
We currently have 10296 tests, and I think we could get away with 100.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: list of extended statistics on psql

2021-01-07 Thread Tomas Vondra




On 1/8/21 12:52 AM, Tatsuro Yamada wrote:

Hi,

On 2021/01/08 0:56, Tomas Vondra wrote:

On 1/7/21 3:47 PM, Alvaro Herrera wrote:

On 2021-Jan-07, Tomas Vondra wrote:


On 1/7/21 1:46 AM, Tatsuro Yamada wrote:



I overlooked the check for MCV in the logic building query
because I created the patch as a new feature on PG14.
I'm not sure whether we should do back patch or not. However, I'll
add the check on the next patch because it is useful if you decide to
do the back patch on PG10, 11, 12, and 13.


BTW perhaps a quick look at the other \d commands would show if 
there are

precedents. I didn't have time for that.


Yes, we do promise that new psql works with older servers.



Yeah, makes sense. That means we need add the check for 12 / MCV.



Ah, I got it.
I fixed the patch to work with older servers to add the checking 
versions. And I tested \dX command on older servers (PG10 - 13).

These results look fine.

0001:
  Added the check code to handle pre-PG12. It has not MCV and
   pg_statistic_ext_data.
0002:
  This patch is the same as the previous patch (not changed).

Please find the attached files.



OK, thanks. I'll take a look and probably push tomorrow. FWIW I plan to 
squash the patches into a single commit.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-07 Thread Stephen Frost
Greetings,

* Peter Geoghegan (p...@bowt.ie) wrote:
> On Thu, Jan 7, 2021 at 1:14 PM Stephen Frost  wrote:
> > Much of this line of discussion seems to be, incorrectly, focused on my
> > mere mention of viewing the use of fsync and checksums as mechanism for
> > addressing certain risks, but that doesn't seem to be a terribly
> > fruitful direction to be going in.  I'm not suggesting that we should go
> > turn off fsync by default simply because we don't have checksums on by
> > default, which seems to be the implication.
> 
> I admit that I saw red. This was a direct result of your bogus
> argument, which greatly overstated the case in favor of enabling
> checksums by default. I regret my role in that now, though. It would
> be good to debate the actual issue, but that isn't what I saw.
> Everyone knows the principles behind checksums and how they're useful
> -- it doesn't need to be a part of the discussion.

I hadn't intended to make an argument that enabling checksums was
equivilant to enabling or disabling fsync- I said it was 'akin', by
which I meant it was similar in character, as in, as I said previously,
a way for PG to hedge against certain external-to-PG risks (though,
unfortunately, our checksums aren't able to actually mitigate any of the
risks but merely to detect them, but there is certainly value in that
too).

I also now regret not being clearer as to what I meant with that comment.

> I think that it should be possible to make a much better case in favor
> of enabling checksums by default. On further reflection I actually
> don't think that the real-world VACUUM overhead is anything like 15x,
> though the details are complex. I might be willing to help with this
> analysis, but since you only seem to want to discuss the question in a
> narrow way (e.g. "I agree that improving compression performance would
> be good but I don't see that as relevant to the question of what our
> defaults should be"), I have to wonder if it's worth the trouble.

What I was attempting to get at with that comment is that while I don't
feel it's relevant, I wouldn't object to both being enabled by default
and if those changes combined helps to get others on board with having
checksums enabled by default then such an approach would also get my
vote.  I also doubt that VACUUM performance would be impacted as heavily
in real-world workloads, but I again point out that VACUUMs, in our
default configuration, is going to be run with the breaks on since it's
run by autovacuum with a non-zero vacuum cost delay.  While I've
advocated for having that cost delay reduced (or the cost limit
increased) in the past, I wouldn't support eliminating the delays
entirely as that would then impact foreground activity, which is
certainly where performance is more important.

I appreciate that VACUUM run by an administrator directly doesn't have
the breaks on, but that then is much more likely to impact foreground
activity and is generally discouraged because of that- instead it's
generally recommended to configure autovacuum to be more aggressive
while still having a delay.  Once you're past the point where you want
delays to be introduced during VACUUM runs, I'd certainly think it's
gone past the point where our standard defaults would be appropriate in
a number of ways and a user could then consider if they want to disable
checksums and accept the risk associated with doing so in favor of
making VACUUM go faster, or not.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Implement for window functions

2021-01-07 Thread Vik Fearing
On 1/1/21 10:21 PM, Zhihong Yu wrote:
> Krasiyan:
> Happy New Year.
> 
> For WinGetFuncArgInPartition():
> 
> +   if (target > 0)
> +   step = 1;
> +   else if (target < 0)
> +   step = -1;
> +   else
> +   step = 0;
> 
> When would the last else statement execute ? Since the above code is
> for WINDOW_SEEK_CURRENT, I wonder why step should be 0.

Hi.

"lag(expr, 0) over w" is useless but valid.
-- 
Vik Fearing




Re: list of extended statistics on psql

2021-01-07 Thread Tatsuro Yamada

Hi,

On 2021/01/08 0:56, Tomas Vondra wrote:

On 1/7/21 3:47 PM, Alvaro Herrera wrote:

On 2021-Jan-07, Tomas Vondra wrote:


On 1/7/21 1:46 AM, Tatsuro Yamada wrote:



I overlooked the check for MCV in the logic building query
because I created the patch as a new feature on PG14.
I'm not sure whether we should do back patch or not. However, I'll
add the check on the next patch because it is useful if you decide to
do the back patch on PG10, 11, 12, and 13.


BTW perhaps a quick look at the other \d commands would show if there are
precedents. I didn't have time for that.


Yes, we do promise that new psql works with older servers.



Yeah, makes sense. That means we need add the check for 12 / MCV.



Ah, I got it.
I fixed the patch to work with older servers to add the checking versions. And 
I tested \dX command on older servers (PG10 - 13).
These results look fine.

0001:
 Added the check code to handle pre-PG12. It has not MCV and
  pg_statistic_ext_data.
0002:
 This patch is the same as the previous patch (not changed).

Please find the attached files.



I wonder the column names added by \dX+ is fine? For example,
"Ndistinct_size" and "Dependencies_size". It looks like long names,
but acceptable?



Seems acceptable - I don't have a better idea. 


I see, thanks!


Thanks,
Tatsuro Yamada
From c8be51a52a381a6e9c7be62022f5fc48b5915bd0 Mon Sep 17 00:00:00 2001
From: Tatsuro Yamada 
Date: Thu, 7 Jan 2021 14:28:20 +0900
Subject: [PATCH] Add \dX command on psql

This patch includes the version check in the logic building query.
---
 doc/src/sgml/ref/psql-ref.sgml |  14 
 src/bin/psql/command.c |   3 +
 src/bin/psql/describe.c| 141 +
 src/bin/psql/describe.h|   3 +
 src/bin/psql/help.c|   1 +
 src/bin/psql/tab-complete.c|   4 +-
 6 files changed, 165 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 221a967bfe..fd860776af 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1918,6 +1918,20 @@ testdb=
 
 
   
+  
+  
+\dX [ pattern ]
+
+
+Lists extended statistics.
+If pattern
+is specified, only those extended statistics whose names match the 
pattern
+are listed.
+If + is appended to the command name, each extended 
statistics
+is listed with its size.
+
+
+  
 
   
 \dy[+] [ pattern ]
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 303e7c3ad8..c5ebc1c3f4 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -928,6 +928,9 @@ exec_command_d(PsqlScanState scan_state, bool 
active_branch, const char *cmd)
else
success = listExtensions(pattern);
break;
+   case 'X':   /* Extended Statistics 
*/
+   success = listExtendedStats(pattern, 
show_verbose);
+   break;
case 'y':   /* Event Triggers */
success = listEventTriggers(pattern, 
show_verbose);
break;
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 52c6de51b6..0ccd9ed286 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -4401,6 +4401,147 @@ listEventTriggers(const char *pattern, bool verbose)
return true;
 }
 
+/*
+ * \dX
+ *
+ * Briefly describes extended statistics.
+ */
+bool
+listExtendedStats(const char *pattern, bool verbose)
+{
+   PQExpBufferData buf;
+   PGresult   *res;
+   printQueryOpt myopt = pset.popt;
+
+   if (pset.sversion < 10)
+   {
+   charsverbuf[32];
+
+   pg_log_error("The server (version %s) does not support extended 
statistics.",
+formatPGVersionNumber(pset.sversion, 
false,
+   
   sverbuf, sizeof(sverbuf)));
+   return true;
+   }
+
+   initPQExpBuffer();
+   printfPQExpBuffer(,
+ "SELECT \n"
+ 
"es.stxnamespace::pg_catalog.regnamespace::text AS \"%s\", \n"
+ "es.stxname AS \"%s\", \n"
+ "pg_catalog.format('%%s FROM %%s', \n"
+ "  (SELECT 
pg_catalog.string_agg(pg_catalog.quote_ident(a.attname),', ') \n"
+ "   FROM 
pg_catalog.unnest(es.stxkeys) s(attnum) \n"
+ "   JOIN pg_catalog.pg_attribute a \n"
+ "   ON 

Re: Deleting older versions in unique indexes to avoid page splits

2021-01-07 Thread Peter Geoghegan
On Mon, Jan 4, 2021 at 8:28 AM Victor Yegorov  wrote:
> I've looked through the v12 patch.
>
> I like the new outline:
>
> - _bt_delete_or_dedup_one_page() is the main entry for the new code
> - first we try _bt_simpledel_pass() does improved cleanup of LP_DEAD entries
> - then (if necessary) _bt_bottomupdel_pass() for bottomup deletion
> - finally, we perform _bt_dedup_pass() to deduplication
>
> We split the leaf page only if all the actions above failed to provide enough 
> space.

I'm thinking of repeating the LP_DEAD enhancement within GiST and hash
shortly after this, as follow-up work. Their implementation of LP_DEAD
deletion was always based on the nbtree original, and I think that it
makes sense to keep everything in sync. The simple deletion
enhancement probably makes just as much sense in these other places.

> +* Only do this for key columns.  A change to a non-key column within an
> +* INCLUDE index should not be considered because that's just payload to
> +* the index (they're not unlike table TIDs to the index AM).
>
> The last part of it (in the parenthesis) is difficult to grasp due to
> the double negation (not unlike). I think it's better to rephrase it.

Okay, will do.

> 2. After reading the patch, I also think, that fact, that 
> index_unchanged_by_update()
> and index_unchanged_by_update_var_walker() return different bool states
> (i.e. when the latter returns true, the first one returns false) is a bit 
> misleading.

> Although logic as it is looks fine, maybe 
> index_unchanged_by_update_var_walker()
> can be renamed to avoid this confusion, to smth like 
> index_expression_changed_walker() ?

I agree. I'll use the name index_expression_changed_walker() in the
next revision.

> 2. I'm not sure the bottomup_delete_items parameter is very helpful. In order 
> to disable
> bottom-up deletion, DBA needs somehow to measure it's impact on a particular 
> index.
> Currently I do not see how to achieve this. Not sure if this is overly 
> important, though, as
> you have a similar parameter for the deduplication.

I'll take bottomup_delete_items out, since I think that you may be
right about that. It can be another "decision to recheck mid-beta"
thing (on the PostgreSQL 14 open items list), so we can delay making a
final decision on it until after it gets tested by the broadest
possible group of people.

> 3. It feels like indexUnchanged is better to make indexChanged and negate its 
> usage in the code.
> As !indexChanged reads more natural than !indexUnchanged, at least to me.

I don't want to do that because then we're negating "indexChanged" as
part of our gating condition to the bottom-up deletion pass code. To
me this feels wrong, since the hint only exists to be used to trigger
index tuple deletion. This is unlikely to ever change.

> First, I run a 2-hour long case with the same setup as I used in my e-mail 
> from 15 of November.
> I found no difference between patch and master whatsoever. Which makes me 
> think, that current
> master is quite good at keeping better bloat control (not sure if this is an 
> effect of
> 4228817449 commit or deduplication).

Commit 4228817449 can't have improved the performance of the master
branch -- that commit was just about providing a *correct*
latestRemovedXid value. It cannot affect how many index tuples get
deleted per pass, or anything like that.

Not surprised that you didn't see many problems with the master branch
on your first attempt. It's normal for there to be non-linear effects
with this stuff, and these are very hard (maybe even impossible) to
model. For example, even with artificial test data you often see
distinct "waves" of page splits, which is a phenomenon pretty well
described by the "Waves of misery after index creation" paper [1].
It's likely that your original stress test case (the 15 November one)
would have shown significant bloat if you just kept it up for long
enough. One goal for this project is to make the performance
characteristics more predictable. The performance characteristics are
unpredictable precisely because they're pathological. B-Trees will
always have non-linear performance characteristics, but they can be
made a lot less chaotic in practice.

To be honest, I was slightly surprised that your more recent stress
test (the queue thing) worked out so well for the patch. But not too
surprised, because I don't necessarily expect to understand how the
patch can help for any given workload -- this is dynamic behavior
that's part of a complex system. I first thought that maybe the
presence of the INSERTs and DELETEs would mess things up. But I was
wrong -- everything still worked well, probably because bottom-up
index deletion "cooperated" with deduplication. The INSERTs could not
trigger a bottom-up deletion (which might have been useful for these
particular INSERTs), but that didn't actually matter because
deduplication was triggered instead, which "held the line" for long
enough for bottom-up 

Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-07 Thread Peter Geoghegan
On Thu, Jan 7, 2021 at 1:14 PM Stephen Frost  wrote:
> I expected there'd be some disagreement on this, but I do continue to
> feel that it's sensible to enable checksums by default.  I also don't
> think there's anything particularly wrong with such a difference of
> opinion, though it likely means that we're going to continue on with the
> status quo- where, certainly, very many deployments enable it even
> though the upstream default is to have it disabled.

I agree with all that.

> This certainly
> isn't the only place that's done, though we've been working to improve
> that situation with things like trying to get rid of 'trust' being used
> in our default pg_hba.conf.

That seems like an easier case to make to me.

> Short answer is 'yes', as mentioned down-thread and having checksums was
> a pre-requisite to deploying PG in RDS (or so folks very involved in RDS
> have told me previously- and I'll also note that it was 9.3 that was
> first deployed as part of RDS).  I don't think there's any question that
> they're using --data-checksums and that it is, in fact, the actual
> original PG checksum code (or at least was at 9.3, though I've further
> heard comments that they actively try to minimize the delta between RDS
> and PG).

I accept that.

> Nope, the risk from not having fsync was clearly understood, and still
> is, to be a larger risk than not having checksums.  That doesn't mean
> there's no risk to not having checksums or that we simply shouldn't
> consider checksums to be worthwhile or that we shouldn't have them on by
> default.  I outlined them together in that they're both there to address
> the risk that "something doesn't go right", but, as I said previously
> and again above, the level of risk between the two isn't the same.  That
> doesn't mean we shouldn't consider that checksums *do* address a risk
> and consider enabling them by default- even with the performance impact
> that they have today.

Fair.

> Much of this line of discussion seems to be, incorrectly, focused on my
> mere mention of viewing the use of fsync and checksums as mechanism for
> addressing certain risks, but that doesn't seem to be a terribly
> fruitful direction to be going in.  I'm not suggesting that we should go
> turn off fsync by default simply because we don't have checksums on by
> default, which seems to be the implication.

I admit that I saw red. This was a direct result of your bogus
argument, which greatly overstated the case in favor of enabling
checksums by default. I regret my role in that now, though. It would
be good to debate the actual issue, but that isn't what I saw.
Everyone knows the principles behind checksums and how they're useful
-- it doesn't need to be a part of the discussion.

I think that it should be possible to make a much better case in favor
of enabling checksums by default. On further reflection I actually
don't think that the real-world VACUUM overhead is anything like 15x,
though the details are complex. I might be willing to help with this
analysis, but since you only seem to want to discuss the question in a
narrow way (e.g. "I agree that improving compression performance would
be good but I don't see that as relevant to the question of what our
defaults should be"), I have to wonder if it's worth the trouble.

-- 
Peter Geoghegan




Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Josef Šimánek
čt 7. 1. 2021 v 22:37 odesílatel Tomas Vondra
 napsal:
>
>
>
> On 1/7/21 7:56 PM, Josef Šimánek wrote:
> > čt 7. 1. 2021 v 19:51 odesílatel Matthias van de Meent
> >  napsal:
> >>
> >> On Wed, 6 Jan 2021 at 22:45, Tomas Vondra  
> >> wrote:
> >>>
> >>> On 1/5/21 11:02 AM, Josef Šimánek wrote:
>  I'm attaching the whole patch since commitfest failed to ingest the
>  last incremental on CI.
> 
> >>>
> >>> Yeah, the whole patch needs to be attached for the commitfest tester to
> >>> work correctly - it can't apply pieces from multiple messages, etc.
> >>>
> >>> Anyway, I pushed this last version of patch, after a couple more tweaks,
> >>> mainly to the docs - one place used pg_stat_copy_progress, the section
> >>> was not indexed properly, and so on.
> >>
> >> Thank you all, I'd love to use this in the future to keep track of
> >> (e.g.) my backup/restore progress.
> >>
> >> For my previously mentioned extension to keep track of filtered tuples
> >> that are excluded by the WHERE-clause, PFA patch 0001 that keeps track
> >> of that, in line with the current column name style of lines.
> >
> > If I understand it well, this column could be used on COPY TO to track
> > skipped lines because of BEFORE TRIGGER, right? I can include this in
> > my following patch keeping lines_processed incremented even for
> > skipped lines as well.
> >
>
> That's my understanding too.
>
> >> If so desired, I'll split this off into a different thread & CF entry.
> >>
> >>> I see Matthias proposed to change "lines" to "tuples" - I only saw the
> >>> message after pushing, but I probably wouldn't make that change anyway.
> >>> The CSV docs seem to talk about lines, newlines etc. so it seems fine.
> >>> If not, we can change that.
> >>
> >> The CSV docs, sure. But copy doesn't only process CSVs; it also has
> >> text (which does have a # lines = # tuples / rows guarantee) and
> >> binary (in which the 'line' vocabulary doesn't make sense, and in
> >> which the 'tuples' vocabulary is used). Additionally, most mentions of
> >> postgres' logical rows/tuples in the COPY documentation use the 'rows'
> >> terminology ('tuples' for FORMAT BINARY), and use 'line' only for
> >> external format's textual representation's strings delimited by
> >> newlines (which I believe is not exactly what we're counting).
> >>
> >> One common user of COPY is the pg_dump tool, and that uses binary
> >> format by default (iirc).
> >>
> >> Additionally, all comments surrounding the *LINES_PROCESSED updates
> >> only mention 'tuples', so I'd like to strongly suggest (a variant of)
> >> attached patch 0002 to keep the vocabulary consistent by using
> >> 'tuples' instead of 'lines'.
> >>
>
> I'm not particularly attached to the "lines" naming, it just seemed OK
> to me. So if there's consensus to rename this somehow, I'm OK with it.

The problem I do see here is it depends on the "way" of COPY. If
you're copying from CSV file to table, those are actually lines (since
1 line = 1 tuple). But copying from DB to file is copying tuples (but
1 tuple = 1 file line). Line works better here for me personally.

Once I'll fix the problem with triggers (and also another cases if
found), I think we can consider it lines. It will represent amount of
lines processed from file on COPY FROM and amount of lines written to
file in COPY TO form (at least in CSV format). I'm not sure how BINARY
format works, I'll check.
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: Multiple hosts in connection string failed to failover in non-hot standby mode

2021-01-07 Thread Tom Lane
Hubert Zhang  writes:
> [ 0001-Enhance-libpq-to-support-multiple-host-for-non-hot-s.patch ]

I took a quick look at this.  TBH, I'd just drop the first three hunks,
as they've got nothing to do with any failure mode that there's evidence
for in this thread or the prior one, and I'm afraid they're more likely
to create trouble than fix it.

As for the last hunk, why is it after rather than before the SSL/GSS
checks?  I doubt that retrying with/without SSL is going to change
a CANNOT_CONNECT_NOW result, unless maybe by slowing things down to
the point where recovery has finished ;-)

The bigger picture though is

(1) what set of failures should we retry on?  I think CANNOT_CONNECT_NOW
is reasonable, but are there others?

(2) what does this do to the quality of the error messages in cases
where all the connection attempts fail?

I think that error message quality was not thought too much about
in the original development of the multi-host feature, so to some
extent I'm asking you to clean up someone else's mess.  Nonetheless,
I feel that we do need to clean it up before we do things that risk
making it even more confusing.

The problems that I see in this area are first that there's no
real standardization in libpq as to whether to append error messages
together or just flush preceding messages; and second that no effort
is made in multi-connection-attempt cases to separate the errors from
different attempts, much less identify which error goes with which
host or IP address.  I think we really need to put some work into
that.  In some cases you can infer what happened from breadcrumbs
we already put into the text, for example

$ psql -h localhost,/tmp -p 12345
psql: error: could not connect to server: Connection refused
Is the server running on host "localhost" (::1) and accepting
TCP/IP connections on port 12345?
could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 12345?
could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.12345"?

but this doesn't seem particularly helpfully laid out to me, and we don't
provide the breadcrumbs at all for a lot of other error cases.

I'm vaguely imagining that we could do something more like

could not connect to host "localhost" (::1), port 12345: Connection refused
could not connect to host "localhost" (127.0.0.1), port 12345: Connection 
refused
could not connect to socket "/tmp/.s.PGSQL.12345": No such file or directory

Not quite sure if the "Is the server running" hint is worth preserving.
We'd have to reword it quite a bit, and it'd be very duplicative.

The implementation of this might involve sticking the initial string
(up to the colon, in this example) into conn->errorMessage speculatively
as we try each host.  If we then append an error to it and go around
again, we're good.  If we successfully connect, then the contents of
conn->errorMessage don't matter.

regards, tom lane




Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Tomas Vondra




On 1/7/21 7:56 PM, Josef Šimánek wrote:

čt 7. 1. 2021 v 19:51 odesílatel Matthias van de Meent
 napsal:


On Wed, 6 Jan 2021 at 22:45, Tomas Vondra  wrote:


On 1/5/21 11:02 AM, Josef Šimánek wrote:

I'm attaching the whole patch since commitfest failed to ingest the
last incremental on CI.



Yeah, the whole patch needs to be attached for the commitfest tester to
work correctly - it can't apply pieces from multiple messages, etc.

Anyway, I pushed this last version of patch, after a couple more tweaks,
mainly to the docs - one place used pg_stat_copy_progress, the section
was not indexed properly, and so on.


Thank you all, I'd love to use this in the future to keep track of
(e.g.) my backup/restore progress.

For my previously mentioned extension to keep track of filtered tuples
that are excluded by the WHERE-clause, PFA patch 0001 that keeps track
of that, in line with the current column name style of lines.


If I understand it well, this column could be used on COPY TO to track
skipped lines because of BEFORE TRIGGER, right? I can include this in
my following patch keeping lines_processed incremented even for
skipped lines as well.



That's my understanding too.


If so desired, I'll split this off into a different thread & CF entry.


I see Matthias proposed to change "lines" to "tuples" - I only saw the
message after pushing, but I probably wouldn't make that change anyway.
The CSV docs seem to talk about lines, newlines etc. so it seems fine.
If not, we can change that.


The CSV docs, sure. But copy doesn't only process CSVs; it also has
text (which does have a # lines = # tuples / rows guarantee) and
binary (in which the 'line' vocabulary doesn't make sense, and in
which the 'tuples' vocabulary is used). Additionally, most mentions of
postgres' logical rows/tuples in the COPY documentation use the 'rows'
terminology ('tuples' for FORMAT BINARY), and use 'line' only for
external format's textual representation's strings delimited by
newlines (which I believe is not exactly what we're counting).

One common user of COPY is the pg_dump tool, and that uses binary
format by default (iirc).

Additionally, all comments surrounding the *LINES_PROCESSED updates
only mention 'tuples', so I'd like to strongly suggest (a variant of)
attached patch 0002 to keep the vocabulary consistent by using
'tuples' instead of 'lines'.



I'm not particularly attached to the "lines" naming, it just seemed OK 
to me. So if there's consensus to rename this somehow, I'm OK with it.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-07 Thread Stephen Frost
Greetings,

* Michael Banck (michael.ba...@credativ.de) wrote:
> Am Mittwoch, den 06.01.2021, 13:08 -0800 schrieb Peter Geoghegan:
> > On Wed, Jan 6, 2021 at 1:04 PM Michael Banck  
> > wrote:
> > > At least data_checksums=on for Azure Managed Postgres, Amazon RDS and
> > > Google Cloud SQL Postgres. It might not be on for all types (I just
> > > checked the basic one each earlier today), but I guess it is.
> > 
> > So you tested this using "show data_checksums;" in psql in each case?
> 
> Yes.
> 
> > What does "show full_page_writes;" show you?
> 
> It's also 'on' for all three (wal_log_hints is 'off' everywhere).

Well, given that data_checksums is 'on', then wal_log_hints is
implicitly 'on', of course.

Given the mention that maybe they've gone and modified PG underneath, it
might be fun to run your same tests against those platforms to see if
the amount of WAL generated is in line with what you're seeing for stock
PG with the same settings, assuming you can get to that level of
detailed information anyway (pretty sure some of them provide that, but
not sure if all of them do).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-07 Thread Stephen Frost
Greetings,

* Peter Geoghegan (p...@bowt.ie) wrote:
> On Wed, Jan 6, 2021 at 12:30 PM Stephen Frost  wrote:
> > As already mentioned, it's also, at least today, far
> > simpler to disable checksums than to enable them, which is something
> > else to consider when thinking about what the default should be.
> 
> That is a valid concern. I just don't think that it's good enough on
> its own, given the overwhelming downside of enabling checksums given
> the WAL architecture that we have today.

I expected there'd be some disagreement on this, but I do continue to
feel that it's sensible to enable checksums by default.  I also don't
think there's anything particularly wrong with such a difference of
opinion, though it likely means that we're going to continue on with the
status quo- where, certainly, very many deployments enable it even
though the upstream default is to have it disabled.  This certainly
isn't the only place that's done, though we've been working to improve
that situation with things like trying to get rid of 'trust' being used
in our default pg_hba.conf.

> > That the major cloud providers all have checksums enabled (at least by
> > default, though I wonder if they would even let you turn them off..),
> > even when we don't have them on by default, strikes me as pretty telling
> > that this is something that we should have on by default.
> 
> Please provide supporting evidence. I know that EBS itself uses
> checksums at the block device level, so I'm sure that RDS "uses
> checksums" in some sense. But does RDS use --data-checksums during
> initdb?

Short answer is 'yes', as mentioned down-thread and having checksums was
a pre-requisite to deploying PG in RDS (or so folks very involved in RDS
have told me previously- and I'll also note that it was 9.3 that was
first deployed as part of RDS).  I don't think there's any question that
they're using --data-checksums and that it is, in fact, the actual
original PG checksum code (or at least was at 9.3, though I've further
heard comments that they actively try to minimize the delta between RDS
and PG).

> > Certainly there's a different risk profile between the two and there may
> > be times when someone is fine with running without fsync, or fine
> > running without checksums, but those are, in my view, exceptions made
> > once you understand exactly what risk you're willing to accept, and not
> > what the default or typical deployment should be.
> 
> Okay, I'll bite. Here is the important difference: Enabling checksums
> doesn't actually make data corruption less likely, it just makes it
> easier to detect. Whereas disabling fsync will reliably produce
> corruption before too long in almost any installation. It may
> occasionally be appropriate to disable fsync in a very controlled
> environment, but it's rare, and not much faster than disabling
> synchronous commits in any case. It barely ever happens.

I agree that it doesn't happen very often.  I'd say that it's also very
infrequent for users who are aware that data checksums are available,
and not enabled by default, to deploy non-checksumed systems.

> We added page-level checksums in 9.3. Can you imagine a counterfactual
> history in which Postgres had page checksums since the 1990s, but only
> added the fsync feature in 9.3? Please answer this non-rhetorical
> question.

Nope, the risk from not having fsync was clearly understood, and still
is, to be a larger risk than not having checksums.  That doesn't mean
there's no risk to not having checksums or that we simply shouldn't
consider checksums to be worthwhile or that we shouldn't have them on by
default.  I outlined them together in that they're both there to address
the risk that "something doesn't go right", but, as I said previously
and again above, the level of risk between the two isn't the same.  That
doesn't mean we shouldn't consider that checksums *do* address a risk
and consider enabling them by default- even with the performance impact
that they have today.

Much of this line of discussion seems to be, incorrectly, focused on my
mere mention of viewing the use of fsync and checksums as mechanism for
addressing certain risks, but that doesn't seem to be a terribly
fruitful direction to be going in.  I'm not suggesting that we should go
turn off fsync by default simply because we don't have checksums on by
default, which seems to be the implication.  I do think that fsync
addresses a large amount of the risks we face (random system reboots,
storage being disconnected from the server, etc), and I feel that
checksums address certain risks (latent bit flips at various levels,
from the physical medium through whatever path is taken to get from the
physical medium to the kernel and then to PG, random blocks being
swapped from other applications, people deciding to gzip their data
directory which I saw last week, etc...) and that all of those risks
amount to sufficient justification that they both be enabled by default,
but allowed to 

Re: Why latestRemovedXid|cuteoff_xid are always sent?

2021-01-07 Thread Peter Geoghegan
On Sat, Jan 2, 2021 at 3:22 PM Peter Geoghegan  wrote:
> Of course, it's possible that the question of whether or not it's
> worth it has been misjudged for any given case. And maybe these
> particular WAL records are one such case where somebody got it wrong,
> affecting a real workload (I am ignoring the complexity of making it
> work for latestRemovedXid in particular for now).

BTW, what I notice with xl_btree_delete records on the master branch
is that the latestRemovedXid value in the WAL record is almost always
InvalidTransactionId ("conflict definitely unnecessary"). And even
when it isn't, the actual xid is usually much older than what we see
for nearby pruning records.

However, with the bottom-up deletion patch that I plan on committing
soon, the situation changes quite a bit. We're now regularly in a
position to delete index tuples that became dead-to-all just moments
earlier, which in practice means that there is a very high chance that
there hasn't been a heap prune for at least one or two affected heap
tuples. Now the latestRemovedXid field in xl_btree_delete can be a
relatively recent XID, which is very similar to what we see in nearby
xl_heap_clean/XLOG_HEAP2_CLEAN records. In fact there are *hardly any*
InvalidTransactionId/0 values for the xl_btree_delete latestRemovedXid
field. They become very rare, having been very common.

In short: Your intuition that the xl_btree_delete record's
latestRemovedXid value is usually not needed anyway seems correct to
me. However, that won't be true for much longer, and ISTM that this
factor eliminates any opportunity for WAL space optimizations of the
kind you were contemplating.

-- 
Peter Geoghegan




Re: Cirrus CI (Windows help wanted)

2021-01-07 Thread Andrew Dunstan


On 1/5/21 5:48 PM, I wrote:
> However, sadly the vctools package above isn't installed with all its
> optional packages, so some crucial things are missing. I cured that by
> forcing a reinstall with the optional components enabled. Sadly, that
> takes a huge amount of time, over 20 minutes. We either need to find an
> image where this isn't necessary or find out how to make one to use,
> assuming that's possible. Or maybe we can ask cirrus to modify their
> buildscripts just a tad to include the required parameter.


For some unfathomable reason they actually removed this less than a
month ago:



I have identified the specific missing component as
Microsoft.VisualStudio.Component.VC.CLI.Support - I will submit a PR to
add it back in.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Reloptions for table access methods

2021-01-07 Thread Simon Riggs
On Sat, Jan 2, 2021 at 6:44 PM Jeff Davis  wrote:
>
> On Wed, 2020-12-30 at 21:23 +, Simon Riggs wrote:
> > But you cannot seriously argue that a one line patch that changes
> > user
> > visible behavior can be acceptable in Postgres core without tests or
> > docs or code comments.
>
> Hi Simon,
>
> Often, good documented APIs come about after a few extensions gain
> experience hacking things together using undocumented assumptions and
> internal APIs. But in this case, extension authors have no opportunity
> to hack in reloptions for a TableAM, because of this needless extra
> check that my patch would eliminate.
>
> The patch does not have any user-visible impact. To have any effect, an
> extension would need to use these internal APIs in a specific way that
> is not documented externally. I see my tiny patch as a tiny improvement
> to the backend code because it eliminates a useless extra check, and
> therefore doesn't need much justification. If others see it as a burden
> on the engine code, that's a different story.
>
> If we insist that this must be a fully-supported feature or nothing at
> all, then I'll withdraw the patch. But I doubt that will result in a
> proper feature for v14, it just means that when we do get around to
> supporting extensible reloptions, there will be no hacks in the wild to
> learn from.

Thanks for the reply. I'm trying to get my head around this before a
longer reply.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Key management with tests

2021-01-07 Thread Alvaro Herrera
On 2021-Jan-07, Bruce Momjian wrote:

> All the tests pass now.  The current src/test directory is 19MB, and
> adding these tests takes it to 23MB, or a 20% increase.  That seems like
> a lot.  It is testing 128-bit and 256-bit keys --- should we do fewer
> tests, or just test 256, or use gzip to compress the tests by 50%? 
> (Does every platform have gzip?)

So the tests are about 95% of the patch ... do we really need that many
tests?

-- 
Álvaro Herrera




Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Josef Šimánek
čt 7. 1. 2021 v 19:51 odesílatel Matthias van de Meent
 napsal:
>
> On Wed, 6 Jan 2021 at 22:45, Tomas Vondra  
> wrote:
> >
> > On 1/5/21 11:02 AM, Josef Šimánek wrote:
> > > I'm attaching the whole patch since commitfest failed to ingest the
> > > last incremental on CI.
> > >
> >
> > Yeah, the whole patch needs to be attached for the commitfest tester to
> > work correctly - it can't apply pieces from multiple messages, etc.
> >
> > Anyway, I pushed this last version of patch, after a couple more tweaks,
> > mainly to the docs - one place used pg_stat_copy_progress, the section
> > was not indexed properly, and so on.
>
> Thank you all, I'd love to use this in the future to keep track of
> (e.g.) my backup/restore progress.
>
> For my previously mentioned extension to keep track of filtered tuples
> that are excluded by the WHERE-clause, PFA patch 0001 that keeps track
> of that, in line with the current column name style of lines.

If I understand it well, this column could be used on COPY TO to track
skipped lines because of BEFORE TRIGGER, right? I can include this in
my following patch keeping lines_processed incremented even for
skipped lines as well.

> If so desired, I'll split this off into a different thread & CF entry.
>
> > I see Matthias proposed to change "lines" to "tuples" - I only saw the
> > message after pushing, but I probably wouldn't make that change anyway.
> > The CSV docs seem to talk about lines, newlines etc. so it seems fine.
> > If not, we can change that.
>
> The CSV docs, sure. But copy doesn't only process CSVs; it also has
> text (which does have a # lines = # tuples / rows guarantee) and
> binary (in which the 'line' vocabulary doesn't make sense, and in
> which the 'tuples' vocabulary is used). Additionally, most mentions of
> postgres' logical rows/tuples in the COPY documentation use the 'rows'
> terminology ('tuples' for FORMAT BINARY), and use 'line' only for
> external format's textual representation's strings delimited by
> newlines (which I believe is not exactly what we're counting).
>
> One common user of COPY is the pg_dump tool, and that uses binary
> format by default (iirc).
>
> Additionally, all comments surrounding the *LINES_PROCESSED updates
> only mention 'tuples', so I'd like to strongly suggest (a variant of)
> attached patch 0002 to keep the vocabulary consistent by using
> 'tuples' instead of 'lines'.
>
>
> With regards,
>
> Matthias van de Meent




Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Matthias van de Meent
On Wed, 6 Jan 2021 at 22:45, Tomas Vondra  wrote:
>
> On 1/5/21 11:02 AM, Josef Šimánek wrote:
> > I'm attaching the whole patch since commitfest failed to ingest the
> > last incremental on CI.
> >
>
> Yeah, the whole patch needs to be attached for the commitfest tester to
> work correctly - it can't apply pieces from multiple messages, etc.
>
> Anyway, I pushed this last version of patch, after a couple more tweaks,
> mainly to the docs - one place used pg_stat_copy_progress, the section
> was not indexed properly, and so on.

Thank you all, I'd love to use this in the future to keep track of
(e.g.) my backup/restore progress.

For my previously mentioned extension to keep track of filtered tuples
that are excluded by the WHERE-clause, PFA patch 0001 that keeps track
of that, in line with the current column name style of lines.

If so desired, I'll split this off into a different thread & CF entry.

> I see Matthias proposed to change "lines" to "tuples" - I only saw the
> message after pushing, but I probably wouldn't make that change anyway.
> The CSV docs seem to talk about lines, newlines etc. so it seems fine.
> If not, we can change that.

The CSV docs, sure. But copy doesn't only process CSVs; it also has
text (which does have a # lines = # tuples / rows guarantee) and
binary (in which the 'line' vocabulary doesn't make sense, and in
which the 'tuples' vocabulary is used). Additionally, most mentions of
postgres' logical rows/tuples in the COPY documentation use the 'rows'
terminology ('tuples' for FORMAT BINARY), and use 'line' only for
external format's textual representation's strings delimited by
newlines (which I believe is not exactly what we're counting).

One common user of COPY is the pg_dump tool, and that uses binary
format by default (iirc).

Additionally, all comments surrounding the *LINES_PROCESSED updates
only mention 'tuples', so I'd like to strongly suggest (a variant of)
attached patch 0002 to keep the vocabulary consistent by using
'tuples' instead of 'lines'.


With regards,

Matthias van de Meent
From d988c86d60d2b49b52e547b42c644493a377d44a Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Thu, 7 Jan 2021 16:39:57 +0100
Subject: [PATCH v1 1/2] Add progress reporting for filtered rows.

COPY ... FROM ... WHERE (condition) may drop rows, which we now register as
explicitly being dropped.
---
 doc/src/sgml/monitoring.sgml | 10 ++
 src/backend/catalog/system_views.sql |  3 ++-
 src/backend/commands/copyfrom.c  |  5 +
 src/include/commands/progress.h  |  1 +
 4 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 43fe8ae383..bf18ea23bd 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6498,6 +6498,16 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
Number of lines already processed by COPY command.
   
  
+
+ 
+  
+   lines_filtered bigint
+  
+  
+   Number of lines not processed due to being filtered by the
+   WHERE clause of the COPY command.
+  
+ 
 

   
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5d89e77dbe..83d62c5286 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1124,7 +1124,8 @@ CREATE VIEW pg_stat_progress_copy AS
 S.relid AS relid,
 S.param1 AS bytes_processed,
 S.param2 AS bytes_total,
-S.param3 AS lines_processed
+S.param3 AS lines_processed,
+S.param4 AS lines_filtered
 FROM pg_stat_get_progress_info('COPY') AS S
 LEFT JOIN pg_database D ON S.datid = D.oid;
 
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 08b6f782c7..212f4c67d6 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -540,6 +540,7 @@ CopyFrom(CopyFromState cstate)
 	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
+	uint64		filtered = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
 	bool		leafpart_use_multi_insert = false;
@@ -868,7 +869,11 @@ CopyFrom(CopyFromState cstate)
 			econtext->ecxt_scantuple = myslot;
 			/* Skip items that don't match COPY's WHERE clause */
 			if (!ExecQual(cstate->qualexpr, econtext))
+			{
+/* Report that this tuple was filtered out by the WHERE clause */
+pgstat_progress_update_param(PROGRESS_COPY_LINES_FILTERED, ++filtered);
 continue;
+			}
 		}
 
 		/* Determine the partition to insert the tuple into */
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 95ec5d02e9..eb0f4e70d6 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -137,5 +137,6 @@
 #define PROGRESS_COPY_BYTES_PROCESSED 0
 #define PROGRESS_COPY_BYTES_TOTAL 1
 #define 

Re: new heapcheck contrib module

2021-01-07 Thread Mark Dilger



> On Nov 19, 2020, at 11:47 AM, Peter Geoghegan  wrote:
> 
> On Thu, Nov 19, 2020 at 9:06 AM Robert Haas  wrote:
>> I'm also not sure if these descriptions are clear enough, but it may
>> also be hard to do a good job in a brief space. Still, comparing this
>> to the documentation of heapallindexed makes me rather nervous. This
>> is only trying to verify that the index contains all the tuples in the
>> heap, not that the values in the heap and index tuples actually match.
> 
> That's a good point. As things stand, heapallindexed verification does
> not notice when there are extra index tuples in the index that are in
> some way inconsistent with the heap. Hopefully this isn't too much of
> a problem in practice because the presence of extra spurious tuples
> gets detected by the index structure verification process. But in
> general that might not happen.
> 
> Ideally heapallindex verification would verify 1:1 correspondence. It
> doesn't do that right now, but it could.
> 
> This could work by having two bloom filters -- one for the heap,
> another for the index. The implementation would look for the absence
> of index tuples that should be in the index initially, just like
> today. But at the end it would modify the index bloom filter by &= it
> with the complement of the heap bloom filter. If any bits are left set
> in the index bloom filter, we go back through the index once more and
> locate index tuples that have at least some matching bits in the index
> bloom filter (we cannot expect all of the bits from each of the hash
> functions used by the bloom filter to still be matches).
> 
> From here we can do some kind of lookup for maybe-not-matching index
> tuples that we locate. Make sure that they point to an LP_DEAD line
> item in the heap or something. Make sure that they have the same
> values as the heap tuple if they're still retrievable (i.e. if we
> haven't pruned the heap tuple away already).

This approach sounds very good to me, but beyond the scope of what I'm planning 
for this release cycle.

>> This to me seems too conservative. The result is that by default we
>> check only tables, not indexes. I don't think that's going to be what
>> users want. I don't know whether they want the heapallindexed or
>> rootdescend behaviors for index checks, but I think they want their
>> indexes checked. Happy to hear opinions from actual users on what they
>> want; this is just me guessing that you've guessed wrong. :-)
> 
> My thoughts on these two options:
> 
> * I don't think that users will ever want rootdescend verification.
> 
> That option exists now because I wanted to have something that relied
> on the uniqueness property of B-Tree indexes following the Postgres 12
> work. I didn't add retail index tuple deletion, so it seemed like a
> good idea to have something that makes the same assumptions that it
> would have to make. To validate the design.
> 
> Another factor is that Alexander Korotkov made the basic
> bt_index_parent_check() tests a lot better for Postgres 13. This
> undermined the practical argument for using rootdescend verification.

The latest version of the patch has rootdescend off by default, but a switch to 
turn it on.  The documentation for that switch in doc/src/sgml/pgamcheck.sgml 
summarizes your comments:

+   This form of verification was originally written to help in the
+   development of btree index features.  It may be of limited or even of no
+   use in helping detect the kinds of corruption that occur in practice.
+   In any event, it is known to be a rather expensive check to perform.

For my own self, I don't care if rootdescend is an option in pg_amcheck.  You 
and Robert expressed somewhat different opinions, and I tried to split the 
difference.  I'm happy to go a different direction if that's what the consensus 
is.

> Finally, note that bt_index_parent_check() was always supposed to be
> something that was to be used only when you already knew that you had
> big problems, and wanted absolutely thorough verification without
> regard for the costs. This isn't the common case at all. It would be
> reasonable to not expose anything from bt_index_parent_check() at all,
> or to give it much less prominence. Not really sure of what the right
> balance is here myself, so I'm not insisting on anything. Just telling
> you what I know about it.

This still needs work.  Currently, there is a switch to turn off index 
checking, with the checks on by default.  But there is no switch controlling 
which kind of check is performed (bt_index_check vs. bt_index_parent_check).  
Making matters more complicated, selecting both rootdescend and bt_index_check 
wouldn't make sense, as there is no rootdescend option on that function.  So 
users would need multiple flags to turn on various options, with some flag 
combinations drawing an error about the flags not being mutually compatible.  
That's doable, but people may not like that interface.

> * 

Re: WIP: System Versioned Temporal Table

2021-01-07 Thread Simon Riggs
On Mon, Jan 4, 2021 at 2:24 PM Masahiko Sawada  wrote:
> Please also note the v7 patch cannot be applied to the current HEAD. I'm 
> switching the patch as Waiting on Author.

Surafel, please say whether you are working on this or not. If you
need help, let us know.

On Tue, 7 Jan 2020 at 10:33, Kyotaro Horiguchi  wrote:
> I think that the start and end timestamps represent the period where
> that version of the row was active. So UPDATE should modify the start
> timestamp of the new version to the same value with the end timestamp
> of the old version to the updated time. Thus, I don't think adding
> start timestamp to PK doesn't work as expected. That hinders us from
> rejecting rows with the same user-defined unique key because start
> timestamps is different each time of inserts. I think what Surafel is
> doing in the current patch is correct. Only end_timestamp = +infinity
> rejects another non-historical (= live) version with the same
> user-defined unique key.

The end_time needs to be updated when a row is updated, so it cannot
form part of the PK. If you try to force that to happen, then logical
replication will not work with system versioned tables, which would be
a bad thing. So *only* start_time should be added to the PK to make
this work. (A later comment also says the start_time needs to be
updated, which makes no sense!)

On Wed, 23 Oct 2019 at 21:03, Vik Fearing  wrote:
> I don't see any error handling for transaction anomalies.  In READ
> COMMITTED, you can easily end up with a case where the end time comes
> before the start time.  I don't even see anything constraining start
> time to be strictly inferior to the end time.  Such a constraint will be
> necessary for application-time periods (which your patch doesn't address
> at all but that's okay).

I don't see how it can have meaning to have an end_time earlier than a
start_time, so yes that should be checked. Having said that, if we use
a statement timestamp on row insertion then, yes, the end_time could
be earlier than start time, so that is just wrong. Ideally we would
use commit timestamp and fill the values in later. So the only thing
that makes sense for me is to use the dynamic time of insertion while
we hold the buffer lock, otherwise we will get anomalies.

The work looks interesting and I will be doing a longer review.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-01-07 Thread Julien Rouhaud
On Sun, Oct 18, 2020 at 4:12 PM Julien Rouhaud  wrote:
>
> On Sun, Oct 18, 2020 at 12:20 PM Tom Lane  wrote:
> >
> > Alvaro Herrera  writes:
> > > Wait ... what?  I've been thinking that this GUC is just to enable or
> > > disable the computation of query ID, not to change the algorithm to do
> > > so.  Do we really need to allow different algorithms in different
> > > sessions?
> >
> > We established that some time ago, no?
>
> I thought we established the need for allowing different algorithms,
> but I assumed globally not per session.  Anyway, allowing to enable or
> disable compute_queryid per session would technically allow that,
> assuming that you have another module loaded that computes a queryid
> only if no-one was already computed.  In that case pg_stat_statements
> works as you would expect, you will get a new entry, with a duplicated
> query text.
>
> With a bit more thinking, there's at least one use case where it's
> interesting to disable pg_stat_statements: queries using temporary
> tables.  In that case you're guaranteed to generate an infinity of
> different queryid.  That doesn't really help since you're not
> aggregating anything anymore, and it also makes pg_stat_statements
> virtually unusable as once you have a workload that needs frequent
> eviction, the overhead is so bad that you basically have to disable
> pg_stat_statements.  We could alternatively add a GUC to disable
> queryid computation when one of the tables is a temporary table, but
> that's yet one among many considerations that are probably best
> answered with a custom implementation.
>
> I'm also attaching an updated patch with some attempt to improve the
> documentation.  I mention that in-core algorithm may not suits
> everyone's needs, but we don't actually document what heuristics are.
> Should we give more details on them and what are the most direct
> consequences?

v15 that fixes recent conflicts.
From 952796fa1c65000948ed2a267f76676e354c989c Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Sun, 8 Mar 2020 14:34:44 +0100
Subject: [PATCH v15 3/3] Expose query identifier in verbose explain

If a query identifier has been computed, either by enabling compute_queryid or
using a third-party module, verbose explain will display it.

Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com
---
 doc/src/sgml/config.sgml  | 14 +++---
 doc/src/sgml/ref/explain.sgml |  6 --
 src/backend/commands/explain.c| 18 ++
 src/test/regress/expected/explain.out |  9 +
 src/test/regress/sql/explain.sql  |  3 +++
 5 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d9c85a1f80..1d2f7ba393 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7487,13 +7487,13 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
 Enables or disables in core query identifier computation.  A query
 identifier can be displayed in the pg_stat_activity
-view, or emitted in the log if configured via the  parameter.  The  extension also requires a query identifier
-to be computed.  Note that an external module can alternatively be used
-if the in core query identifier computation specification doesn't suit
-your need.  In this case, in core computation must be disabled.  The
-default is off.
+view, using EXPLAIN, or emitted in the log if
+configured via the  parameter.
+The  extension also requires a query
+identifier to be computed.  Note that an external module can
+alternatively be used if the in core query identifier computation
+specification doesn't suit your need.  In this case, in core
+computation must be disabled.  The default is off.

   
  
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index c4512332a0..105b069b41 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -136,8 +136,10 @@ ROLLBACK;
   the output column list for each node in the plan tree, schema-qualify
   table and function names, always label variables in expressions with
   their range table alias, and always print the name of each trigger for
-  which statistics are displayed.  This parameter defaults to
-  FALSE.
+  which statistics are displayed.  The query identifier will also be
+  displayed if one has been compute, see  for more details.  This parameter
+  defaults to FALSE.
  
 

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5d7eb3574c..2e1b4bf0bf 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -24,6 +24,7 @@
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "parser/analyze.h"
 

Re: Cache relation sizes?

2021-01-07 Thread 陈佳昕(步真)
Hi, Thomas

Is some scenes the same as dropdb for smgr cache.
1. drop tablespace for master. Should smgrdroptbs function be added in 
DropTableSpace function ? 
2. drop database for slave. smgrdropdb in dbase_redo.
3. drop tablespace for slave. smgrdroptbs in tblspc_redo.

Buzhen


 --原始邮件 --
发件人:Thomas Munro 
发送时间:Fri Jan 8 00:56:17 2021
收件人:陈佳昕(步真) 
抄送:Amit Kapila , Konstantin Knizhnik 
, PostgreSQL Hackers 

主题:Re: Cache relation sizes?
On Wed, Dec 30, 2020 at 4:13 AM 陈佳昕(步真)  wrote:
> I found some other problems which I want to share my change with you to make 
> you confirm.
> <1> I changed the code in smgr_alloc_sr to avoid dead lock.
>
>   LWLockAcquire(mapping_lock, LW_EXCLUSIVE);
>   flags = smgr_lock_sr(sr);
>   Assert(flags & SR_SYNCING(forknum));
>   + flags &= ~SR_SYNCING(forknum);
>   if (flags & SR_JUST_DIRTIED(forknum))
>   {
>/*
> * Someone else dirtied it while we were syncing, so we can't mark
> * it clean.  Let's give up on this SR and go around again.
> */
>smgr_unlock_sr(sr, flags);
>LWLockRelease(mapping_lock);
>goto retry;
>   }
>   /* This fork is clean! */
>   - flags &= ~SR_SYNCING(forknum);
>   flags &= ~SR_DIRTY(forknum);
>  }
>
> In smgr_drop_sr, if the sr is SR_SYNCING, it will retry until the sr is not 
> SR_SYNCING.  But in smgr_alloc_sr, if the sr is SR_JUST_DIRTIED, it will 
> retry to get another sr, although it has been synced by smgrimmedsync, the 
> flag SR_SYNCING  doesn't changed. This might cause dead lock. So I changed 
> your patch as above.

Right.  Thanks!

I also added smgrdropdb() to handle DROP DATABASE (the problem you
reported in your previous email).

While tweaking that code, I fixed it so that it uses a condition
variable to wait (instead of the silly sleep loop) when it needs to
drop an SR that is being sync'd.  Also, it now releases the mapping
lock while it's doing that, and requires it on retry.

> <2> I changed the code in smgr_drop_sr to avoid some corner cases
> /* Mark it invalid and drop the mapping. */
> smgr_unlock_sr(sr, ~SR_VALID);
> + for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
> +  sr->nblocks[forknum] = InvalidBlockNumber;
> hash_search_with_hash_value(sr_mapping_table,
>>smgr_rnode,
>hash,
>HASH_REMOVE,
>NULL);
>
> smgr_drop_sr just removes the hash entry from sr_mapping_table, but doesn't 
> remove the sr_pool. But in smgrnblocks_fast, it just get sr from sr_pool, so 
> I add some codes as above to avoid some corner cases get an unexpected result 
> from smgrnblocks_fast. Is it necessary, I also want some advice from you.

Hmm.  I think it might be better to increment sr->generation.  That
was already done in the "eviction" code path, where other processes
might still have references to the SR object, and I don't think it's
possible for anyone to access a dropped relation, but I suppose it's
more consistent to do that anyway.  Fixed.

Thanks for the review!


Re: plpgsql variable assignment with union is broken

2021-01-07 Thread Pavel Stehule
čt 7. 1. 2021 v 17:29 odesílatel Merlin Moncure  napsal:

> On Wed, Jan 6, 2021 at 11:07 PM Pavel Stehule 
> wrote:
>
> > čt 7. 1. 2021 v 4:20 odesílatel Merlin Moncure 
> napsal:
> >>
> >> On Tue, Jan 5, 2021 at 3:40 PM Tom Lane  wrote:
> >> >
> >> > easter...@verfriemelt.org writes:
> >> > > i found, that the behaviour of variable assignment in combination
> with union is not working anymore:
> >> > >   DO $$
> >> > >   DECLARE t bool;
> >> > >   begin
> >> > >   t := a FROM ( SELECT true WHERE false ) t(a) UNION SELECT
> true AS a;
> >> > >   END $$;
> >> >
> >> > > is this an intended change or is it a bug?
> >> >
> >> > It's an intended change, or at least I considered the case and thought
> >> > that it was useless because assignment will reject any result with
> more
> >> > than one row.  Do you have any non-toy example that wouldn't be as
> >> > clear or clearer without using UNION?  The above sure seems like an
> >> > example of awful SQL code.
> >>
> >> What is the definition of broken here?  What is the behavior of the
> >> query with the change and why?
> >>
> >> OP's query provably returns a single row and ought to always assign
> >> true as written.  A real world example might evaluate multiple
> >> condition branches so that the assignment resolves true if any branch
> >> is true. It could be rewritten with 'OR' of course.
> >>
> >> Is this also "broken"?
> >>   t := a FROM ( SELECT 'something' WHERE _Flag) t(a) UNION SELECT
> >> 'something else' AS a WHERE NOT _Flag;
> >>
> >> What about this?
> >> SELECT INTO t true WHERE false
> >> UNION select true;
> >
> >
> > ANSI SQL allows only SELECT INTO or var := SQL expression and SQL
> expression can be (subquery) too
>
> This is PLPGSQL not ansi SQL so that's irrelevant. If queries along
> the lines of:
> var := FROM (SELECT ..) UNION ..
>
> are narrowly broken, ok, but is that all that's going on here?  I
> guess I ought to test.
>
> I have a 300k line pl/pgsql project, this thread is terrifying me.  I
> am going to be blunt here and say I am not comfortable with tightening
> pl/pgsql syntax without an impact assessment,  The point that this is
> undocumanted behavior is weak, and it's already turning up problem
> reports.  IMO, expectation has been clearly set that
> var := expression;
>
> is more or less interchangeable with
> SELECT INTO var expression;
>
> Again, if this is narrowly confined to assignment into set query
> operations, maybe this is not so bad. But is it?


 PLpgSQL_Expr: opt_target_list
<--><--><-->from_clause where_clause
<--><--><-->group_clause having_clause window_clause
<--><--><-->opt_sort_clause opt_select_limit opt_for_locking_clause
<--><--><--><-->{

So SELECT INTO and assignment are not fully compatible now.

Regards

Pavel




> merlin
>


Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Tomas Vondra

On 1/7/21 2:17 AM, Justin Pryzby wrote:

On Wed, Jan 06, 2021 at 10:44:49PM +0100, Tomas Vondra wrote:

On 1/5/21 11:02 AM, Josef Šimánek wrote:

I'm attaching the whole patch since commitfest failed to ingest the
last incremental on CI.



Yeah, the whole patch needs to be attached for the commitfest tester to work
correctly - it can't apply pieces from multiple messages, etc.

Anyway, I pushed this last version of patch, after a couple more tweaks,
mainly to the docs - one place used pg_stat_copy_progress, the section was
not indexed properly, and so on.


Some more doc fixes.


Thanks, pushed.

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Added schema level support for publication.

2021-01-07 Thread vignesh C
Hi,

This feature adds schema option while creating publication. Users will
be able to specify one or more schemas while creating publication,
when the user specifies schema option, then the data changes for the
tables present in the schema specified by the user will be replicated
to the subscriber. Few examples have been listed below:

Create a publication that publishes all changes for all the tables
present in production schema:
CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA production;

Create a publication that publishes all changes for all the tables
present in marketing and sales schemas:
CREATE PUBLICATION sales_publication FOR ALL TABLES SCHEMA marketing, sales;

Add some schemas to the publication:
ALTER PUBLICATION sales_publication ADD SCHEMA marketing_june, sales_june;

Drop some schema from the publication:
ALTER PUBLICATION production_quarterly_publication DROP SCHEMA production_july;

Attached is a POC patch for the same. I felt this feature would be quite useful.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
From f9b5134182229f718bbc1a9162b6043f879a6410 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Thu, 7 Jan 2021 11:38:17 +0530
Subject: [PATCH v1] Added schema level support for publication.

This patch adds schema level support for publication along with for all tables.
User can specify multiple schemas with schema option. When user specifies
schema option, then the tables present in the schema specified will be selected
by publisher for sending the data to subscriber.
---
 doc/src/sgml/ref/alter_publication.sgml   |  32 ++
 doc/src/sgml/ref/create_publication.sgml  |  33 +-
 src/backend/catalog/pg_publication.c  |  55 +-
 src/backend/commands/publicationcmds.c| 172 +-
 src/backend/parser/gram.y |  38 ++-
 src/bin/pg_dump/pg_dump.c |  22 +++-
 src/bin/pg_dump/pg_dump.h |   1 +
 src/bin/psql/describe.c   |   6 +-
 src/include/catalog/pg_publication.h  |   9 +-
 src/include/nodes/parsenodes.h|   2 +
 src/test/regress/expected/publication.out |  36 +++
 11 files changed, 376 insertions(+), 30 deletions(-)

diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml
index c2946df..c00b8da 100644
--- a/doc/src/sgml/ref/alter_publication.sgml
+++ b/doc/src/sgml/ref/alter_publication.sgml
@@ -24,6 +24,9 @@ PostgreSQL documentation
 ALTER PUBLICATION name ADD TABLE [ ONLY ] table_name [ * ] [, ...]
 ALTER PUBLICATION name SET TABLE [ ONLY ] table_name [ * ] [, ...]
 ALTER PUBLICATION name DROP TABLE [ ONLY ] table_name [ * ] [, ...]
+ALTER PUBLICATION name ADD SCHEMA schema_name [, ...]
+ALTER PUBLICATION name SET SCHEMA schema_name [, ...]
+ALTER PUBLICATION name DROP SCHEMA schema_name [, ...]
 ALTER PUBLICATION name SET ( publication_parameter [= value] [, ... ] )
 ALTER PUBLICATION name OWNER TO { new_owner | CURRENT_ROLE | CURRENT_USER | SESSION_USER }
 ALTER PUBLICATION name RENAME TO new_name
@@ -97,6 +100,15 @@ ALTER PUBLICATION name RENAME TO 
 

+schema_name
+
+ 
+  Name of an existing schema.
+ 
+
+   
+
+   
 SET ( publication_parameter [= value] [, ... ] )
 
  
@@ -141,6 +153,26 @@ ALTER PUBLICATION noinsert SET (publish = 'update, delete');
 
 ALTER PUBLICATION mypublication ADD TABLE users, departments;
 
+
+  
+   Add some schemas to the publication:
+
+ALTER PUBLICATION sales_publication ADD SCHEMA marketing_june, sales_june;
+
+  
+
+  
+   Drop some schema from the publication:
+
+ALTER PUBLICATION production_quarterly_publication DROP SCHEMA production_july;
+
+  
+
+  
+   Set schema to the publication:
+
+ALTER PUBLICATION production_publication SET SCHEMA production_july;
+
  
 
  
diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
index ff82fbc..c941643 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -22,8 +22,8 @@ PostgreSQL documentation
  
 
 CREATE PUBLICATION name
-[ FOR TABLE [ ONLY ] table_name [ * ] [, ...]
-  | FOR ALL TABLES ]
+[ FOR TABLE [ ONLY ] table_name [ * ] [, ... ]
+  | FOR ALL TABLES [ SCHEMA schema_name [, ... ] ] ]
 [ WITH ( publication_parameter [= value] [, ... ] ) ]
 
  
@@ -100,6 +100,19 @@ CREATE PUBLICATION name

 

+SCHEMA
+
+ 
+  Specifies the list of schema that should be added to the publication.
+  If SCHEMA is specified, then the tables present in the
+  specified schema list is selected and added to the publication.  The rest
+  of the tables will be skipped. This option should be specified
+  with FOR ALL TABLES option.
+ 
+
+   
+
+   
 WITH ( publication_parameter [= value] [, ... ] )
 
  
@@ -222,6 +235,22 @@ CREATE PUBLICATION alltables FOR ALL TABLES;
 
 CREATE PUBLICATION insert_only FOR TABLE 

Re: Add Information during standby recovery conflicts

2021-01-07 Thread Fujii Masao



On 2020/12/15 2:00, Fujii Masao wrote:



On 2020/12/15 0:49, Drouvot, Bertrand wrote:

Hi,

On 12/14/20 4:20 PM, Fujii Masao wrote:

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



On 2020/12/14 21:31, Fujii Masao wrote:



On 2020/12/05 12:38, Masahiko Sawada wrote:

On Fri, Dec 4, 2020 at 7:22 PM Drouvot, Bertrand  wrote:


Hi,

On 12/4/20 2:21 AM, Fujii Masao wrote:


On 2020/12/04 9:28, Masahiko Sawada wrote:

On Fri, Dec 4, 2020 at 2:54 AM Fujii Masao
 wrote:




On 2020/12/01 17:29, Drouvot, Bertrand wrote:

Hi,

On 12/1/20 12:35 AM, Masahiko Sawada wrote:

CAUTION: This email originated from outside of the organization.
Do not click links or open attachments unless you can confirm the
sender and know the content is safe.



On Tue, Dec 1, 2020 at 3:25 AM Alvaro Herrera
 wrote:

On 2020-Dec-01, Fujii Masao wrote:


+ if (proc)
+ {
+ if (nprocs == 0)
+ appendStringInfo(, "%d", proc->pid);
+ else
+ appendStringInfo(, ", %d", proc->pid);
+
+ nprocs++;

What happens if all the backends in wait_list have gone? In
other words,
how should we handle the case where nprocs == 0 (i.e., nprocs
has not been
incrmented at all)? This would very rarely happen, but can happen.
In this case, since buf.data is empty, at least there seems no
need to log
the list of conflicting processes in detail message.

Yes, I noticed this too; this can be simplified by changing the
condition in the ereport() call to be "nprocs > 0" (rather than
wait_list being null), otherwise not print the errdetail.  (You
could
test buf.data or buf.len instead, but that seems uglier to me.)

+1

Maybe we can also improve the comment of this function from:

+ * This function also reports the details about the conflicting
+ * process ids if *wait_list is not NULL.

to " This function also reports the details about the conflicting
process ids if exist" or something.


Thank you all for the review/remarks.

They have been addressed in the new attached patch version.


Thanks for updating the patch! I read through the patch again
and applied the following chages to it. Attached is the updated
version of the patch. Could you review this version? If there is
no issue in it, I'm thinking to commit this version.


Thank you for updating the patch! I have one question.



+   timeouts[cnt].id = STANDBY_TIMEOUT;
+   timeouts[cnt].type = TMPARAM_AFTER;
+   timeouts[cnt].delay_ms = DeadlockTimeout;

Maybe STANDBY_TIMEOUT should be STANDBY_DEADLOCK_TIMEOUT here?
I changed the code that way.


As the comment of ResolveRecoveryConflictWithLock() says the
following, a deadlock is detected by the ordinary backend process:

   * Deadlocks involving the Startup process and an ordinary backend
proces
   * will be detected by the deadlock detector within the ordinary
backend.

If we use STANDBY_DEADLOCK_TIMEOUT,
SendRecoveryConflictWithBufferPin() will be called after
DeadlockTimeout passed, but I think it's not necessary for the startup
process in this case.


Thanks for pointing this! You are right.



If we want to just wake up the startup process
maybe we can use STANDBY_TIMEOUT here?



Thanks for the patch updates! Except what we are still discussing below,
it looks good to me.


When STANDBY_TIMEOUT happens, a request to release conflicting buffer
pins is sent. Right? If so, we should not also use STANDBY_TIMEOUT there?


Agree



Or, first of all, we don't need to enable the deadlock timer at all?
Since what we'd like to do is to wake up after deadlock_timeout
passes, we can do that by changing ProcWaitForSignal() so that it can
accept the timeout and giving the deadlock_timeout to it. If we do
this, maybe we can get rid of STANDBY_LOCK_TIMEOUT from
ResolveRecoveryConflictWithLock(). Thought?


Where do we enable deadlock timeout in hot standby case? You meant to
enable it in ProcWaitForSignal() or where we set a timer for not hot
standby case, in ProcSleep()?


No, what I tried to say is to change ResolveRecoveryConflictWithLock() so that 
it does

1. calculate the "minimum" timeout from deadlock_timeout and 
max_standby_xxx_delay
2. give the calculated timeout value to ProcWaitForSignal()
3. wait for signal and timeout on ProcWaitForSignal()

Since ProcWaitForSignal() calls WaitLatch(), seems it's not so difficult to 
make ProcWaitForSignal() handle the timeout. If we do this, I was thinking that 
we can get rid of enable_timeouts() from ResolveRecoveryConflictWithLock().






Why not simply use (again) the STANDBY_LOCK_TIMEOUT one? (as it triggers
a call to StandbyLockTimeoutHandler() which does nothing, except waking
up. That's what we want, right?)


Right, what I wanted to mean is STANDBY_LOCK_TIMEOUT. The startup
process can wake up and do 

Re: plpgsql variable assignment with union is broken

2021-01-07 Thread Merlin Moncure
On Wed, Jan 6, 2021 at 11:07 PM Pavel Stehule  wrote:

> čt 7. 1. 2021 v 4:20 odesílatel Merlin Moncure  napsal:
>>
>> On Tue, Jan 5, 2021 at 3:40 PM Tom Lane  wrote:
>> >
>> > easter...@verfriemelt.org writes:
>> > > i found, that the behaviour of variable assignment in combination with 
>> > > union is not working anymore:
>> > >   DO $$
>> > >   DECLARE t bool;
>> > >   begin
>> > >   t := a FROM ( SELECT true WHERE false ) t(a) UNION SELECT true AS 
>> > > a;
>> > >   END $$;
>> >
>> > > is this an intended change or is it a bug?
>> >
>> > It's an intended change, or at least I considered the case and thought
>> > that it was useless because assignment will reject any result with more
>> > than one row.  Do you have any non-toy example that wouldn't be as
>> > clear or clearer without using UNION?  The above sure seems like an
>> > example of awful SQL code.
>>
>> What is the definition of broken here?  What is the behavior of the
>> query with the change and why?
>>
>> OP's query provably returns a single row and ought to always assign
>> true as written.  A real world example might evaluate multiple
>> condition branches so that the assignment resolves true if any branch
>> is true. It could be rewritten with 'OR' of course.
>>
>> Is this also "broken"?
>>   t := a FROM ( SELECT 'something' WHERE _Flag) t(a) UNION SELECT
>> 'something else' AS a WHERE NOT _Flag;
>>
>> What about this?
>> SELECT INTO t true WHERE false
>> UNION select true;
>
>
> ANSI SQL allows only SELECT INTO or var := SQL expression and SQL expression 
> can be (subquery) too

This is PLPGSQL not ansi SQL so that's irrelevant. If queries along
the lines of:
var := FROM (SELECT ..) UNION ..

are narrowly broken, ok, but is that all that's going on here?  I
guess I ought to test.

I have a 300k line pl/pgsql project, this thread is terrifying me.  I
am going to be blunt here and say I am not comfortable with tightening
pl/pgsql syntax without an impact assessment,  The point that this is
undocumanted behavior is weak, and it's already turning up problem
reports.  IMO, expectation has been clearly set that
var := expression;

is more or less interchangeable with
SELECT INTO var expression;

Again, if this is narrowly confined to assignment into set query
operations, maybe this is not so bad. But is it?

merlin




Re: Cirrus CI (Windows help wanted)

2021-01-07 Thread Andrew Dunstan


On 1/6/21 12:36 AM, Andrew Dunstan wrote:
> On 1/5/21 11:19 PM, Thomas Munro wrote:
>> Thanks!  I hacked on this a bit more and got as far as:
>>
>> C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build>call
>> perl buildsetup.pl
>> Unable to determine Visual Studio version: The nmake version could not
>> be determined. at src/tools/msvc/Mkvcbuild.pm line 92.
>>
>> That's from https://cirrus-ci.com/task/5842523031601152.  I guess PATH
>> is wrong or nmake it not present, but it's so painful to do a test
>> cycle that I gave up for today...
>
> Hmm, weird. I'll play some more tomorrow and see what I can find.


I have it building and testing ok, but it's horribly fragile. I doubt
this is acceptable for the cfbot, you'll get far to many spurious failures.


There are some build warnings we don't usually see. I haven't delved
into that.


See 


The yml file is:


task:
  name: Windows
  windows_container:
    image: cirrusci/windowsservercore:cmake
  install_script:
    - choco feature disable --name=showDownloadProgress
    - choco install -y winflexbison diffutils
    - rename c:\ProgramData\chocolatey\bin\win_flex.exe flex.exe
    - rename c:\ProgramData\chocolatey\bin\win_bison.exe bison.exe
    - choco install -y strawberryperl
    - choco install --force -y visualstudio2019-workload-vctools 
--package-parameters "--includeOptional"
    - choco list --localonly
  build_script:
    - cd "c:/Program Files (x86)/Microsoft Visual 
Studio/2019/BuildTools/VC/Auxiliary/Build"
    - vcvarsall x64
    - echo on
    - cd C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build
    - set PATH=C:\strawberry\perl\bin;%PATH%
    - set
    - perl src/tools/msvc/mkvcbuild.pl
    - msbuild pgsql.sln
  test_script:
    - set PATH=C:\strawberry\perl\bin;%PATH%
    - set
    - perl src/tools/msvc/vcregress.pl check


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Zedstore - compressed in-core columnar storage

2021-01-07 Thread Robert Eckhardt


On Dec 31, 2020, at 9:22 AM, Simon Riggs 
mailto:si...@2ndquadrant.com>> wrote:

On Wed, 18 Nov 2020 at 00:31, Jacob Champion 
mailto:pchamp...@vmware.com>> wrote:

So that's in need of resolution. I'd expect gin and gist to be pretty
flaky until we fix that.

Jacob and Soumyadeep,

Thanks for submitting this. I think a fix is still outstanding? and
the patch fails to apply on HEAD in two places.
Please can you submit the next version?

Do you mind if we add this for review to the Jan CF?

It is a lot of code and I think there is significant difficulty for
the community to accept that as-is, even though it looks to be a very
high quality submission. So I would like to suggest a strategy for
commit: we accept Zedstore as "Beta" or "Experimental" in PG14,
perhaps with a WARNING/Caution similar to the one that used to be
given by Postgres in earlier versions when you created a Hash index.
We keep Zedstore in "Beta" mode until a later release, PG15 or later
when we can declare Zedstore fully safe. That approach allows us to
get this into the repo asap, and then be fixed and improved
incrementally from here.

The goal for Zedstore is to get a Column Store into Postgres, but not 
necessarily Zedstore. (Zedstore itself would be nice) When designing Zedstore 
success for us would be:
- significantly more performant on OLAP type queries,
- performant enough to not be terrible with OLTP type queries
- must support compression
- cannot be append only, this was the case initially with Greenplum Column 
Store and it was a mistake. Customers want to update and delete
- it needs to be feature complete as compared to HEAP unless it doesn’t make 
sense

Our initial goal is to get the TableAM and executor molded into a state where 
the above is possible for anyone wanting a column store implementation.

Given the goal of addressing API/Executor issues generically first, we have 
been trying to peel off and work on the parts that are not tightly linked to 
Zedstore.  Specifically I don’t think it would be ok to merge Zedstore into 
core when it might affect the performance of HEAP relations.

Instead of focusing on the larger, more difficult to review Zedstore patch, we 
are trying to peel off the touch points where Zedstore and the current server 
interact. Note this isn’t intended to be an exhaustive list, rather a list of 
the most immediate issues. Some of these issues are critical for Zedstore to 
work, i.e. column projection, while some of these issues point more towards 
ensuring the various layers in the code are clean so that folks leveraging the 
TableAM don’t need to write their own bits from whole cloth but rather can 
leverage appropriately generic primitives, i.e. DBsize or page inspect.

As such, an incomplete list of things currently on our radar:

1) Column Projection — We have a patch [1] that is a demonstration of what we 
would like to do. There are several TODOs in the email that can/will be 
addressed if the general method is acceptable

2) DBSize —Georgios has a patch [2] that begins to make DBSize less HEAP 
specific

3) Reloptions —Jeff Davis has a patch [3] that begins to make these more 
flexible, having spoken with him we think additional work needs to be done here

4) PageInspect —needs to be less HEAP specific but no work has been done here 
that I’m aware of

5) bitmapHeapScan —currently scans both the index and the relation, there are 
code comments to address this and we need to look into what a fix would mean

6) Bulk insertion —Justin Pryzby has a patch [4] we are following along with.

7) analyze — Denis has a patch which starts to address this [5]

Ideally we can peel out anything that is useful to any column store. Once those 
have been discussed and committed the general code should be in better shape as 
well.

— Rob


[1] 
https://www.postgresql.org/message-id/flat/cae-ml+9rmtnzkcntzpqf8o3b-ujhwgfbsoxpqa3wvuc8ybb...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/svffVJPtfDYEIISNS-3FQs64CauSul3RjF7idXOfy4H40YBVwB3TMumHb6WoAElJpHOsN-j8fjxYohEt4VxcsJ0Qd9gizwzsY3rjgtjj440=@pm.me
[3] 
https://www.postgresql.org/message-id/flat/429fb58fa3218221bb17c7bf9e70e1aa6cfc6b5d.ca...@j-davis.com
[4] 
https://www.postgresql.org/message-id/flat/20200508072545.ga9...@telsasoft.com
[5] 
https://www.postgresql.org/message-id/flat/c7cfe16b-f192-4124-bebb-7864285e0...@arenadata.io




e.g.

"NOTICE:  Caution: Zedstore is an experimental feature in PostgreSQL14
intended for robustness and performance testing only. Your data and/or
query accuracy may be at risk if you rely on this."

--
Simon Riggshttp://www.EnterpriseDB.com/





Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Josef Šimánek
čt 7. 1. 2021 v 16:54 odesílatel Tomas Vondra
 napsal:
>
>
>
> On 1/7/21 12:06 PM, Josef Šimánek wrote:
> > st 6. 1. 2021 v 22:44 odesílatel Tomas Vondra
> >  napsal:
> >>
> >> On 1/5/21 11:02 AM, Josef Šimánek wrote:
> >>> I'm attaching the whole patch since commitfest failed to ingest the
> >>> last incremental on CI.
> >>>
> >>
> >> Yeah, the whole patch needs to be attached for the commitfest tester to
> >> work correctly - it can't apply pieces from multiple messages, etc.
> >>
> >> Anyway, I pushed this last version of patch, after a couple more tweaks,
> >> mainly to the docs - one place used pg_stat_copy_progress, the section
> >> was not indexed properly, and so on.
> >>
> >> I see Matthias proposed to change "lines" to "tuples" - I only saw the
> >> message after pushing, but I probably wouldn't make that change anyway.
> >> The CSV docs seem to talk about lines, newlines etc. so it seems fine.
> >> If not, we can change that.
> >>
> >> One more question, though - I now realize the lines_processed ignores
> >> rows skipped because of BEFORE INSERT triggers. I wonder if that's the
> >> right thing to do? Imagine you know the number of lines in a file. You
> >> can't really use (lines_processed / total_lines) to measure progress,
> >> because that may ignore many "skipped" rows. So maybe this should be
> >> changed to count all rows. OTOH we still have bytes_processed.
> >
> > I think that should be fixed. It is called "lines_processed" not
> > "lines_inserted". I'll take a look.
> >
>
> So we may either rename the column to "lines_inserted", or tweak the
> code to count all processed lines. Or track both and have two columns.

First I'll ensure lines_processed represents the actual amount of
processed lines. If reading from file and some lines are skipped due
to before insert trigger, I consider that one processed as well, even
if it is not inserted. If welcomed, I can add lines_inserted later as
well. But I'm not sure about the use case.

Also thanks to mentioning triggers, I think those could be used to
test the COPY progress (at least some variants). I'll check if I would
be able to add some test cases as well.

> regarss
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: list of extended statistics on psql

2021-01-07 Thread Tomas Vondra




On 1/7/21 3:47 PM, Alvaro Herrera wrote:

On 2021-Jan-07, Tomas Vondra wrote:


On 1/7/21 1:46 AM, Tatsuro Yamada wrote:



I overlooked the check for MCV in the logic building query
because I created the patch as a new feature on PG14.
I'm not sure whether we should do back patch or not. However, I'll
add the check on the next patch because it is useful if you decide to
do the back patch on PG10, 11, 12, and 13.


BTW perhaps a quick look at the other \d commands would show if there are
precedents. I didn't have time for that.


Yes, we do promise that new psql works with older servers.



Yeah, makes sense. That means we need add the check for 12 / MCV.


I think we would not backpatch any of this, though.


I wasn't really planning to backpatch any of this, of course.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Tomas Vondra




On 1/7/21 12:06 PM, Josef Šimánek wrote:

st 6. 1. 2021 v 22:44 odesílatel Tomas Vondra
 napsal:


On 1/5/21 11:02 AM, Josef Šimánek wrote:

I'm attaching the whole patch since commitfest failed to ingest the
last incremental on CI.



Yeah, the whole patch needs to be attached for the commitfest tester to
work correctly - it can't apply pieces from multiple messages, etc.

Anyway, I pushed this last version of patch, after a couple more tweaks,
mainly to the docs - one place used pg_stat_copy_progress, the section
was not indexed properly, and so on.

I see Matthias proposed to change "lines" to "tuples" - I only saw the
message after pushing, but I probably wouldn't make that change anyway.
The CSV docs seem to talk about lines, newlines etc. so it seems fine.
If not, we can change that.

One more question, though - I now realize the lines_processed ignores
rows skipped because of BEFORE INSERT triggers. I wonder if that's the
right thing to do? Imagine you know the number of lines in a file. You
can't really use (lines_processed / total_lines) to measure progress,
because that may ignore many "skipped" rows. So maybe this should be
changed to count all rows. OTOH we still have bytes_processed.


I think that should be fixed. It is called "lines_processed" not
"lines_inserted". I'll take a look.



So we may either rename the column to "lines_inserted", or tweak the 
code to count all processed lines. Or track both and have two columns.


regarss

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add Information during standby recovery conflicts

2021-01-07 Thread Fujii Masao




On 2021/01/07 22:39, Drouvot, Bertrand wrote:

Hi,

On 1/6/21 6:31 PM, Fujii Masao wrote:

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



On 2020/12/15 0:20, Fujii Masao wrote:



On 2020/12/14 21:31, Fujii Masao wrote:



On 2020/12/05 12:38, Masahiko Sawada wrote:

On Fri, Dec 4, 2020 at 7:22 PM Drouvot, Bertrand  wrote:


Hi,

On 12/4/20 2:21 AM, Fujii Masao wrote:


On 2020/12/04 9:28, Masahiko Sawada wrote:

On Fri, Dec 4, 2020 at 2:54 AM Fujii Masao
 wrote:




On 2020/12/01 17:29, Drouvot, Bertrand wrote:

Hi,

On 12/1/20 12:35 AM, Masahiko Sawada wrote:

CAUTION: This email originated from outside of the organization.
Do not click links or open attachments unless you can confirm the
sender and know the content is safe.



On Tue, Dec 1, 2020 at 3:25 AM Alvaro Herrera
 wrote:

On 2020-Dec-01, Fujii Masao wrote:


+ if (proc)
+ {
+ if (nprocs == 0)
+ appendStringInfo(, "%d", proc->pid);
+ else
+ appendStringInfo(, ", %d", proc->pid);
+
+ nprocs++;

What happens if all the backends in wait_list have gone? In
other words,
how should we handle the case where nprocs == 0 (i.e., nprocs
has not been
incrmented at all)? This would very rarely happen, but can happen.
In this case, since buf.data is empty, at least there seems no
need to log
the list of conflicting processes in detail message.

Yes, I noticed this too; this can be simplified by changing the
condition in the ereport() call to be "nprocs > 0" (rather than
wait_list being null), otherwise not print the errdetail.  (You
could
test buf.data or buf.len instead, but that seems uglier to me.)

+1

Maybe we can also improve the comment of this function from:

+ * This function also reports the details about the conflicting
+ * process ids if *wait_list is not NULL.

to " This function also reports the details about the conflicting
process ids if exist" or something.


Thank you all for the review/remarks.

They have been addressed in the new attached patch version.


Thanks for updating the patch! I read through the patch again
and applied the following chages to it. Attached is the updated
version of the patch. Could you review this version? If there is
no issue in it, I'm thinking to commit this version.


Thank you for updating the patch! I have one question.



+   timeouts[cnt].id = STANDBY_TIMEOUT;
+   timeouts[cnt].type = TMPARAM_AFTER;
+   timeouts[cnt].delay_ms = DeadlockTimeout;

Maybe STANDBY_TIMEOUT should be STANDBY_DEADLOCK_TIMEOUT here?
I changed the code that way.


As the comment of ResolveRecoveryConflictWithLock() says the
following, a deadlock is detected by the ordinary backend process:

   * Deadlocks involving the Startup process and an ordinary backend
proces
   * will be detected by the deadlock detector within the ordinary
backend.

If we use STANDBY_DEADLOCK_TIMEOUT,
SendRecoveryConflictWithBufferPin() will be called after
DeadlockTimeout passed, but I think it's not necessary for the startup
process in this case.


Thanks for pointing this! You are right.



If we want to just wake up the startup process
maybe we can use STANDBY_TIMEOUT here?



Thanks for the patch updates! Except what we are still discussing below,
it looks good to me.


When STANDBY_TIMEOUT happens, a request to release conflicting buffer
pins is sent. Right? If so, we should not also use STANDBY_TIMEOUT there?


Agree



Or, first of all, we don't need to enable the deadlock timer at all?
Since what we'd like to do is to wake up after deadlock_timeout
passes, we can do that by changing ProcWaitForSignal() so that it can
accept the timeout and giving the deadlock_timeout to it. If we do
this, maybe we can get rid of STANDBY_LOCK_TIMEOUT from
ResolveRecoveryConflictWithLock(). Thought?


Where do we enable deadlock timeout in hot standby case? You meant to
enable it in ProcWaitForSignal() or where we set a timer for not hot
standby case, in ProcSleep()?


No, what I tried to say is to change ResolveRecoveryConflictWithLock() so that 
it does

1. calculate the "minimum" timeout from deadlock_timeout and 
max_standby_xxx_delay
2. give the calculated timeout value to ProcWaitForSignal()
3. wait for signal and timeout on ProcWaitForSignal()

Since ProcWaitForSignal() calls WaitLatch(), seems it's not so difficult to 
make ProcWaitForSignal() handle the timeout. If we do this, I was thinking that 
we can get rid of enable_timeouts() from ResolveRecoveryConflictWithLock().






Why not simply use (again) the STANDBY_LOCK_TIMEOUT one? (as it triggers
a call to StandbyLockTimeoutHandler() which does nothing, except waking
up. That's what we want, right?)


Right, what I wanted to mean is STANDBY_LOCK_TIMEOUT. The startup
process can wake up and do nothing. Thank you for 

Re: Add session statistics to pg_stat_database

2021-01-07 Thread Laurenz Albe
On Fri, 2020-12-25 at 20:28 +0900, Masahiro Ikeda wrote:
> As a user, I want this feature to know whether
> clients' session activities are as expected.
> 
> I have some comments about the patch.

Thanks you for the thorough review!

> 1. pg_proc.dat
> 
> The unit of "session time" and so on says "in seconds".
> But, is "in milliseconds" right?

You are right.  Fixed.

> 2. monitoring.sgml
> 
> IIUC, "active_time" includes the time executes a fast-path function and
> "idle in transaction" includes "idle in transaction(aborted)" time.
>
> Why don't you reference pg_stat_activity's "state" column and
> "active_time" is the total time when the state is "active" and "fast 
> path"?
> "idle in transaction" is as same too.

Good idea; I have expanded the documentation like that.

> 3. pgstat.h
> 
> The comment of PgStat_MsgConn says "Sent by pgstat_connection".
> I thought "pgstat_connection" is a function, but it doesn't exist.
>
> Is "Sent by the backend" right?

The function was renamed and is now called "pgstat_send_connstats".

But you are right, I might as well match the surrounding code and
write "Sent by the backend".

> Although this is a trivial thing, the following row has too many tabs.
> 
> Other structs have only one space.
> 
> // }Pgstat_MsgConn;

Yes, I messed that up during the pgindent run.  Fixed.

Patch version 11 is attached.

Yours,
Laurenz Albe
From 324847353f5d9e5b2899dd93d43fb345df1dcdb8 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Thu, 7 Jan 2021 16:33:45 +0100
Subject: [PATCH] Add session statistics to pg_stat_database

If "track_counts" is active, track the following per database:
- total number of connections
- number of sessions that ended by loss of network connection,
  fatal errors and operator intervention
- total time spent in database sessions
- total time spent executing queries
- total idle in transaction time

This is useful to check if connection pooling is working.
It also helps to estimate the size of the connection pool
required to keep the database busy, which depends on the
percentage of the transaction time that is spent idling.

Discussion: https://postgr.es/m/b07e1f9953701b90c66ed368656f2aef40cac4fb.ca...@cybertec.at
Reviewed-By: Soumyadeep Chakraborty, Justin Pryzby, Masahiro Ikeda, Magnus Hagander

(This requires a catversion bump, as well as an update to PGSTAT_FILE_FORMAT_ID)
---
 doc/src/sgml/monitoring.sgml |  77 +++
 src/backend/catalog/system_views.sql |   7 ++
 src/backend/postmaster/pgstat.c  | 134 ++-
 src/backend/tcop/postgres.c  |  11 ++-
 src/backend/utils/adt/pgstatfuncs.c  |  94 +++
 src/backend/utils/error/elog.c   |   8 ++
 src/include/catalog/pg_proc.dat  |  32 +++
 src/include/pgstat.h |  39 
 src/test/regress/expected/rules.out  |   7 ++
 9 files changed, 404 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 43fe8ae383..59622173da 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3737,6 +3737,83 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
+ 
+  
+   session_time double precision
+  
+  
+   Time spent by database sessions in this database, in milliseconds
+   (note that statistics are only updated when the state of a session
+   changes, so if sessions have been idle for a long time, this idle time
+   won't be included)
+  
+ 
+
+ 
+  
+   active_time double precision
+  
+  
+   Time spent executing SQL statements in this database, in milliseconds
+   (this corresponds to the states active and
+   fastpath function call in
+   
+   pg_stat_activity)
+  
+ 
+
+ 
+  
+   idle_in_transaction_time double precision
+  
+  
+   Time spent idling while in a transaction in this database, in milliseconds
+   (this corresponds to the states idle in transaction and
+   idle in transaction (aborted) in
+   
+   pg_stat_activity)
+  
+ 
+
+ 
+  
+   sessions bigint
+  
+  
+   Total number of sessions established to this database
+  
+ 
+
+ 
+  
+   sessions_abandoned bigint
+  
+  
+   Number of database sessions to this database that were terminated
+   because connection to the client was lost
+  
+ 
+
+ 
+  
+   sessions_fatal bigint
+  
+  
+   Number of database sessions to this database that were terminated
+   by fatal errors
+  
+ 
+
+ 
+  
+   sessions_killed bigint
+  
+  
+   Number of database sessions to this database that were terminated
+   by operator intervention
+  
+ 
+
  
   
stats_reset timestamp with time zone
diff --git a/src/backend/catalog/system_views.sql 

Re: Disable WAL logging to speed up data loading

2021-01-07 Thread Robert Haas
On Wed, Dec 2, 2020 at 10:53 PM tsunakawa.ta...@fujitsu.com
 wrote:
> The code looks good, and the performance seems to be nice, so I marked this 
> ready for committer.

Were the issues that I mentioned regarding GIST (and maybe other AMs)
in the last paragraph of
http://postgr.es/m/ca+tgmozez5rons49c7mepjhjndqmqtvrz_lcqukprwdmrev...@mail.gmail.com
addressed in some way? That seems like a pretty hard engineering
problem to me, and I don't see that there's been any discussion of it.
Those are correctness concerns separate from any wal_level tracking we
might want to do to avoid accidental mistakes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Online checksums patch - once again

2021-01-07 Thread Magnus Hagander
On Thu, Jan 7, 2021 at 3:05 PM Daniel Gustafsson  wrote:
>
> > On 5 Jan 2021, at 18:19, Michael Banck  wrote:
>
> >> diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
> >> index 385ac25150..e3b0048806 100644
> >> --- a/doc/src/sgml/ref/initdb.sgml
> >> +++ b/doc/src/sgml/ref/initdb.sgml
> >> @@ -219,6 +219,7 @@ PostgreSQL documentation
> >> failures will be reported in the
> >> 
> >> pg_stat_database view.
> >> +See  for details.
> >>
> >>   
> >>  
> >
> >> diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
> >> index f4bc147b10..5dcfcdd2ff 100644
> >> --- a/doc/src/sgml/wal.sgml
> >> +++ b/doc/src/sgml/wal.sgml
> >> @@ -230,6 +230,103 @@
> >>   
> >>  
> >>
> >> + 
> >> +  Data Checksums
> >> +  
> >> +   checksums
> >> +  
> >> +
> >> +  
> >> +   Data pages are not checksum protected by default, but this can 
> >> optionally be
> >> +   enabled for a cluster.  When enabled, each data page will be assigned a
> >> +   checksum that is updated when the page is written and verified every 
> >> time
> >> +   the page is read. Only data pages are protected by checksums, internal 
> >> data
> >> +   structures and temporary files are not.
> >> +  
> >> +
> >> +  
> >> +   Checksums are normally enabled when the cluster is initialized using 
> >>  >> +   
> >> linkend="app-initdb-data-checksums">initdb.
> >> +   They can also be enabled or disabled at a later time, either as an 
> >> offline
> >> +   operation or in a running cluster. In all cases, checksums are enabled 
> >> or
> >> +   disabled at the full cluster level, and cannot be specified 
> >> individually for
> >> +   databases or tables.
> >> +  
> >> +
> >> +  
> >> +   The current state of checksums in the cluster can be verified by 
> >> viewing the
> >> +   value of the read-only configuration variable  >> +   linkend="guc-data-checksums" /> by issuing the command SHOW
> >> +   data_checksums.
> >> +  
> >> +
> >> +  
> >> +   When attempting to recover from corrupt data it may be necessary to 
> >> bypass
> >> +   the checksum protection in order to recover data. To do this, 
> >> temporarily
> >> +   set the configuration parameter  >> linkend="guc-ignore-checksum-failure" />.
> >> +  
> >
> > I think the above is rather informative about checksums in general and
> > not specific to online activation of checksusm, so could pretty much be
> > committed verbatim right now, except for the "either as an offline
> > operation or in a running cluster" bit which would have to be rewritten.
>
> That might indeedbe useful regardless of the state of this patch.  Robert and
> Heikki: being committers who have both reviewed recent versions of the patch,
> would you prefer these sections be broken out into a separate patch?

I think it would be ;)

Obviously in that case adapted so that it has to be changed offline,
and the patch would have to change that to say it can be changed
online.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: Proposal: Global Index

2021-01-07 Thread Robert Haas
On Thu, Jan 7, 2021 at 4:44 AM 曾文旌  wrote:
> I've been following this topic for a long time. It's been a year since the 
> last response.
> It was clear that our customers wanted this feature as well, and a large 
> number of them mentioned it.
>
> So, I wish the whole feature to mature as soon as possible.
> I summarized the scheme mentioned in the email and completed the POC 
> patch(base on PG_13).

You need to summarize the basic design choices you've made here. Like,
what did you do about the fact that TIDs have to be 6 bytes?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Key management with tests

2021-01-07 Thread Bruce Momjian
On Fri, Jan  1, 2021 at 01:07:50AM -0500, Bruce Momjian wrote:
> On Thu, Dec 31, 2020 at 11:50:47PM -0500, Bruce Momjian wrote:
> > I have completed the key management patch with tests created by Stephen
> > Frost.  Original patch by Masahiko Sawada.  It requires the hex
> > reorganization patch first.  The key patch is now 2.1MB because of the
> > tests, so attaching it here seems unwise:
> > 
> > https://github.com/postgres/postgres/compare/master...bmomjian:hex.diff
> > https://github.com/postgres/postgres/compare/master...bmomjian:key.diff
> > 
> > I will add it to the commitfest.  I think we need to figure out how much
> > of the tests we want to add.
> 
> I am getting regression test errors using OpenSSL 1.1.1d  10 Sep 2019
> with zero-length input data (no -p), while Stephen is able for those
> tests to pass.   This needs more research, plus I think higher-level
> tests.

I have found the cause of the failure, which I added as a C comment:

/*
 * OpenSSL 1.1.1d and earlier crashes on some zero-length plaintext
 * and ciphertext strings.  It crashes on an encryption call to
 * EVP_EncryptFinal_ex(() in GCM mode of zero-length strings if
 * plaintext is NULL, even though plaintext_len is zero.  Setting
 * plaintext to non-NULL allows it to work.  In KW/KWP mode,
 * zero-length strings fail if plaintext_len = 0 and plaintext is
 * non-NULL (the opposite).  OpenSSL 1.1.1e+ is fine with all options.
 */
else if (cipher == PG_CIPHER_AES_GCM)
{
plaintext_len = 0;
plaintext = pg_malloc0(1);
}

All the tests pass now.  The current src/test directory is 19MB, and
adding these tests takes it to 23MB, or a 20% increase.  That seems like
a lot.  It is testing 128-bit and 256-bit keys --- should we do fewer
tests, or just test 256, or use gzip to compress the tests by 50%? 
(Does every platform have gzip?)

My next step is to add the high-level tests.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: list of extended statistics on psql

2021-01-07 Thread Alvaro Herrera
On 2021-Jan-07, Tomas Vondra wrote:

> On 1/7/21 1:46 AM, Tatsuro Yamada wrote:

> > I overlooked the check for MCV in the logic building query
> > because I created the patch as a new feature on PG14.
> > I'm not sure whether we should do back patch or not. However, I'll
> > add the check on the next patch because it is useful if you decide to
> > do the back patch on PG10, 11, 12, and 13.
> 
> BTW perhaps a quick look at the other \d commands would show if there are
> precedents. I didn't have time for that.

Yes, we do promise that new psql works with older servers.

I think we would not backpatch any of this, though.

-- 
Álvaro Herrera




Re: Error on failed COMMIT

2021-01-07 Thread Dave Cramer
I could if someone wants to commit to reviewing it.
I've updated it a number of times but it seems nobody wants to review it.

Dave Cramer
www.postgres.rocks


On Thu, 7 Jan 2021 at 09:27, Masahiko Sawada  wrote:

> Hi Dave,
>
> On Tue, Dec 1, 2020 at 6:49 PM Georgios Kokolatos
>  wrote:
> >
> > Hi,
> >
> > this patch fails on the cfbot yet it has received an update during the
> current CF.
> >
> > I will move it to the next CF and mark it there as Waiting on Author.
> >
>
> This patch has not been updated for almost 2 months. According to
> cfbot test, the patch conflicts on only src/include/utils/elog.h.
> Could you submit the rebased patch?
>
> Regards,
>
> --
> Masahiko Sawada
> EnterpriseDB:  https://www.enterprisedb.com/
>


Re: Error on failed COMMIT

2021-01-07 Thread Masahiko Sawada
Hi Dave,

On Tue, Dec 1, 2020 at 6:49 PM Georgios Kokolatos
 wrote:
>
> Hi,
>
> this patch fails on the cfbot yet it has received an update during the 
> current CF.
>
> I will move it to the next CF and mark it there as Waiting on Author.
>

This patch has not been updated for almost 2 months. According to
cfbot test, the patch conflicts on only src/include/utils/elog.h.
Could you submit the rebased patch?

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/




Re: Proposal: Global Index

2021-01-07 Thread Bruce Momjian
On Thu, Jan  7, 2021 at 05:44:01PM +0800, 曾文旌 wrote:
> I've been following this topic for a long time. It's been a year since the 
> last response.
> It was clear that our customers wanted this feature as well, and a large 
> number of them mentioned it.
> 
> So, I wish the whole feature to mature as soon as possible.
> I summarized the scheme mentioned in the email and completed the POC 
> patch(base on PG_13).

I think you need to address the items mentioned in this blog, and the
email link it mentions:

https://momjian.us/main/blogs/pgblog/2020.html#July_1_2020

I am not clear this is a feature we will want.  Yes, people ask for it,
but if the experience will be bad for them and they will regret using
it, I am not sure we want it.  Of course, if you code it up and we get
a good user experience, we would want it --- I am just saying it is not
clear right now.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Online checksums patch - once again

2021-01-07 Thread Michael Banck
Hi,

On Thu, Jan 07, 2021 at 03:05:44PM +0100, Daniel Gustafsson wrote:
> > On 5 Jan 2021, at 18:19, Michael Banck  wrote:
> >> +  
> >> +   Data pages are not checksum protected by default, but this can 
> >> optionally be
> >> +   enabled for a cluster. 

This would also have to be rewritten to clarify that it "can optionally
be enabled during cluster creation or when the cluster is offline later
on" I'd say.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz




Re: Removing unneeded self joins

2021-01-07 Thread Masahiko Sawada
Hi Andrey,

On Mon, Nov 30, 2020 at 2:51 PM Andrey V. Lepikhov
 wrote:
>
> On 11/29/20 10:10 PM, Heikki Linnakangas wrote:
> > On 28/11/2020 19:21, Andrey Lepikhov wrote:
> >> On 27.11.2020 21:49, Heikki Linnakangas wrote:
> >> CREATE TABLE a(x int, y int);
> >> CREATE UNIQUE INDEX ON a(x);
> >> SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.x;  -- self-join
> >> CREATE UNIQUE INDEX ON a(y);
> >> SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y;  -- self-join too
> >
> > The latter join is not "useless". The patch is returning incorrect
> > result for that query:
> >
> >> postgres=# insert into a values (1, 2);
> >> INSERT 0 1
> >> postgres=# insert into a values (2, 1);
> >> INSERT 0 1
> >> postgres=# SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- WRONG RESULT
> >>  x | y ---+---
> >> (0 rows)
> >>
> >> postgres=# set enable_self_join_removal=off;
> >> SET
> >> postgres=# SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- CORRECT
> >> RESULT
> >>  x | y ---+---
> >>  1 | 2
> >>  2 | 1
> >> (2 rows)
>
> Thanks, it is my fault. I tried to extend this patch with foreign key
> references and made a mistake.
> Currently I rollback this new option (see patch in attachment), but will
> be working for a while to simplify this patch.

Are you working to simplify the patch? This patch has been "Waiting on
Author" for 1 month and doesn't seem to pass cfbot tests. Please
update the patch.

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/




Re: Online checksums patch - once again

2021-01-07 Thread Daniel Gustafsson
> On 5 Jan 2021, at 18:19, Michael Banck  wrote:

>> diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
>> index 385ac25150..e3b0048806 100644
>> --- a/doc/src/sgml/ref/initdb.sgml
>> +++ b/doc/src/sgml/ref/initdb.sgml
>> @@ -219,6 +219,7 @@ PostgreSQL documentation
>> failures will be reported in the
>> 
>> pg_stat_database view.
>> +See  for details.
>>
>>   
>>  
> 
>> diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
>> index f4bc147b10..5dcfcdd2ff 100644
>> --- a/doc/src/sgml/wal.sgml
>> +++ b/doc/src/sgml/wal.sgml
>> @@ -230,6 +230,103 @@
>>   
>>  
>> 
>> + 
>> +  Data Checksums
>> +  
>> +   checksums
>> +  
>> +
>> +  
>> +   Data pages are not checksum protected by default, but this can 
>> optionally be
>> +   enabled for a cluster.  When enabled, each data page will be assigned a
>> +   checksum that is updated when the page is written and verified every time
>> +   the page is read. Only data pages are protected by checksums, internal 
>> data
>> +   structures and temporary files are not.
>> +  
>> +
>> +  
>> +   Checksums are normally enabled when the cluster is initialized using 
>> > +   
>> linkend="app-initdb-data-checksums">initdb.
>> +   They can also be enabled or disabled at a later time, either as an 
>> offline
>> +   operation or in a running cluster. In all cases, checksums are enabled or
>> +   disabled at the full cluster level, and cannot be specified individually 
>> for
>> +   databases or tables.
>> +  
>> +
>> +  
>> +   The current state of checksums in the cluster can be verified by viewing 
>> the
>> +   value of the read-only configuration variable > +   linkend="guc-data-checksums" /> by issuing the command SHOW
>> +   data_checksums.
>> +  
>> +
>> +  
>> +   When attempting to recover from corrupt data it may be necessary to 
>> bypass
>> +   the checksum protection in order to recover data. To do this, temporarily
>> +   set the configuration parameter > linkend="guc-ignore-checksum-failure" />.
>> +  
> 
> I think the above is rather informative about checksums in general and
> not specific to online activation of checksusm, so could pretty much be
> committed verbatim right now, except for the "either as an offline
> operation or in a running cluster" bit which would have to be rewritten.

That might indeedbe useful regardless of the state of this patch.  Robert and
Heikki: being committers who have both reviewed recent versions of the patch,
would you prefer these sections be broken out into a separate patch?

cheers ./daniel



Re: PoC/WIP: Extended statistics on expressions

2021-01-07 Thread Dean Rasheed
Starting to look at the planner code, I found an oversight in the way
expression stats are read at the start of planning -- it is necessary
to call ChangeVarNodes() on any expressions if the relid isn't 1,
otherwise the stats expressions may contain Var nodes referring to the
wrong relation. Possibly the easiest place to do that would be in
get_relation_statistics(), if rel->relid != 1.

Here's a simple test case:

CREATE TABLE foo AS SELECT x FROM generate_series(1,10) g(x);
CREATE STATISTICS foo_s ON (x%10) FROM foo;
ANALYSE foo;

EXPLAIN SELECT * FROM foo WHERE x%10 = 0;
EXPLAIN SELECT * FROM (SELECT 1) t, foo WHERE x%10 = 0;

(in the second query, the stats don't get applied).

Regards,
Dean




Re: pgbench stopped supporting large number of client connections on Windows

2021-01-07 Thread Masahiko Sawada
 Hi Marina,

On Sun, Nov 8, 2020 at 11:59 PM Marina Polyakova
 wrote:
>
> On 2020-11-07 01:01, Fabien COELHO wrote:
> > Hello Marina,
>
> Hello, Fabien!
>
> Thank you for your comments!
>
> >> While trying to test a patch that adds a synchronization barrier in
> >> pgbench [1] on Windows,
> >
> > Thanks for trying that, I do not have a windows setup for testing, and
> > the sync code I wrote for Windows is basically blind coding:-(
>
> FYI:
>
> 1) It looks like pgbench will no longer support Windows XP due to the
> function DeleteSynchronizationBarrier. From
> https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-deletesynchronizationbarrier
> :
>
> Minimum supported client: Windows 8 [desktop apps only]
> Minimum supported server: Windows Server 2012 [desktop apps only]
>
> On Windows Server 2008 R2 (MSVC 2013) the 6-th version of the patch [1]
> has compiled without (new) warnings, but when running pgbench I got the
> following error:
>
> The procedure entry point DeleteSynchronizationBarrier could not be
> located in the dynamic link library KERNEL32.dll.
>
> 2) On Windows Server 2019 (MSVC 2019) the 6-th version of the patch [1]
> with fix_max_client_conn_on_Windows.patch has compiled without (new)
> warnings. I made a few runs (-M prepared -c 100 -j 10 -T 10 -P1 -S) with
> and without your patches. On Linux (-M prepared -c 1000 -j 500 -T 10 -P1
> -S) your patches fix problems with progress reports as in [2], but on
> Windows I did not notice such changes, see attached
> pgbench_runs_linux_vs_windows.zip.
>
> >> The almost same thing happens with reindexdb and vacuumdb (build on
> >> commit [3]):
> >
> > Windows fd implementation is somehow buggy because it does not return
> > the smallest number available, and then with the assumption that
> > select uses a dense array indexed with them (true on linux, less so on
> > Windows which probably uses a sparse array), so that the number gets
> > over the limit, even if less are actually used, hence the catch, as
> > you noted.
>
> I agree with you. It looks like the structure fd_set just contains used
> sockets by this application on Windows, and the macro FD_SETSIZE is used
> only here.
>
>  From
> https://docs.microsoft.com/en-us/windows/win32/api/winsock/ns-winsock-fd_set
> :
>
> typedef struct fd_set {
>u_int  fd_count;
>SOCKET fd_array[FD_SETSIZE];
> } fd_set, FD_SET, *PFD_SET, *LPFD_SET;
>
>  From
> https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2
> :
>
> The maximum number of sockets that a Windows Sockets application can use
> is not affected by the manifest constant FD_SETSIZE. This value defined
> in the Winsock2.h header file is used in constructing the FD_SET
> structures used with select function.
>
> >> IIUC the checks below are not correct on Windows, since on this system
> >> sockets can have values equal to or greater than FD_SETSIZE (see
> >> Windows documentation [4] and pgbench debug output in attached
> >> pgbench_debug.txt).
> >
> > Okay.
> >
> > But then, how may one detect that there are too many fds in the set?
> >
> > I think that an earlier version of the code needed to make assumptions
> > about the internal implementation of windows (there is a counter
> > somewhere in windows fd_set struct), which was rejected because if was
> > breaking the interface. Now your patch is basically resurrecting that.
>
> I tried to keep the behaviour "we check if the socket value can be used
> in select() at runtime", but now I will also read that thread...
>
> > Why not if there is no other solution, but this is quite depressing,
> > and because it breaks the interface it would be broken if windows
> > changed its internals for some reason:-(
>
> It looks like if the internals of the structure fd_set are changed, we
> will also have problems with the function pgwin32_select from
> src/backend/port/win32/socket.c, because it uses fd_set.fd_count too?..
>
> (I'm writing responses to the rest of your comments but it takes
> time...)
>

This patch on Commitfest has been "Waiting on Author" for almost 2
months. Could you share the current status? Are you updating the
patch?

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/




Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Josef Šimánek
čt 7. 1. 2021 v 14:08 odesílatel Amit Kapila  napsal:
>
> On Thu, Jan 7, 2021 at 3:15 AM Tomas Vondra
>  wrote:
> >
> > On 1/5/21 11:02 AM, Josef Šimánek wrote:
> > > I'm attaching the whole patch since commitfest failed to ingest the
> > > last incremental on CI.
> > >
> >
> > Yeah, the whole patch needs to be attached for the commitfest tester to
> > work correctly - it can't apply pieces from multiple messages, etc.
> >
> > Anyway, I pushed this last version of patch, after a couple more tweaks,
> > mainly to the docs - one place used pg_stat_copy_progress, the section
> > was not indexed properly, and so on.
> >
>
> How about including a column for command similar to
> pg_stat_progress_create_index and pg_stat_progress_cluster? It seems
> that command will be useful in the context of COPY as there are many
> variants of COPY.
>

>From pg_stat_progress_create_index docs:

The command that is running: CREATE INDEX, CREATE INDEX CONCURRENTLY,
REINDEX, or REINDEX CONCURRENTLY.

Which values should be present in copy progress? Just COPY FROM or COPY TO?

> With Regards,
> Amit Kapila.




Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Amit Kapila
On Thu, Jan 7, 2021 at 3:15 AM Tomas Vondra
 wrote:
>
> On 1/5/21 11:02 AM, Josef Šimánek wrote:
> > I'm attaching the whole patch since commitfest failed to ingest the
> > last incremental on CI.
> >
>
> Yeah, the whole patch needs to be attached for the commitfest tester to
> work correctly - it can't apply pieces from multiple messages, etc.
>
> Anyway, I pushed this last version of patch, after a couple more tweaks,
> mainly to the docs - one place used pg_stat_copy_progress, the section
> was not indexed properly, and so on.
>

How about including a column for command similar to
pg_stat_progress_create_index and pg_stat_progress_cluster? It seems
that command will be useful in the context of COPY as there are many
variants of COPY.

-- 
With Regards,
Amit Kapila.




Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion

2021-01-07 Thread Fujii Masao



On 2021/01/07 17:28, shinya11.k...@nttdata.com wrote:

On Thu, Jan 7, 2021 at 1:30 PM Fujii Masao 
 wrote:


On 2021/01/07 12:42, Masahiko Sawada wrote:

On Thu, Jan 7, 2021 at 10:59 AM Fujii Masao 
 wrote:


On 2021/01/07 10:01, Masahiko Sawada wrote:

On Wed, Jan 6, 2021 at 3:37 PM  wrote:



+#define Query_for_list_of_cursors \
+" SELECT name FROM pg_cursors"\

This query should be the following?

" SELECT pg_catalog.quote_ident(name) "\
" FROM pg_catalog.pg_cursors "\
" WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s'"

+/* CLOSE */
+ else if (Matches("CLOSE"))
+ COMPLETE_WITH_QUERY(Query_for_list_of_cursors
+ " UNION ALL SELECT 'ALL'");

"UNION ALL" should be "UNION"?


Thank you for your advice, and I corrected them.


+ COMPLETE_WITH_QUERY(Query_for_list_of_cursors
+ " UNION SELECT 'ABSOLUTE'"
+ " UNION SELECT 'BACKWARD'"
+ " UNION SELECT 'FORWARD'"
+ " UNION SELECT 'RELATIVE'"
+ " UNION SELECT 'ALL'"
+ " UNION SELECT 'NEXT'"
+ " UNION SELECT 'PRIOR'"
+ " UNION SELECT 'FIRST'"
+ " UNION SELECT 'LAST'"
+ " UNION SELECT 'FROM'"
+ " UNION SELECT 'IN'");

This change makes psql unexpectedly output "FROM" and "IN" just after "FETCH".


I think "FROM" and "IN" can be placed just after "FETCH". According to
the documentation, the direction can be empty. For instance, we can do
like:


Thank you!


I've attached the patch improving the tab completion for DECLARE as an
example. What do you think?

BTW according to the documentation, the options of DECLARE statement
(BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive.

DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ]
CURSOR [ { WITH | WITHOUT } HOLD ] FOR query

But I realized that these options are actually order-insensitive. For
instance, we can declare a cursor like:

=# declare abc scroll binary cursor for select * from pg_class;
DECLARE CURSOR

The both parser code and documentation has been unchanged from 2003.
Is it a documentation bug?


Thank you for your patch, and it is good.
I cannot find the description "(BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are 
order-sensitive."
I saw "The key words BINARY, INSENSITIVE, and SCROLL can appear in any order." 
, according to the documentation.


Thanks, you're right. I missed that sentence. I still don't think the
syntax of DECLARE statement in the documentation doesn't match its
implementation but I agree that it's order-insensitive.


I made a new patch, but the amount of codes was large due to order-insensitive.


Thank you for updating the patch!

Yeah, I'm also afraid a bit that conditions will exponentially
increase when a new option is added to DECLARE statement in the
future. Looking at the parser code for DECLARE statement, we can put
the same options multiple times (that's also why I don't think it
matches). For instance,

postgres(1:44758)=# begin;
BEGIN
postgres(1:44758)=# declare test binary binary binary cursor for
select * from pg_class;
DECLARE CURSOR

So how about simplify the above code as follows?

@@ -3005,8 +3014,23 @@ psql_completion(const char *text, int start, int end)
else if (Matches("DECLARE", MatchAny))
COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL",
"CURSOR");
+ /*
+ * Complete DECLARE  with one of BINARY, INSENSITIVE, SCROLL,
+ * NO SCROLL, and CURSOR. The tail doesn't match any keywords for
+ * DECLARE, assume we want options.
+ */
+ else if (HeadMatches("DECLARE", MatchAny, "*") &&
+ TailMatches(MatchAnyExcept("CURSOR|WITH|WITHOUT|HOLD|FOR")))
+ COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL",
+ "CURSOR");


This change seems to cause "DECLARE ... CURSOR FOR SELECT " to
unexpectedly output BINARY, INSENSITIVE, etc.


Indeed. Is the following not complete but much better?


Yes, I think that's better.



@@ -3002,11 +3011,18 @@ psql_completion(const char *text, int start, int end)
" UNION SELECT 'ALL'");

/* DECLARE */
- else if (Matches("DECLARE", MatchAny))
- COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL",
- "CURSOR");
+ /*
+ * Complete DECLARE  with one of BINARY, INSENSITIVE, SCROLL,
+ * NO SCROLL, and CURSOR. If the tail is any one of options, assume we
+ * still want options.
+ */
+ else if (Matches("DECLARE", MatchAny) ||
+ TailMatches("BINARY|INSENSITIVE|SCROLL|NO"))
+ COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL", "CURSOR");


This change seems to cause "DECLARE ... NO " to unexpectedly output
"NO SCROLL". Also this change seems to cause "COPY ... (FORMAT BINARY "
to unexpectedly output BINARY, CURSOR, etc.


Oops, I missed the HeadMatches(). Thank you for pointing this out.

I've attached the updated patch including Kato-san's changes. I
think the tab completion support for DECLARE added by this patch
works better.


Thanks!

+   /* Complete with more options */
+   else if (HeadMatches("DECLARE", MatchAny, "BINARY|INSENSITIVE|SCROLL|NO") 
&&
+TailMatches("BINARY|INSENSITIVE|SCROLL|NO"))

Seems "MatchAny, "BINARY|INSENSITIVE|SCROLL|NO"" is not 

Re: pg_rewind restore_command issue in PG12

2021-01-07 Thread Amine Tengilimoglu
You're welcome Michael!



Michael Paquier , 7 Oca 2021 Per, 14:54 tarihinde şunu
yazdı:

> On Tue, Jan 05, 2021 at 11:54:42AM +0300, Amine Tengilimoglu wrote:
> > Thank you Michael.  I agree with you. Relevant part can be removed from
> the
> > document and  eliminate the confusion at least.
>
> Okay, I got around this stuff, and committed a fix for 9.6~12.  Thanks
> for the report, Amine!
> --
> Michael
>


Re: Proposal: Global Index

2021-01-07 Thread 曾文旌
I've been following this topic for a long time. It's been a year since the last response.It was clear that our customers wanted this feature as well, and a large number of them mentioned it.So, I wish the whole feature to mature as soon as possible.I summarized the scheme mentioned in the email and completed the POC patch(base on PG_13).Next, I encountered some difficulties when implementing the DDL of the partition table with global index, and I hope to get some help from the communityHere are some details what has been implemented1 Definition of global indexUsing the INCLUDE keyword to include the tableoid of the partitioned table.2. Maintenance of global index by partition table DML.Both INSERT and UPDATE of a partitioned table maintain global index3. Global index scanPlanner: Processes predicate conditions on the primary partition, generating paths and plans for the global index.Executer: index scan get indextup, get the tableoid from indextup, and verify the visibility of the data in the partition.4. Vacuum partition table maintains global indexEach partitioned table VACUUM cleans its own garbage data in the global index.After the above function point is completed, the global index can be used without partition table DDL.Demo:--Use pgbench to create the test partition tablepgbench -i -s 1000 --partitions=6 --partition-method=range—- create global index on bid, bid is not partition keyCREATE INDEX  idx_pgbench_accounts_bid on pgbench_accounts(bid) global;— check global index statusselect * , sum(alivetup) over()as sum_alivetup, sum(deadtup) over() as sum_deadtup from bt_get_global_index_status('idx_pgbench_accounts_bid');      relname       | alivetup | deadtup | sum_alivetup | sum_deadtup +--+-+--+- pgbench_accounts_1 | 1667 |       0 |    1 |           0 pgbench_accounts_2 | 1667 |       0 |    1 |           0 pgbench_accounts_3 | 1667 |       0 |    1 |           0 pgbench_accounts_4 | 1667 |       0 |    1 |           0 pgbench_accounts_5 | 1667 |       0 |    1 |           0 pgbench_accounts_6 | 1665 |       0 |    1 |           0(6 rows)— run pgbench for for a whilepgbench -M prepared  -j 32 -c 32 -T 60 -P1—- check global index, The index has bloatedpostgres=# select * , sum(alivetup) over()as sum_alivetup, sum(deadtup) over() as sum_deadtup from bt_get_global_index_status('idx_pgbench_accounts_bid');      relname       | alivetup | deadtup | sum_alivetup | sum_deadtup +--+-+--+- pgbench_accounts_1 | 16717733 |       0 |    100306102 |           0 pgbench_accounts_2 | 16717409 |       0 |    100306102 |           0 pgbench_accounts_3 | 16717540 |       0 |    100306102 |           0 pgbench_accounts_4 | 16717972 |       0 |    100306102 |           0 pgbench_accounts_5 | 16717578 |       0 |    100306102 |           0 pgbench_accounts_6 | 16717870 |       0 |    100306102 |           0(6 rows)—- vacuum partition tablevacuum pgbench_accounts;—- Garbage is collected, global index still looks correct and valid.postgres=# select * , sum(alivetup) over()as sum_alivetup, sum(deadtup) over() as sum_deadtup from bt_get_global_index_status('idx_pgbench_accounts_bid');      relname       | alivetup | deadtup | sum_alivetup | sum_deadtup +--+-+--+- pgbench_accounts_1 | 1667 |       0 |    1 |           0 pgbench_accounts_2 | 1667 |       0 |    1 |           0 pgbench_accounts_3 | 1667 |       0 |    1 |           0 pgbench_accounts_4 | 1667 |       0 |    1 |           0 pgbench_accounts_5 | 1667 |       0 |    1 |           0 pgbench_accounts_6 | 1665 |       0 |    1 |           0(6 rows)—-—- global index scan works wellpostgres=# select tableoid ,count(*) from pgbench_accounts where bid = 834 group by tableoid; tableoid | count --+---    16455 | 5    16458 | 5(2 rows)postgres=# explain select tableoid ,count(*) from pgbench_accounts where bid = 834 group by tableoid;                                                     QUERY PLAN                                                      HashAggregate  (cost=2945.23..2945.24 rows=1 width=12)   Group Key: pgbench_accounts.tableoid   ->  Global Index Scan using idx_pgbench_accounts_bid on pgbench_accounts  (cost=0.50..10.18 rows=587011 width=4)         Index Cond: (bid = 834)(4 rows)The following is how to implement DDL of global index. How to maintain global index of DDL of partitioned table.This seems to be more difficult than the previous work.I understand there are four main parts1 Build global index or reindex, especially in concurrent mode2 Detach partitionWould it be a good idea to make a flag to global 

Re: pg_rewind restore_command issue in PG12

2021-01-07 Thread Michael Paquier
On Tue, Jan 05, 2021 at 11:54:42AM +0300, Amine Tengilimoglu wrote:
> Thank you Michael.  I agree with you. Relevant part can be removed from the
> document and  eliminate the confusion at least.

Okay, I got around this stuff, and committed a fix for 9.6~12.  Thanks
for the report, Amine!
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Josef Šimánek
st 6. 1. 2021 v 22:44 odesílatel Tomas Vondra
 napsal:
>
> On 1/5/21 11:02 AM, Josef Šimánek wrote:
> > I'm attaching the whole patch since commitfest failed to ingest the
> > last incremental on CI.
> >
>
> Yeah, the whole patch needs to be attached for the commitfest tester to
> work correctly - it can't apply pieces from multiple messages, etc.
>
> Anyway, I pushed this last version of patch, after a couple more tweaks,
> mainly to the docs - one place used pg_stat_copy_progress, the section
> was not indexed properly, and so on.
>
> I see Matthias proposed to change "lines" to "tuples" - I only saw the
> message after pushing, but I probably wouldn't make that change anyway.
> The CSV docs seem to talk about lines, newlines etc. so it seems fine.
> If not, we can change that.
>
> One more question, though - I now realize the lines_processed ignores
> rows skipped because of BEFORE INSERT triggers. I wonder if that's the
> right thing to do? Imagine you know the number of lines in a file. You
> can't really use (lines_processed / total_lines) to measure progress,
> because that may ignore many "skipped" rows. So maybe this should be
> changed to count all rows. OTOH we still have bytes_processed.

I think that should be fixed. It is called "lines_processed" not
"lines_inserted". I'll take a look.

>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: Prevent printing "next step instructions" in initdb and pg_upgrade

2021-01-07 Thread Peter Eisentraut
After pondering this again, I think we can go with initdb 
--no-instructions, as in your patch.


As a minor nitpick, I would leave out the

else
printf(_("\nSuccess.\n"));

in the --no-instructions case.

(I don't know where the pg_upgrade part of this discussion is right now.)

--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/




Re: Parallel INSERT (INTO ... SELECT ...)

2021-01-07 Thread Amit Kapila
On Wed, Jan 6, 2021 at 2:09 PM Antonin Houska  wrote:
>
> Greg Nancarrow  wrote:
>
> > Posting an updated set of patches to address recent feedback:
>
> Following is my review.
>
> v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch
> --
>
> @@ -342,6 +343,18 @@ standard_planner(Query *parse, const char *query_string, 
> int cursorOptions,
> /* all the cheap tests pass, so scan the query tree */
> glob->maxParallelHazard = max_parallel_hazard(parse);
> glob->parallelModeOK = (glob->maxParallelHazard != 
> PROPARALLEL_UNSAFE);
> +
> +   /*
> +* Additional parallel-mode safety checks are required in 
> order to
> +* allow an underlying parallel query to be used for a
> +* table-modification command that is supported in 
> parallel-mode.
> +*/
> +   if (glob->parallelModeOK &&
> +   IsModifySupportedInParallelMode(parse->commandType))
> +   {
> +   glob->maxParallelHazard = 
> max_parallel_hazard_for_modify(parse, >maxParallelHazard);
> +   glob->parallelModeOK = (glob->maxParallelHazard != 
> PROPARALLEL_UNSAFE);
> +   }
>
> Is it really ok to allow PROPARALLEL_RESTRICTED? Per definition, these
> functions should not be called by parallel worker.
>

What in the above change indicates that the parallel_restricted will
be allowed in parallel workers. This just sets paralleModeOK to allow
parallel plans for Selects if the Insert can be performed safely in a
leader backend.

>
> @@ -1015,6 +1016,27 @@ IsInParallelMode(void)
>  }
>
>  /*
> + * PrepareParallelMode
> + *
> + * Prepare for entering parallel mode, based on command-type.
> + */
> +void
> +PrepareParallelMode(CmdType commandType)
> +{
> +   Assert(!IsInParallelMode() || force_parallel_mode != 
> FORCE_PARALLEL_OFF);
>
> Isn't the test of force_parallel_mode just a hack to make regression tests
> pass? When I removed this part and ran the regression tests with
> force_parallel_mode=regress, the assertion fired when executing a subquery
> because the executor was already in parallel mode due to the main query
> execution.
>

I think this Assert is bogus. We are allowed to enter in parallel-mode
if we are already in parallel-mode, see EnterParallelMode. But we
shouldn't be allowed allocate xid in parallel-mode. So the
Assert(!IsInParallelMode()) should be moved inside the check if
(IsModifySupportedInParallelMode(commandType)) in this function. Can
you check if it still fails after such a modification?

> As an alternative, have you considered allocation of the XID even in parallel
> mode? I imagine that the first parallel worker that needs the XID for
> insertions allocates it and shares it with the other workers as well as with
> the leader process.
>

As a matter of this patch
(v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch), we
never need to allocate xids by workers because Insert is always
performed by leader backend. Even, if we want to do what you are
suggesting it would be tricky because currently, we don't have such an
infrastructure where we can pass information among workers.

> One problem of the current patch version is that the "INSERT INTO ... SELECT
> ..." statement consumes XID even if the SELECT eventually does not return any
> row. However, if the same query is processed w/o parallelism, the XID is only
> allocated if at least one tuple needs to be inserted.
>

Yeah, that is true but I think this can happen w/o parallelism for
updates and deletes where by the time we try to modify the row, it got
modified by a concurrent session and the first session will needlessly
allocate XID.

-- 
With Regards,
Amit Kapila.




Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW

2021-01-07 Thread Bharath Rupireddy
On Mon, Dec 28, 2020 at 5:56 PM Bharath Rupireddy
 wrote:
>
> On Tue, Dec 22, 2020 at 7:01 PM Bharath Rupireddy
>  wrote:
> > Currently, $subject is not allowed. We do plan the mat view query
> > before every refresh. I propose to show the explain/explain analyze of
> > the select part of the mat view in case of Refresh Mat View(RMV). It
> > will be useful for the user to know what exactly is being planned and
> > executed as part of RMV. Please note that we already have
> > explain/explain analyze CTAS/Create Mat View(CMV), where we show the
> > explain/explain analyze of the select part. This proposal will do the
> > same thing.
> >
> > The behaviour can be like this:
> > EXPLAIN REFRESH MATERIALIZED VIEW mv1;   --> will not refresh the mat
> > view, but shows the select part's plan of mat view.
> > EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW mv1;   --> will refresh the
> > mat view and shows the select part's plan of mat view.
> >
> > Thoughts? If okay, I will post a patch later.
>
> Attaching below patches:
>
> 0001 - Rearrange Refresh Mat View Code - Currently, the function
> ExecRefreshMatView in matview.c is having many lines of code which is
> not at all good from readability and maintainability perspectives.
> This patch adds a few functions and moves the code from
> ExecRefreshMatView to them making the code look better.
>
> 0002 - EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW support and tests.
>
> If this proposal is useful, I have few open points - 1) In the patch I
> have added a new mat view info parameter to ExplainOneQuery(), do we
> also need to add it to ExplainOneQuery_hook_type? 2) Do we document
> (under respective command pages or somewhere else) that we allow
> explain/explain analyze for a command?
>
> Thoughts?

Attaching v2 patch set reabsed on the latest master f7a1a805cb. And
also added an entry for upcoming commitfest -
https://commitfest.postgresql.org/32/2928/

Please consider the v2 patches for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v2-0001-Rearrange-Refresh-Mat-View-Code.patch
Description: Binary data


v2-0002-EXPLAIN-EXPLAIN-ANALYZE-REFRESH-MATERIALIZED-VIEW.patch
Description: Binary data


RE: [Patch] Optimize dropping of relation buffers using dlist

2021-01-07 Thread k.jami...@fujitsu.com
On Thu, January 7, 2021 5:36 PM (JST), Amit Kapila wrote:
> 
> On Wed, Jan 6, 2021 at 6:43 PM k.jami...@fujitsu.com
>  wrote:
> >
> > [Results for VACUUM on single relation]
> > Average of 5 runs.
> >
> > 1. % REGRESSION
> > % Regression: (patched - master)/master
> >
> > | rel_size | 128MB  | 1GB| 20GB   | 100GB|
> > |--||||--|
> > | NB/512   | 0.000% | 0.000% | 0.000% | -32.680% |
> > | NB/256   | 0.000% | 0.000% | 0.000% | 0.000%   |
> > | NB/128   | 0.000% | 0.000% | 0.000% | -16.502% |
> > | NB/64| 0.000% | 0.000% | 0.000% | -9.841%  |
> > | NB/32| 0.000% | 0.000% | 0.000% | -6.219%  |
> > | NB/16| 0.000% | 0.000% | 0.000% | 3.323%   |
> > | NB/8 | 0.000% | 0.000% | 0.000% | 8.178%   |
> >
> > For 100GB shared_buffers, we can observe regression
> > beyond NBuffers/32. So with this, we can conclude
> > that NBuffers/32 is the right threshold.
> > For NBuffers/16 and beyond, the patched performs
> > worse than master. In other words, the cost of for finding
> > to be invalidated buffers gets higher in the optimized path
> > than the traditional path.
> >
> > So in attached V39 patches, I have updated the threshold
> > BUF_DROP_FULL_SCAN_THRESHOLD to NBuffers/32.
> >
> 
> Thanks for the detailed tests. NBuffers/32 seems like an appropriate
> value for the threshold based on these results. I would like to
> slightly modify part of the commit message in the first patch as below
> [1], otherwise, I am fine with the changes. Unless you or anyone else
> has any more comments, I am planning to push the 0001 and 0002
> sometime next week.
> 
> [1]
> "The recovery path of DropRelFileNodeBuffers() is optimized so that
> scanning of the whole buffer pool can be avoided when the number of
> blocks to be truncated in a relation is below a certain threshold. For
> such cases, we find the buffers by doing lookups in BufMapping table.
> This improves the performance by more than 100 times in many cases
> when several small tables (tested with 1000 relations) are truncated
> and where the server is configured with a large value of shared
> buffers (greater than 100GB)."

Thank you for taking a look at the results of the tests. And it's also 
consistent with the results from Tang too.
The commit message LGTM.

Regards,
Kirk Jamison


Re: plpgsql variable assignment with union is broken

2021-01-07 Thread easteregg
to be clear, that was existing code within our codebase, it was used as a very 
bad alternative to an if/else clause. think of it as an tenery operator. thats 
just straight up bad code as tom said. but even if it is, its a change and i 
wanted to check, if its intended. so thank you for you time and consideration!




Re: CheckpointLock needed in CreateCheckPoint()?

2021-01-07 Thread Bharath Rupireddy
On Thu, Jan 7, 2021 at 1:22 PM Amul Sul  wrote:
 > I am not sure I understood your point completely.  Do you mean there could be
> multiple bootstrap or standalone backends that could exist at a time and can
> perform checkpoint?

Actually, my understanding of the standalone backend was wrong
initially. Now, I'm clear with that.

> > I don't think we can completely get rid of CheckpointLock in
> > CreateCheckPoint given the fact that it can get called from multiple
> > processes.
> >
>
> How?

When the server is run in a standalone backend, it seems like there
can be only one process running in single user mode, as pointed by
Dilip upthread, so there will be no simultaneous calls to
CreateCheckPoint.

> > How about serving that ASRO barrier request alone for checkpointer
> > process in ProcessInterrupts even though the CheckpointLock is held by
> > the checkpointer process? And also while the checkpointing is
> > happening, may be we should call CHECK_FOR_INTERRUPTS to see if the
> > ASRO barrier has arrived. This may sound bit hacky, but just a
> > thought. I'm thinking that in ProcessInterrupts we can get to know the
> > checkpointer process type via MyAuxProcType, and whether the lock is
> > held or not using CheckpointLock, and then we can handle the ASRO
> > global barrier request.
> >
>
> I am afraid that this will easily get rejected by the community.  Note that 
> this
> is a very crucial code path of the database server. There are other options
> too, but don't want to propose them until clarity on the CheckpointLock
> necessity.

I understand that.

I'm sorry, if I have sidetracked the main point here.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: [Patch] Optimize dropping of relation buffers using dlist

2021-01-07 Thread Amit Kapila
On Wed, Jan 6, 2021 at 6:43 PM k.jami...@fujitsu.com
 wrote:
>
> [Results for VACUUM on single relation]
> Average of 5 runs.
>
> 1. % REGRESSION
> % Regression: (patched - master)/master
>
> | rel_size | 128MB  | 1GB| 20GB   | 100GB|
> |--||||--|
> | NB/512   | 0.000% | 0.000% | 0.000% | -32.680% |
> | NB/256   | 0.000% | 0.000% | 0.000% | 0.000%   |
> | NB/128   | 0.000% | 0.000% | 0.000% | -16.502% |
> | NB/64| 0.000% | 0.000% | 0.000% | -9.841%  |
> | NB/32| 0.000% | 0.000% | 0.000% | -6.219%  |
> | NB/16| 0.000% | 0.000% | 0.000% | 3.323%   |
> | NB/8 | 0.000% | 0.000% | 0.000% | 8.178%   |
>
> For 100GB shared_buffers, we can observe regression
> beyond NBuffers/32. So with this, we can conclude
> that NBuffers/32 is the right threshold.
> For NBuffers/16 and beyond, the patched performs
> worse than master. In other words, the cost of for finding
> to be invalidated buffers gets higher in the optimized path
> than the traditional path.
>
> So in attached V39 patches, I have updated the threshold
> BUF_DROP_FULL_SCAN_THRESHOLD to NBuffers/32.
>

Thanks for the detailed tests. NBuffers/32 seems like an appropriate
value for the threshold based on these results. I would like to
slightly modify part of the commit message in the first patch as below
[1], otherwise, I am fine with the changes. Unless you or anyone else
has any more comments, I am planning to push the 0001 and 0002
sometime next week.

[1]
"The recovery path of DropRelFileNodeBuffers() is optimized so that
scanning of the whole buffer pool can be avoided when the number of
blocks to be truncated in a relation is below a certain threshold. For
such cases, we find the buffers by doing lookups in BufMapping table.
This improves the performance by more than 100 times in many cases
when several small tables (tested with 1000 relations) are truncated
and where the server is configured with a large value of shared
buffers (greater than 100GB)."

-- 
With Regards,
Amit Kapila.




RE: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion

2021-01-07 Thread Shinya11.Kato
>On Thu, Jan 7, 2021 at 1:30 PM Fujii Masao 
> wrote:
>>
>> On 2021/01/07 12:42, Masahiko Sawada wrote:
>> > On Thu, Jan 7, 2021 at 10:59 AM Fujii Masao 
>> >  wrote:
>> >>
>> >> On 2021/01/07 10:01, Masahiko Sawada wrote:
>> >>> On Wed, Jan 6, 2021 at 3:37 PM  
>> >>> wrote:
>> 
>> > +#define Query_for_list_of_cursors \
>> > +" SELECT name FROM pg_cursors"\
>> >
>> > This query should be the following?
>> >
>> > " SELECT pg_catalog.quote_ident(name) "\
>> > " FROM pg_catalog.pg_cursors "\
>> > " WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s'"
>> >
>> > +/* CLOSE */
>> > + else if (Matches("CLOSE"))
>> > + COMPLETE_WITH_QUERY(Query_for_list_of_cursors
>> > + " UNION ALL SELECT 'ALL'");
>> >
>> > "UNION ALL" should be "UNION"?
>> 
>>  Thank you for your advice, and I corrected them.
>> 
>> >> + COMPLETE_WITH_QUERY(Query_for_list_of_cursors
>> >> + " UNION SELECT 'ABSOLUTE'"
>> >> + " UNION SELECT 'BACKWARD'"
>> >> + " UNION SELECT 'FORWARD'"
>> >> + " UNION SELECT 'RELATIVE'"
>> >> + " UNION SELECT 'ALL'"
>> >> + " UNION SELECT 'NEXT'"
>> >> + " UNION SELECT 'PRIOR'"
>> >> + " UNION SELECT 'FIRST'"
>> >> + " UNION SELECT 'LAST'"
>> >> + " UNION SELECT 'FROM'"
>> >> + " UNION SELECT 'IN'");
>> >>
>> >> This change makes psql unexpectedly output "FROM" and "IN" just after 
>> >> "FETCH".
>> >
>> > I think "FROM" and "IN" can be placed just after "FETCH". According to
>> > the documentation, the direction can be empty. For instance, we can do
>> > like:
>> 
>>  Thank you!
>> 
>> > I've attached the patch improving the tab completion for DECLARE as an
>> > example. What do you think?
>> >
>> > BTW according to the documentation, the options of DECLARE statement
>> > (BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive.
>> >
>> > DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ]
>> > CURSOR [ { WITH | WITHOUT } HOLD ] FOR query
>> >
>> > But I realized that these options are actually order-insensitive. For
>> > instance, we can declare a cursor like:
>> >
>> > =# declare abc scroll binary cursor for select * from pg_class;
>> > DECLARE CURSOR
>> >
>> > The both parser code and documentation has been unchanged from 2003.
>> > Is it a documentation bug?
>> 
>>  Thank you for your patch, and it is good.
>>  I cannot find the description "(BINARY, INSENSITIVE, SCROLL, and NO 
>>  SCROLL) are order-sensitive."
>>  I saw "The key words BINARY, INSENSITIVE, and SCROLL can appear in any 
>>  order." , according to the documentation.
>> >>>
>> >>> Thanks, you're right. I missed that sentence. I still don't think the
>> >>> syntax of DECLARE statement in the documentation doesn't match its
>> >>> implementation but I agree that it's order-insensitive.
>> >>>
>>  I made a new patch, but the amount of codes was large due to 
>>  order-insensitive.
>> >>>
>> >>> Thank you for updating the patch!
>> >>>
>> >>> Yeah, I'm also afraid a bit that conditions will exponentially
>> >>> increase when a new option is added to DECLARE statement in the
>> >>> future. Looking at the parser code for DECLARE statement, we can put
>> >>> the same options multiple times (that's also why I don't think it
>> >>> matches). For instance,
>> >>>
>> >>> postgres(1:44758)=# begin;
>> >>> BEGIN
>> >>> postgres(1:44758)=# declare test binary binary binary cursor for
>> >>> select * from pg_class;
>> >>> DECLARE CURSOR
>> >>>
>> >>> So how about simplify the above code as follows?
>> >>>
>> >>> @@ -3005,8 +3014,23 @@ psql_completion(const char *text, int start, int 
>> >>> end)
>> >>> else if (Matches("DECLARE", MatchAny))
>> >>> COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL",
>> >>> "CURSOR");
>> >>> + /*
>> >>> + * Complete DECLARE  with one of BINARY, INSENSITIVE, SCROLL,
>> >>> + * NO SCROLL, and CURSOR. The tail doesn't match any keywords for
>> >>> + * DECLARE, assume we want options.
>> >>> + */
>> >>> + else if (HeadMatches("DECLARE", MatchAny, "*") &&
>> >>> + TailMatches(MatchAnyExcept("CURSOR|WITH|WITHOUT|HOLD|FOR")))
>> >>> + COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL",
>> >>> + "CURSOR");
>> >>
>> >> This change seems to cause "DECLARE ... CURSOR FOR SELECT " to
>> >> unexpectedly output BINARY, INSENSITIVE, etc.
>> >
>> > Indeed. Is the following not complete but much better?
>>
>> Yes, I think that's better.
>>
>> >
>> > @@ -3002,11 +3011,18 @@ psql_completion(const char *text, int start, int 
>> > end)
>> > " UNION SELECT 'ALL'");
>> >
>> > /* DECLARE */
>> > - else if (Matches("DECLARE", MatchAny))
>> > - COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL",
>> > - "CURSOR");
>> > + /*
>> > + * Complete DECLARE  with one of BINARY, INSENSITIVE, SCROLL,
>> > + * NO SCROLL, and CURSOR. If the tail is any one 

Re: [HACKERS] [PATCH] Generic type subscripting

2021-01-07 Thread Pavel Stehule
čt 7. 1. 2021 v 9:15 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
napsal:

> > On Wed, Jan 06, 2021 at 09:22:53PM +0100, Pavel Stehule wrote:
> >
> > this case should to raise exception - the value should be changed or
> error
> > should be raised
> >
> > postgres=# insert into foo values('{}');
> > postgres=# update foo set a['a'] = '100';
> > postgres=# update foo set a['a'][1] = '-1';
> > postgres=# select * from foo;
> > ┌┐
> > │ a  │
> > ╞╡
> > │ {"a": 100} │
> > └┘
>
> I was expecting this question, as I've left this like that intentionally
> because of two reasons:
>
> * Opposite to other changes, to implement this one we need to introduce
>   a condition more interfering with normal processing, which raises
>   performance issues for already existing functionality in jsonb_set.
>
> * I vaguely recall there was a similar discussion about jsonb_set with
>   the similar solution.
>

ok.

In this case I have a strong opinion so current behavior is wrong. It can
mask errors. There are two correct possibilities

1. raising error - because the update try to apply index on scalar value

2. replace by array - a = {NULL, -1}

But isn't possible ignore assignment

What do people think about it?




> For the references what I mean I've attached the third patch, which does
> this. My opinion would be to not consider it, but I'm fine leaving this
> decision to committer.
>


RE: Enhance traceability of wal_level changes for backup management

2021-01-07 Thread osumi.takami...@fujitsu.com
Hi Stephen


Thank you so much for replying !
On Thursday, January 7, 2021 2:40 AM Stephen Frost  wrote:
> * osumi.takami...@fujitsu.com (osumi.takami...@fujitsu.com) wrote:
> > You said
> > > The use case I imagined is that the user temporarily changes
> > > wal_level to 'none' from 'replica' or 'logical' to speed up loading
> > > and changes back to the normal. In this case, the backups taken
> > > before the wal_level change cannot be used to restore the database
> > > to the point after the wal_level change. So I think backup
> > > management tools would want to recognize the time or LSN when/where
> > > wal_level is changed to ‘none’ in order to do some actions such as
> > > invalidating older backups, re-calculating backup redundancy etc.
> > > Actually the same is true when the user changes to ‘minimal’. The
> > > tools would need to recognize the time or LSN in this case too.
> > > Since temporarily changing wal_level has been an uncommon use case
> > > some tools perhaps are not aware of that yet. But since introducing
> > > wal_level = ’none’ could make the change common, I think we need to
> provide a way for the tools.
> 
> I continue to be against the idea of introducing another wal_level.  If 
> there's
> additional things we can do to reduce WAL traffic while we continue to use it 
> to
> keep the system in a consistent state then we should implement those for the
> 'minimal' case and be done with it.
The new wal_level=none patch achieves something that cannot be done
or cannot be implemented together with wal_level='minimal' clearly.
Did you have a look at the peformance evaluation result that I conducted in [1] 
?
It proved that data loading of 'none' is much faster than that of 'minimal'.


> Adding another wal_level is just going to be confusing and increase 
> complexity,
> and the chances that someone will end up in a bad state.
Even if when we committed another idea,
that is "ALTER TABLE tbl SET UNLOGGED/LOGGED without copying relation data",
the complexity like taking a full backup before bulk data loading didn't change
and when any accidents happened during no wal logging for specific table with 
the improvement,
user would need to start from the backup again. This looks same to me.
Additionally, the patch itself in that thread is big and more complicated.
The complexity you meant is the wal_level's impact to backup management tools 
or anything else ?

> > I wondered, couldn't backup management tools utilize the information
> > in the backup that is needed to be made when wal_level is changed to "none"
> for example ?
> 
> What information is that, exactly?  If there's a way to detect that the 
> wal_level
> has been flipped to 'minimal' and then back to 'replica', other than scanning 
> the
> WAL, we'd certainly like to hear of it, so we can implement logic in 
> pgbackrest
> to detect that happening.  I'll admit that I've not gone hunting for such of 
> late,
> but I don't recall seeing anything that would change that either.
Excuse me for making you confused.
I was thinking about control file in the backup as information.

I'm not familiar with the internals of those backup management tools
but do they monitor the control file and its values of the runnning server at 
short intervals ?
And, if they don't do so and if we want accurate time or LSN that indicates 
wal_level changes,
I thought we could pick up exact information from control file of cluster 
directory or its backup
during server stop (because changing wal_level requires server stop).
That's all. Sorry for the noise.

> The idea proposed above about having the LSN and time recorded when a
> wal_level change to minimal happens, presumably in pg_control, seems like it
> could work as a way to allow external tools to more easily figure out if the
> wal_level's been changed to minimal.  Perhaps there's a reason to track
> changes between replica and logical but I can't think of any offhand and 
> backup
> tools would still need to know if the wal level was set to *minimal*
> independently of changes between replica and logical.
Here, probably we use different assumptions.
What you say makes sense if we commit neither the patch of
ALTER TABLE SET UNLOGGED/LOGGED without copy (during wal_level=minimal)
nor the patch of new wal_level 'none' ?
We were talking about a case that fast bulk data loading is needed
and the user changes wal_level to 'none' or 'minimal' temporarily
from higher level in order to speed up data loading and make it back to the 
higher level again.
After this operation, we need to invalidate the old backups taken before the 
data loading.
In this case, we have to track the change to 'minimal'. Didn't it make sense ?

> Then again, once we add support for scanning the WAL to pgbackrest, we'll
> almost certainly track it that way since that'll work for older and released
> versions of PG, so I'm not really sure it's worth it to add this to pg_control
> unless there's other reasons to.
> 
> > As I said 

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-07 Thread Bharath Rupireddy
On Thu, Jan 7, 2021 at 9:49 AM Fujii Masao  wrote:
> On 2021/01/05 16:56, Bharath Rupireddy wrote:
> > Attaching v8 patch set. Hopefully, cf bot will be happy with v8.
> >
> > Please consider the v8 patch set for further review.
> -DATA = postgres_fdw--1.0.sql
> +DATA = postgres_fdw--1.1.sql postgres_fdw--1.0--1.1.sql
>
> Shouldn't we leave 1.0.sql as it is and create 1.0--1.1.sql so that
> we can run the followings?
>
>  CREATE EXTENSION postgres_fdw VERSION "1.0";
>  ALTER EXTENSION postgres_fdw UPDATE TO "1.1";

Yes we can. In that case, to use the new functions users have to
update postgres_fdw to 1.1, in that case, do we need to mention in the
documentation that to make use of the new functions, update
postgres_fdw to version 1.1?

With the above change, the contents of postgres_fdw--1.0.sql remain as
is and in postgres_fdw--1.0--1.1.sql we will have only the new
function statements:

/* contrib/postgres_fdw/postgres_fdw--1.0--1.1.sql */

-- complain if script is sourced in psql, rather than via ALTER EXTENSION
\echo Use "ALTER EXTENSION postgres_fdw UPDATE TO '1.1'" to load this
file. \quit

CREATE FUNCTION postgres_fdw_get_connections ()
RETURNS text[]
AS 'MODULE_PATHNAME','postgres_fdw_get_connections'
LANGUAGE C STRICT PARALLEL RESTRICTED;

CREATE FUNCTION postgres_fdw_disconnect ()
RETURNS bool
AS 'MODULE_PATHNAME','postgres_fdw_disconnect'
LANGUAGE C STRICT PARALLEL RESTRICTED;

CREATE FUNCTION postgres_fdw_disconnect (text)
RETURNS bool
AS 'MODULE_PATHNAME','postgres_fdw_disconnect'
LANGUAGE C STRICT PARALLEL RESTRICTED;

> +
> +  Functions
>
> The document format for functions should be consistent with
> that in other contrib module like pgstattuple?

pgstattuple has so many columns to show up in output because of that
they have a table listing all the output columns and their types. The
new functions introduced here have only one or none input and an
output. I think, we don't need a table listing the input and output
names and types.

IMO, we can have something similar to what pg_visibility_map has for
functions, and also an example for each function showing how it can be
used. Thoughts?

> +   When called in the local session, it returns an array with each element 
> as a
> +   pair of the foreign server names of all the open connections that are
> +   previously made to the foreign servers and true or
> +   false to show whether or not the connection is valid.
>
> We thought that the information about whether the connection is valid or
> not was useful to, for example, identify and close explicitly the long-living
> invalid connections because they were useless. But thanks to the recent
> bug fix for connection leak issue, that information would be no longer
> so helpful for us? False is returned only when the connection is used in
> this local transaction but it's marked as invalidated. In this case that
> connection cannot be explicitly closed because it's used in this transaction.
> It will be closed at the end of transaction. Thought?

Yes, connection's validity can be false only when the connection gets
invalidated and postgres_fdw_get_connections is called within an
explicit txn i.e. begin; commit;. In implicit txn, since the
invalidated connections get closed either during invalidation callback
or at the end of txn, postgres_fdw_get_connections will always show
valid connections. Having said that, I still feel we need the
true/false for valid/invalid in the output of the
postgres_fdw_get_connections, otherwise we might miss giving
connection validity information to the user in a very narrow use case
of explicit txn. If required, we can issue a warning whenever we see
an invalid connection saying "invalid connections connections are
discarded at the end of remote transaction". Thoughts?

> I guess that you made postgres_fdw_get_connections() return the array
> because the similar function dblink_get_connections() does that. But
> isn't it more convenient to make that return the set of records?

Yes, for postgres_fdw_get_connections we can return a set of records
of (server_name, valid). To do so, I can refer to dblink_get_pkey.
Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] [PATCH] Generic type subscripting

2021-01-07 Thread Dmitry Dolgov
> On Wed, Jan 06, 2021 at 09:22:53PM +0100, Pavel Stehule wrote:
>
> this case should to raise exception - the value should be changed or error
> should be raised
>
> postgres=# insert into foo values('{}');
> postgres=# update foo set a['a'] = '100';
> postgres=# update foo set a['a'][1] = '-1';
> postgres=# select * from foo;
> ┌┐
> │ a  │
> ╞╡
> │ {"a": 100} │
> └┘

I was expecting this question, as I've left this like that intentionally
because of two reasons:

* Opposite to other changes, to implement this one we need to introduce
  a condition more interfering with normal processing, which raises
  performance issues for already existing functionality in jsonb_set.

* I vaguely recall there was a similar discussion about jsonb_set with
  the similar solution.

For the references what I mean I've attached the third patch, which does
this. My opinion would be to not consider it, but I'm fine leaving this
decision to committer.
>From c9143a620497dac5615c4de1d9349684e9af95b5 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Fri, 18 Dec 2020 17:19:51 +0100
Subject: [PATCH v44 1/3] Subscripting for jsonb

Subscripting implementation for jsonb. It does not support slices, does
not have a limit for number of subscripts and for assignment expects a
replace value to be of jsonb type. There is also one functional
difference in assignment via subscripting from jsonb_set, when an
original jsonb container is NULL, subscripting replaces it with an empty
jsonb and proceed with assignment.

For the sake of code reuse, some parts of jsonb functionality were
rearranged to allow use the same functions for jsonb_set and assign
subscripting operation.

The original idea belongs to Oleg Bartunov.

Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule
---
 doc/src/sgml/json.sgml  |  48 
 src/backend/utils/adt/Makefile  |   1 +
 src/backend/utils/adt/jsonb_util.c  |  76 -
 src/backend/utils/adt/jsonbsubs.c   | 413 
 src/backend/utils/adt/jsonfuncs.c   | 180 ++--
 src/include/catalog/pg_proc.dat |   4 +
 src/include/catalog/pg_type.dat |   3 +-
 src/include/utils/jsonb.h   |   6 +-
 src/test/regress/expected/jsonb.out | 272 +-
 src/test/regress/sql/jsonb.sql  |  84 +-
 10 files changed, 982 insertions(+), 105 deletions(-)
 create mode 100644 src/backend/utils/adt/jsonbsubs.c

diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index 5b9a5557a4..100d1a60f4 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -602,6 +602,54 @@ SELECT jdoc-'guid', jdoc-'name' FROM api WHERE jdoc @ '{"tags": ["qu
   
  
 
+ 
+  jsonb Subscripting
+  
+   jsonb data type supports array-style subscripting expressions
+   to extract or update particular elements. It's possible to use multiple
+   subscripting expressions to extract nested values. In this case, a chain of
+   subscripting expressions follows the same rules as the
+   path argument in jsonb_set function,
+   e.g. in case of arrays it is a 0-based operation or that negative integers
+   that appear in path count from the end of JSON arrays.
+   The result of subscripting expressions is always jsonb data type. An
+   example of subscripting syntax:
+
+-- Extract value by key
+SELECT ('{"a": 1}'::jsonb)['a'];
+
+-- Extract nested value by key path
+SELECT ('{"a": {"b": {"c": 1}}}'::jsonb)['a']['b']['c'];
+
+-- Extract element by index
+SELECT ('[1, "2", null]'::jsonb)[1];
+
+-- Update value by key, note the single quotes - the assigned value
+-- needs to be of jsonb type as well
+UPDATE table_name SET jsonb_field['key'] = '1';
+
+-- Select records using where clause with subscripting. Since the result of
+-- subscripting is jsonb and we basically want to compare two jsonb objects, we
+-- need to put the value in double quotes to be able to convert it to jsonb.
+SELECT * FROM table_name WHERE jsonb_field['key'] = '"value"';
+
+
+  Subscripting for jsonb does not support slice expressions,
+  even if it contains an array.
+
+  In case if source jsonb is NULL, assignment
+  via subscripting will proceed as if it was an empty JSON object:
+
+-- If jsonb_field here is NULL, the result is {"a": 1}
+UPDATE table_name SET jsonb_field['a'] = '1';
+
+-- If jsonb_field here is NULL, the result is [1]
+UPDATE table_name SET jsonb_field[0] = '1';
+
+
+  
+ 
+
  
   Transforms
 
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 82732146d3..279ff15ade 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -50,6 +50,7 @@ OBJS = \
 	jsonb_op.o \
 	jsonb_util.o \
 	jsonfuncs.o \
+	jsonbsubs.o \
 	jsonpath.o \
 	jsonpath_exec.o \
 	jsonpath_gram.o \
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 4eeffa1424..41a1c1f9bb 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ 

Re: Single transaction in the tablesync worker?

2021-01-07 Thread Peter Smith
On Wed, Jan 6, 2021 at 2:07 PM Amit Kapila  wrote:
>

> Okay, if we want to go that way then we should add some documentation
> about it. Currently, the slot name used by apply worker is known to
> the user because either it is specified by the user or the default is
> subscription name, so the user can manually remove that slot later but
> that is not true for tablesync slots. I think we need to update both
> the Drop Subscription page [1] and logical-replication-subscription
> page [2] where we have mentioned temporary slots and in the end "Here
> are some scenarios: .." to mention about these slots and probably how
> their names are generated so that in such special situations users can
> drop them manually.
>
> [1] - https://www.postgresql.org/docs/devel/sql-dropsubscription.html
> [2] - 
> https://www.postgresql.org/docs/devel/logical-replication-subscription.html
>

PG docs updated in latest patch [v12]


[v12] = 
https://www.postgresql.org/message-id/CAHut%2BPsonJzarxSBWkOM%3DMjoEpaq53ShBJoTT9LHJskwP3OvZA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Single transaction in the tablesync worker?

2021-01-07 Thread Peter Smith
On Tue, Jan 5, 2021 at 10:41 PM Amit Kapila  wrote:
>
> > > 3.
> > > +#define SUBREL_STATE_COPYDONE 'C' /* tablesync copy phase is completed */
> > >
> > > You can mention in the comments that sublsn will be NULL for this
> > > state as it is mentioned for other similar states. Can we think of
> > > using any letter in lower case for this as all other states are in
> > > lower-case except for this which makes it a look bit odd? We can use
> > > 'f' or 'e' and describe it as 'copy finished' or 'copy end'. I am fine
> > > if you have any better ideas.
> > >
> >
> > Fixed in latest patch [v11]
> >
>
> It is still not reflected in the docs. See below:
> --- a/doc/src/sgml/catalogs.sgml
> +++ b/doc/src/sgml/catalogs.sgml
> @@ -7651,6 +7651,7 @@ SCRAM-SHA-256$iteration
> count:
> State code:
> i = initialize,
> d = data is being copied,
> +   C = table data has been copied,
> s = synchronized,
>

Fixed in latest patch [v12]


[v12] = 
https://www.postgresql.org/message-id/CAHut%2BPsonJzarxSBWkOM%3DMjoEpaq53ShBJoTT9LHJskwP3OvZA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Single transaction in the tablesync worker?

2021-01-07 Thread Peter Smith
On Mon, Jan 4, 2021 at 8:06 PM Amit Kapila  wrote:
>
> On Wed, Dec 30, 2020 at 11:51 AM Peter Smith  wrote:
> >
> > On Wed, Dec 23, 2020 at 8:43 PM Amit Kapila  wrote:
> > >
> > > 1.
> > > + * Rarely, the DropSubscription may be issued when a tablesync still
> > > + * is in SYNCDONE but not yet in READY state. If this happens then
> > > + * the drop slot could fail because it is already dropped.
> > > + * In this case suppress and drop slot error.
> > > + *
> > > + * FIXME - Is there a better way than this?
> > > + */
> > > + if (rstate->state != SUBREL_STATE_SYNCDONE)
> > > + PG_RE_THROW();
> > >
> > > So, does this situation happens when we try to drop subscription after
> > > the state is changed to syncdone but not syncready. If so, then can't
> > > we write a function GetSubscriptionNotDoneRelations similar to
> > > GetSubscriptionNotReadyRelations where we get a list of relations that
> > > are not in done stage. I think this should be safe because once we are
> > > here we shouldn't be allowed to start a new worker and old workers are
> > > already stopped by this function.
> >
> > Yes, but I don't see how adding such a function is an improvement over
> > the existing code:
> >
>
> The advantage is that we don't need to use try..catch to deal with
> such conditions which I don't think is a good way to deal with such
> cases. Also, even after using try...catch, still, we can leak the
> slots because the patch drops the slot after changing the state to
> syncdone and if there is any error while dropping the slot, it simply
> skips it. So, it is possible that the rel state is syncdone but the
> slot still exists and we get an error due to some different reason,
> and then we will silently skip it again and allow the subscription to
> be dropped.
>
> I think instead what we should do is to drop the slot before we change
> the rel state to syncdone. Also, if the apply workers fail to drop the
> slot, it should try to again drop it after restart. In
> DropSubscription, we can then check if the rel state is not SYNC or
> READY, we can drop the corresponding slots.
>

Fixed as suggested in latest patch [v12]



[v12] = 
https://www.postgresql.org/message-id/CAHut%2BPsonJzarxSBWkOM%3DMjoEpaq53ShBJoTT9LHJskwP3OvZA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




  1   2   >