Re: Don't wake up to check trigger file if none is configured

2019-01-30 Thread Michael Paquier
On Wed, Dec 26, 2018 at 03:24:01PM +0900, Michael Paquier wrote:
> It seems to me that the comment on top of WaitLatch should be clearer
> about that, and that the current state leads to confusion.  Another
> thing coming to my mind is that I think it would be useful to make the
> timeout configurable so as instances can react more quickly in the
> case of a sudden death of the WAL receiver (or to check faster for a
> trigger file if the HA application is to lazy to send a signal to the
> standby host).
> 
> Attached is a patch to improve the comment for now.

So, does somebody have an objection if I apply the comment patch?  Per
the reasons above, the proposed patch is not correct, but the code can
be more descriptive.
--
Michael


signature.asc
Description: PGP signature


Re: Cache lookup errors with functions manipulation object addresses

2019-01-30 Thread Michael Paquier
On Tue, Dec 25, 2018 at 01:34:38PM +0900, Michael Paquier wrote:
> With that all the regression tests show a consistent behavior for all
> object types when those are undefined.  Thoughts?

End-of-commit-fest rebase and moved to next CF.
--
Michael
From 6514d737cef100ea8146b1d6d10fa93e40c6ee9c Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 25 Dec 2018 12:21:24 +0900
Subject: [PATCH 1/3] Add flag to format_type_extended to enforce NULL-ness

If a type scanned is undefined, type format routines have two behaviors
depending on if FORMAT_TYPE_ALLOW_INVALID is defined by the caller:
- Generate an error
- Return undefined type name "???" or "-".

The current interface is unhelpful for callers willing to format
properly a type name, but still make sure that the type is defined as
there could be types matching the undefined strings.  In order to
counter that, add a new flag called FORMAT_TYPE_FORCE_NULL which returns
a NULL result instead of "??? or "-" which does not generate an error.
They will be used for future patches to improve the SQL interface for
object addresses.
---
 src/backend/utils/adt/format_type.c | 20 
 src/include/utils/builtins.h|  1 +
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/format_type.c b/src/backend/utils/adt/format_type.c
index 84a1b3c72b..6fc8f55668 100644
--- a/src/backend/utils/adt/format_type.c
+++ b/src/backend/utils/adt/format_type.c
@@ -98,6 +98,9 @@ format_type(PG_FUNCTION_ARGS)
  * - FORMAT_TYPE_ALLOW_INVALID
  *			if the type OID is invalid or unknown, return ??? or such instead
  *			of failing
+ * - FORMAT_TYPE_FORCE_NULL
+ *			if the type OID is invalid or unknown, return NULL instead of ???
+ *			or such
  * - FORMAT_TYPE_FORCE_QUALIFY
  *			always schema-qualify type names, regardless of search_path
  *
@@ -116,13 +119,20 @@ format_type_extended(Oid type_oid, int32 typemod, bits16 flags)
 	char	   *buf;
 	bool		with_typemod;
 
-	if (type_oid == InvalidOid && (flags & FORMAT_TYPE_ALLOW_INVALID) != 0)
-		return pstrdup("-");
+	if (type_oid == InvalidOid)
+	{
+		if ((flags & FORMAT_TYPE_FORCE_NULL) != 0)
+			return NULL;
+		else if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0)
+			return pstrdup("-");
+	}
 
 	tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(type_oid));
 	if (!HeapTupleIsValid(tuple))
 	{
-		if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0)
+		if ((flags & FORMAT_TYPE_FORCE_NULL) != 0)
+			return NULL;
+		else if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0)
 			return pstrdup("???");
 		else
 			elog(ERROR, "cache lookup failed for type %u", type_oid);
@@ -145,7 +155,9 @@ format_type_extended(Oid type_oid, int32 typemod, bits16 flags)
 		tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(array_base_type));
 		if (!HeapTupleIsValid(tuple))
 		{
-			if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0)
+			if ((flags & FORMAT_TYPE_FORCE_NULL) != 0)
+return NULL;
+			else if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0)
 return pstrdup("???[]");
 			else
 elog(ERROR, "cache lookup failed for type %u", type_oid);
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 03d0ee2766..43e7ef471c 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -109,6 +109,7 @@ extern Datum numeric_float8_no_overflow(PG_FUNCTION_ARGS);
 #define FORMAT_TYPE_TYPEMOD_GIVEN	0x01	/* typemod defined by caller */
 #define FORMAT_TYPE_ALLOW_INVALID	0x02	/* allow invalid types */
 #define FORMAT_TYPE_FORCE_QUALIFY	0x04	/* force qualification of type */
+#define FORMAT_TYPE_FORCE_NULL		0x08	/* force NULL if undefined */
 extern char *format_type_extended(Oid type_oid, int32 typemod, bits16 flags);
 
 extern char *format_type_be(Oid type_oid);
-- 
2.20.1

From 3eaf6833c367549b9309b9df7ef7d56f8e596cc3 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 25 Dec 2018 13:13:28 +0900
Subject: [PATCH 2/3] Refactor format procedure and operator APIs to be more
 modular

This introduce a new set of extended routines for procedure and operator
lookups, with a flags bitmask argument that can modify the default
behavior:
- Force schema qualification
- Force NULL as result instead of a numeric OID for an undefined
object.
---
 src/backend/utils/adt/regproc.c | 61 +++--
 src/include/utils/regproc.h | 10 ++
 2 files changed, 52 insertions(+), 19 deletions(-)

diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 09cf0e1abc..f85b47821a 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -40,8 +40,6 @@
 #include "utils/regproc.h"
 #include "utils/varlena.h"
 
-static char *format_operator_internal(Oid operator_oid, bool force_qualify);
-static char *format_procedure_internal(Oid procedure_oid, bool force_qualify);
 static void parseNameAndArgTypes(const char *string, bool allowNone,
 	 List **names, int *nargs, Oid *argtypes);
 
@@ -322,24 +320,27 @@ 

Re: Add pg_partition_root to get top-most parent of a partition tree

2019-01-30 Thread Michael Paquier
On Sat, Dec 15, 2018 at 10:16:08AM +0900, Michael Paquier wrote:
> Changed in this sense.  Please find attached an updated patch.

Rebased as per the attached, and moved to next CF.
--
Michael
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4930ec17f6..86ff4e5c9e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20274,6 +20274,17 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 their partitions, and so on.

   
+  
+   
+pg_partition_root
+pg_partition_root(regclass)
+   
+   regclass
+   
+Return the top-most parent of a partition tree to which the given
+relation belongs.
+   
+  
  
 

diff --git a/src/backend/utils/adt/partitionfuncs.c b/src/backend/utils/adt/partitionfuncs.c
index 5cdf4a4524..6d731493d9 100644
--- a/src/backend/utils/adt/partitionfuncs.c
+++ b/src/backend/utils/adt/partitionfuncs.c
@@ -25,6 +25,34 @@
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 
+/*
+ * Perform several checks on a relation on which is extracted some
+ * information related to its partition tree.  Returns false if the
+ * relation cannot be processed, in which case it is up to the caller
+ * to decide what to do, by either raising an error or doing something
+ * else.
+ */
+static bool
+check_rel_can_be_partition(Oid relid)
+{
+	char		relkind;
+
+	/* Check if relation exists */
+	if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relid)))
+		return false;
+
+	relkind = get_rel_relkind(relid);
+
+	/* Only allow relation types that can appear in partition trees. */
+	if (relkind != RELKIND_RELATION &&
+		relkind != RELKIND_FOREIGN_TABLE &&
+		relkind != RELKIND_INDEX &&
+		relkind != RELKIND_PARTITIONED_TABLE &&
+		relkind != RELKIND_PARTITIONED_INDEX)
+		return false;
+
+	return true;
+}
 
 /*
  * pg_partition_tree
@@ -39,19 +67,10 @@ pg_partition_tree(PG_FUNCTION_ARGS)
 {
 #define PG_PARTITION_TREE_COLS	4
 	Oid			rootrelid = PG_GETARG_OID(0);
-	char		relkind = get_rel_relkind(rootrelid);
 	FuncCallContext *funcctx;
 	ListCell  **next;
 
-	if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(rootrelid)))
-		PG_RETURN_NULL();
-
-	/* Return NULL for relation types that cannot appear in partition trees */
-	if (relkind != RELKIND_RELATION &&
-		relkind != RELKIND_FOREIGN_TABLE &&
-		relkind != RELKIND_INDEX &&
-		relkind != RELKIND_PARTITIONED_TABLE &&
-		relkind != RELKIND_PARTITIONED_INDEX)
+	if (!check_rel_can_be_partition(rootrelid))
 		PG_RETURN_NULL();
 
 	/* stuff done only on the first call of the function */
@@ -153,3 +172,39 @@ pg_partition_tree(PG_FUNCTION_ARGS)
 	/* done when there are no more elements left */
 	SRF_RETURN_DONE(funcctx);
 }
+
+/*
+ * pg_partition_root
+ *
+ * For the given relation part of a partition tree, return its top-most
+ * root parent.
+ */
+Datum
+pg_partition_root(PG_FUNCTION_ARGS)
+{
+	Oid		relid = PG_GETARG_OID(0);
+	Oid		rootrelid;
+	List   *ancestors;
+
+	if (!check_rel_can_be_partition(relid))
+		PG_RETURN_NULL();
+
+	/*
+	 * If the relation is not a partition (it may be the partition parent),
+	 * return itself as a result.
+	 */
+	if (!get_rel_relispartition(relid))
+		PG_RETURN_OID(relid);
+
+	/* Fetch the top-most parent */
+	ancestors = get_partition_ancestors(relid);
+	rootrelid = llast_oid(ancestors);
+	list_free(ancestors);
+
+	/*
+	 * "rootrelid" must contain a valid OID, given that the input relation
+	 * is a valid partition tree member as checked above.
+	 */
+	Assert(OidIsValid(rootrelid));
+	PG_RETURN_OID(rootrelid);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 3ecc2e12c3..b1efd9c49c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10509,4 +10509,9 @@
   proargnames => '{rootrelid,relid,parentrelid,isleaf,level}',
   prosrc => 'pg_partition_tree' },
 
+# function to get the top-most partition root parent
+{ oid => '3424', descr => 'get top-most partition root parent',
+  proname => 'pg_partition_root', prorettype => 'regclass',
+  proargtypes => 'regclass', prosrc => 'pg_partition_root' },
+
 ]
diff --git a/src/test/regress/expected/partition_info.out b/src/test/regress/expected/partition_info.out
index 3e15e02f8d..a884df976f 100644
--- a/src/test/regress/expected/partition_info.out
+++ b/src/test/regress/expected/partition_info.out
@@ -12,6 +12,18 @@ SELECT * FROM pg_partition_tree(0);
| ||  
 (1 row)
 
+SELECT pg_partition_root(NULL);
+ pg_partition_root 
+---
+ 
+(1 row)
+
+SELECT pg_partition_root(0);
+ pg_partition_root 
+---
+ 
+(1 row)
+
 -- Test table partition trees
 CREATE TABLE ptif_test (a int, b int) PARTITION BY range (a);
 CREATE TABLE ptif_test0 PARTITION OF ptif_test
@@ -66,6 +78,20 @@ SELECT relid, parentrelid, level, isleaf
  ptif_test01 | ptif_test0  | 0 | t
 (1 row)
 
+-- List all members using pg_partition_root with leaf table 

Re: Adding a TAP test checking data consistency on standby with minRecoveryPoint

2019-01-30 Thread Michael Paquier
On Mon, Nov 12, 2018 at 01:08:38PM +0900, Michael Paquier wrote:
> Thanks!  I am stealing that stuff, and I have added an offline check by
> comparing the value of minRecoveryPoint in pg_controldata.  Again, if
> you revert c186ba13 and run the tests, both online and offline failures
> are showing up.
> 
> What do you think?

I think that this test has some value, and it did not get any reviews,
so I am moving it to the next CF.
--
Michael


signature.asc
Description: PGP signature


Re: Unused parameters & co in code

2019-01-30 Thread Michael Paquier
On Wed, Jan 30, 2019 at 08:14:05AM -0300, Alvaro Herrera wrote:
> On 2019-Jan-30, Michael Paquier wrote:
>> ReindexPartitionedIndex => parentIdx
> 
> I'd not change this one, as we will surely want to do something useful
> eventually.  Not that it matters, but it'd be useless churn.

A lot of these things assume that the unused arguments may be useful
in the future.

>> _bt_relbuf => rel (Quite some cleanup here for the btree code)
> 
> If this is a worthwhile cleanup, let's see a patch.

This cleanup is disappointing, so it can be discarded.

I looked at all the references I have spotted yesterday, and most of
them are not really worth the trouble, still there are some which
could be cleaned up.  Attached is a set of patches which do some
cleanup here and there, I don't propose all of them for commit, some
of them are attached just for reference:
- 0001 cleans up port in SendAuthRequest.  This one is disappointing,
so I am fine to discard it.
- 0002 works on _bt_relbuf, whose callers don't actually benefit from
the cleanup as the relation worked on is always used for different
reasons, so it can be discarded.
- 0003 works on the code of GIN, which simplifies at least the code,
so it could be applied.  This removes more than changed.
- 0004 also cleans some code for clause parsing, with a negative line
output.
- 0005 is for pg_rewind, which is some stuff I introduced, so I'd like
to clean up my mess and the change is recent :)
- 0006 is for tablecmds.c, and something I would like to apply because
it reduces some confusion with some recursion arguments which are not
needed for constraint handling and inheritance.  Most of the complains
come from lockmode not being used but all the AtPrep and AtExec
routines are rather symmetric so I am not bothering about that.
--
Michael
From 2b86907b218798545e17e5f6cbc985f8368387b5 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 31 Jan 2019 14:20:33 +0900
Subject: [PATCH 1/6] Simplify SendAuthRequest

---
 src/backend/libpq/auth.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 20fe98fb82..f92f78e17a 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -43,8 +43,8 @@
  * Global authentication functions
  *
  */
-static void sendAuthRequest(Port *port, AuthRequest areq, const char *extradata,
-int extralen);
+static void sendAuthRequest(AuthRequest areq, const char *extradata,
+			int extralen);
 static void auth_failed(Port *port, int status, char *logdetail);
 static char *recv_password_packet(Port *port);
 
@@ -523,7 +523,7 @@ ClientAuthentication(Port *port)
 
 		case uaGSS:
 #ifdef ENABLE_GSS
-			sendAuthRequest(port, AUTH_REQ_GSS, NULL, 0);
+			sendAuthRequest(AUTH_REQ_GSS, NULL, 0);
 			status = pg_GSS_recvauth(port);
 #else
 			Assert(false);
@@ -532,7 +532,7 @@ ClientAuthentication(Port *port)
 
 		case uaSSPI:
 #ifdef ENABLE_SSPI
-			sendAuthRequest(port, AUTH_REQ_SSPI, NULL, 0);
+			sendAuthRequest(AUTH_REQ_SSPI, NULL, 0);
 			status = pg_SSPI_recvauth(port);
 #else
 			Assert(false);
@@ -603,7 +603,7 @@ ClientAuthentication(Port *port)
 		(*ClientAuthentication_hook) (port, status);
 
 	if (status == STATUS_OK)
-		sendAuthRequest(port, AUTH_REQ_OK, NULL, 0);
+		sendAuthRequest(AUTH_REQ_OK, NULL, 0);
 	else
 		auth_failed(port, status, logdetail);
 }
@@ -613,7 +613,7 @@ ClientAuthentication(Port *port)
  * Send an authentication request packet to the frontend.
  */
 static void
-sendAuthRequest(Port *port, AuthRequest areq, const char *extradata, int extralen)
+sendAuthRequest(AuthRequest areq, const char *extradata, int extralen)
 {
 	StringInfoData buf;
 
@@ -740,7 +740,7 @@ CheckPasswordAuth(Port *port, char **logdetail)
 	int			result;
 	char	   *shadow_pass;
 
-	sendAuthRequest(port, AUTH_REQ_PASSWORD, NULL, 0);
+	sendAuthRequest(AUTH_REQ_PASSWORD, NULL, 0);
 
 	passwd = recv_password_packet(port);
 	if (passwd == NULL)
@@ -841,7 +841,7 @@ CheckMD5Auth(Port *port, char *shadow_pass, char **logdetail)
 		return STATUS_ERROR;
 	}
 
-	sendAuthRequest(port, AUTH_REQ_MD5, md5Salt, 4);
+	sendAuthRequest(AUTH_REQ_MD5, md5Salt, 4);
 
 	passwd = recv_password_packet(port);
 	if (passwd == NULL)
@@ -895,7 +895,7 @@ CheckSCRAMAuth(Port *port, char *shadow_pass, char **logdetail)
 	/* Put another '\0' to mark that list is finished. */
 	appendStringInfoChar(_mechs, '\0');
 
-	sendAuthRequest(port, AUTH_REQ_SASL, sasl_mechs.data, sasl_mechs.len);
+	sendAuthRequest(AUTH_REQ_SASL, sasl_mechs.data, sasl_mechs.len);
 	pfree(sasl_mechs.data);
 
 	/*
@@ -1000,9 +1000,9 @@ CheckSCRAMAuth(Port *port, char *shadow_pass, char **logdetail)
 			elog(DEBUG4, "sending SASL challenge of length %u", outputlen);
 
 			if (result == SASL_EXCHANGE_SUCCESS)
-sendAuthRequest(port, AUTH_REQ_SASL_FIN, output, outputlen);
+sendAuthRequest(AUTH_REQ_SASL_FIN, output, 

Re: Index Skip Scan

2019-01-30 Thread Kyotaro HORIGUCHI
Hello.

At Wed, 30 Jan 2019 18:19:05 +0100, Dmitry Dolgov <9erthali...@gmail.com> wrote 
in 
> A bit of adjustment after nodes/relation -> nodes/pathnodes.

I had a look on this.

The name "index skip scan" is a different feature from the
feature with the name on other prodcuts, which means "index scan
with postfix key (of mainly of multi column key) that scans
ignoring the prefixing part" As Thomas suggested I'd suggest that
we call it "index hop scan". (I can accept Hopscotch, either:p)

Also as mentioned upthread by Peter Geoghegan, this could easly
give worse plan by underestimation. So I also suggest that this
has dynamic fallback function.  In such perspective it is not
suitable for AM API level feature.

If all leaf pages are on the buffer and the average hopping
distance is less than (expectedly) 4 pages (the average height of
the tree), the skip scan will lose. If almost all leaf pages are
staying on disk, we could win only by 2-pages step (skipping over
one page).

=
As I'm writing the above, I came to think that it's better
implement this as an pure executor optimization.

Specifically, let _bt_steppage count the ratio of skipped pages
so far then if the ratio exceeds some threshold (maybe around
3/4) it gets into hopscotching mode, where it uses index scan to
find the next page (rather than traversing). As mentioned above,
I think using skip scan to go beyond the next page is a good
bet. If the success ration of skip scans goes below some
threshold (I'm not sure for now), we should fall back to
traversing.

Any opinions?



Some comments on the patch below.

+ skip scan approach, which is based on the idea of
+ https://wiki.postgresql.org/wiki/Free_Space_Map_Problems;>
+ Loose index scan. Rather than scanning all equal values of a key,
+ as soon as a new value is found, it will search for a larger value on the

I'm not sure it is a good thing to put a pointer to rather
unstable source in the documentation.


This adds a new AM method but it seems avaiable only for ordered
indexes, specifically btree. And it seems that the feature can be
implemented in btgettuple since btskip apparently does the same
thing. (I agree to Robert in the point in [1]).

[1] 
https://www.postgresql.org/message-id/CA%2BTgmobb3uN0xDqTRu7f7WdjGRAXpSFxeAQnvNr%3DOK5_kC_SSg%40mail.gmail.com


Related to the above, it seems better that the path generation of
skip scan is a part of index scan. Whether skip scan or not is a
matter of index scan itself.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Too rigorous assert in reorderbuffer.c

2019-01-30 Thread Arseny Sher
Hi,

My colleague Alexander Lakhin has noticed an assertion failure in
reorderbuffer.c:1330. Here is a simple snippet reproducing it:

SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 
'test_decoding');

create table t(k int);
begin;
savepoint a;
alter table t alter column k type text;
rollback to savepoint a;
alter table t alter column k type bigint;
commit;

SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
'include-xids', '0', 'skip-empty-xacts', '1');

It is indeed too opinionated since cmax of a tuple is not stable; it can
be rewritten if subxact who tried to delete it later aborts (analogy
also holds for xmax). Attached patch removes it. While here, I had also
considered worthwhile to add a test involving DDL in aborted subxact as
it is interesting anyway and wasn't covered before.

>From c5cd30f9e23c96390cafad82f832c918dfd3397f Mon Sep 17 00:00:00 2001
From: Arseny Sher 
Date: Wed, 30 Jan 2019 23:31:47 +0300
Subject: [PATCH] Remove assertion in reorderbuffer that cmax is stable.

Since it can be rewritten arbitrary number of times if subxact(s) who tried to
delete some tuple version abort later. Add small test involving DDL in aborted
subxact as this case is interesting on its own and wasn't covered before.
---
 contrib/test_decoding/expected/ddl.out  | 20 
 contrib/test_decoding/sql/ddl.sql   | 14 ++
 src/backend/replication/logical/reorderbuffer.c | 10 ++
 3 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/contrib/test_decoding/expected/ddl.out b/contrib/test_decoding/expected/ddl.out
index b7c76469fc..69dd74676c 100644
--- a/contrib/test_decoding/expected/ddl.out
+++ b/contrib/test_decoding/expected/ddl.out
@@ -409,6 +409,26 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc
  COMMIT
 (6 rows)
 
+-- check that DDL in aborted subtransactions handled correctly
+CREATE TABLE tr_sub_ddl(data int);
+BEGIN;
+INSERT INTO tr_sub_ddl VALUES (42);
+SAVEPOINT a;
+ALTER TABLE tr_sub_ddl ALTER COLUMN data TYPE text;
+INSERT INTO tr_sub_ddl VALUES ('blah-blah');
+ROLLBACK TO SAVEPOINT a;
+ALTER TABLE tr_sub_ddl ALTER COLUMN data TYPE bigint;
+INSERT INTO tr_sub_ddl VALUES(43);
+COMMIT;
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+   data
+---
+ BEGIN
+ table public.tr_sub_ddl: INSERT: data[integer]:42
+ table public.tr_sub_ddl: INSERT: data[bigint]:43
+ COMMIT
+(4 rows)
+
 /*
  * Check whether treating a table as a catalog table works somewhat
  */
diff --git a/contrib/test_decoding/sql/ddl.sql b/contrib/test_decoding/sql/ddl.sql
index c4b10a4cf9..f47b4ad716 100644
--- a/contrib/test_decoding/sql/ddl.sql
+++ b/contrib/test_decoding/sql/ddl.sql
@@ -236,6 +236,20 @@ COMMIT;
 
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
+-- check that DDL in aborted subtransactions handled correctly
+CREATE TABLE tr_sub_ddl(data int);
+BEGIN;
+INSERT INTO tr_sub_ddl VALUES (42);
+SAVEPOINT a;
+ALTER TABLE tr_sub_ddl ALTER COLUMN data TYPE text;
+INSERT INTO tr_sub_ddl VALUES ('blah-blah');
+ROLLBACK TO SAVEPOINT a;
+ALTER TABLE tr_sub_ddl ALTER COLUMN data TYPE bigint;
+INSERT INTO tr_sub_ddl VALUES(43);
+COMMIT;
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+
 
 /*
  * Check whether treating a table as a catalog table works somewhat
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index a49e226967..0ace0cfb87 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1327,13 +1327,15 @@ ReorderBufferBuildTupleCidHash(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		else
 		{
 			Assert(ent->cmin == change->data.tuplecid.cmin);
-			Assert(ent->cmax == InvalidCommandId ||
-   ent->cmax == change->data.tuplecid.cmax);
 
 			/*
 			 * if the tuple got valid in this transaction and now got deleted
-			 * we already have a valid cmin stored. The cmax will be
-			 * InvalidCommandId though.
+			 * we already have a valid cmin stored. The cmax is still
+			 * InvalidCommandId though, so stamp it. Moreover, cmax might
+			 * be rewritten arbitrary number of times if subxacts who tried to
+			 * delete this version abort later. Pick up the latest since it is
+			 * (belonging to potentially committed subxact) the only one which
+			 * matters.
 			 */
 			ent->cmax = change->data.tuplecid.cmax;
 		}
-- 
2.11.0


--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: WIP: Avoid creation of the free space map for small tables

2019-01-30 Thread Amit Kapila
On Wed, Jan 30, 2019 at 10:41 PM Masahiko Sawada  wrote:
>
> On Wed, Jan 30, 2019 at 4:33 AM Amit Kapila  wrote:
> >
> > On Tue, Jan 29, 2019 at 8:12 PM John Naylor  
> > wrote:
> > >
> > > On Tue, Jan 29, 2019 at 11:56 AM Amit Kapila  
> > > wrote:
> > >
> > > > You can find this change in attached patch.  Then, I ran the make
> > > > check in src/bin/pgbench multiple times using test_conc_insert.sh.
> > > > You can vary the number of times the test should run, if you are not
> > > > able to reproduce it with this.
> > > >
> > > > The attached patch (clear_local_map_if_exists_1.patch) atop the main
> > > > patch fixes the issue for me.  Kindly verify the same.
> > >
> > > I got one failure in 50 runs. With the new patch, I didn't get any
> > > failures in 300 runs.
> > >
> >
> > Thanks for verification.  I have included it in the attached patch and
> > I have also modified the page.sql test to have enough number of pages
> > in relation so that FSM will get created irrespective of alignment
> > boundaries.  Masahiko San, can you verify if this now works for you?
> >
>
> Thank you for updating the patch!
>
> The modified page.sql test could fail if the block size is more than
> 8kB?

That's right, but I don't think current regression tests will work for
block size greater than 8KB.  I have tried with 16 and 32 as block
size, there were few failures on the head itself.

> We can ensure the number of pages are more than 4 by checking it
> and adding more data if no enough but I'm really not sure we should
> care the bigger-block size cases.
>

Yeah, I am not sure either.  I think as this is an existing test, we
should not try to change it too much.  However, if both you and John
feel it is better to change, we can go with that.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-01-30 Thread Amit Kapila
On Wed, Jan 30, 2019 at 8:11 PM John Naylor  wrote:
>
> On Wed, Jan 30, 2019 at 2:11 PM Amit Kapila  wrote:
> > This is much better than the earlier version of test and there is no
> > dependency on the vacuum.  However, I feel still there is some
> > dependency on how the rows will fit in a page and we have seen some
> > related failures due to alignment stuff.  By looking at the test, I
> > can't envision any such problem, but how about if we just write some
> > simple tests where we can check that the FSM won't be created for very
> > small number of records say one or two and then when we increase the
> > records FSM gets created, here if we want, we can even use vacuum to
> > ensure FSM gets created.  Once we are sure that the main patch passes
> > all the buildfarm tests, we can extend the test to something advanced
> > as you are proposing now.  I think that will reduce the chances of
> > failure, what do you think?
>
> That's probably a good idea to limit risk. I just very basic tests
> now, and vacuum before every relation size check to make sure any FSM
> extension (whether desired or not) is invoked. Also, in my last patch
> I forgot to implement explicit checks of the block number instead of
> assuming how many rows will fit on a page. I've used a plpgsql code
> block to do this.
>

-- Extend table with enough blocks to exceed the FSM threshold
 -- FSM is created and extended to 3 blocks

The second comment line seems redundant to me, so I have removed that
and integrated it in the main patch.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v20-0001-Avoid-creation-of-the-free-space-map-for-small-heap-.patch
Description: Binary data


Re: "could not reattach to shared memory" on buildfarm member dory

2019-01-30 Thread Noah Misch
On Tue, Jan 29, 2019 at 11:28:56AM -0500, Heath Lord wrote:
> On Thu, Jan 17, 2019 at 3:27 AM Noah Misch  wrote:
> > On Sun, Dec 02, 2018 at 09:35:06PM -0800, Noah Misch wrote:
> > > Could one of you having a dory login use
> > > https://live.sysinternals.com/Procmon.exe to capture process events
> > during
> > > backend startup?
> > > The ideal would be one capture where startup failed reattach
> > > and another where it succeeded, but having the successful run alone
> > would be a
> > > good start.  The procedure is roughly this:
> > >
> > > - Install PostgreSQL w/ debug symbols.
> > > - Start a postmaster.
> > > - procmon /nomonitor
> > > - procmon "Filter" menu -> Enable Advanced Output
> > > - Ctrl-l, add filter for "Process Name" is "postgres.exe"
> > > - Ctrl-e (starts collecting data)
> > > - psql (leave it running)
> > > - After ~60s, Ctrl-e again in procmon (stops collecting data)
> > > - File -> Save -> PML
> > > - File -> Save -> XML, include stack traces, resolve stack symbols
> > > - Compress the PML and XML files, and mail them here
> 
>I apologize for not responding earlier, but we attempting to capture the
> information you requested.  However, where would you like us to pull the
> install for PostgreSQL with the debug symbols in it?  We are aware of the
> BigSQL and EDB installers, but are unsure if these contain the debug
> symbols.

Please use a locally-built PostgreSQL as similar as possible to what dory's
buildfarm runs produce, except building with debug symbols.  A
potentially-convenient way to achieve this would be to add CONFIG=>'Debug' to
build_env of your buildfarm configuration, then run the buildfarm client with
--keepall.  By the end of the run, $buildroot/$branch/inst will contain an
installation.  (I haven't actually tested that.)  You can copy that installation
to any convenient place (e.g. your home directory) and use that copy for the
remaining steps.

> Currently this machine has nothing on it other than the necessary
> dependencies to build postgres for the community.  If you could please give
> us some more information on what you would like us to install to gather
> this information to help debug the issue we are seeing we would really
> appreciate it.

At the moment, I think the only other thing you'll need is to download
https://live.sysinternals.com/Procmon.exe.

> Also, if there is any additional information on what we
> have installed on the server or how we are configured please just let us
> know and we will get you that as soon as possible.

Thanks.  I don't have specific questions about that at this time, but feel free
to share any lists or config dumps that seem relevant.



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2019-01-30 Thread David Rowley
On Sat, 26 Jan 2019 at 13:51, Tom Lane  wrote:
>
> David Rowley  writes:
> > On Thu, 3 Jan 2019 at 08:01, Tomas Vondra  
> > wrote:
> >> AFAICS the patch essentially does two things: (a) identifies Append
> >> paths with a single member and (b) ensures the Vars are properly mapped
> >> when the Append node is skipped when creating the plan.
> >> I agree doing (a) in setrefs.c is too late, as mentioned earlier. But
> >> perhaps we could do at least (b) to setrefs.c?
>
> > The problem I found with doing var translations in setrefs.c was that
> > the plan tree is traversed there breadth-first and in createplan.c we
> > build the plan depth-first.  The problem I didn't manage to overcome
> > with that is that when we replace a "ProxyPath" (actually an Append
> > path marked as is_proxy=true) it only alter targetlists, etc of nodes
> > above that in the plan tree. The nodes below it and their targetlists
> > don't care, as these don't fetch Vars from nodes above them.  If we do
> > the Var translation in setrefs then we've not yet looked at the nodes
> > that don't care and we've already done the nodes that do care. So the
> > tree traversal is backwards.
>
> I don't quite buy this argument, because you haven't given a reason
> why it doesn't apply with equal force to setrefs.c's elimination of
> no-op SubqueryScan nodes.  Yet that does work.

I assume you're talking about the code that's in
set_subqueryscan_references() inside the trivial_subqueryscan()
condition?

If so, then that seems pretty different from what I'm doing here
because trivial_subqueryscan() only ever returns true when the
Vars/Exprs from the subquery's target list match the scan's targetlist
exactly, in the same order.  It's not quite that simple with the
single-subpath Append/MergeAppend case since the Vars/Exprs won't be
the same since they're from different relations and possibly in some
different order.

> Stepping back for a minute: even if we get this to work, how often
> is it going to do anything useful?  It seems like most of the other
> development that's going on is trying to postpone pruning till later,
> so I wonder how often we'll really know at the beginning of planning
> that only one child will survive.

It'll do something useful everytime the planner would otherwise
produce an Append or MergeAppend with a single subpath. Pruning down
to just 1 relation is pretty common, so seems worth optimising for.
There's no other development that postpones pruning until later. If
you're talking about run-time pruning then nothing is "postponed". We
still attempt to prune during planning, it's just that we might not be
able to prune in some cases until during execution, so we may do both.
  You make it sound like we're trying to do away with plan-time
pruning, but that's not happening, in fact, we're working pretty hard
to speed up the planner when plan-time pruning *can* be done. So far
there are no patches to speed up planner for partitioned tables when
the planner can't prune any partitions, although it is something that
I've been giving some thought to.

I've also attached a rebased patch.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v11-0001-Forgo-generating-single-subpath-Append-and-Merge.patch
Description: Binary data


Re: A few new options for vacuumdb

2019-01-30 Thread Michael Paquier
On Thu, Jan 31, 2019 at 02:28:05AM +, Bossart, Nathan wrote:
> It looks good to me.  The only thing I noticed is the use of
> relfrozenxid instead of relminmxid here:

Fixed and committed.  So we are finally done here, except for the
debate with the relation size.
--
Michael


signature.asc
Description: PGP signature


Re: psql exit status with multiple -c or -f

2019-01-30 Thread Justin Pryzby
At Tue, 18 Dec 2018 10:24:39 -0600, Justin Pryzby  wrote:
> I think ON_ERROR_STOP would control whether the script stops, but it should
> fail the exit status should reflect any error in the last command.  The
> shell does that even without set -e.

Let me correct my own language:

| I think ON_ERROR_STOP would control whether the script stops, but 
| the exit status should reflect any error in the last command.  The
| shell does that even without set -e.

What I mean is that "the exit status of the shell reflects the status of the
last command".  That's why you'll see at the bottom of shscripts an "exit 0"
which might seem to be implied.  Actually falling off the end of the script is
same as "exit" which is same as "exit $?" which may be nonzero if the preceding
command was a conditional/test, like:

for ...
 if ...
 fi
done
# without this, script will variously "fail" due to conditional evaluating to
# false:
exit 0

So I'm pointing out serious concern if psql would return 0 if last command
failed, a case where it's currently exiting with status 1.

Justin



Re: psql exit status with multiple -c or -f

2019-01-30 Thread David G. Johnston
On Wed, Jan 30, 2019 at 6:38 PM Kyotaro HORIGUCHI
 wrote:
> I guess the reason is that psql is widely used with just a single
> -c command and acutually the fix breaks the cases. So it doesn't
> seem back-pachable but it is apparently contradicting to
> documentation, which seems perfectly reasonable.
>
> So I propose to fix the behavior for 12 and back-patch
> documentation fix.
>
> | Exit Status
> |
> | psql returns 0 to the shell if it finished normally, 1 if a fatal
> | error of its own occurs (e.g. out of memory, file not found), 2
> | if the connection to the server went bad and the session was not
> | interactive, and 3 if an error occurred in a script and the
> | variable ON_ERROR_STOP was set.
> + As the only exception, irrespective of ON_ERROR_STOP setting,
> + psql returns 1 if the last executed command failed and it was
> + givin by -c option.
>

In head can we just turn ON_ERROR_STOP on by default when more than
one -c/-f is encountered, return 3, and call it a day.  Then, if the
user unsets ON_ERROR_STOP and does the same have psql always returns 0
unless a file specified with "-f" cannot be found (or some other
application error...).

If so can we maybe reconsider having ON_ERROR_STOP off by default generally...

I don't like saying -c "is the same as a line in a script" since -c
requires complete statements (and doesn't share transactions with
other -c lines).  Each -c simply provides psql with a single
auto-commit statement to execute - in the supplied order.  If we need
a word here "procedure" might be a good choice.

David J.



Re: Ordered Partitioned Table Scans

2019-01-30 Thread David Rowley
I've attached a rebased patch which fixes up the recent conflicts. No
other changes were made.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v7-0001-Allow-Append-to-be-used-in-place-of-MergeAppend-f.patch
Description: Binary data


Re: Proposed refactoring of planner header files

2019-01-30 Thread Tom Lane
I wrote:
> ... I didn't work on tightening selfuncs.c's dependencies.
> While I don't have a big problem with considering selfuncs.c to be
> in bed with the planner, that's risky in that whatever dependencies
> selfuncs.c has may well apply to extensions' selectivity estimators too.
> What I'm thinking about doing there is trying to split selfuncs.c into
> two parts, one being infrastructure that can be tightly tied to the
> core planner (and, likely, get moved under backend/optimizer/) and the
> other being estimators that use a limited API and can serve as models
> for extension code.

I spent some time looking at this, wondering whether it'd be practical
to write selectivity-estimating code that hasn't #included pathnodes.h
(nee relation.h).  It seems not: even pretty high-level functions
such as eqjoinsel() need access to fields like RelOptInfo.rows and
SpecialJoinInfo.jointype.  Now, there are only a few such fields, so
conceivably we could provide accessor functions in optimizer.h for
the commonly-used fields and keep the struct types opaque.   I'm not
excited about that though; it's unlike how we do things elsewhere in
Postgres and the mere savings of one #include dependency doesn't seem
to justify it.  So I'm thinking that planner support functions that
want to do selectivity estimation will still end up pulling in
pathnodes.h via selfuncs.h, and we'll just live with that.

However ... there are three pretty clearly identifiable groups of
functions in selfuncs.c: operator-specific estimators, support
functions, and AM-specific indexscan cost estimation functions.
There's a case to be made for splitting them into three separate files.
One point is that selfuncs.c is just too large; at over 8K lines,
it's currently the 7th-largest hand-maintained C file in our tree.
Another point is that as things stand, there's a strong temptation
to bury useful functionality in static functions that can't be
gotten at by extension estimators.  Separating the support functions
might help keep us more honest on that point.  (Or not.)

I'm not sure whether those arguments are strong enough to justify
the added pain-in-the-neck factor from moving a bunch of code
around.  That complicates back-patching bug fixes and it makes it
harder to trace the git history of the code.  So I've got mixed
emotions about whether it's worth doing anything.

Thoughts?

regards, tom lane



Re: A few new options for vacuumdb

2019-01-30 Thread Bossart, Nathan
On 1/30/19, 6:04 PM, "Michael Paquier"  wrote:
> Something which was not correct in the patch is the compatibility of
> the query.  xid <> xid has been added in 9.6, so the new options will
> not be able to work with older versions.  The versions marked as
> compatible in the last patch came from the age-ing functions, but you
> added direct comparisons with relfrozenxid and relminmxid in the
> latest versions of the patch.  This implementation goes down a couple
> of released versions, which is useful enough in my opinion, so I would
> keep it as-is.

Agreed.  Thanks for catching this.

> I have added as well some markups around "PostgreSQL" in the docs, and
> extra casts for the integer/xid values of the query.  The test
> patterns are also simplified, and I added tests for incorrect values
> of --min-xid-age and --min-mxid-age.  Does that look correct to you?

It looks good to me.  The only thing I noticed is the use of
relfrozenxid instead of relminmxid here:

+   appendPQExpBuffer(_query,
+ " %s 
GREATEST(pg_catalog.mxid_age(c.relminmxid),"
+ " 
pg_catalog.mxid_age(t.relminmxid)) OPERATOR(pg_catalog.>=)"
+ " '%d'::pg_catalog.int4\n"
+ " AND c.relfrozenxid 
OPERATOR(pg_catalog.!=)"
+ " '0'::pg_catalog.xid\n",
+ has_where ? "AND" : "WHERE", 
vacopts->min_mxid_age);

However, that may still work as intended.

Nathan



Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well

2019-01-30 Thread Michael Paquier
On Wed, Jan 30, 2019 at 08:18:31PM -0500, Tom Lane wrote:
> This looks a bit copy-and-paste-y to me, in particular no thought
> has been taken for the order of flags.  We found in configure that
> it's better to add user-specified CFLAGS at the *end*, even though
> injecting user-specified CPPFLAGS at the beginning is the right
> thing.  This is because in, eg, "-O2 -O0" the last flag wins.
> Presumably the same goes for CXXFLAGS.  I think it's right to
> put user LDFLAGS first, though.  (The argument for CPPFLAGS and
> LDFLAGS is that you want the user's -I and -L flags to go first.)

Ah yes, good point about CFLAGS and LDFLAGS.  It would be better to
add a comment about that and document the difference, aka "prepend" or
"append" the flag values.

CXXFLAGS applies to compiler options like -g -O2 which you would like
to enforce for the C++ compiler, so it seems to me that like CFLAGS
the custom values should be added at the end and not at the beginning,
no?
--
Michael


signature.asc
Description: PGP signature


Re: A few new options for vacuumdb

2019-01-30 Thread Michael Paquier
On Wed, Jan 30, 2019 at 05:45:58PM +, Bossart, Nathan wrote:
> On 1/29/19, 4:47 PM, "Michael Paquier"  wrote:
>> Oh, OK.  This makes sense.  It would be nice to add a comment in the
>> patch and to document this calculation method in the docs of
>> vacuumdb.
> 
> Sure, this is added in v8.

Thanks, Nathan.

Something which was not correct in the patch is the compatibility of
the query.  xid <> xid has been added in 9.6, so the new options will
not be able to work with older versions.  The versions marked as
compatible in the last patch came from the age-ing functions, but you
added direct comparisons with relfrozenxid and relminmxid in the
latest versions of the patch.  This implementation goes down a couple
of released versions, which is useful enough in my opinion, so I would
keep it as-is.

I have added as well some markups around "PostgreSQL" in the docs, and
extra casts for the integer/xid values of the query.  The test
patterns are also simplified, and I added tests for incorrect values
of --min-xid-age and --min-mxid-age.  Does that look correct to you?

> Thanks.  Something else I noticed is that we do not retrieve foreign
> tables and partitioned tables for --analyze and --analyze-only.
> However, that has long been the case for parallel mode, and this issue
> should probably get its own thread.

Good point, this goes down a couple of releases, and statistics on
both may be useful to compile for a system-wide operation.  Spawning a
separate thread looks adapted to me.
--
Michael
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index f304627802..41c7f3df79 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -172,6 +172,60 @@ PostgreSQL documentation
   
  
 
+ 
+  --min-mxid-age mxid_age
+  
+   
+Only execute the vacuum or analyze commands on tables with a multixact
+ID age of at least mxid_age.
+This setting is useful for prioritizing tables to process to prevent
+multixact ID wraparound (see
+).
+   
+   
+For the purposes of this option, the multixact ID age of a relation is
+the greatest of the ages of the main relation and its associated
+TOAST table, if one exists.  Since the commands
+issued by vacuumdb will also process the
+TOAST table for the relation if necessary, it does
+not need to be considered separately.
+   
+   
+
+ This option is only available for servers running
+ PostgreSQL 9.6 and later.
+
+   
+  
+ 
+
+ 
+  --min-xid-age xid_age
+  
+   
+Only execute the vacuum or analyze commands on tables with a
+transaction ID age of at least
+xid_age.  This setting
+is useful for prioritizing tables to process to prevent transaction
+ID wraparound (see ).
+   
+   
+For the purposes of this option, the transaction ID age of a relation
+is the greatest of the ages of the main relation and its associated
+TOAST table, if one exists.  Since the commands
+issued by vacuumdb will also process the
+TOAST table for the relation if necessary, it does
+not need to be considered separately.
+   
+   
+
+ This option is only available for servers running
+ PostgreSQL 9.6 and later.
+
+   
+  
+ 
+
  
   -q
   --quiet
diff --git a/src/bin/scripts/t/100_vacuumdb.pl b/src/bin/scripts/t/100_vacuumdb.pl
index 5e87af2d51..7f3a9b14a9 100644
--- a/src/bin/scripts/t/100_vacuumdb.pl
+++ b/src/bin/scripts/t/100_vacuumdb.pl
@@ -3,7 +3,7 @@ use warnings;
 
 use PostgresNode;
 use TestLib;
-use Test::More tests => 38;
+use Test::More tests => 44;
 
 program_help_ok('vacuumdb');
 program_version_ok('vacuumdb');
@@ -95,3 +95,20 @@ $node->command_checks_all(
 	[qr/^.*vacuuming database "postgres"/],
 	[qr/^WARNING.*cannot vacuum non-tables or special system tables/s],
 	'vacuumdb with view');
+$node->command_fails(
+	[ 'vacuumdb', '--table', 'vactable', '--min-mxid-age', '0',
+	  'postgres'],
+	'vacuumdb --min-mxid-age with incorrect value');
+$node->command_fails(
+	[ 'vacuumdb', '--table', 'vactable', '--min-xid-age', '0',
+	  'postgres'],
+	'vacuumdb --min-xid-age with incorrect value');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--table', 'vactable', '--min-mxid-age', '2147483000',
+	  'postgres'],
+	qr/GREATEST.*relminmxid.*2147483000/,
+	'vacuumdb --table --min-mxid-age');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--min-xid-age', '2147483001', 'postgres' ],
+	qr/GREATEST.*relfrozenxid.*2147483001/,
+	'vacuumdb --table --min-xid-age');
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 40ba8283a2..cb503569bb 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -43,6 +43,8 @@ typedef struct vacuumingOptions
 	bool		freeze;
 	bool		disable_page_skipping;
 	

Re: psql exit status with multiple -c or -f

2019-01-30 Thread Kyotaro HORIGUCHI
At Thu, 31 Jan 2019 10:37:28 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20190131.103728.153290385.horiguchi.kyot...@lab.ntt.co.jp>
> > I think ON_ERROR_STOP would control whether the script stops, but it should
> > fail the exit status should reflect any error in the last command.  The 
> > shell
> > does that even without set -e.
> 
> At least bash behaves exactly the same way to psql for me. (Just
> so there's not confusion, set -e works opposite as you think. I
> always use explcit exit to do that, though).
> 
> ===t.sh:
> hoge
> hage
> echo Ho-Ho-Ho
> ===
> $ bash t.sh ; echo $?
> test.sh: line 1: hoge: command not found
> test.sh: line 2: hage: command not found
> Ho-Ho-Ho
> 0
> 
> ===t.sh:
> set -e
> hoge
> hage
> echo Ho-Ho-Ho
> ===
> $ bash t.sh ; echo $?
> test.sh: line 2: hage: command not found
> 127

Sorry, that was not so good as an example.

=== t.sh
set -e
ls -impossible
echo HoHoHoHo
===
$ bash ~/test.sh; echo $?
ls: invalid option -- 'e'
Try 'ls --help' for more information.
2

=== without set -e
$ bash ~/test.sh; echo $?
ls: invalid option -- 'e'
Try 'ls --help' for more information.
HoHoHoHo
0


# Wow. ls -impossibl works!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: psql exit status with multiple -c or -f

2019-01-30 Thread Kyotaro HORIGUCHI
Hello.

At Tue, 18 Dec 2018 10:24:39 -0600, Justin Pryzby  wrote 
in <20181218162439.gb8...@telsasoft.com>
> On Tue, Dec 18, 2018 at 05:13:40PM +0900, Kyotaro HORIGUCHI wrote:
> > $  psql postgres -v ON_ERROR_STOP=0 -f ~/work/y.txt ; echo $?
> > $  psql postgres -v ON_ERROR_STOP=0 < ~/work/y.txt ; echo $?
> 
> > c) psql postgres -v ON_ERROR_STOP=0 -c foo -c 'select 1'; echo $?
> > d) psql postgres -v ON_ERROR_STOP=0 -c 'select 1' -c foo; echo $?
> > 
> > (c) returns 0 and (d) returns 1, but both return 1 when
> > ON_ERROR_STOP=1.
> 
> > The attached second patch lets (c) and (d) behave the same as (a)
> > and (b).
> 
> I don't think the behavior in the single command case should be changed:

I guess the reason is that psql is widely used with just a single
-c command and acutually the fix breaks the cases. So it doesn't
seem back-pachable but it is apparently contradicting to
documentation, which seems perfectly reasonable.

So I propose to fix the behavior for 12 and back-patch
documentation fix.

| Exit Status
| 
| psql returns 0 to the shell if it finished normally, 1 if a fatal
| error of its own occurs (e.g. out of memory, file not found), 2
| if the connection to the server went bad and the session was not
| interactive, and 3 if an error occurred in a script and the
| variable ON_ERROR_STOP was set.
+ As the only exception, irrespective of ON_ERROR_STOP setting,
+ psql returns 1 if the last executed command failed and it was
+ givin by -c option.

# What a ...


> |[pryzbyj@database postgresql]$ ./src/bin/psql/psql ts -c FOO 2>/dev/null ; 
> echo $? 
> |1
> |[pryzbyj@database postgresql]$ patch -p1 
>  |patching file src/bin/psql/startup.c
> |[pryzbyj@database postgresql]$ make >/dev/null
> |[pryzbyj@database postgresql]$ ./src/bin/psql/psql ts -c FOO 2>/dev/null ; 
> echo $? 
> |0
> 
> Also, unpatched v11 psql returns status of last command
> |[pryzbyj@database postgresql]$ psql ts -xtc 'SELECT 1' -c FOO 2>/dev/null ; 
> echo $? 
> |?column? | 1
> |
> |1
> 
> patched psql returns 0:
> |[pryzbyj@database postgresql]$ ./src/bin/psql/psql ts -xtc 'SELECT 1' -c FOO 
> 2>/dev/null ; echo $? 
> |?column? | 1
> |
> |0
> 
> I'd prefer to see the exit status of the -f scripts (your cases a and b)
> changed to 3 if the last command failed.  I realized that's not what the
> documentation says so possibly not backpatchable.

It doesn't make sense that psql exits with ERROR stauts when
accidentially the last command was failed while erros occurred so
far was ignored. It is achieved by ON_ERROR_STOP=1 in a better way.

> |3 if an error occurred in a script and the variable ON_ERROR_STOP was set.
> 
> Think of:
> 
> $ cat /tmp/sql 
> begin;
> foo;
> select 1;
> 
> $ psql ts -f /tmp/sql ; echo ret=$?
> BEGIN
> psql:/tmp/sql:2: ERROR:  syntax error at or near "foo"
> LINE 1: foo;
> ^
> psql:/tmp/sql:3: ERROR:  current transaction is aborted, commands ignored 
> until end of transaction block
> ret=0

As you wrote below, ON_ERROR_STOP=1 would give the desired
behavior.

> I think ON_ERROR_STOP would control whether the script stops, but it should
> fail the exit status should reflect any error in the last command.  The shell
> does that even without set -e.

At least bash behaves exactly the same way to psql for me. (Just
so there's not confusion, set -e works opposite as you think. I
always use explcit exit to do that, though).

===t.sh:
hoge
hage
echo Ho-Ho-Ho
===
$ bash t.sh ; echo $?
test.sh: line 1: hoge: command not found
test.sh: line 2: hage: command not found
Ho-Ho-Ho
0

===t.sh:
set -e
hoge
hage
echo Ho-Ho-Ho
===
$ bash t.sh ; echo $?
test.sh: line 2: hage: command not found
127

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well

2019-01-30 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Jan 30, 2019 at 02:41:01PM +0100, Christoph Berg wrote:
>> Do we still want some CXXOPT flag for the server build? I can write a
>> patch, but someone else would need to do the bikeshedding how to name
>> it, and which of the existing knobs would set CXXFLAGS along. I don't
>> think I need that feature.

> If we don't directly need it, let's not add it now but let's revisit
> the need if it proves necessary.

+1

> I propose to just commit the last
> patch you sent, and back-patch to ease integration with existing
> extensions.  Any objections?

A thought here:

 ifdef PG_CPPFLAGS
 override CPPFLAGS := $(PG_CPPFLAGS) $(CPPFLAGS)
 endif
+ifdef PG_CFLAGS
+override CFLAGS := $(PG_CFLAGS) $(CFLAGS)
+endif
+ifdef PG_CXXFLAGS
+override CXXFLAGS := $(PG_CXXFLAGS) $(CXXFLAGS)
+endif
+ifdef PG_LDFLAGS
+override LDFLAGS := $(PG_LDFLAGS) $(LDFLAGS)
+endif

This looks a bit copy-and-paste-y to me, in particular no thought
has been taken for the order of flags.  We found in configure that
it's better to add user-specified CFLAGS at the *end*, even though
injecting user-specified CPPFLAGS at the beginning is the right
thing.  This is because in, eg, "-O2 -O0" the last flag wins.
Presumably the same goes for CXXFLAGS.  I think it's right to
put user LDFLAGS first, though.  (The argument for CPPFLAGS and
LDFLAGS is that you want the user's -I and -L flags to go first.)

regards, tom lane



RE: pg_upgrade: Pass -j down to vacuumdb

2019-01-30 Thread Jamison, Kirk
On January 29, 2019 8:19 PM +, Jesper Pedersen wrote:
>On 1/29/19 12:08 AM, Michael Paquier wrote:
>> I would document the optional VACUUM_OPTS on the page of pg_upgrade.
>> If Peter thinks it is fine to not do so, that's fine for me as well.
>> 
..
>I added most of the documentation back, as requested by Kirk.

Actually, I find it useful to be documented. However, major contributors have 
higher opinions in terms of experience, so I think it's alright with me not to 
include the doc part if they finally say so.

>> It seems to me that the latest patch sent is incorrect for multiple
>> reasons:
>> 1) You still enforce -j to use the number of jobs that the caller of 
>> pg_upgrade provides, and we agreed that both things are separate 
>> concepts upthread, no?  What has been suggested by Alvaro is to add a 
>> comment so as one can use VACUUM_OPTS with -j optionally, instead of 
>> suggesting a full-fledged vacuumdb command which depends on what 
>> pg_upgrade uses.  So there is no actual need for the if/else 
>> complication business.

> I think it is ok for the echo command to highlight to the user that running 
> --analyze-only using the same amount of jobs will give a faster result.

I missed that part. 
IIUC, using the $VACUUMDB_OPTS variable would simplify and correct the usage of 
jobs for vacuumdb. (pg_upgrade jobs is not equal to vacuumdb jobs) Plus, it 
might simplify and reduce the number of additional lines.
Tom Lane also suggested above to make the script absorb the value from env.
Would that address your concern of getting a faster result, yes?


Regards,
Kirk Jamison




Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well

2019-01-30 Thread Michael Paquier
On Wed, Jan 30, 2019 at 02:41:01PM +0100, Christoph Berg wrote:
> Do we still want some CXXOPT flag for the server build? I can write a
> patch, but someone else would need to do the bikeshedding how to name
> it, and which of the existing knobs would set CXXFLAGS along. I don't
> think I need that feature.

If we don't directly need it, let's not add it now but let's revisit
the need if it proves necessary.  I propose to just commit the last
patch you sent, and back-patch to ease integration with existing
extensions.  Any objections?
--
Michael


signature.asc
Description: PGP signature


Re: backslash-dot quoting in COPY CSV

2019-01-30 Thread Michael Paquier
On Wed, Jan 30, 2019 at 01:20:59PM -0500, Tom Lane wrote:
> Given how long we've had COPY CSV support, and the tiny number of
> complaints to date, I do not think it's something to panic over.
> Disallowing the functionality altogether is surely an overreaction.
>
> A documentation warning might be the appropriate response.  I don't
> see any plausible way for psql to actually fix the problem, short
> of a protocol change to allow the backend to tell it how the data
> stream is going to be parsed.

Yes, agreed.  I looked at this problem a couple of months (year(s)?)
ago and gave up on designing a clear portable solution after a couple
of hours over it, and that's quite a corner case.
--
Michael


signature.asc
Description: PGP signature


Re: Unused parameters & co in code

2019-01-30 Thread Michael Paquier
On Wed, Jan 30, 2019 at 12:33:47PM +0100, Peter Eisentraut wrote:
> On 30/01/2019 08:33, Michael Paquier wrote:
>> I just got a run with CFLAGS with the following options:
>> -Wunused-function -Wunused-variable -Wunused-const-variable
> 
> These are part of -Wall.

The ones selected already generate a lot of junk, so increasing the
output is not really a good idea.  What I wanted to find out are the
spots where we could be able to simplify the code for any unused
parameter.  As you mention, some parameters are here for symmetry in
the declaration, which makes sense in some cases, but for some other
cases I think that we may be able to reduce logic complexity, and this
gives hints about that.

>>  -Wno-unused-result
> 
> What is the purpose of this?

Not really useful actually as we don't mark anything with
warn_unused_result.

>> -Wunused-macros
> 
> I have looked into that in the past.  There are a few that were
> genuinely left behind accidentally, but most are there for completeness
> or symmetry and don't look like they should be removed.  Also you get
> junk from flex and perl and the like.  Needs to be addressed case by
> case.  I can dig up my old branch and make some proposals.

Thanks.  Maybe I missed some of them.  Some macros, like the xml one,
are here for documentation purposes, so removing such things does not
make sense either.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Allow anonymous rowtypes in function return column definition

2019-01-30 Thread Tom Lane
Elvis Pranskevichus  writes:
> On Wednesday, January 30, 2019 5:59:41 PM EST Tom Lane wrote:
>> I still found this pretty disjointed.  After a bit of thought I
>> propose the attached reformulation, which has the callers just tell
>> the routines which types to allow explicitly, with the justification
>> comments living at the call sites instead of within the routines.

> This is a much better formulation, thank you!

OK, pushed.

regards, tom lane



Re: Unused parameters & co in code

2019-01-30 Thread Michael Paquier
On Wed, Jan 30, 2019 at 09:41:04AM -0500, Tom Lane wrote:
> I'd definitely take this on a case-by-case basis.  In the planner
> functions you mentioned, for instance, I'd be pretty hesitant to remove
> the "root" parameter even if it happens not to be needed today.
> We'd probably just end up putting it back in the future, because almost
> everything in the planner needs that.  I'd only consider removing it in
> cases where there was a solid reason to require the function not to need
> it ever (as for instance what I just did to flatten_join_alias_vars).

Definitely agreed, this is a case-by-case.  For the callbacks and
hooks it makes no sense, the planner ones and a couple of layers like
shared memory calculation are just here for symmetry, so most of the
report is really noise (I deliberately discarded anything related to
bison and generated code of course).

> In cases where we can get simplifications of calling layers, and
> it doesn't seem likely that we'd have to undo it in future, then
> probably it's worth the trouble to change.

Some of these are in tablecmds.c, and visibly worth the trouble.  The
ones in the btree, gin and gist code also could do for some cleanup at
quick sight.  And these are places where we complain a lot about the
complexity of the code.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Allow anonymous rowtypes in function return column definition

2019-01-30 Thread Elvis Pranskevichus
On Wednesday, January 30, 2019 5:59:41 PM EST Tom Lane wrote:
> I still found this pretty disjointed.  After a bit of thought I
> propose the attached reformulation, which has the callers just tell
> the routines which types to allow explicitly, with the justification
> comments living at the call sites instead of within the routines.

This is a much better formulation, thank you!

  Elvis





Re: Delay locking partitions during query execution

2019-01-30 Thread David Rowley
On Tue, 29 Jan 2019 at 19:42, Amit Langote
 wrote:
> However, I tried the example as you described and the plan *doesn't*
> change due to concurrent update of reloptions with master (without the
> patch) either.

Well, I didn't think to try that.  I just assumed had broken it.
Could well be related to lowering the lock levels on setting these
reloptions.  Once upon a time, they required an AEL but that was
changed in e5550d5fec6. So previous to that commit you'd have been
blocked from making the concurrent change while the other session was
in a transaction.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [PATCH] Allow anonymous rowtypes in function return column definition

2019-01-30 Thread Tom Lane
Elvis Pranskevichus  writes:
> On Sunday, January 13, 2019 4:57:48 PM EST Tom Lane wrote:
>> I feel a bit uncomfortable about defining the new flag to
>> CheckAttributeNamesTypes as "allow_anonymous_records"; that
>> seems pretty short-sighted and single-purpose, especially in
>> view of there being no obvious reason why it shouldn't accept
>> RECORDARRAY too.  Maybe with a clearer explanation of the
>> issues above, we could define that flag in a more on-point way.

> I renamed it to "in_coldeflist", which seems like a clearer indication 
> that we are validating that, and not a regular table definition.

I still found this pretty disjointed.  After a bit of thought I propose
the attached reformulation, which has the callers just tell the routines
which types to allow explicitly, with the justification comments living
at the call sites instead of within the routines.

One point that we hadn't really talked about is what happens when
CheckArrayTypes recurses.  The existing logic is that it just passes
down the same restrictions it was told at the top level, and I kept
that here.  Right now it hardly matters what we pass down, because
the base type of a domain or array can't be a pseudotype, and we
don't really expect to find one in a named composite type either.
The one case where it could matter is if someone tries to use
"pg_statistic" as a field's composite type: that would be allowed if
allow_system_table_mods and otherwise not.  But it's not really
hard to imagine future additions where it would matter and it'd
be important to restrict things as we recurse down.  I think this
formulation makes it easier to see what to do in such cases.

regards, tom lane

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 910f651..06d18a1 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -445,11 +445,14 @@ heap_create(const char *relname,
  *		this is used to make certain the tuple descriptor contains a
  *		valid set of attribute names and datatypes.  a problem simply
  *		generates ereport(ERROR) which aborts the current transaction.
+ *
+ *		relkind is the relkind of the relation to be created.
+ *		flags controls which datatypes are allowed, cf CheckAttributeType.
  * 
  */
 void
 CheckAttributeNamesTypes(TupleDesc tupdesc, char relkind,
-		 bool allow_system_table_mods)
+		 int flags)
 {
 	int			i;
 	int			j;
@@ -507,7 +510,7 @@ CheckAttributeNamesTypes(TupleDesc tupdesc, char relkind,
 		   TupleDescAttr(tupdesc, i)->atttypid,
 		   TupleDescAttr(tupdesc, i)->attcollation,
 		   NIL, /* assume we're creating a new rowtype */
-		   allow_system_table_mods);
+		   flags);
 	}
 }
 
@@ -524,13 +527,22 @@ CheckAttributeNamesTypes(TupleDesc tupdesc, char relkind,
  * containing_rowtypes.  When checking a to-be-created rowtype, it's
  * sufficient to pass NIL, because there could not be any recursive reference
  * to a not-yet-existing rowtype.
+ *
+ * flags is a bitmask controlling which datatypes we allow.  For the most
+ * part, pseudo-types are disallowed as attribute types, but there are some
+ * exceptions: ANYARRAYOID, RECORDOID, and RECORDARRAYOID can be allowed
+ * in some cases.  (This works because values of those type classes are
+ * self-identifying to some extent.  However, RECORDOID and RECORDARRAYOID
+ * are reliably identifiable only within a session, since the identity info
+ * may use a typmod that is only locally assigned.  The caller is expected
+ * to know whether these cases are safe.)
  * 
  */
 void
 CheckAttributeType(const char *attname,
    Oid atttypid, Oid attcollation,
    List *containing_rowtypes,
-   bool allow_system_table_mods)
+   int flags)
 {
 	char		att_typtype = get_typtype(atttypid);
 	Oid			att_typelem;
@@ -538,12 +550,18 @@ CheckAttributeType(const char *attname,
 	if (att_typtype == TYPTYPE_PSEUDO)
 	{
 		/*
-		 * Refuse any attempt to create a pseudo-type column, except for a
-		 * special hack for pg_statistic: allow ANYARRAY when modifying system
-		 * catalogs (this allows creating pg_statistic and cloning it during
-		 * VACUUM FULL)
+		 * We disallow pseudo-type columns, with the exception of ANYARRAY,
+		 * RECORD, and RECORD[] when the caller says that those are OK.
+		 *
+		 * We don't need to worry about recursive containment for RECORD and
+		 * RECORD[] because (a) no named composite type should be allowed to
+		 * contain those, and (b) two "anonymous" record types couldn't be
+		 * considered to be the same type, so infinite recursion isn't
+		 * possible.
 		 */
-		if (atttypid != ANYARRAYOID || !allow_system_table_mods)
+		if (!((atttypid == ANYARRAYOID && (flags & CHKATYPE_ANYARRAY)) ||
+			  (atttypid == RECORDOID && (flags & CHKATYPE_ANYRECORD)) ||
+			  (atttypid == RECORDARRAYOID && (flags & CHKATYPE_ANYRECORD
 			ereport(ERROR,
 	

Re: pg_dump multi VALUES INSERT

2019-01-30 Thread David Rowley
On Thu, 24 Jan 2019 at 04:45, Fabien COELHO  wrote:
> >> The feature is not tested anywhere. I still think that there should be a
> >> test on empty/small/larger-than-rows-per-insert tables, possibly added to
> >> existing TAP-tests.
> >
> > I was hoping to get away with not having to do that... mainly because
> > I've no idea how.
>
> Hmmm. That is another question! Maybe someone will help.

Here's another version, same as before but with tests this time.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pg_dump-rows-per-insert-option_v13.patch
Description: Binary data


Re: [PATCH] Log PostgreSQL version number on startup

2019-01-30 Thread Peter Eisentraut
On 29/01/2019 16:46, Christoph Berg wrote:
> Re: Peter Eisentraut 2019-01-16 
> <92bfdfdf-4164-aec5-4e32-c26e67821...@2ndquadrant.com>
>>> Why don't we start the logging collector before opening the sockets?
>>
>> Specifically, something like the attached.
>>
>> This keeps the dynamic module loading before the logging collector
>> start, so we see those error messages on stderr, but then the setting up
>> of the sockets would get logged.
> 
> This works nicely, so +1.
> 
> I'm attaching your patch as 0001 and my rebased one on top of it as
> 0002.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: speeding up planning with partitions

2019-01-30 Thread David Rowley
On Tue, 29 Jan 2019 at 22:32, Amit Langote
 wrote:
> Attached updated patches.

I think we could make the 0001 patch a bit smaller if we were to apply
the attached first.

Details in the commit message.

What do you think?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0001-Move-building-of-child-base-quals-out-into-a-new-fun.patch
Description: Binary data


Re: backslash-dot quoting in COPY CSV

2019-01-30 Thread Tom Lane
Pavel Stehule  writes:
> st 30. 1. 2019 18:51 odesílatel Bruce Momjian  napsal:
>> I am wondering if we should just disallow CSV from STDIN, on security
>> grounds.  How big a problem would that be for people?  Would we have to
>> disable to STDOUT as well since it could not be restored?  Should we
>> issue some kind of security warning in such cases?  Should we document
>> this?

> it is pretty common pattern for etl, copy from stdin. I am thinking it can
> be big problem

Given how long we've had COPY CSV support, and the tiny number of
complaints to date, I do not think it's something to panic over.
Disallowing the functionality altogether is surely an overreaction.

I don't really see an argument for calling it a security problem,
given that pg_dump doesn't use CSV and it isn't the default for
anything else either.  Sure, you can imagine some bad actor hoping
to cause problems by putting crafted data into a table, but how
does that data end up in a script that's using COPY CSV FROM STDIN
(as opposed to copying out-of-line data)?  It's a bit far-fetched.

A documentation warning might be the appropriate response.  I don't
see any plausible way for psql to actually fix the problem, short
of a protocol change to allow the backend to tell it how the data
stream is going to be parsed.

regards, tom lane



Re: backslash-dot quoting in COPY CSV

2019-01-30 Thread Pavel Stehule
st 30. 1. 2019 18:51 odesílatel Bruce Momjian  napsal:

> On Wed, Jan 30, 2019 at 06:32:11PM +0100, Daniel Verite wrote:
> >   Bruce Momjian wrote:
> > > Well, these all kind of require a change to the COPY format, which
> > > hasn't changed in many years.
> >
> > Not for the first two. As an example of solution #2, it could look like
> this:
> >
> > =# \set INLINE_COPY_BOUNDARY ==JuQW3gc2mQjXuvmJ32TlOLhJ3F2Eh2LcsBup0oH7==
> > =# COPY table FROM STDIN CSV;
> > somevalue,"foo
> > \.
> > bar"
> > ==JuQW3gc2mQjXuvmJ32TlOLhJ3F2Eh2LcsBup0oH7==
> >
> > Instead of looking for \. on a line by itself, psql would look for the
> > boundary to know where the data ends.
> > The boundary is not transmitted to the server, it has no need to know
> > about it.
>
> Wow, that is an odd API, as you stated below.
>
> > > > - psql could be told somehow that the next piece of inline data is in
> > > > the CSV format, and then pass it through a CSV parser.
> > >
> > > That might be the cleanest solution, but how would we actually input
> > > multi-line data in CSV mode with \. alone on a line?
> >
> > With this solution, the content doesn't change at all.
> > The weird part would be the user interface, because the information
> > psql needs is not only "CSV", it's also the options DELIMITER, QUOTE,
> > ESCAPE and possibly ENCODING. Currently it doesn't know any of these,
> > they're passed to the server in an opaque, unparsed form within
> > the COPY command.
> >
> > Personally, the solution I find cleaner is the server side not having
> > any end-of-data marker for CSV. So backslash-dot would never be
> > special. psql could allow for a custom ending boundary for in-script
> > data, and users could set that to backslash-dot if they want, but that
> > would be their choice.
> > That would be clearly not backward compatible, and I believe it wouldn't
> > work with the v2 protocol, so I'm not sure it would have much chance of
> > approval.
>
> I had forgotten that the DELIMITER and QUOTE can be changed --- that
> kills the idea of adding a simple CSV parser into psql because we would
> have to parse the COPY SQL command as well.
>
> I am wondering if we should just disallow CSV from STDIN, on security
> grounds.  How big a problem would that be for people?  Would we have to
> disable to STDOUT as well since it could not be restored?  Should we
> issue some kind of security warning in such cases?  Should we document
> this?
>

it is pretty common pattern for etl, copy from stdin. I am thinking it can
be big problem



> In hindsight, I am not sure how we could have designed this more
> securly.  I guess we could have required some special text to start all
> CSV continuation lines that were not end-of-file, but that would have
> been very unportable, which is the goal of CSV.
>
> --
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
> + As you are, so once was I.  As I am, so you will be. +
> +  Ancient Roman grave inscription +
>
>


Re: backslash-dot quoting in COPY CSV

2019-01-30 Thread Bruce Momjian
On Wed, Jan 30, 2019 at 06:32:11PM +0100, Daniel Verite wrote:
>   Bruce Momjian wrote:
> > Well, these all kind of require a change to the COPY format, which
> > hasn't changed in many years.
> 
> Not for the first two. As an example of solution #2, it could look like this:
> 
> =# \set INLINE_COPY_BOUNDARY ==JuQW3gc2mQjXuvmJ32TlOLhJ3F2Eh2LcsBup0oH7==
> =# COPY table FROM STDIN CSV;
> somevalue,"foo
> \.
> bar"
> ==JuQW3gc2mQjXuvmJ32TlOLhJ3F2Eh2LcsBup0oH7==
> 
> Instead of looking for \. on a line by itself, psql would look for the
> boundary to know where the data ends.
> The boundary is not transmitted to the server, it has no need to know
> about it.

Wow, that is an odd API, as you stated below.

> > > - psql could be told somehow that the next piece of inline data is in
> > > the CSV format, and then pass it through a CSV parser.
> > 
> > That might be the cleanest solution, but how would we actually input
> > multi-line data in CSV mode with \. alone on a line?
> 
> With this solution, the content doesn't change at all.
> The weird part would be the user interface, because the information
> psql needs is not only "CSV", it's also the options DELIMITER, QUOTE,
> ESCAPE and possibly ENCODING. Currently it doesn't know any of these,
> they're passed to the server in an opaque, unparsed form within
> the COPY command.
> 
> Personally, the solution I find cleaner is the server side not having
> any end-of-data marker for CSV. So backslash-dot would never be
> special. psql could allow for a custom ending boundary for in-script
> data, and users could set that to backslash-dot if they want, but that
> would be their choice.
> That would be clearly not backward compatible, and I believe it wouldn't
> work with the v2 protocol, so I'm not sure it would have much chance of
> approval.

I had forgotten that the DELIMITER and QUOTE can be changed --- that
kills the idea of adding a simple CSV parser into psql because we would
have to parse the COPY SQL command as well.

I am wondering if we should just disallow CSV from STDIN, on security
grounds.  How big a problem would that be for people?  Would we have to
disable to STDOUT as well since it could not be restored?  Should we
issue some kind of security warning in such cases?  Should we document
this?

In hindsight, I am not sure how we could have designed this more
securly.  I guess we could have required some special text to start all
CSV continuation lines that were not end-of-file, but that would have
been very unportable, which is the goal of CSV.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: speeding up planning with partitions

2019-01-30 Thread Amit Langote
On Wed, Jan 30, 2019 at 3:20 AM David Rowley
 wrote:
>
> On Tue, 29 Jan 2019 at 22:32, Amit Langote
>  wrote:
> >
> > On 2019/01/29 11:23, David Rowley wrote:
> > > 7. In set_inherit_target_rel_sizes() I still don't really like the way
> > > you're adjusting the EquivalenceClasses. Would it not be better to
> > > invent a function similar to adjust_appendrel_attrs(), or just use
> > > that function?
> >
> > OK, I added a function copy_eq_classes_for_child_root() that simply makes
> > a copy of eq_classes from the source root (deep-copying where applicable)
> > and returns the list.
>
> hmm, but you've added a case for SpecialJoinInfo, is there a good
> reason not just to do the translation by just adding an
> EquivalenceClass case to adjust_appendrel_attrs_mutator() then just
> get rid of the new "replace" flag in add_child_rel_equivalences()?
> That way you'd also remove the churn in the couple of places you've
> had to modify the existing calls to add_child_rel_equivalences().

Sounds like something worth trying to make work.  Will do, thanks for the idea.

Thanks,
Amit



Re: backslash-dot quoting in COPY CSV

2019-01-30 Thread Daniel Verite
Bruce Momjian wrote:

> > - the end of data could be expressed as a length (in number of lines
> > for instance) instead of an in-data marker.
> > 
> > - the end of data could be configurable, as in the MIME structure of
> > multipart mail messages, where a part is ended by a "boundary",
> > line, generally a long randomly generated string. This boundary
> > would have to be known to psql through setting a dedicated
> > variable or command.
> > 
> > - COPY as the SQL command could have the boundary option
> > for data fed through its STDIN. This could neutralize the
> > special role of backslash-dot in general, not just in quoted fields,
> > since the necessity to quote backslash-dot is a wart anyway.
> 
> Well, these all kind of require a change to the COPY format, which
> hasn't changed in many years.

Not for the first two. As an example of solution #2, it could look like this:

=# \set INLINE_COPY_BOUNDARY ==JuQW3gc2mQjXuvmJ32TlOLhJ3F2Eh2LcsBup0oH7==
=# COPY table FROM STDIN CSV;
somevalue,"foo
\.
bar"
==JuQW3gc2mQjXuvmJ32TlOLhJ3F2Eh2LcsBup0oH7==

Instead of looking for \. on a line by itself, psql would look for the
boundary to know where the data ends.
The boundary is not transmitted to the server, it has no need to know
about it.

> > - psql could be told somehow that the next piece of inline data is in
> > the CSV format, and then pass it through a CSV parser.
> 
> That might be the cleanest solution, but how would we actually input
> multi-line data in CSV mode with \. alone on a line?

With this solution, the content doesn't change at all.
The weird part would be the user interface, because the information
psql needs is not only "CSV", it's also the options DELIMITER, QUOTE,
ESCAPE and possibly ENCODING. Currently it doesn't know any of these,
they're passed to the server in an opaque, unparsed form within
the COPY command.

Personally, the solution I find cleaner is the server side not having
any end-of-data marker for CSV. So backslash-dot would never be
special. psql could allow for a custom ending boundary for in-script
data, and users could set that to backslash-dot if they want, but that
would be their choice.
That would be clearly not backward compatible, and I believe it wouldn't
work with the v2 protocol, so I'm not sure it would have much chance of
approval.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: strange error sequence on parallel btree creation

2019-01-30 Thread Peter Geoghegan
Hi Álvaro,

On Wed, Jan 30, 2019 at 5:40 AM Alvaro Herrera  wrote:
> Note that those come from the same create index: the one on process
> 46299 must evidently be a parallel worker.  It's weird that two
> processes report the index building error.  But even if it were correct,
> the CONTEXT line in the other process is not okay ... precisely because
> it's the parent.
>
> What I did was
>
> create table a as select * from generate_series(1, 100);
> insert into a select * from generate_series(1, 8000);
> create index on a (generate_series);

I can see why you'd find that slightly confusing. I'm not sure what
can be done about this scenario specifically, though. It seems to come
down to how the parallel infrastructure works, which is not something
that I gave much input on.

Fundamentally, the parallel infrastructure wants to propagate all
errors that it received before parallel workers were shut down. I
think that that's probably the right thing to do. I'm not sure what
you mean by "But even if it were correct, the CONTEXT line in the
other process is not okay ... precisely because it's the parent".
Perhaps you can go into more detail on that. The CONTEXT looks like it
would look regardless of this race.

In any case, I think that the chances of this happening in production
are pretty slim. The error messages each refer to specific, distinct
pairs of duplicates (duplicated values). It's probably necessary to
have an enormous number of duplicates for things to work out this way.
That's hardly typical.

-- 
Peter Geoghegan



Re: Index Skip Scan

2019-01-30 Thread Dmitry Dolgov
> On Sun, Jan 27, 2019 at 6:17 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > On Sat, Jan 26, 2019 at 6:45 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> >
> > Rebased version after rd_amroutine was renamed.
>
> And one more to fix the documentation. Also I've noticed few TODOs in the 
> patch
> about the missing docs, and replaced them with a required explanation of the
> feature.

A bit of adjustment after nodes/relation -> nodes/pathnodes.


0001-Index-skip-scan-v8.patch
Description: Binary data


Re: WIP: Avoid creation of the free space map for small tables

2019-01-30 Thread Masahiko Sawada
On Wed, Jan 30, 2019 at 4:33 AM Amit Kapila  wrote:
>
> On Tue, Jan 29, 2019 at 8:12 PM John Naylor  
> wrote:
> >
> > On Tue, Jan 29, 2019 at 11:56 AM Amit Kapila  
> > wrote:
> >
> > > You can find this change in attached patch.  Then, I ran the make
> > > check in src/bin/pgbench multiple times using test_conc_insert.sh.
> > > You can vary the number of times the test should run, if you are not
> > > able to reproduce it with this.
> > >
> > > The attached patch (clear_local_map_if_exists_1.patch) atop the main
> > > patch fixes the issue for me.  Kindly verify the same.
> >
> > I got one failure in 50 runs. With the new patch, I didn't get any
> > failures in 300 runs.
> >
>
> Thanks for verification.  I have included it in the attached patch and
> I have also modified the page.sql test to have enough number of pages
> in relation so that FSM will get created irrespective of alignment
> boundaries.  Masahiko San, can you verify if this now works for you?
>

Thank you for updating the patch!

The modified page.sql test could fail if the block size is more than
8kB? We can ensure the number of pages are more than 4 by checking it
and adding more data if no enough but I'm really not sure we should
care the bigger-block size cases. However maybe it's good to check the
number of pages after insertion so that we can break down the issue in
case the test failed again.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] proposal: schema variables

2019-01-30 Thread Pavel Stehule
Hi

just rebase

Regards

Pavel


schema-variables-20190130.patch.gz
Description: application/gzip


Re: proposal: new polymorphic types - commontype and commontypearray

2019-01-30 Thread Pavel Stehule
st 30. 1. 2019 v 17:00 odesílatel Pavel Stehule 
napsal:

>
>
> po 28. 1. 2019 v 20:47 odesílatel Robert Haas 
> napsal:
>
>> On Fri, Jan 25, 2019 at 7:21 PM Tom Lane  wrote:
>> > Anyway I think the names need to be any-something.
>>
>> To me, that seems unnecessarily rigid.  Not a bad idea if we can come
>> up with something that is otherwise acceptable.  But all of your
>> suggestions sound worse than Pavel's proposal, so...
>>
>
> I implemented commontypenonarray, and commontyperange types. Now, a SQL
> functions are supported too.
>
> The naming is same - I had not a better idea. But it can be changed
> without any problems, if somebody come with some more acceptable.
>
> I don't think so the name is too important. The polymorphic types are
> important, interesting for extension's developers what is small group of
> Postgres users.
>
> And personally, I think so commontype and commontypearray are good enough
> for not native speakers like me. But I am opened any variant - I think so
> this functionality is interesting
> and partially coverage one gap in our implementation of polymorphic types.
>

maybe "supertype". It is one char shorter .. somewhere is term
"supperclass, ..."

In Czech language this term is short, "nadtyp", but probably it is not
acceptable :)



> Regards
>
> Pavel
>
>
>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>


Re: proposal: new polymorphic types - commontype and commontypearray

2019-01-30 Thread Pavel Stehule
po 28. 1. 2019 v 20:47 odesílatel Robert Haas 
napsal:

> On Fri, Jan 25, 2019 at 7:21 PM Tom Lane  wrote:
> > Anyway I think the names need to be any-something.
>
> To me, that seems unnecessarily rigid.  Not a bad idea if we can come
> up with something that is otherwise acceptable.  But all of your
> suggestions sound worse than Pavel's proposal, so...
>

I implemented commontypenonarray, and commontyperange types. Now, a SQL
functions are supported too.

The naming is same - I had not a better idea. But it can be changed without
any problems, if somebody come with some more acceptable.

I don't think so the name is too important. The polymorphic types are
important, interesting for extension's developers what is small group of
Postgres users.

And personally, I think so commontype and commontypearray are good enough
for not native speakers like me. But I am opened any variant - I think so
this functionality is interesting
and partially coverage one gap in our implementation of polymorphic types.

Regards

Pavel



> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index ed0ee584c9..f9d10204e3 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4761,6 +4761,14 @@ SELECT * FROM pg_attribute
 anyrange

 
+   
+commontype
+   
+
+   
+commontypearray
+   
+

 void

@@ -4870,6 +4878,34 @@ SELECT * FROM pg_attribute
 ).

 
+   
+commontype
+Indicates that a function accepts any data type. Values
+are converted to real common type.
+(see ).
+   
+
+   
+commontypearray
+Indicates that a function accepts any array data type. The
+elements of array are converted to common type of these values.
+(see ).
+   
+
+   
+commontypenonarray
+Indicates that a function accepts any non-array data type
+(see ).
+   
+
+   
+commontyperange
+Indicates that a function accepts any range data type
+(see  and
+). The subtype can be used for
+deduction of common type.
+   
+

 cstring
 Indicates that a function accepts or returns a null-terminated C string.
diff --git a/doc/src/sgml/extend.sgml b/doc/src/sgml/extend.sgml
index a6b77c1cfe..dece346c03 100644
--- a/doc/src/sgml/extend.sgml
+++ b/doc/src/sgml/extend.sgml
@@ -231,7 +231,7 @@
 
  Five pseudo-types of special interest are anyelement,
  anyarray, anynonarray, anyenum,
- and anyrange,
+ anyrange, commontype and commontypearray.
  which are collectively called polymorphic types.
  Any function declared using these types is said to be
  a polymorphic function.  A polymorphic function can
@@ -267,6 +267,15 @@
  be an enum type.
 
 
+
+ Second family of polymorphic types are types commontype and
+ commontypearray. These types are similar to types
+ anyelement and anyarray. The arguments declared
+ as anyelement requires same real type of passed values. For
+ commontype's arguments is selected common type, and later
+ all these arguments are casted to this common type.
+
+
 
  Thus, when more than one argument position is declared with a polymorphic
  type, the net effect is that only certain combinations of actual argument
diff --git a/src/backend/catalog/pg_aggregate.c b/src/backend/catalog/pg_aggregate.c
index cc3806e85d..d5013ec12d 100644
--- a/src/backend/catalog/pg_aggregate.c
+++ b/src/backend/catalog/pg_aggregate.c
@@ -133,7 +133,7 @@ AggregateCreate(const char *aggName,
 	hasInternalArg = false;
 	for (i = 0; i < numArgs; i++)
 	{
-		if (IsPolymorphicType(aggArgTypes[i]))
+		if (IsPolymorphicTypeAny(aggArgTypes[i]))
 			hasPolyArg = true;
 		else if (aggArgTypes[i] == INTERNALOID)
 			hasInternalArg = true;
@@ -143,7 +143,7 @@ AggregateCreate(const char *aggName,
 	 * If transtype is polymorphic, must have polymorphic argument also; else
 	 * we will have no way to deduce the actual transtype.
 	 */
-	if (IsPolymorphicType(aggTransType) && !hasPolyArg)
+	if (IsPolymorphicTypeAny(aggTransType) && !hasPolyArg)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
  errmsg("cannot determine transition data type"),
@@ -153,7 +153,7 @@ AggregateCreate(const char *aggName,
 	 * Likewise for moving-aggregate transtype, if any
 	 */
 	if (OidIsValid(aggmTransType) &&
-		IsPolymorphicType(aggmTransType) && !hasPolyArg)
+		IsPolymorphicTypeAny(aggmTransType) && !hasPolyArg)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
  errmsg("cannot determine transition data type"),
@@ -489,7 +489,7 @@ AggregateCreate(const char *aggName,
 	 * that itself violates the rule against polymorphic result with no
 	 * polymorphic input.)
 	 */
-	if (IsPolymorphicType(finaltype) && !hasPolyArg)
+	if 

Fwd: Re: BUG #15589: Due to missing wal, restore ends prematurely and opens database for read/write

2019-01-30 Thread leif
Hi
I have reported a bug via PostgreSQL bug report form, but havent got any 
response so far.
This might not be a bug, but a feature not implemented yet.
I suggest to make a small addition to StartupXLOG to solve the issue.



git diff src/backend/access/transam/xlog.c
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 2ab7d804f0..d0e5bb3f84 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7277,6 +7277,19 @@ StartupXLOG(void)
 
case RECOVERY_TARGET_ACTION_PROMOTE:
break;
+   } 
+   } else if (recoveryTarget == RECOVERY_TARGET_TIME)
+   {
+   /*
+* Stop point not reached but next WAL could 
not be read
+* Some explanation and warning should be logged
+   */
+   switch (recoveryTargetAction)
+   {
+   case RECOVERY_TARGET_ACTION_PAUSE:
+   SetRecoveryPause(true);
+   recoveryPausesHere();
+   break;
}
}





The scenario I want to solve is:
Need to restore backup to another server.
 Restores pgbasebackup files
 Restores som wal-files
 Extract pgbasebackup files
 creates recover.conf with pit
 Starts postgresql
 recover ends before pit due to missing wal-files
 database opens read/write

I think database should have paused recovery then I could restore 
additional wal-files and restart postgresql to continue with recover.

With large databases and a lot of wal-files it is time consuming to repeat 
parts of the process.

Best regards
Leif Gunnar Erlandsen



Re: insensitive collations

2019-01-30 Thread Daniel Verite
Peter Eisentraut wrote:

> Another patch. 

+ks key), in order for such such collations to act in
a

s/such such/such/

+   
+The pattern matching operators of all three kinds do not support
+nondeterministic collations.  If required, apply a different collation
to
+the expression to work around this limitation.
+   

It's an important point of comparison between CI collations and
contrib/citext, since the latter diverts a bunch of functions/operators
to make them do case-insensitive pattern matching.
The doc for citext explains the rationale for using it versus text,
maybe it would need now to be expanded a bit with pros/cons of
choosing citext versus non-deterministic collations.

The current patch doesn't alter a few string functions that could
potentially implement collation-aware string search, such as
replace(), strpos(), starts_with().
ISTM that we should not let these functions ignore the collation: they
ought to error out until we get their implementation to use the ICU
collation-aware string search.
FWIW I've been experimenting with usearch_openFromCollator() and
other usearch_* functions, and it looks doable to implement at least the
3 above functions based on that, even though the UTF16-ness of the API
does not favor us.

ICU also provides regexp matching, but not collation-aware, since
character-based patterns don't play well with the concept of collation.
About a potential collation-aware LIKE, it looks hard to implement,
since the algorithm currently used in like_match.c seems purely
character-based. AFAICS there's no way to plug calls to usearch_*
functions into it, it would need a separate redesign from scratch.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: WIP: Avoid creation of the free space map for small tables

2019-01-30 Thread John Naylor
On Wed, Jan 30, 2019 at 2:11 PM Amit Kapila  wrote:
> This is much better than the earlier version of test and there is no
> dependency on the vacuum.  However, I feel still there is some
> dependency on how the rows will fit in a page and we have seen some
> related failures due to alignment stuff.  By looking at the test, I
> can't envision any such problem, but how about if we just write some
> simple tests where we can check that the FSM won't be created for very
> small number of records say one or two and then when we increase the
> records FSM gets created, here if we want, we can even use vacuum to
> ensure FSM gets created.  Once we are sure that the main patch passes
> all the buildfarm tests, we can extend the test to something advanced
> as you are proposing now.  I think that will reduce the chances of
> failure, what do you think?

That's probably a good idea to limit risk. I just very basic tests
now, and vacuum before every relation size check to make sure any FSM
extension (whether desired or not) is invoked. Also, in my last patch
I forgot to implement explicit checks of the block number instead of
assuming how many rows will fit on a page. I've used a plpgsql code
block to do this.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/test/regress/expected/fsm.out b/src/test/regress/expected/fsm.out
index df6b15b9d2..eb2c3842e2 100644
--- a/src/test/regress/expected/fsm.out
+++ b/src/test/regress/expected/fsm.out
@@ -2,46 +2,30 @@
 -- Free Space Map test
 --
 CREATE TABLE fsm_check_size (num int, str text);
--- Fill 3 blocks with as many large records as will fit
--- No FSM
-INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a')
-FROM generate_series(1,7*3) i;
+-- With one block, there should be no FSM
+INSERT INTO fsm_check_size VALUES(1, 'a');
 VACUUM fsm_check_size;
 SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size,
 pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
  heap_size | fsm_size 
 ---+--
- 24576 |0
-(1 row)
-
--- Clear some space on block 0
-DELETE FROM fsm_check_size WHERE num <= 5;
-VACUUM fsm_check_size;
--- Insert small record in block 2 to set the cached smgr targetBlock
-INSERT INTO fsm_check_size VALUES(99, 'b');
--- Insert large record and make sure it goes in block 0 rather than
--- causing the relation to extend
-INSERT INTO fsm_check_size VALUES (101, rpad('', 1024, 'a'));
-SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size,
-pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
- heap_size | fsm_size 
+--
- 24576 |0
+  8192 |0
 (1 row)
 
 -- Extend table with enough blocks to exceed the FSM threshold
 -- FSM is created and extended to 3 blocks
-INSERT INTO fsm_check_size SELECT i, 'c' FROM generate_series(200,1200) i;
-VACUUM fsm_check_size;
-SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
- fsm_size 
---
-24576
-(1 row)
-
--- Truncate heap to 1 block
--- No change in FSM
-DELETE FROM fsm_check_size WHERE num > 7;
+DO $$
+DECLARE curtid tid;
+num int;
+BEGIN
+num = 11;
+  LOOP
+INSERT INTO fsm_check_size VALUES (num, 'b') RETURNING ctid INTO curtid;
+EXIT WHEN curtid >= tid '(4, 0)';
+num = num + 1;
+  END LOOP;
+END;
+$$;
 VACUUM fsm_check_size;
 SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
  fsm_size 
@@ -49,16 +33,6 @@ SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
 24576
 (1 row)
 
--- Truncate heap to 0 blocks
--- FSM now truncated to 2 blocks
-DELETE FROM fsm_check_size;
-VACUUM fsm_check_size;
-SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
- fsm_size 
---
-16384
-(1 row)
-
 -- Add long random string to extend TOAST table to 1 block
 INSERT INTO fsm_check_size
 VALUES(0, (SELECT string_agg(md5(chr(i)), '')
diff --git a/src/test/regress/sql/fsm.sql b/src/test/regress/sql/fsm.sql
index 07f505591a..b1debedea1 100644
--- a/src/test/regress/sql/fsm.sql
+++ b/src/test/regress/sql/fsm.sql
@@ -4,42 +4,28 @@
 
 CREATE TABLE fsm_check_size (num int, str text);
 
--- Fill 3 blocks with as many large records as will fit
--- No FSM
-INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a')
-FROM generate_series(1,7*3) i;
-VACUUM fsm_check_size;
-SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size,
-pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
+-- With one block, there should be no FSM
+INSERT INTO fsm_check_size VALUES(1, 'a');
 
--- Clear some space on block 0
-DELETE FROM fsm_check_size WHERE num <= 5;
 VACUUM fsm_check_size;
-
--- Insert small record in block 2 to set the cached smgr targetBlock
-INSERT INTO fsm_check_size VALUES(99, 'b');
-
--- Insert large record and make sure it goes in block 0 rather than
--- causing the relation to extend
-INSERT INTO fsm_check_size VALUES (101, rpad('', 1024, 'a'));
 SELECT pg_relation_size('fsm_check_size', 

Re: Unused parameters & co in code

2019-01-30 Thread Tom Lane
Michael Paquier  writes:
> ... while this generates a lot of noise for callbacks and depending on
> the environment used, this is also pointing to things which could be
> simplified:

I'd definitely take this on a case-by-case basis.  In the planner
functions you mentioned, for instance, I'd be pretty hesitant to remove
the "root" parameter even if it happens not to be needed today.
We'd probably just end up putting it back in the future, because almost
everything in the planner needs that.  I'd only consider removing it in
cases where there was a solid reason to require the function not to need
it ever (as for instance what I just did to flatten_join_alias_vars).

In cases where we can get simplifications of calling layers, and
it doesn't seem likely that we'd have to undo it in future, then
probably it's worth the trouble to change.

regards, tom lane



Re: Replication & recovery_min_apply_delay

2019-01-30 Thread Alvaro Herrera
Hi

On 2019-Jan-30, Konstantin Knizhnik wrote:

> One of our customers was faced with the following problem: he has
> setup  physical primary-slave replication but for some reasons
> specified very large (~12 hours) recovery_min_apply_delay.

We also came across this exact same problem some time ago.  It's pretty
nasty.  I wrote a quick TAP reproducer, attached (needed a quick patch
for PostgresNode itself too.)

I tried several failed strategies:
1. setting lastSourceFailed just before sleeping for apply delay, with
   the idea that for the next fetch we would try stream.  But this
   doesn't work because WaitForWalToBecomeAvailable is not executed.

2. split WaitForWalToBecomeAvailable in two pieces, so that we can call
   the first half in the restore loop.  But this causes 1s of wait
   between segments (error recovery) and we never actually catch up.

What back then I thought was the *real* solution but I didn't get around
to implementing is the idea you describe to start a walreceiver at an
earlier point.

> I wonder if it can be considered as acceptable solution of the problem or
> there can be some better approach?

I didn't find one.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 5d16db9a8308692f66b2836432fe84fbbec3e81f Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 17 Aug 2018 14:20:47 -0300
Subject: [PATCH] Support pg_basebackup -S in PostgresNode->backup()

---
 src/test/perl/PostgresNode.pm | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index d9aeb277d9..2442251683 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -488,6 +488,9 @@ Create a hot backup with B in subdirectory B of
 B<< $node->backup_dir >>, including the WAL. WAL files
 fetched at the end of the backup, not streamed.
 
+The keyword parameter replication_slot => 'myslot' can be used for the B<-S>
+argument to B.
+
 You'll have to configure a suitable B on the
 target server since it isn't done by default.
 
@@ -495,14 +498,17 @@ target server since it isn't done by default.
 
 sub backup
 {
-	my ($self, $backup_name) = @_;
+	my ($self, $backup_name, %params) = @_;
 	my $backup_path = $self->backup_dir . '/' . $backup_name;
 	my $port= $self->port;
 	my $name= $self->name;
 
+	my @cmd = ("pg_basebackup", "-D", $backup_path, "-p", $port, "--no-sync");
+	push @cmd, '-S', $params{replication_slot}
+	  if defined $params{replication_slot};
+
 	print "# Taking pg_basebackup $backup_name from node \"$name\"\n";
-	TestLib::system_or_bail('pg_basebackup', '-D', $backup_path, '-p', $port,
-		'--no-sync');
+	TestLib::system_or_bail(@cmd);
 	print "# Backup finished\n";
 }
 
-- 
2.11.0

# Test streaming with replication delay

use strict;
use warnings;

use PostgresNode;
use TestLib;
use IPC::Run;
use Test::More;

my $golf = get_new_node('golf');

my $foxtrot = get_new_node('foxtrot');
$foxtrot->init(allows_streaming => 1, has_archiving => 1);
$foxtrot->append_conf('postgresql.conf', 'log_connections=on');
$foxtrot->start();

$foxtrot->safe_psql('postgres', 'select pg_create_physical_replication_slot(\'golf\')');

$foxtrot->backup('backup', replication_slot => 'golf');

$golf->init_from_backup($foxtrot, 'backup', has_streaming => 1, has_restoring => 1);
$golf->append_conf('recovery.conf', 'recovery_min_apply_delay = 1min');
$golf->append_conf('recovery.conf', 'primary_slot_name = \'golf\'');

system("pgbench", "-is10", "-p", $foxtrot->port);

my ($stat, $slots);
note("insert lsn: ". $foxtrot->safe_psql('postgres', 'select pg_current_wal_insert_lsn()'));
note("repl slot restart: ". $foxtrot->safe_psql('postgres', 'select restart_lsn from pg_replication_slots'));

$slots = $foxtrot->safe_psql('postgres', 'select slot_name,slot_type from pg_replication_slots');
ok($slots eq "golf|physical", "replication slot looks good");

# 8 transactions should be enough to fill just over 3 segments

my $h = IPC::Run::start(['pgbench', '-R100', '-P1', '-T1', $foxtrot->connstr('postgres')]);

sleep 62;

note("insert lsn: ". $foxtrot->safe_psql('postgres', 'select pg_current_wal_insert_lsn()'));
note("repl slot restart: ". $foxtrot->safe_psql('postgres', 'select restart_lsn from pg_replication_slots'));

$golf->start;

while (1) {
	$stat = $foxtrot->safe_psql('postgres', 'select * from pg_stat_replication');
	$slots = $foxtrot->safe_psql('postgres', 'select * from pg_replication_slots');
	ok(1, "stat: $stat repl: $slots");

	if ($stat ne '') {
		done_testing();
		exit 0;
	}

	sleep(10);
}


Re: Synchronous replay take III

2019-01-30 Thread Michail Nikolaev
Hello,

Sorry, missed email.

>> In our case we have implemented some kind of "replication barrier"
functionality based on table with counters (one counter per application
backend in simple case).
>> Each application backend have dedicated connection to each replica. And
it selects its counter value few times (2-100) per second from each replica
in background process (depending on how often replication barrier is used).

> Interesting approach. Why don't you sample pg_last_wal_replay_lsn()
> on all the standbys instead, so you don't have to generate extra write
> traffic?

Replay lsn was the first approach I tried. I was sampling 'select
replay_lsn from pg_stat_replication' on master to get info about replay
position on replicas.
However, for some unknown reason I was not able to get it to work. Because
after replay_lsn was reached - standby was unable to see the data.
I know it should not happen. I spend few days on debugging... And… Since I
was required to ping replicas anyway (to check if it is a master already,
monitor ping, locks, connections, etc.) - I have decided to introduce table
for now.

>> Once application have committed transaction it may want join replication
barrier before return new data to a user. So, it increments counter in the
table and waits until all replicas have replayed that value according to
background monitoring process. Of course timeout, replicas health checks
and few optimizations and circuit breakers are used.

> I'm interested in how you handle failure (taking too long to respond
> or to see the new counter value, connectivity failure etc).
> Specifically, if the writer decides to give up on a certain standby
> (timeout, circuit breaker etc), how should a client that is connected
> directly to that standby now or soon afterwards know that this standby
> has been 'dropped' from the replication barrier and it's now at risk
> of seeing stale data?

Each standby has some health flags attached to it. Health is "red" when:
* can't connect to replica, or all connections are in use
* replica lag according to pg_last_xact_replay_timestamp is more than 3000ms
* replica lag according to pg_last_xact_replay_timestamp was more than
3000ms some time ago (1ms)
* replica is new master now
* etc.

In case of replication barrier, we are waiting only for "green" replicas
and max for 5000ms. If we still no able to see new counter value on some
replicas - it is up to client to decide how to process it. In our case, it
means replica is lagging more than 3000ms - so it is "red" now and next
client request will dispatched to another "green" replica. It is done by
special connection pool with balancer inside.
Not sure it is all 100% correct, but we could just proceed in our case.

> someone can successfully execute queries on a standby
> that hasn't applied a transaction that you know to be committed on the
> primary.

>> Nice thing here - constant number of connection involved. Even if lot of
threads joining replication barrier in the moment. Even if some replicas
are lagging.
>>
>> Because 2-5 seconds lag of some replica will lead to out of connections
issue in few milliseconds in case of implementation described in this
thread.

> Right, if a standby is lagging more than the allowed amount, in my
> patch the lease is cancelled and it will refuse to handle requests if
> the GUC is on, with a special new error code, and then it's up to the
> client to decide what to do. Probably find another node.
> In case of high loded

> Could you please elaborate? What could you do that would be better?
> If the answer is that you just want to know that you might be seeing
> stale data but for some reason you don't want to have to find a new
> node, the reader is welcome to turn synchronous_standby off and try
> again (giving up data freshness guarantees). Not sure when that would
> be useful though.

Main problem I see here - is master connection usage. We have about 10.000
RPS on master. So, small lag on some replica (we have six of them) will
lead all master connections to be waiting for replay on stale replica until
timeout. It is out of service for whole system. Even if it lagged for
200-300ms (in real world it could lag for seconds on regular basis).

If we set synchronous_replay_max_lag to 10-20ms - standbys will be
cancelled all the time.
In our case, we are using constant amount of connections involved. In
addition, client requests are waiting for standby replay
inside application backend thread without blocking master connection. This
is the main difference as I think.

> I've attached a small shell script that starts up a primary and N
> replicas with synchronous_replay configured, in the hope of
> encouraging you to try it out.

Thanks - will try and report.


Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well

2019-01-30 Thread Christoph Berg
Re: Andres Freund 2019-01-30 <20190130015127.hciz36lpmu7pr...@alap3.anarazel.de>
> I'm confused - that doesn't allow to inject flags to all in-core built
> files? So how does that fix your problem fully? Say
> e.g. llvmjit_inline.cpp won't get the flag, and thus not be
> reproducible?

The original itch being scratched here is extensions written in C++,
which need -ffile-prefix-map (or -fdebug-prefix-map) injected into
CXXFLAGS on top of the flags encoded in pgxs.mk. I originally picked
COPT because that works for injecting flags into pgxs.mk as well, but
PG_*FLAGS is better.

For the server build itself, it shouldn't be necessary to inject flags
because they are passed directly to ./configure. Admittedly I haven't
looked at reproducibility of PG11 yet as clang is still missing the
prefix-map feature, but https://reviews.llvm.org/D49466 is making good
progress.

Do we still want some CXXOPT flag for the server build? I can write a
patch, but someone else would need to do the bikeshedding how to name
it, and which of the existing knobs would set CXXFLAGS along. I don't
think I need that feature.

Christoph
-- 
Senior Berater, Tel.: +49 2166 9901 187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



strange error sequence on parallel btree creation

2019-01-30 Thread Alvaro Herrera
Hi

While trying out the progress report mechanism for btrees, I noticed
this strange chain of errors:

2019-01-29 15:51:55.928 -03 [43789] ERROR:  no se pudo crear el índice único 
«a_generate_series_idx»
2019-01-29 15:51:55.928 -03 [43789] DETALLE:  La llave (generate_series)=(152) 
está duplicada.
2019-01-29 15:51:55.928 -03 [43789] SENTENCIA:  create unique index 
concurrently on a (generate_series);
2019-01-29 15:51:55.928 -03 [44634] ERROR:  no se pudo crear el índice único 
«a_generate_series_idx»
2019-01-29 15:51:55.928 -03 [44634] DETALLE:  La llave 
(generate_series)=(31339) está duplicada.
2019-01-29 15:51:55.928 -03 [44634] SENTENCIA:  create unique index 
concurrently on a (generate_series);
2019-01-29 15:51:55.985 -03 [43670] LOG:  background worker "parallel worker" 
(PID 44634) terminó con código de salida 1


Note that those come from the same create index: the one on process
46299 must evidently be a parallel worker.  It's weird that two
processes report the index building error.  But even if it were correct,
the CONTEXT line in the other process is not okay ... precisely because
it's the parent.

What I did was

create table a as select * from generate_series(1, 100);
insert into a select * from generate_series(1, 8000);
create index on a (generate_series);

The last command used the laptop disk, excessive use of which cause the whole
thing to stall for a few dozen seconds (I think it's because of the encryption
but I'm not sure).  I then changed lc_messages to C, for pasting here, and
repeated with an external USB drive -- result: it fails cleanly (only one
ERROR).

-- 
Álvaro Herrera



Re: Unused parameters & co in code

2019-01-30 Thread Alvaro Herrera
On 2019-Jan-30, Michael Paquier wrote:

> ReindexPartitionedIndex => parentIdx

I'd not change this one, as we will surely want to do something useful
eventually.  Not that it matters, but it'd be useless churn.

> blackholeNoticeProcessor => arg

This one's signature is enforced by PQsetNoticeProcessor.

> _bt_relbuf => rel (Quite some cleanup here for the btree code)

If this is a worthwhile cleanup, let's see a patch.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-01-30 Thread Amit Kapila
On Wed, Jan 30, 2019 at 3:26 PM John Naylor  wrote:
>
> On Wed, Jan 30, 2019 at 4:33 AM Amit Kapila  wrote:
> > There are two more failures which we need to something about.
> > 1. Make fsm.sql independent of vacuum without much losing on coverage
> > of newly added code.  John, I guess you have an idea, see if you can
> > take care of it, otherwise, I will see what I can do for it.
>
> I've attached a patch that applies on top of v19 that uses Andrew
> Gierth's idea to use fillfactor to control free space. I've also
> removed tests that relied on truncation and weren't very useful to
> begin with.
>

This is much better than the earlier version of test and there is no
dependency on the vacuum.  However, I feel still there is some
dependency on how the rows will fit in a page and we have seen some
related failures due to alignment stuff.  By looking at the test, I
can't envision any such problem, but how about if we just write some
simple tests where we can check that the FSM won't be created for very
small number of records say one or two and then when we increase the
records FSM gets created, here if we want, we can even use vacuum to
ensure FSM gets created.  Once we are sure that the main patch passes
all the buildfarm tests, we can extend the test to something advanced
as you are proposing now.  I think that will reduce the chances of
failure, what do you think?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Control your disk usage in PG: Introduction to Disk Quota Extension

2019-01-30 Thread Hubert Zhang
Hi all,

Currently we add hooks in SMGR to detect which table is being modified(disk
size change).
Inserting rows into existing page with room will not introduce new block,
and thus should not be treated as active table. smgrextend is a good
position to meet this behavior.
We welcome suggestions on other better hook positions!

Besides, suppose we use smgr as hook position, I want to discuss the API
passed to the hook function.
Take smgrextend as example. The function interface of smgrextend is like
that:
```
void smgrextend
(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, char *buffer,
bool skipFsync);
```
So the hook api of smgrextend could have two main options.
Hook API Option1
```
typedef void (*smgrextend_hook_type)
(RelFileNode smgr_rnode,ForkNumber forknum);
```
Hook API Option 2
```
typedef void (*smgrextend_hook_type)
(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,char *buffer,
bool skipFsync);
```

As for Option1. Since diskquota extension only needs the relfilenode
information to detect the active tables, Option1 just pass the RelFileNode
to the hook function. It's more clear and has semantic meaning.

While Option 2 is to pass the original parameters to the hook functions
without any filter. This is more general and let the different hook
implementations to decide how to use these parameters.

Option 1 also needs some additional work to handle smgrdounlinkall case,
whose input parameter is the SMgrRelation list. We may need to palloc
Relfilenode list and pfree it manually.
smgrdounlinkall function interface:
```
smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo, char
*relstorages)
```

Based on the assumption we use smgr as hook position, hook API option1 or
option2 which is better?
Or we could find some balanced API between option1 and option2?

Again comments on other better hook positions are appreciated!

Thanks
Hubert


On Wed, Jan 30, 2019 at 10:26 AM Hubert Zhang  wrote:

> Hi Michael, Robert
> For you question about the hook position, I want to explain more about the
> background why we want to introduce these hooks.
> We wrote a diskquota extension 
> for Postgresql(which is inspired by Heikki's pg_quota
> ). Diskquota extension is used to
> control the disk usage in Postgresql in a fine-grained way, which means:
> 1. You could set disk quota limit at schema level or role level.
> 2. A background worker will gather the current disk usage for each
> schema/role in realtime.
> 3. A background worker will generate the blacklist for schema/role whose
> quota limit is exceeded.
> 4. New transaction want to insert data into the schema/role in the
> blacklist will be cancelled.
>
> In step 2, gathering the current disk usage for each schema needs to sum
> disk size of all the tables in this schema. This is a time consuming
> operation. We want to use hooks in SMGR to detect the Active Table, and
> only recalculate the disk size of all the Active Tables.
> For example, the smgrextend hook indicates that you allocate a new block
> and the table need to be treated as Active Table.
>
> Do you have some better hook positions recommend to solve the above user
> case?
> Thanks in advance.
>
> Hubert
>
>
>
>
>
> On Tue, Jan 22, 2019 at 12:08 PM Hubert Zhang  wrote:
>
>> > For this particular purpose, I don't immediately see why you need a
>>> > hook in both places.  If ReadBuffer is called with P_NEW, aren't we
>>> > guaranteed to end up in smgrextend()?
>>> Yes, that's a bit awkward.
>>
>>
>>  Hi Michael, we revisit the ReadBuffer hook and remove it in the latest
>> patch.
>> ReadBuffer hook is original used to do enforcement(e.g. out of diskquota
>> limit) when query is loading data.
>> We plan to put the enforcement work of running query to separate
>> diskquota worker process.
>> Let worker process to detect the backends to be cancelled and send SIGINT
>> to these backends.
>> So there is no need for ReadBuffer hook anymore.
>>
>> Our patch currently only contains smgr related hooks to catch the file
>> change and get the Active Table list for diskquota extension.
>>
>> Thanks Hubert.
>>
>>
>> On Mon, Jan 7, 2019 at 6:56 PM Haozhou Wang  wrote:
>>
>>> Thanks very much for your comments.
>>>
>>> To the best of my knowledge, smgr is a layer that abstract the storage
>>> operations. Therefore, it is a good place to control or collect information
>>> the storage operations without touching the physical storage layer.
>>> Moreover, smgr is coming with actual disk IO operation (not consider the
>>> OS cache) for postgres. So we do not need to worry about the buffer
>>> management in postgres.
>>> It will make the purpose of hook is pure: a hook for actual disk IO.
>>>
>>> Regards,
>>> Haozhou
>>>
>>> On Wed, Dec 26, 2018 at 1:56 PM Michael Paquier 
>>> wrote:
>>>
 On Wed, Nov 21, 2018 at 09:47:44AM -0500, Robert Haas wrote:
 > +1 for adding some hooks to support this kind of 

Replication & recovery_min_apply_delay

2019-01-30 Thread Konstantin Knizhnik

Hi hackers,

One of our customers was faced with the following problem:
he has setup  physical primary-slave replication but for some reasons 
specified very large (~12 hours)
recovery_min_apply_delay. I do not know precise reasons for such large 
gap between master and replica.
But everything works normally until replica is restarted. Then it starts 
to apply WAL, comes to the point where record timestamp is less then 12 
hours older
and ... suspends recovery. No WAL receiver is launched and so nobody is 
fetching changes from master.
It may cause master's WAL space overflow (if there is replication slot) 
and loose of data in case of master crash.


Looks like the right behavior is to be able launch WAL receiver before 
replica reaches end of WAL.

For example, we can launch it before going to sleep in recoveryApplyDelay.
We need to specify start LSN for WAL sender. I didn't find better 
solution except iterating WAL until I reach the last valid record.


I attach small patch which implements this approach.
I wonder if it can be considered as acceptable solution of the problem 
or there can be some better approach?


--
Konstantin Knizhnik
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2ab7d80..ef6433f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -802,6 +802,7 @@ static XLogSource readSource = 0;	/* XLOG_FROM_* code */
  */
 static XLogSource currentSource = 0;	/* XLOG_FROM_* code */
 static bool lastSourceFailed = false;
+static bool stopOnError = false;
 
 typedef struct XLogPageReadPrivate
 {
@@ -3971,6 +3972,49 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr RedoRecPtr, XLogRecPtr endptr)
 }
 
 /*
+ * Find latest WAL LSN
+ */
+static XLogRecPtr
+GetLastLSN(XLogRecPtr lsn)
+{
+	XLogReaderState *xlogreader;
+	char	   *errormsg;
+	XLogPageReadPrivate private;
+	MemSet(, 0, sizeof(XLogPageReadPrivate));
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, , );
+
+	stopOnError = true;
+	while (XLogReadRecord(xlogreader, lsn, ) != NULL)
+	{
+		lsn = InvalidXLogRecPtr;
+	}
+	stopOnError = false;
+	lsn = xlogreader->EndRecPtr;
+	XLogReaderFree(xlogreader);
+
+	return lsn;
+}
+
+/*
+ * Launch WalReceiver starting from last LSN if not started yet.
+ */
+static void
+StartWalRcv(XLogRecPtr currLsn)
+{
+	if (!WalRcvStreaming() && PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0)
+	{
+		XLogRecPtr lastLSN = GetLastLSN(currLsn);
+		if (lastLSN != InvalidXLogRecPtr)
+		{
+			curFileTLI = ThisTimeLineID;
+			RequestXLogStreaming(ThisTimeLineID, lastLSN, PrimaryConnInfo,
+ PrimarySlotName);
+		}
+	}
+}
+
+/*
  * Remove WAL files that are not part of the given timeline's history.
  *
  * This is called during recovery, whenever we switch to follow a new
@@ -6004,6 +6048,12 @@ recoveryApplyDelay(XLogReaderState *record)
 	if (secs <= 0 && microsecs <= 0)
 		return false;
 
+	/*
+	 * Start WAL receiver if not started yet, to avoid WALs overflow at primary node
+	 * or large gap between primary and replica when apply delay is specified.
+	 */
+	StartWalRcv(record->EndRecPtr);
+
 	while (true)
 	{
 		ResetLatch(>recoveryWakeupLatch);
@@ -11821,6 +11871,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 		return false;
 
 	/*
+	 * If WAL receiver was altery started because of apply delay,
+	 * thre restart it.
+	 */
+	if (WalRcvStreaming())
+		ShutdownWalRcv();
+
+/*
 	 * If primary_conninfo is set, launch walreceiver to try
 	 * to stream the missing WAL.
 	 *
@@ -11990,6 +12047,9 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 if (readFile >= 0)
 	return true;	/* success! */
 
+if (stopOnError)
+	return false;
+
 /*
  * Nope, not found in archive or pg_wal.
  */


Re: Unused parameters & co in code

2019-01-30 Thread Peter Eisentraut
On 30/01/2019 08:33, Michael Paquier wrote:
> I just got a run with CFLAGS with the following options:
> -Wunused-function -Wunused-variable -Wunused-const-variable

These are part of -Wall.

>  -Wno-unused-result

What is the purpose of this?

> -Wunused-parameter

I think it's worth cleaning this up a bit, but it needs to be done by
hand, unless you want to add a large number of pg_attribute_unused()
decorations.

> -Wunused-macros

I have looked into that in the past.  There are a few that were
genuinely left behind accidentally, but most are there for completeness
or symmetry and don't look like they should be removed.  Also you get
junk from flex and perl and the like.  Needs to be addressed case by
case.  I can dig up my old branch and make some proposals.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-01-30 Thread John Naylor
On Wed, Jan 30, 2019 at 4:33 AM Amit Kapila  wrote:
> There are two more failures which we need to something about.
> 1. Make fsm.sql independent of vacuum without much losing on coverage
> of newly added code.  John, I guess you have an idea, see if you can
> take care of it, otherwise, I will see what I can do for it.

I've attached a patch that applies on top of v19 that uses Andrew
Gierth's idea to use fillfactor to control free space. I've also
removed tests that relied on truncation and weren't very useful to
begin with.

--
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/test/regress/expected/fsm.out b/src/test/regress/expected/fsm.out
index df6b15b9d2..f92d454042 100644
--- a/src/test/regress/expected/fsm.out
+++ b/src/test/regress/expected/fsm.out
@@ -2,11 +2,10 @@
 -- Free Space Map test
 --
 CREATE TABLE fsm_check_size (num int, str text);
--- Fill 3 blocks with as many large records as will fit
--- No FSM
+-- Fill 3 blocks with one record each
+ALTER TABLE fsm_check_size SET (fillfactor=15);
 INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a')
-FROM generate_series(1,7*3) i;
-VACUUM fsm_check_size;
+FROM generate_series(1,3) i;
 SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size,
 pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
  heap_size | fsm_size 
@@ -14,11 +13,12 @@ pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
  24576 |0
 (1 row)
 
--- Clear some space on block 0
-DELETE FROM fsm_check_size WHERE num <= 5;
-VACUUM fsm_check_size;
--- Insert small record in block 2 to set the cached smgr targetBlock
-INSERT INTO fsm_check_size VALUES(99, 'b');
+-- Fill most of the last block
+ALTER TABLE fsm_check_size SET (fillfactor=100);
+INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a')
+FROM generate_series(11,15) i;
+-- Make sure records can go into any block but the last one
+ALTER TABLE fsm_check_size SET (fillfactor=30);
 -- Insert large record and make sure it goes in block 0 rather than
 -- causing the relation to extend
 INSERT INTO fsm_check_size VALUES (101, rpad('', 1024, 'a'));
@@ -32,38 +32,16 @@ pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
 -- Extend table with enough blocks to exceed the FSM threshold
 -- FSM is created and extended to 3 blocks
 INSERT INTO fsm_check_size SELECT i, 'c' FROM generate_series(200,1200) i;
-VACUUM fsm_check_size;
-SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
- fsm_size 
---
-24576
-(1 row)
-
--- Truncate heap to 1 block
--- No change in FSM
-DELETE FROM fsm_check_size WHERE num > 7;
-VACUUM fsm_check_size;
 SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
  fsm_size 
 --
 24576
 (1 row)
 
--- Truncate heap to 0 blocks
--- FSM now truncated to 2 blocks
-DELETE FROM fsm_check_size;
-VACUUM fsm_check_size;
-SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
- fsm_size 
---
-16384
-(1 row)
-
 -- Add long random string to extend TOAST table to 1 block
 INSERT INTO fsm_check_size
 VALUES(0, (SELECT string_agg(md5(chr(i)), '')
 		   FROM generate_series(1,100) i));
-VACUUM fsm_check_size;
 SELECT pg_relation_size(reltoastrelid, 'main') AS toast_size,
 pg_relation_size(reltoastrelid, 'fsm') AS toast_fsm_size
 FROM pg_class WHERE relname = 'fsm_check_size';
diff --git a/src/test/regress/sql/fsm.sql b/src/test/regress/sql/fsm.sql
index 07f505591a..542bd6e203 100644
--- a/src/test/regress/sql/fsm.sql
+++ b/src/test/regress/sql/fsm.sql
@@ -4,20 +4,21 @@
 
 CREATE TABLE fsm_check_size (num int, str text);
 
--- Fill 3 blocks with as many large records as will fit
--- No FSM
+-- Fill 3 blocks with one record each
+ALTER TABLE fsm_check_size SET (fillfactor=15);
 INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a')
-FROM generate_series(1,7*3) i;
-VACUUM fsm_check_size;
+FROM generate_series(1,3) i;
+
 SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size,
 pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
 
--- Clear some space on block 0
-DELETE FROM fsm_check_size WHERE num <= 5;
-VACUUM fsm_check_size;
+-- Fill most of the last block
+ALTER TABLE fsm_check_size SET (fillfactor=100);
+INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a')
+FROM generate_series(11,15) i;
 
--- Insert small record in block 2 to set the cached smgr targetBlock
-INSERT INTO fsm_check_size VALUES(99, 'b');
+-- Make sure records can go into any block but the last one
+ALTER TABLE fsm_check_size SET (fillfactor=30);
 
 -- Insert large record and make sure it goes in block 0 rather than
 -- causing the relation to extend
@@ -28,26 +29,12 @@ pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
 -- Extend table with enough blocks to exceed the FSM threshold
 -- FSM is created and extended to 3 blocks
 INSERT INTO fsm_check_size SELECT i, 'c' FROM generate_series(200,1200) i;
-VACUUM fsm_check_size;
-SELECT pg_relation_size('fsm_check_size', 

Re: jsonpath

2019-01-30 Thread Alexander Korotkov
On Wed, Jan 30, 2019 at 9:36 AM Andres Freund  wrote:
> On 2019-01-30 07:34:00 +0300, Alexander Korotkov wrote:
> > Thank you for your review!  Let me answer some points of your review
> > while others will be answered later.
> >
> > On Wed, Jan 30, 2019 at 5:28 AM Andres Freund 
wrote:
> > > On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote:
> > > >
+/*INPUT/OUTPUT*/
> > >
> > > Why do we need this much code to serialize data that we initially got
in
> > > serialized manner? Couldn't we just keep the original around?
> >
> > As I get, you concern related to fact that we have jsonpath in text
> > form (serialized) and convert it to the binary form (also serialized).
> > Yes, we should do so.  Otherwise, we would have to parse jsonpath for
> > each evaluation.  Basically, for the same reason we have binary
> > representation for jsonb.
>
> But why can't we keep the text around, instead of having all of that
> code for converting back?

Yeah, that's possible.  But now converting back to string is one of ways to
test that jsonpath parsing is correct.  If we remove conversion back to
string, this possibility would be also removed.  Also, we would loose way
to normalize jsonpath, which is probably not that important.  As well it's
generally ugly redundancy.  So, I can't say I like idea to save few
hundreds lines of codes for this price.

> > > > +++ b/src/backend/utils/adt/jsonpath_exec.c
> > >
> > > Gotta say, I'm unenthusiastic about yet another execution engine in
some
> > > PG subsystem.
> >
> > Better ideas?  I can imagine we can evade introduction of jsonpath
> > datatype and turn jsonpath into executor nodes.  But I hardly can
> > imagine you would be more enthusiastic about that :)
>
> Not executor nodes, I think it could be possible to put it into the
> expression evaluation codepaths, but it's probably too different to fit
> well (would get you JIT compilation of the whole thing tho).

Consider given example. We need to check some predicate for each JSON item,
where predicate is given by expression and set of items is produced by
another expression.  In order to fit this into expression evaluation, we
probably need some kind of lamda functions there.  It seems unlikely for me
that we want to implement that.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Protect syscache from bloating with negative cache entries

2019-01-30 Thread Kyotaro HORIGUCHI
At Wed, 30 Jan 2019 05:06:30 +, "Ideriha, Takeshi" 
 wrote in 
<4E72940DA2BF16479384A86D54D0988A6F4156D4@G01JPEXMBKW04>
> >You don't have a direct control on syscache memory usage. When you find a 
> >queriy
> >slowed by the default cache expiration, you can set cache_memory_taret to 
> >keep
> >them for intermittent execution of a query, or you can increase
> >syscache_prune_min_age to allow cache live for a longer time.
> 
> In current ver8 patch there is a stats view representing age class 
> distribution.
> https://www.postgresql.org/message-id/20181019.173457.68080786.horiguchi.kyotaro%40lab.ntt.co.jp
> Does it help DBA with tuning cache_prune_age and/or cache_prune_target?

Definitely. At least DBA can see nothing about cache usage.

> If the amount of cache entries of older age class is large, are people 
> supposed to lower prune_age and 
> not to change cache_prune_target?
> (I get confusion a little bit.)

This feature just removes cache entries that have not accessed
for a certain time.

If older entries occupies the major portion, it means that
syscache is used effectively (in other words most of the entries
are accessed frequently enough.) And in that case I believe
syscache doesn't put pressure to memory usage. If the total
memory usage exceeds expectations in the case, reducing pruning
age may reduce it but not necessarily. Extremely short pruning
age will work in exchange for performance degradation.

If newer entries occupies the major portion, it means that
syscache may not be used effectively. The total amount of memory
usage will be limited by puruning feature so tuning won't be
needed.

In both cases, if pruning causes slowdown of intermittent large
queries, cache_memory_target will alleviate the slowdown.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [BDR] Question on compatibility of Postgres with Open JDK

2019-01-30 Thread Craig Ringer
On Wed, 30 Jan 2019 at 15:28, Pratheej Chandrika <
pratheej.chandr...@motorolasolutions.com> wrote:

> Criag,
> Thanks a lot for the prompt reply.
>
> From what you mentioned, I assume that other the client side jdbc  there
> is no Java dependency from Server side. What I mean is I assume there is no
> jdk/java dependency for Postgres server to work. Request your input.
>

Correct.

Please see the PostgreSQL and PgJDBC manuals for more details.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise