date:20190919

Re: dropdb --force

2019-09-19 Thread Dilip Kumar

On Thu, Sep 19, 2019 at 12:14 AM Pavel Stehule  wrote:
>
> Hi
>
> I am sending updated version - the changes against last patch are two. I use 
> pg_terminate_backed for killing other terminates like Tom proposed. I am not 
> sure if it is 100% practical - on second hand the necessary right to kill 
> other sessions is  almost available - and consistency with 
> pg_terminal_backend has sense. More - native implementation is significantly 
> better then solution implemented outer. I fixed docs little bit - the timeout 
> for logout (as reaction on SIGTERM is only 5 sec).
>

Few comments on the patch.

@@ -598,10 +598,20 @@ standard_ProcessUtility(PlannedStmt *pstmt,
  case T_DropdbStmt:
  {
  DropdbStmt *stmt = (DropdbStmt *) parsetree;
+ bool force = false;
+ ListCell   *lc;
+
+ foreach(lc, stmt->options)
+ {
+ DefElem*item = (DefElem *) lfirst(lc);
+
+ if (strcmp(item->defname, "force") == 0)
+ force = true;
+ }

  /* no event triggers for global objects */
  PreventInTransactionBlock(isTopLevel, "DROP DATABASE");
- dropdb(stmt->dbname, stmt->missing_ok);
+ dropdb(stmt->dbname, stmt->missing_ok, force);

If you see the createdb, then options are processed inside the
createdb function but here we are processing outside
the function.  Wouldn't it be good to keep them simmilar and also it
will be expandable in case we add more options
in the future?


initPQExpBuffer();

- appendPQExpBuffer(, "DROP DATABASE %s%s;",
-   (if_exists ? "IF EXISTS " : ""), fmtId(dbname));
-
  /* Avoid trying to drop postgres db while we are connected to it. */
  if (maintenance_db == NULL && strcmp(dbname, "postgres") == 0)
  maintenance_db = "template1";
@@ -134,6 +136,10 @@ main(int argc, char *argv[])
host, port, username, prompt_password,
progname, echo);
+ appendPQExpBuffer(, "DROP DATABASE %s%s%s;",
+   (force ? " (FORCE) " : ""),
+   (if_exists ? "IF EXISTS " : ""), fmtId(dbname));
+

I did not understand why you have moved this code after
connectMaintenanceDatabase function?  I don't see any problem but in
earlier code
initPQExpBuffer and appendPQExpBuffer were together now you have moved
appendPQExpBuffer after the connectMaintenanceDatabase so if there is
no reason to do that then better to keep it how it was earlier.


-extern bool CountOtherDBBackends(Oid databaseId,
- int *nbackends, int *nprepared);
+extern bool CountOtherDBBackends(Oid databaseId, int *nbackends, int
*nprepared);

Unrelated change

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Usage of the system truststore for SSL certificate validation

2019-09-19 Thread Ashutosh Sharma

This certainly looks like a good addition to me that can be
implemented on both client and server side. It is always good to have
a common location where the list of all the certificates from various
CA's can be placed for validation.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Thu, Sep 19, 2019 at 8:24 PM Thomas Berger  wrote:
>
> Hi,
>
> currently, libpq does SSL cerificate validation only against the defined
> `PGSSLROOTCERT` file.
>
> Is there any specific reason, why the system truststore ( at least under
> unixoid systems) is not considered for the validation?
>
> We would like to contribute a patch to allow certificate validation against
> the system truststore. Are there any opinions against it?
>
>
> A little bit background for this:
>
> Internally we sign the certificates for our systems with our own CA. The CA
> root certificates and revocation lists are distributed via puppet and/or
> packages on all of our internal systems.
>
> Validating the certificate against this CA requires to either override the
> PGSSLROOTCERT location via the environment or provide a copy of the file for
> each user that connects with libpq or libpq-like connectors.
>
> We would like to simplify this.
>
>
> --
> Thomas Berger
>
> PostgreSQL DBA
> Database Operations
>
> 1&1 Telecommunication SE | Ernst-Frey-Straße 10 | 76135 Karlsruhe | Germany
>
>

Re: pgbench - allow to create partitioned tables

2019-09-19 Thread Fabien COELHO




The behavior is not actually changed, but I had to move fillfactor away
because it cannot be declared on partitioned tables, it must be declared
on partitions only. Once there is a function to handle that it is pretty
easy to add the test.

I can remove it but franckly there are only benefits: the default is now
tested by pgbench, the create query is smaller, and it would work with
older versions of pg, which does not matter but is good on principle.


I am not saying that it is a bad check on its own, rather it might be
good, but let's not do any unrelated change as that will delay the
main patch.  Once, we are done with the main patch, you can propose
these as improvements.


I would not bother to create a patch for so small an improvement. This 
makes sense in passing because the created function makes it very easy, 
but otherwise I'll just drop it.



The user could do a -i with a version of pgbench and bench with another
one. I do that often while developing…


I am not following what you want to say here especially ("pgbench and
bench with another one").  Can you explain with some example?


While developing, I often run pgbench under development client against an 
already created set of tables on an already created cluster, and usually 
the server side on my laptop is the last major release from pgdg (ie 11.5) 
while the pgbench I'm testing is from sources (ie 12dev). If I type 
"pgbench" I run 11.5, and in the sources "./pgbench" runs the dev version.


--
Fabien.

Re: pgbench - allow to create partitioned tables

2019-09-19 Thread Fabien COELHO




v12:
  - fixes NULL vs NULL
  - works correctly with a partitioned table without partitions attached
  - generates an error if the partition method is unknown
  - adds an assert


You seem to have attached some previous version (v2) of this patch.  I
could see old issues in the patch which we have sorted out in the
review.


Indeed. This is a change from forgetting the attachement.

Here is v12. Hopefully.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index c857aa3cba..e3a0abb4c7 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -306,6 +306,31 @@ pgbench  options  d
   
  
 
+ 
+  --partitions=NUM
+  
+   
+Create a partitioned pgbench_accounts table with
+NUM partitions of nearly equal size for
+the scaled number of accounts.
+Default is 0, meaning no partitioning.
+   
+  
+ 
+
+ 
+  --partition-method=NAME
+  
+   
+Create a partitioned pgbench_accounts table with
+NAME method.
+Expected values are range or hash.
+This option requires that --partitions is set to non-zero.
+If unspecified, default is range.
+   
+  
+ 
+
  
   --tablespace=tablespace
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ed7652bfbf..c17df93dbe 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -186,6 +186,15 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
+/* partitioning for pgbench_accounts table, 0 for no partitioning, -1 for bad */
+static int 		partitions = 0;
+
+typedef enum { PART_NONE, PART_RANGE, PART_HASH }
+  partition_method_t;
+
+static partition_method_t partition_method = PART_NONE;
+static const char *PARTITION_METHOD[] = { "none", "range", "hash" };
+
 /* random seed used to initialize base_random_sequence */
 int64		random_seed = -1;
 
@@ -617,6 +626,9 @@ usage(void)
 		   "  --foreign-keys   create foreign key constraints between tables\n"
 		   "  --index-tablespace=TABLESPACE\n"
 		   "   create indexes in the specified tablespace\n"
+		   "  --partitions=NUM partition pgbench_accounts in NUM parts (default: 0)\n"
+		   "  --partition-method=(range|hash)\n"
+		   "   partition pgbench_accounts with this method (default: range)\n"
 		   "  --tablespace=TABLESPACE  create tables in the specified tablespace\n"
 		   "  --unlogged-tablescreate tables as unlogged tables\n"
 		   "\nOptions to select what to run:\n"
@@ -3601,6 +3613,17 @@ initDropTables(PGconn *con)
 	 "pgbench_tellers");
 }
 
+/*
+ * add fillfactor percent option if not 100.
+ */
+static void
+append_fillfactor(char *opts, int len)
+{
+	if (fillfactor < 100)
+		snprintf(opts + strlen(opts), len - strlen(opts),
+ " with (fillfactor=%d)", fillfactor);
+}
+
 /*
  * Create pgbench's standard tables
  */
@@ -3664,9 +3687,15 @@ initCreateTables(PGconn *con)
 
 		/* Construct new create table statement. */
 		opts[0] = '\0';
-		if (ddl->declare_fillfactor)
+
+		/* Partition pgbench_accounts table */
+		if (partitions >= 1 && strcmp(ddl->table, "pgbench_accounts") == 0)
 			snprintf(opts + strlen(opts), sizeof(opts) - strlen(opts),
-	 " with (fillfactor=%d)", fillfactor);
+	 " partition by %s (aid)", PARTITION_METHOD[partition_method]);
+		else if (ddl->declare_fillfactor)
+			/* fillfactor is only expected on actual tables */
+			append_fillfactor(opts, sizeof(opts));
+
 		if (tablespace != NULL)
 		{
 			char	   *escape_tablespace;
@@ -3686,6 +3715,57 @@ initCreateTables(PGconn *con)
 
 		executeStatement(con, buffer);
 	}
+
+	/* if needed, pgbench_accounts partitions must be created manually */
+	if (partitions >= 1)
+	{
+		char		ff[64];
+
+		ff[0] = '\0';
+		append_fillfactor(ff, sizeof(ff));
+
+		fprintf(stderr, "creating %d partitions...\n", partitions);
+
+		for (int p = 1; p <= partitions; p++)
+		{
+			char		query[256];
+
+			if (partition_method == PART_RANGE)
+			{
+int64		part_size = (naccounts * (int64) scale + partitions - 1) / partitions;
+char		minvalue[32], maxvalue[32];
+
+/* For RANGE, we use open-ended partitions at the beginning and end */
+if (p == 1)
+	sprintf(minvalue, "minvalue");
+else
+	sprintf(minvalue, INT64_FORMAT, (p - 1) * part_size + 1);
+
+if (p < partitions)
+	sprintf(maxvalue, INT64_FORMAT, p * part_size + 1);
+else
+	sprintf(maxvalue, "maxvalue");
+
+snprintf(query, sizeof(query),
+		 "create%s table pgbench_accounts_%d\n"
+		 "  partition of pgbench_accounts\n"
+		 "  for values from (%s) to (%s)%s\n",
+		 unlogged_tables ? " unlogged" : "", p,
+		 minvalue, maxvalue, ff);
+			}
+			else if (partition_method == PART_HASH)
+snprintf(query, sizeof(query),
+		 "create%s table pgbench_accounts_%d\n"
+		 "  partition of pgbench_accounts\n"

Re: A problem about partitionwise join

2019-09-19 Thread Dilip Kumar

On Thu, Aug 29, 2019 at 3:15 PM Richard Guo  wrote:
>
>
> Attached is a patch as an attempt to address this issue. The idea is
> quite straightforward. When building partition info for joinrel, we
> generate any possible EC-derived joinclauses of form 'outer_em =
> inner_em', which will be used together with the original restrictlist to
> check if there exists an equi-join condition for each pair of partition
> keys.
>
> Any comments are welcome!
 /*
+ * generate_join_implied_equalities_for_all
+ *   Create any EC-derived joinclauses of form 'outer_em = inner_em'.
+ *
+ * This is used when building partition info for joinrel.
+ */
+List *
+generate_join_implied_equalities_for_all(PlannerInfo *root,
+ Relids join_relids,
+ Relids outer_relids,
+ Relids inner_relids)

I think we need to have more detailed comments about why we need this
separate function, we can also explain that
generate_join_implied_equalities function will avoid
the join clause if EC has the constant but for partition-wise join, we
need that clause too.

+ while ((i = bms_next_member(matching_ecs, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+ List*outer_members = NIL;
+ List*inner_members = NIL;
+ ListCell   *lc1;
+
+ /* Do not consider this EC if it's ec_broken */
+ if (ec->ec_broken)
+ continue;
+
+ /* Single-member ECs won't generate any deductions */
+ if (list_length(ec->ec_members) <= 1)
+ continue;
+

I am wondering isn't it possible to just process the missing join
clause?  I mean 'generate_join_implied_equalities' has only skipped
the ECs which has const so
can't we create join clause only for those ECs and append it the
"Restrictlist" we already have?  I might be missing something?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Add "password_protocol" connection parameter to libpq

2019-09-19 Thread Michael Paquier

On Thu, Sep 19, 2019 at 05:40:15PM -0700, Jeff Davis wrote:
> On Tue, 2019-09-17 at 16:04 +0900, Michael Paquier wrote:
>> What do you think?  There is no need to save down the connection
>> parameter value into fe_scram_state.
> 
> I'm not sure I understand this comment, but I removed the extra boolean
> flags.

Thanks for considering it.  I was just asking about removing those
flags and your thoughts about my thoughts from upthread.

>> is required".  Or you have in mind that this error message is better?
> 
> I felt it would be a more useful error message.

Okay, fine by me.

>> I think that pgindent would complain with the comment block in
>> check_expected_areq(). 
> 
> Changed.

A trick to make pgindent to ignore some comment blocks is to use
/*- at its top and bottom, FWIW.

+$ENV{PGUSER} = "ssltestuser";
 $ENV{PGPASSWORD} = "pass";
test_connect_ok() can use a complementary string, so I would use that
in the SSL test part instead of relying too much on the environment
for readability, particularly for the last test added with md5testuser.
Using the environment variable in src/test/authentication/ makes
sense.  Maybe that's just a matter of taste :)

+   return (state != NULL && state->state == FE_SCRAM_FINISHED &&
+   strcmp(state->sasl_mechanism, SCRAM_SHA_256_PLUS_NAME) == 0);
I think that we should document in the code why those reasons are
chosen.

I would also add a test for an invalid value of channel_binding.

A comment update is forgotten in libpq-int.h.

+# using the password authentication method; channel binding can't
work
+reset_pg_hba($node, 'password');
+$ENV{"PGCHANNELBINDING"} = 'require';
+test_login($node, 'saslpreptest4a_role', "a", 2);
+
+# SSL not in use; channel binding still can't work
+reset_pg_hba($node, 'scram-sha-256');
+$ENV{"PGCHANNELBINDING"} = 'require';
+test_login($node, 'saslpreptest4a_role', "a", 2);
Those two tests are in the test suite dedicated to SASLprep.  I think
that it would be more consistent to just move them to
001_password.pl.  And this does not impact the error coverage.

Missing some indentation in the perl scripts (per pgperltidy).

Those are mainly nits, and attached are the changes I would do to your
patch.  Please feel free to consider those changes as you see fit.
Anyway, the base logic looks good to me, so I am switching the patch
as ready for committer.
--
Michael
diff --git a/src/interfaces/libpq/fe-auth-scram.c b/src/interfaces/libpq/fe-auth-scram.c
index 3b40c4b083..61f9512544 100644
--- a/src/interfaces/libpq/fe-auth-scram.c
+++ b/src/interfaces/libpq/fe-auth-scram.c
@@ -127,8 +127,20 @@ pg_fe_scram_channel_bound(void *opaq)
 {
 	fe_scram_state *state = (fe_scram_state *) opaq;
 
-	return (state != NULL && state->state == FE_SCRAM_FINISHED &&
-			strcmp(state->sasl_mechanism, SCRAM_SHA_256_PLUS_NAME) == 0);
+	/* no SCRAM exchange done */
+	if (state == NULL)
+		return false;
+
+	/* SCRAM exchange not completed */
+	if (state->state != FE_SCRAM_FINISHED)
+		return false;
+
+	/* channel binding mechanism not used */
+	if (strcmp(state->sasl_mechanism, SCRAM_SHA_256_PLUS_NAME) != 0)
+		return false;
+
+	/* all clear! */
+	return true;
 }
 
 /*
@@ -368,7 +380,8 @@ build_client_first_message(fe_scram_state *state)
 		appendPQExpBufferStr(, "p=tls-server-end-point");
 	}
 #ifdef HAVE_PGTLS_GET_PEER_CERTIFICATE_HASH
-	else if (conn->ssl_in_use && conn->channel_binding[0] != 'd')
+	else if (conn->channel_binding[0] != 'd' && /* disable */
+			 conn->ssl_in_use)
 	{
 		/*
 		 * Client supports channel binding, but thinks the server does not.
@@ -508,7 +521,8 @@ build_client_final_message(fe_scram_state *state)
 #endif			/* HAVE_PGTLS_GET_PEER_CERTIFICATE_HASH */
 	}
 #ifdef HAVE_PGTLS_GET_PEER_CERTIFICATE_HASH
-	else if (conn->ssl_in_use && conn->channel_binding[0] != 'd')
+	else if (conn->channel_binding[0] != 'd' && /* disable */
+			 conn->ssl_in_use)
 		appendPQExpBufferStr(, "c=eSws");	/* base64 of "y,," */
 #endif
 	else
diff --git a/src/interfaces/libpq/fe-auth.c b/src/interfaces/libpq/fe-auth.c
index 0ef8fa89d0..0a70e0ea4a 100644
--- a/src/interfaces/libpq/fe-auth.c
+++ b/src/interfaces/libpq/fe-auth.c
@@ -423,7 +423,7 @@ pg_SASL_init(PGconn *conn, int payloadlen)
 
 	initPQExpBuffer(_buf);
 
-	if (conn->channel_binding[0] == 'r' && /* require */
+	if (conn->channel_binding[0] == 'r' &&	/* require */
 		!conn->ssl_in_use)
 	{
 		printfPQExpBuffer(>errorMessage,
@@ -462,8 +462,8 @@ pg_SASL_init(PGconn *conn, int payloadlen)
 
 		/*
 		 * Select the mechanism to use.  Pick SCRAM-SHA-256-PLUS over anything
-		 * else if a channel binding type is set and if the client supports
-		 * it (and did not set channel_binding=disable). Pick SCRAM-SHA-256 if
+		 * else if a channel binding type is set and if the client supports it
+		 * (and did not set channel_binding=disable). Pick SCRAM-SHA-256 if
 		 * nothing else has already been picked.  If we add more mechanisms, a
 		 * more refined priority mechanism might become necessary.

Re: psql - add SHOW_ALL_RESULTS option

2019-09-19 Thread vignesh C

On Fri, Sep 13, 2019 at 1:01 AM Alvaro Herrera  wrote:
>
> This v6 is just Fabien's v5, rebased over a very minor conflict, and
> pgindented.  No further changes.  I've marked this Ready for Committer.
>
Should we add function header for the below function to maintain the
common standard of this file:
+
+static void
+AppendNoticeMessage(void *arg, const char *msg)
+{
+ t_notice_messages *notes = (t_notice_messages *) arg;
+
+ appendPQExpBufferStr(notes->in_flip ? >flip : >flop, msg);
+}
+
+static void
+ShowNoticeMessage(t_notice_messages *notes)
+{
+ PQExpBufferData *current = notes->in_flip ? >flip : >flop;
+
+ if (current->data != NULL && *current->data != '\0')
+ pg_log_info("%s", current->data);
+ resetPQExpBuffer(current);
+}
+
+/*
+ * SendQueryAndProcessResults: utility function for use by SendQuery() only
+ *

+static void
+ShowErrorMessage(const PGresult *result)
+{
+ const char *error = PQerrorMessage(pset.db);
+
+ if (strlen(error))
+ pg_log_info("%s", error);
+
+ CheckConnection();
+}

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: pgbench - allow to create partitioned tables

2019-09-19 Thread Amit Kapila

On Fri, Sep 20, 2019 at 12:41 AM Fabien COELHO  wrote:
>
>
> Hello Amit,
>
> > [...] 'ps' itself won't be NULL in that case, the value it contains is
> > NULL. I have debugged this case as well.  'ps' itself can be NULL only
> > when you pass wrong column number or something like that to PQgetvalue.
>
> Argh, you are right! I mixed up C NULL and SQL NULL:-(
>
> >>> If we don't find pgbench_accounts, let's give error here itself rather
> >>> than later unless you have a valid case in mind.
> >>
> >> I thought of it, but decided not to: Someone could add a builtin script
> >> which does not use pgbench_accounts, or a parallel running script could
> >> create a table dynamically, whatever, so I prefer the error to be raised
> >> by the script itself, rather than deciding that it will fail before even
> >> trying.
> >
> > I think this is not a possibility today and I don't know of the
> > future.  I don't think it is a good idea to add code which we can't
> > reach today.  You can probably add Assert if required.
>
> I added a fail on an unexpected partition method, i.e. not 'r' or 'h',
> and an Assert of PQgetvalue returns NULL.
>
> I fixed the query so that it counts actual partitions, otherwise I was
> getting one for a partitioned table without partitions attached, which
> does not generate an error by the way. I just figured out that pgbench
> does not check that UPDATE updates anything. Hmmm.
>
> > Hmm, why you need to change the fill factor behavior?  If it is not
> > specifically required for the functionality of this patch, then I
> > suggest keeping that behavior as it is.
>
> The behavior is not actually changed, but I had to move fillfactor away
> because it cannot be declared on partitioned tables, it must be declared
> on partitions only. Once there is a function to handle that it is pretty
> easy to add the test.
>
> I can remove it but franckly there are only benefits: the default is now
> tested by pgbench, the create query is smaller, and it would work with
> older versions of pg, which does not matter but is good on principle.
>

I am not saying that it is a bad check on its own, rather it might be
good, but let's not do any unrelated change as that will delay the
main patch.  Once, we are done with the main patch, you can propose
these as improvements.

> >> added that pgbench does not know about, the failure mode will be that
> >> nothing is printed rather than printing something strange like
> >> "method none with 2 partitions".
> >
> > but how will that new partition method will be associated with a table
> > created via pgbench?
>
> The user could do a -i with a version of pgbench and bench with another
> one. I do that often while developing…
>

I am not following what you want to say here especially ("pgbench and
bench with another one").  Can you explain with some example?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Zedstore - compressed in-core columnar storage

2019-09-19 Thread Ashutosh Sharma

On Fri, Sep 20, 2019 at 5:48 AM Taylor Vesely  wrote:
>
> > When doing update operation, for each tuple being modified,
> > *tuplebuffers_insert()* says that there is no entry for the relation
> > being modified in the hash table although it was already added when
> > the first tuple in the table was updated. Why is it so?
>
> Currently, when doing an update, it will actually flush the tuple
> buffers every time we update a tuple. As a result, we only ever spool
> up one tuple at a time. This is a good place to put in an optimization
> like was implemented for insert, but I haven't gotten around to
> looking into that yet.
>

Okay. So, that's the root cause. Spooling just one tuple where at
least 60 tuples can be spooled and then not freeing it at all is
altogether the reason for this extensive memory leak.

> The memory leak is actually happening because it isn't freeing the
> attbuffers after flushing. Alexandra Wang and I have a working
> branch[1] where we tried to plug the leak by freeing the attbuffers,
> but it has exposed an issue with triggers that I need to understand
> before I push the fix into the main zedstore branch.
>
> I don't like our solution of freeing the buffers either, because they
> could easily be reused. I'm going to take a stab at making that better
> before merging in the fix.
>

That's right, why do we need to free the memory after flushing data in
attbuffers. We can simply reuse it for next set of data to be updated.

> [1] https://github.com/l-wang/postgres-1/tree/zedstore-fix-memory-issues

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Re: dropdb --force

2019-09-19 Thread vignesh C

On Thu, Sep 19, 2019 at 11:41 PM Pavel Stehule  wrote:
>
>
>
> čt 19. 9. 2019 v 13:52 odesílatel vignesh C  napsal:
>>
>> On Thu, Sep 19, 2019 at 12:14 AM Pavel Stehule  
>> wrote:
>> >
>> > Hi
>> >
>> > I am sending updated version - the changes against last patch are two. I 
>> > use pg_terminate_backed for killing other terminates like Tom proposed. I 
>> > am not sure if it is 100% practical - on second hand the necessary right 
>> > to kill other sessions is  almost available - and consistency with 
>> > pg_terminal_backend has sense. More - native implementation is 
>> > significantly better then solution implemented outer. I fixed docs little 
>> > bit - the timeout for logout (as reaction on SIGTERM is only 5 sec).
>> >
>> > Regards
>> >
>> > Pavel
>> >
>> >
>> I observed one thing with the latest patch:
>> There is a possibility that in case of drop database failure some
>> sessions may be terminated.
>>
>> It can happen in the following scenario:
>> 1) create database test; /* super user creates test database */
>> 2) create user test password 'test@123'; /* super user creates test user */
>> 3) alter user test nosuperuser; /* super user changes test use to non
>> super user */
>> 4) alter database test owner to test; /* super user set test as test
>> database owner */
>>
>> 5) psql -d test /* connect to test database as super user - Session 1 */
>> 6) psql -d postgres -U test /* connect to test database as test user -
>> Session 2 */
>> 7) psql -d postgres -U test /* connect to test database as test user -
>> Session 3 */
>>
>> 8) drop database (force) test; /* Executed from Session 3 */
>>
>> Drop database fails in Session 3 with:
>> ERROR:  must be a superuser to terminate superuser process
>>
>> Session 2 is terminated, Session 1 is left as it is.
>>
>> Is the above behaviour ok to terminate session 2 in case of drop
>> database failure?
>> Thoughts?
>
>
> I agree so it's not nice behave. Again there are two possible solutions
>
> 1. strategy - owner can all - and don't check rigts
> 2. implement own check of rights instead using checks from 
> pg_terminate_backend.
>
> It's easy fixable when we find a agreement what is preferred behave.
>
> Pavel
>
For the above I felt, if we could identify if we don't have
permissions to terminate any of the connected session, throwing the
error at that point would be appropriate.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: backup manifests

2019-09-19 Thread David Steele


On 9/19/19 11:00 AM, Robert Haas wrote:


On Thu, Sep 19, 2019 at 9:51 AM Robert Haas  wrote:

I intend that we should be able to support incremental backups based
either on a previous full backup or based on a previous incremental
backup. I am not aware of a technical reason why we need to identify
the specific backup that must be used. If incremental backup B is
taken based on a pre-existing backup A, then I think that B can be
restored using either A or *any other backup taken after A and before
B*. In the normal case, there probably wouldn't be any such backup,
but AFAICS the start-LSNs are a sufficient cross-check that the chosen
base backup is legal.


Scratch that: there can be overlapping backups, so you have to
cross-check both start and stop LSNs.


Overall we have found it's much simpler to label each backup and 
cross-check that against the pg version and system id.  Start LSN is 
pretty unique, but backup labels work really well and are more widely 
understood.



(3) Cross-check a manifest against a backup and complain about extra
files, missing files, size differences, or checksum mismatches.


Verification is the best part of the manifest.  Plus, you can do
verification pretty cheaply on restore.  We also restore pg_control last
so clusters that have a restore error won't start.


There's no "restore" operation here, really. A backup taken by
pg_basebackup can be "restored" by copying the whole thing, but it can
also be used just where it is. If we were going to build something
into some in-core tool to copy backups around, this would be a smart
way to implement said tool, but I'm not planning on that myself.


Scratch that: incremental backups need a restore tool, so we can use
this technique there. And it can work for full backups too, because
why not?


Agreed, once we have a restore tool, use it for everything.

--
-David
da...@pgmasters.net

Re: backup manifests

2019-09-19 Thread David Steele


Hi Robert,

On 9/19/19 9:51 AM, Robert Haas wrote:

On Wed, Sep 18, 2019 at 9:11 PM David Steele  wrote:

Also consider adding the timestamp.


Sounds reasonable, even if only for the benefit of humans who might
look at the file.  We can decide later whether to use it for anything
else (and third-party tools could make different decisions from core).
I assume we're talking about file mtime here, not file ctime or file
atime or the time the manifest was generated, but let me know if I'm
wrong.


In my experience only mtime is useful.


Based on my original calculations (which sadly I don't have anymore),
the combination of SHA1, size, and file name is *extremely* unlikely to
generate a collision.  As in, unlikely to happen before the end of the
universe kind of unlikely.  Though, I guess it depends on your
expectations for the lifetime of the universe.



What I'd say is: if
the probability of getting a collision is demonstrably many orders of
magnitude less than the probability of the disk writing the block
incorrectly, then I think we're probably reasonably OK. Somebody might
differ, which is perhaps a mild point in favor of LSN-based
approaches, but as a practical matter, if a bad block is a billion
times more likely to be the result of a disk error than a checksum
mismatch, then it's a negligible risk.


Agreed.


We include the version/sysid of the cluster to avoid mixups.  It's a
great extra check on top of references to be sure everything is kosher.


I don't think it's a good idea to duplicate the information that's
already in the backup_label. Storing two copies of the same
information is just an invitation to having to worry about what
happens if they don't agree.


OK, but now we have backup_label, tablespace_map, 
..backup (in the WAL) and now perhaps a 
backup.manifest file.  I feel like we may be drowning in backup info files.



I'd
recommend JSON for the format since it is so ubiquitous and easily
handles escaping which can be gotchas in a home-grown format.  We
currently have a format that is a combination of Windows INI and JSON
(for human-readability in theory) and we have become painfully aware of
escaping issues.  Really, why would you drop files with '=' in their
name in PGDATA?  And yet it happens.


I am not crazy about JSON because it requires that I get a json parser
into src/common, which I could do, but given the possibly-imminent end
of the universe, I'm not sure it's the greatest use of time. You're
right that if we pick an ad-hoc format, we've got to worry about
escaping, which isn't lovely.


My experience is that JSON is simple to implement and has already dealt 
with escaping and data structure considerations.  A home-grown solution 
will be at least as complex but have the disadvantage of being non-standard.



One thing I'm not quite sure about is where to store the backup
manifest. If you take a base backup in tar format, you get base.tar,
pg_wal.tar (unless -Xnone), and an additional tar file per tablespace.
Does the backup manifest go into base.tar? Get written into a separate
file outside of any tar archive? Something else? And what about a
plain-format backup? I suppose then we should just write the manifest
into the top level of the main data directory, but perhaps someone has
another idea.


We do:

[backup_label]/
 backup.manifest
 pg_data/
 pg_tblspc/

In general, having the manifest easily accessible is ideal.


That's a fine choice for a tool, but a I'm talking about something
that is part of the actual backup format supported by PostgreSQL, not
what a tool might wrap around it. The choice is whether, for a
tar-format backup, the manifest goes inside a tar file or as a
separate file. To put that another way, a patch adding backup
manifests does not get to redesign where pg_basebackup puts anything
else; it only gets to decide where to put the manifest.


Fair enough.  The point is to make the manifest easily accessible.

I'd keep it in the data directory for file-based backups and as a 
separate file for tar-based backups.  The advantage here is that we can 
pick a file name that becomes reserved which a tool can't do.


Regards,
--
-David
da...@pgmasters.net

Re: pgbench - allow to create partitioned tables

2019-09-19 Thread Amit Kapila

On Fri, Sep 20, 2019 at 12:41 AM Fabien COELHO  wrote:
>
> This case now generates a fail.
>
> v12:
>   - fixes NULL vs NULL
>   - works correctly with a partitioned table without partitions attached
>   - generates an error if the partition method is unknown
>   - adds an assert
>

You seem to have attached some previous version (v2) of this patch.  I
could see old issues in the patch which we have sorted out in the
review.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-19 Thread Michael Paquier

On Thu, Sep 19, 2019 at 08:22:04AM +0530, Amit Kapila wrote:
> Thanks.  I was about to have a look today, but anyway I checked the
> committed patch and it looks fine.

Thanks Amit for double-checking.
--
Michael


signature.asc
Description: PGP signature

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2019-09-19 Thread Michael Paquier

On Thu, Sep 19, 2019 at 05:40:41PM +0300, Alexey Kondratov wrote:
> On 19.09.2019 16:21, Robert Haas wrote:
>> So, earlier in this thread, I suggested making this part of ALTER
>> TABLE, and several people seemed to like that idea. Did we have a
>> reason for dropping that approach?

Personally, I don't find this idea very attractive as ALTER TABLE is
already complicated enough with all the subqueries we already support
in the command, all the logic we need to maintain to make combinations
of those subqueries in a minimum number of steps, and also the number
of bugs we have seen because of the amount of complication present.

> If we add this option to REINDEX, then for 'ALTER TABLE tb_name action1,
> REINDEX SET TABLESPACE tbsp_name, action3' action2 will be just a direct
> alias to 'REINDEX TABLE tb_name SET TABLESPACE tbsp_name'. So it seems
> practical to do this for REINDEX first.
> 
> The only one concern I have against adding REINDEX to ALTER TABLE in this
> context is that it will allow user to write such a chimera:
>
> ALTER TABLE tb_name REINDEX SET TABLESPACE tbsp_name, SET TABLESPACE
> tbsp_name;
>
> when they want to move both table and all the indexes. Because simple
> ALTER TABLE tb_name REINDEX, SET TABLESPACE tbsp_name;
> looks ambiguous. Should it change tablespace of table, indexes or both?

Tricky question, but we don't change the tablespace of indexes when
using an ALTER TABLE, so I would say no on compatibility grounds.
ALTER TABLE has never touched the tablespace of indexes, and I don't
think that we should begin to do so.
--
Michael


signature.asc
Description: PGP signature

Re: Custom reloptions and lock modes

2019-09-19 Thread Michael Paquier

On Fri, Sep 20, 2019 at 10:38:31AM +0900, Michael Paquier wrote:
> Hi all,
> 
> This is a new thread related to the bug analyzed here:
> https://www.postgresql.org/message-id/20190919083203.gc21...@paquier.xyz
> 
> And in short, if you attempt to do an ALTER TABLE with a custom
> reloptions the command burns itself, like that for example this
> sequence:
> create extension bloom;
> create table aa (a int);
> create index aa_bloom ON aa USING bloom (a);
> alter index aa_bloom set (length = 20);
> 
> Which results in the following error:
> - 0002 is a patch which we could use to extend the existing reloption
> APIs to set the lock mode used.  I am aware of the recent work done by
> Nikolay in CC to rework this API set, but I am unsure where we are
> going there, and the resulting patch is actually very short, being
> 20-line long with the current infrastructure.  That could go into
> HEAD.  Table AMs have been added in v12 so custom reloptions could
> gain more in popularity, but as we are very close to the release it
> would not be cool to break those APIs.  The patch simplicity could
> also be a reason sufficient for a back-patch, and I don't think that
> there are many users of them yet.

I mean a back-patch down to v12 for this part, but not further down.
Sorry for the possible confusion.
--
Michael


signature.asc
Description: PGP signature

Re: Syntax highlighting for Postgres spec files

2019-09-19 Thread Michael Paquier

On Thu, Sep 19, 2019 at 05:14:20PM -0700, Ashwin Agrawal wrote:
> I didn't try as waiting to see if for emacs as well shows up :-) Do we want
> to get these in src/tools/editors?

A full complex plugin may be hard to justify, especially as I suspect
that there are very few hackers able to create their own isolation
tests in Visual.  But my take is that if you can have something
which can be directly plugged into emacs.samples and vim.samples then
there is room for it.
--
Michael


signature.asc
Description: PGP signature

Custom reloptions and lock modes

2019-09-19 Thread Michael Paquier

Hi all,

This is a new thread related to the bug analyzed here:
https://www.postgresql.org/message-id/20190919083203.gc21...@paquier.xyz

And in short, if you attempt to do an ALTER TABLE with a custom
reloptions the command burns itself, like that for example this
sequence:
create extension bloom;
create table aa (a int);
create index aa_bloom ON aa USING bloom (a);
alter index aa_bloom set (length = 20);

Which results in the following error:
ERROR:  XX000: unrecognized lock mode: 2139062143
LOCATION:  LockAcquireExtended, lock.c:756

The root of the problem is that the set of relation options loaded
finds properly the custom options set when looking for the lock mode
to use in AlterTableGetRelOptionsLockLevel(), but we never set the
lock mode this option should use when allocating it, resulting in a
failure.  The current set of APIs does not allow either to set the
lock mode associated with a custom reloption.

Hence attached is a patch set to address those issues:
- 0001 makes sure that any existing module creating a custom reloption
has the lock mode set to AccessExclusiveMode, which would be a sane
default anyway.  I think that we should just back-patch that and close
any holes.
- 0002 is a patch which we could use to extend the existing reloption
APIs to set the lock mode used.  I am aware of the recent work done by
Nikolay in CC to rework this API set, but I am unsure where we are
going there, and the resulting patch is actually very short, being
20-line long with the current infrastructure.  That could go into
HEAD.  Table AMs have been added in v12 so custom reloptions could
gain more in popularity, but as we are very close to the release it
would not be cool to break those APIs.  The patch simplicity could
also be a reason sufficient for a back-patch, and I don't think that
there are many users of them yet.

My take would be to use 0001 on all branches (or I am missing
something related to custom relopts manipulation?), and consider 0002
on HEAD.

Thoughts?
--
Michael
From af2541d08965369f1b1fa4d8a0750e798420b3a5 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 20 Sep 2019 10:10:18 +0900
Subject: [PATCH 1/2] Fix failure for lock mode used for custom relation
 options

---
 contrib/bloom/expected/bloom.out   | 1 +
 contrib/bloom/sql/bloom.sql| 1 +
 src/backend/access/common/reloptions.c | 7 +++
 3 files changed, 9 insertions(+)

diff --git a/contrib/bloom/expected/bloom.out b/contrib/bloom/expected/bloom.out
index 5ab9e34f82..dae12a7d3e 100644
--- a/contrib/bloom/expected/bloom.out
+++ b/contrib/bloom/expected/bloom.out
@@ -5,6 +5,7 @@ CREATE TABLE tst (
 );
 INSERT INTO tst SELECT i%10, substr(md5(i::text), 1, 1) FROM generate_series(1,2000) i;
 CREATE INDEX bloomidx ON tst USING bloom (i, t) WITH (col1 = 3);
+ALTER INDEX bloomidx SET (length=80);
 SET enable_seqscan=on;
 SET enable_bitmapscan=off;
 SET enable_indexscan=off;
diff --git a/contrib/bloom/sql/bloom.sql b/contrib/bloom/sql/bloom.sql
index 32755f2b1a..4733e1e705 100644
--- a/contrib/bloom/sql/bloom.sql
+++ b/contrib/bloom/sql/bloom.sql
@@ -7,6 +7,7 @@ CREATE TABLE tst (
 
 INSERT INTO tst SELECT i%10, substr(md5(i::text), 1, 1) FROM generate_series(1,2000) i;
 CREATE INDEX bloomidx ON tst USING bloom (i, t) WITH (col1 = 3);
+ALTER INDEX bloomidx SET (length=80);
 
 SET enable_seqscan=on;
 SET enable_bitmapscan=off;
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 20f4ed3c38..b59e606771 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -659,6 +659,13 @@ allocate_reloption(bits32 kinds, int type, const char *name, const char *desc)
 	newoption->namelen = strlen(name);
 	newoption->type = type;
 
+	/*
+	 * Set the default lock mode for this option.  There is no actual way
+	 * for a module to enforce it when declaring a custom relation option,
+	 * so just use the highest level, which is safe for all cases.
+	 */
+	newoption->lockmode = AccessExclusiveLock;
+
 	MemoryContextSwitchTo(oldcxt);
 
 	return newoption;
-- 
2.23.0

From 8d3f95edea77ea71eeade8e2a977e8404573665f Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 20 Sep 2019 10:17:54 +0900
Subject: [PATCH 2/2] Allow definition of lock mode for custom reloptions

---
 contrib/bloom/blutils.c|  6 --
 src/backend/access/common/reloptions.c | 28 +++---
 src/include/access/reloptions.h| 11 ++
 3 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index cc1670934f..dbb24cb5b2 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -60,7 +60,8 @@ _PG_init(void)
 	/* Option for length of signature */
 	add_int_reloption(bl_relopt_kind, "length",
 	  "Length of signature in bits",
-	  DEFAULT_BLOOM_LENGTH, 1, MAX_BLOOM_LENGTH);
+	  DEFAULT_BLOOM_LENGTH, 1, MAX_BLOOM_LENGTH,
+	  AccessExclusiveLock);

Re: subscriptionCheck failures on nightjar

2019-09-19 Thread Michael Paquier

On Thu, Sep 19, 2019 at 05:20:15PM +0530, Kuntal Ghosh wrote:
> While subscription 3 is created, it eventually reaches to a consistent
> snapshot point and prints the WAL location corresponding to it. It
> seems sub1/sub2 immediately fails to serialize the snapshot to the
> .snap file having the same WAL location.
> 
> Is this helpful?

It looks like you are pointing to something here.
--
Michael


signature.asc
Description: PGP signature

Re: Add "password_protocol" connection parameter to libpq

2019-09-19 Thread Jeff Davis

On Tue, 2019-09-17 at 16:04 +0900, Michael Paquier wrote:
> basically maps into checking after FE_SCRAM_FINISHED.  Instead of
> those two flags, wouldn't it be cleaner to add an extra routine to
> fe-auth-scram.c which does the same sanity checks, say
> pg_fe_scram_check_state()?  This needs to be careful about three
> things, taking in input an opaque pointer to fe_scram_state if
> channel
> binding is required:
> - The data is not NULL.
> - The sasl mechanism selected is SCRAM-SHA-256-PLUS.
> - The state is FE_SCRAM_FINISHED.

Yes, I think this does come out a bit cleaner, thank you.

> What do you think?  There is no need to save down the connection
> parameter value into fe_scram_state.

I'm not sure I understand this comment, but I removed the extra boolean
flags.

> Nit here as there are only two mechanisms handled: I would rather
> cause the error if the selected mechanism does not match
> SCRAM-SHA-256-PLUS, instead of complaining if the selected mechanism
> matches SCRAM-SHA-256.  Either way is actually fine :)

Done.

> +printfPQExpBuffer(>errorMessage,
> +  libpq_gettext("Channel binding required but
> not
> offered by server\n"));
> +   result = false;
> Should that be "completed by server" instead?

Done.

> is required".  Or you have in mind that this error message is better?

I felt it would be a more useful error message.

> I think that pgindent would complain with the comment block in
> check_expected_areq(). 

Changed.

Regards,
Jeff Davis

From 3fa2ab55929803280416454548ec97a960ec0389 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Tue, 20 Aug 2019 18:59:23 -0700
Subject: [PATCH] Add libpq parameter 'channel_binding'.

Allow clients to require channel binding to enhance security against
untrusted servers.

Author: Jeff Davis
Discussion: https://postgr.es/m/227015d8417f2b4fef03f8966dbfa5cbcc4f44da.camel%40j-davis.com
---
 doc/src/sgml/libpq.sgml   | 32 ++
 src/interfaces/libpq/fe-auth-scram.c  | 22 +--
 src/interfaces/libpq/fe-auth.c| 74 +--
 src/interfaces/libpq/fe-auth.h|  1 +
 src/interfaces/libpq/fe-connect.c | 34 +++
 src/interfaces/libpq/libpq-int.h  |  1 +
 src/test/authentication/t/002_saslprep.pl | 12 +++-
 src/test/ssl/t/002_scram.pl   | 15 -
 src/test/ssl/t/SSLServer.pm   |  9 ++-
 9 files changed, 185 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 1189341ca15..c58527b0c3b 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1122,6 +1122,28 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
  
 
+ 
+  channel_binding
+  
+  
+This option controls the client's use of channel binding. A setting
+of require means that the connection must employ
+channel binding, prefer means that the client will
+choose channel binding if available, and disable
+prevents the use of channel binding. The default
+is prefer if
+PostgreSQL is compiled with SSL support;
+otherwise the default is disable.
+  
+  
+Channel binding is a method for the server to authenticate itself to
+the client. It is only supported over SSL connections
+with PostgreSQL 11 or later servers using
+the SCRAM authentication method.
+  
+  
+ 
+
  
   connect_timeout
   
@@ -6864,6 +6886,16 @@ myEventProc(PGEventId evtId, void *evtInfo, void *passThrough)
  
 
 
+
+ 
+  
+   PGCHANNELBINDING
+  
+  PGCHANNELBINDING behaves the same as the  connection parameter.
+ 
+
+
 
  
   
diff --git a/src/interfaces/libpq/fe-auth-scram.c b/src/interfaces/libpq/fe-auth-scram.c
index 7a8335bf9f8..3b40c4b083c 100644
--- a/src/interfaces/libpq/fe-auth-scram.c
+++ b/src/interfaces/libpq/fe-auth-scram.c
@@ -119,6 +119,18 @@ pg_fe_scram_init(PGconn *conn,
 	return state;
 }
 
+/*
+ * Return true if channel binding was successful; false otherwise.
+ */
+bool
+pg_fe_scram_channel_bound(void *opaq)
+{
+	fe_scram_state *state = (fe_scram_state *) opaq;
+
+	return (state != NULL && state->state == FE_SCRAM_FINISHED &&
+			strcmp(state->sasl_mechanism, SCRAM_SHA_256_PLUS_NAME) == 0);
+}
+
 /*
  * Free SCRAM exchange status
  */
@@ -225,9 +237,7 @@ pg_fe_scram_exchange(void *opaq, char *input, int inputlen,
 
 			/*
 			 * Verify server signature, to make sure we're talking to the
-			 * genuine server.  XXX: A fake server could simply not require
-			 * authentication, though.  There is currently no option in libpq
-			 * to reject a connection, if SCRAM authentication did not happen.
+			 * genuine server.
 			 */
 			if (verify_server_signature(state))
 *success = true;
@@ -358,7 +368,7 @@ build_client_first_message(fe_scram_state *state)
 		appendPQExpBufferStr(, "p=tls-server-end-point");
 	}

Re: Zedstore - compressed in-core columnar storage

2019-09-19 Thread Taylor Vesely

> When doing update operation, for each tuple being modified,
> *tuplebuffers_insert()* says that there is no entry for the relation
> being modified in the hash table although it was already added when
> the first tuple in the table was updated. Why is it so?

Currently, when doing an update, it will actually flush the tuple
buffers every time we update a tuple. As a result, we only ever spool
up one tuple at a time. This is a good place to put in an optimization
like was implemented for insert, but I haven't gotten around to
looking into that yet.

The memory leak is actually happening because it isn't freeing the
attbuffers after flushing. Alexandra Wang and I have a working
branch[1] where we tried to plug the leak by freeing the attbuffers,
but it has exposed an issue with triggers that I need to understand
before I push the fix into the main zedstore branch.

I don't like our solution of freeing the buffers either, because they
could easily be reused. I'm going to take a stab at making that better
before merging in the fix.

[1] https://github.com/l-wang/postgres-1/tree/zedstore-fix-memory-issues

Re: [PATCH] src/test/modules/dummy_index -- way to test reloptions from inside of access method

2019-09-19 Thread Michael Paquier

On Thu, Sep 19, 2019 at 02:13:23PM +0300, Nikolay Shaplov wrote:
> What a good catch! dummy_index already proved to be useful ;-)

Yes, the testing around custom reloptions is rather poor now, so this
module has value.  I still don't like much its shape though, so I
began hacking on it for integration, and I wanted to get that part
down in this CF :)

There may be other issues, but let's sort out that later if anything
shows up.

> Yes I think AccessExclusiveLock is quite good default I think. Especially in 
> the case when these options are not really used in real world ;-)

I guess so, but with table AMs introduced in 12, I would suspect that
we are going to have much more use cases popping out, and that these
use cases would be very happy to have the possibility to lower the
lock level needed to set a custom reloption.  I would like to get that
fixed and back-patched separately.  As it is not especially clear for
everybody here in a thread dedicated to a test module that we are
discussing about a backend-side bug, I am going to spawn a new thread
with a proper patch.  Perhaps I missed something as well, so it would
be good to get more input on that.

> As you know I have plans for rewriting options engine and there would be same 
> options code both for core Access Methods and for options for AM from 
> extensions. So there would be API for setting lockmode...
> But the way it is going right now, I am not sure it will reviewed to reach 
> 13...

Well, another thing would be to extend the existing routines so as
they take an extra argument to be able to enforce the lockmode, which
is something that can be done without a large rewrite of the whole
facility, and the change is less invasive so it would have better
chances to get into core.  I don't mind changing those APIs on HEAD by
the way as long as the breakage involves a clean compilation failure
and I don't think they are the most popular extension APIs ever.
Perhaps others don't have the same line of thoughts, but let's see.
--
Michael

signature.asc
Description: PGP signature

Re: Avoiding possible future conformance headaches in JSON work

2019-09-19 Thread Tom Lane

Chapman Flack  writes:
> Sure, against *every* non-spec feature we have or ever will have, someone
> /could/ raise a generic "what if SQL committee might add something pretty
> similar in future".
> But what we have in this case are specific non-spec features (array, map,
> and sequence constructors, lambdas, map/fold/reduce, user-defined
> functions) that are flat-out already present in the current version of
> the language that the SQL committee is clearly modeling jsonpath on.

Sure.  But we also modeled those features on the same language that the
committee is looking at (or at least I sure hope we did).  So it's
reasonable to assume that they would come out at the same spot without
any prompting.  And we can prompt them ;-).

regards, tom lane

Re: Avoiding possible future conformance headaches in JSON work

2019-09-19 Thread Tom Lane

Chapman Flack  writes:
> But is that even the point? It's already noted in [1] that the standard
> calls for one style of regexps and we're providing another.

> It's relatively uncomplicated now to add some kind of distinguishing
> syntax for our posix flavor of like_regex. Yes, it would be a change
> between beta1 and final release, but that doesn't seem unheard-of.

> In contrast, if such a distinction is not added now, we know that will
> be a headache for any future effort to more closely conform to the
> standard. Whether such a future effort seems near-term or far off, it
> doesn't seem strategic to make current choices that avoidably make it
> harder.

Just to not leave this thread hanging --- the discussion was picked up
in this other thread:

https://www.postgresql.org/message-id/flat/CAPpHfdvDci4iqNF9fhRkTqhe-5_8HmzeLt56drH%2B_Rv2rNRqfg%40mail.gmail.com

and I think we've come to the conclusion that the only really awful regex
compatibility problem is differing interpretations of the 'x' flag, which
we solved temporarily by treating that as unimplemented in jsonpath.
There are some other unimplemented features that we can consider adding
later, too.  (Fortunately, Spencer's engine throws errors for all of
those, so adding them won't create new compatibility issues.)  And we did
add some documentation:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0a97edb12ec44f8d2d8828cbca6dd7639408ac88

There remains the question of whether we should do something like
requiring a "pg" prefix to allow access to the other nonstandard
features we added to jsonpath.  I see the point that the SQL committee
might well add something pretty similar in future.  But I'm not too
concerned about that, on two grounds: (1) the same argument could be
raised against *every* non-spec feature we have or ever will have;
(2) now that Peter's in on SQL committee deliberations, we have a
chance to push for any future spec changes to not be unnecessarily
incompatible.  So my inclination is to close this open item as
sufficiently done, once the minor lexer issues raised in the other
thread are done.

regards, tom lane

Re: Define jsonpath functions as stable

2019-09-19 Thread Jonathan S. Katz

On 9/19/19 6:18 PM, Tom Lane wrote:
> "Jonathan S. Katz"  writes:
>> On 9/19/19 3:48 PM, Tom Lane wrote:
>>> Seems like
>>> the error handling in jsonpath_gram.y could use some cleanup too
>>> ... although I don't think it's a task to tackle while we're
>>> rushing to get v12 shippable.
> 
>> IIRC if we want to change the contents of an error message we wait until
>> major releases. Is there anything we can do before 12 to avoid messages
>> like "unexpected IDENT_P" coming to a user? Would that be something
>> acceptable to fix as a 12.1 or would it have to wait until 13?
> 
> I think these messages are sufficiently confusing that we could call
> it a bug fix to improve them.  As long as we don't change the SQLSTATE
> that's thrown, it's hard to claim that there's any real application
> compatibility hazard from changing them.

Great. +1 on that.

> I just don't want to call this point a release blocker.  It's not
> about changing any semantics or the set of things that work.

+100 on that.

Jonathan



signature.asc
Description: OpenPGP digital signature

Re: Define jsonpath functions as stable

2019-09-19 Thread Tom Lane

"Jonathan S. Katz"  writes:
> On 9/19/19 3:48 PM, Tom Lane wrote:
>> Seems like
>> the error handling in jsonpath_gram.y could use some cleanup too
>> ... although I don't think it's a task to tackle while we're
>> rushing to get v12 shippable.

> IIRC if we want to change the contents of an error message we wait until
> major releases. Is there anything we can do before 12 to avoid messages
> like "unexpected IDENT_P" coming to a user? Would that be something
> acceptable to fix as a 12.1 or would it have to wait until 13?

I think these messages are sufficiently confusing that we could call
it a bug fix to improve them.  As long as we don't change the SQLSTATE
that's thrown, it's hard to claim that there's any real application
compatibility hazard from changing them.

I just don't want to call this point a release blocker.  It's not
about changing any semantics or the set of things that work.

regards, tom lane

Re: Define jsonpath functions as stable

2019-09-19 Thread Jonathan S. Katz

On 9/19/19 3:48 PM, Tom Lane wrote:
> "Jonathan S. Katz"  writes:
>> I looked at the patch, but did not test it. From what I can see, it
>> looks good, but perhaps we add a test in it to show that single-quoted
>> literals are unsupported?
> 
> I thought about that, but it seems like it'd be memorializing some
> other weird behavior:
> 
> regression=# select '''foo'''::jsonpath;
> ERROR:  syntax error, unexpected IDENT_P at end of jsonpath input
> LINE 1: select '''foo'''::jsonpath;
>^
> 
> regression=# select '''foo'' <= ''bar'''::jsonpath;
> ERROR:  syntax error, unexpected IDENT_P at or near " " of jsonpath input
> LINE 1: select '''foo'' <= ''bar'''::jsonpath;
>^

Ah yeah, those are some interesting errors.

> There isn't anything I like about these error messages.

Agreed. It would be nice to have tests around it, but yes, I think
looking at the regression outpout one may scratch their head.

>  Seems like
> the error handling in jsonpath_gram.y could use some cleanup too
> ... although I don't think it's a task to tackle while we're
> rushing to get v12 shippable.

IIRC if we want to change the contents of an error message we wait until
major releases. Is there anything we can do before 12 to avoid messages
like "unexpected IDENT_P" coming to a user? Would that be something
acceptable to fix as a 12.1 or would it have to wait until 13?

Jonathan

signature.asc
Description: OpenPGP digital signature

Re: Bug in GiST paring heap comparator

2019-09-19 Thread Nikita Glukhov


On 19.09.2019 22:14, Alexander Korotkov wrote:

Pushed.


Attached patch fixes premature xs_orderbynulls[] assignment.  The old value of
NULL flag, not the new, should be checked before pfree()ing the old value.


--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
>From bf4ffb0c066dd55e027ed156790277c597f44e45 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov 
Date: Fri, 20 Sep 2019 00:07:30 +0300
Subject: [PATCH] Fix pfree()ing of index order-by distances

---
 src/backend/access/index/indexam.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 442f414..3a061e5 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -869,11 +869,6 @@ index_store_float8_orderby_distances(IndexScanDesc scan, Oid *orderByTypes,
 
 	for (i = 0; i < scan->numberOfOrderBys; i++)
 	{
-		scan->xs_orderbynulls[i] = distances[i].isnull;
-
-		if (scan->xs_orderbynulls[i])
-			scan->xs_orderbyvals[i] = (Datum) 0;
-
 		if (orderByTypes[i] == FLOAT8OID)
 		{
 #ifndef USE_FLOAT8_BYVAL
@@ -881,7 +876,11 @@ index_store_float8_orderby_distances(IndexScanDesc scan, Oid *orderByTypes,
 			if (!scan->xs_orderbynulls[i])
 pfree(DatumGetPointer(scan->xs_orderbyvals[i]));
 #endif
-			if (!scan->xs_orderbynulls[i])
+			scan->xs_orderbynulls[i] = distances[i].isnull;
+
+			if (scan->xs_orderbynulls[i])
+scan->xs_orderbyvals[i] = (Datum) 0;
+			else
 scan->xs_orderbyvals[i] = Float8GetDatum(distances[i].value);
 		}
 		else if (orderByTypes[i] == FLOAT4OID)
@@ -892,7 +891,11 @@ index_store_float8_orderby_distances(IndexScanDesc scan, Oid *orderByTypes,
 			if (!scan->xs_orderbynulls[i])
 pfree(DatumGetPointer(scan->xs_orderbyvals[i]));
 #endif
-			if (!scan->xs_orderbynulls[i])
+			scan->xs_orderbynulls[i] = distances[i].isnull;
+
+			if (scan->xs_orderbynulls[i])
+scan->xs_orderbyvals[i] = (Datum) 0;
+			else
 scan->xs_orderbyvals[i] = Float4GetDatum((float4) distances[i].value);
 		}
 		else
@@ -908,6 +911,7 @@ index_store_float8_orderby_distances(IndexScanDesc scan, Oid *orderByTypes,
 			if (scan->xs_recheckorderby)
 elog(ERROR, "ORDER BY operator must return float8 or float4 if the distance function is lossy");
 			scan->xs_orderbynulls[i] = true;
+			scan->xs_orderbyvals[i] = (Datum) 0;
 		}
 	}
 }
-- 
2.7.4

Re: Define jsonpath functions as stable

2019-09-19 Thread Tom Lane

"Jonathan S. Katz"  writes:
> I looked at the patch, but did not test it. From what I can see, it
> looks good, but perhaps we add a test in it to show that single-quoted
> literals are unsupported?

I thought about that, but it seems like it'd be memorializing some
other weird behavior:

regression=# select '''foo'''::jsonpath;
ERROR:  syntax error, unexpected IDENT_P at end of jsonpath input
LINE 1: select '''foo'''::jsonpath;
   ^

regression=# select '''foo'' <= ''bar'''::jsonpath;
ERROR:  syntax error, unexpected IDENT_P at or near " " of jsonpath input
LINE 1: select '''foo'' <= ''bar'''::jsonpath;
   ^

There isn't anything I like about these error messages.  Seems like
the error handling in jsonpath_gram.y could use some cleanup too
... although I don't think it's a task to tackle while we're
rushing to get v12 shippable.

regards, tom lane

Re: Bug in GiST paring heap comparator

2019-09-19 Thread Alexander Korotkov

On Mon, Sep 16, 2019 at 10:30 PM Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Mon, Sep 16, 2019 at 3:47 PM Nikita Glukhov 
> wrote:
> > On 13.09.2019 20:17, Alexander Korotkov wrote:
> >
> > On Fri, Sep 13, 2019 at 5:23 PM Nikita Glukhov 
> wrote:
> >
> > I have moved handling of NULL ordering keys from opclasses to the common
> > SP-GiST code, but really I don't like how it is implemented now. Maybe
> it's
> > worth to move handling of NULL order-by keys to the even more higher
> > level so,
> > that AM don't have to worry about NULLs?
> >
> > Yes, optimizer could remove away "col op NULL" clauses from ORDER BY
> > if op is strict operator.  And then such clauses wouldn't be passed to
> > AM.  But I see this as future improvement.  For backpatching we should
> > solve this at AM side.
> >
> > Also I leaved usages of IndexOrderByDistance in opclasses. I think, that
> > may help to minimize opclass changes in the future.
> >
> > Could you please extract this as a separate patch.  We can consider
> > this for master, but we shouldn't backpatch this.
> >
> > Propagation of IndexOrderByDistance in SP-GiST was extracted into a
> separate
> > patch #2.
>
> First patch is minor editing from me and commit message is attached.
> I'm going to push it if no objections.
>

Pushed.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: pgbench - allow to create partitioned tables

2019-09-19 Thread Fabien COELHO



Hello Amit,

[...] 'ps' itself won't be NULL in that case, the value it contains is 
NULL. I have debugged this case as well.  'ps' itself can be NULL only 
when you pass wrong column number or something like that to PQgetvalue.


Argh, you are right! I mixed up C NULL and SQL NULL:-(


If we don't find pgbench_accounts, let's give error here itself rather
than later unless you have a valid case in mind.


I thought of it, but decided not to: Someone could add a builtin script
which does not use pgbench_accounts, or a parallel running script could
create a table dynamically, whatever, so I prefer the error to be raised
by the script itself, rather than deciding that it will fail before even
trying.


I think this is not a possibility today and I don't know of the
future.  I don't think it is a good idea to add code which we can't
reach today.  You can probably add Assert if required.


I added a fail on an unexpected partition method, i.e. not 'r' or 'h',
and an Assert of PQgetvalue returns NULL.

I fixed the query so that it counts actual partitions, otherwise I was 
getting one for a partitioned table without partitions attached, which 
does not generate an error by the way. I just figured out that pgbench 
does not check that UPDATE updates anything. Hmmm.



Hmm, why you need to change the fill factor behavior?  If it is not
specifically required for the functionality of this patch, then I
suggest keeping that behavior as it is.


The behavior is not actually changed, but I had to move fillfactor away 
because it cannot be declared on partitioned tables, it must be declared 
on partitions only. Once there is a function to handle that it is pretty 
easy to add the test.


I can remove it but franckly there are only benefits: the default is now 
tested by pgbench, the create query is smaller, and it would work with 
older versions of pg, which does not matter but is good on principle.



added that pgbench does not know about, the failure mode will be that
nothing is printed rather than printing something strange like
"method none with 2 partitions".


but how will that new partition method will be associated with a table
created via pgbench?


The user could do a -i with a version of pgbench and bench with another 
one. I do that often while developing…


I think the previous check was good because it makes partition checking 
consistent throughout the patch.


This case now generates a fail.

v12:
 - fixes NULL vs NULL
 - works correctly with a partitioned table without partitions attached
 - generates an error if the partition method is unknown
 - adds an assert

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 816f9cc4c7..3e8e292e39 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -306,6 +306,32 @@ pgbench  options  d
   
  
 
+ 
+  --partitions=NUM
+  
+   
+Create a partitioned pgbench_accounts table with
+NUM partitions of nearly equal size for
+the scaled number of accounts.
+Default is 0, meaning no partitioning.
+   
+  
+ 
+
+ 
+  --partition-method=NAME
+  
+   
+Create a partitioned pgbench_accounts table with
+NAME method.
+Expected values are range or hash.
+This option is only taken into account if
+--partitions is non-zero.
+Default is range.
+   
+  
+ 
+
  
   --tablespace=tablespace
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 570cf3306a..6819b4e433 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -186,6 +186,11 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
+/* partitioning for pgbench_accounts table, 0 for no partitioning */
+int 		partitions = 0;
+enum { PART_RANGE, PART_HASH }
+			partition_method = PART_RANGE;
+
 /* random seed used to initialize base_random_sequence */
 int64		random_seed = -1;
 
@@ -617,6 +622,9 @@ usage(void)
 		   "  --foreign-keys   create foreign key constraints between tables\n"
 		   "  --index-tablespace=TABLESPACE\n"
 		   "   create indexes in the specified tablespace\n"
+		   "  --partitions=NUM partition account table in NUM parts (defaults: 0)\n"
+		   "  --partition-method=(range|hash)\n"
+		   "   partition account table with this method (default: range)\n"
 		   "  --tablespace=TABLESPACE  create tables in the specified tablespace\n"
 		   "  --unlogged-tablescreate tables as unlogged tables\n"
 		   "\nOptions to select what to run:\n"
@@ -3601,6 +3609,17 @@ initDropTables(PGconn *con)
 	 "pgbench_tellers");
 }
 
+/*
+ * add fillfactor percent option if not 100.
+ */
+static void
+append_fillfactor(char *opts, int len)
+{
+	if (fillfactor < 100)
+		snprintf(opts + strlen(opts), len - strlen(opts),
+ " with

Re: Memory Accounting

2019-09-19 Thread Jeff Davis

On Wed, 2019-09-18 at 13:50 -0700, Soumyadeep Chakraborty wrote:
> Hi Jeff,

Hi Soumyadeep and Melanie,

Thank you for the review!

> max_stack_depth   max level   lazy (ms)   eager (ms)  (eage
> r/lazy)
> 2MB   82  302.715 427.554 1.4123978
> 3MB   3474567.829 896.143 1.578191674
> 7.67MB86942657.9724903.0631.844663149

Thank you for collecting data on this. Were you able to find any
regression when compared to no memory accounting at all?

It looks like you agree with the approach and the results. Did you find
any other issues with the patch?

I am also including Robert in this thread. He had some concerns the
last time around due to a small regression on POWER.

Regards,
Jeff Davis

Syntax highlighting for Postgres spec files

2019-09-19 Thread Ekin Dursun

Hi people,

I have written language plugins for .spec files used in isolation tests. They 
are available for Vim and Visual Studio Code. I hope they will make reading the 
tests easier for you. If you find a problem, please open an issue!



https://github.com/onlined/pgspec.vim

https://github.com/onlined/pgspec-vsc

Re: Usage of the system truststore for SSL certificate validation

2019-09-19 Thread Isaac Morland

If we're going to open this up, can we add an option to say "this key is
allowed to log in to this account", SSH style?

I like the idea of using keys rather than .pgpass, but I like the
~/.ssh/authorized_keys model and don't like the "set up an entire
certificate infrastructure" approach.

On Thu, 19 Sep 2019 at 10:54, Thomas Berger  wrote:

> Hi,
>
> currently, libpq does SSL cerificate validation only against the defined
> `PGSSLROOTCERT` file.
>
> Is there any specific reason, why the system truststore ( at least under
> unixoid systems) is not considered for the validation?
>
> We would like to contribute a patch to allow certificate validation
> against
> the system truststore. Are there any opinions against it?
>
>
> A little bit background for this:
>
> Internally we sign the certificates for our systems with our own CA. The
> CA
> root certificates and revocation lists are distributed via puppet and/or
> packages on all of our internal systems.
>
> Validating the certificate against this CA requires to either override the
> PGSSLROOTCERT location via the environment or provide a copy of the file
> for
> each user that connects with libpq or libpq-like connectors.
>
> We would like to simplify this.
>
>
> --
> Thomas Berger
>
> PostgreSQL DBA
> Database Operations
>
> 1&1 Telecommunication SE | Ernst-Frey-Straße 10 | 76135 Karlsruhe | Germany
>
>
>

Usage of the system truststore for SSL certificate validation

2019-09-19 Thread Thomas Berger

Hi,

currently, libpq does SSL cerificate validation only against the defined 
`PGSSLROOTCERT` file.

Is there any specific reason, why the system truststore ( at least under 
unixoid systems) is not considered for the validation?

We would like to contribute a patch to allow certificate validation against 
the system truststore. Are there any opinions against it?


A little bit background for this:

Internally we sign the certificates for our systems with our own CA. The CA 
root certificates and revocation lists are distributed via puppet and/or 
packages on all of our internal systems.

Validating the certificate against this CA requires to either override the 
PGSSLROOTCERT location via the environment or provide a copy of the file for 
each user that connects with libpq or libpq-like connectors.

We would like to simplify this.


-- 
Thomas Berger

PostgreSQL DBA
Database Operations

1&1 Telecommunication SE | Ernst-Frey-Straße 10 | 76135 Karlsruhe | Germany

Re: allow online change primary_conninfo

2019-09-19 Thread Sergei Kornilov

Hello

Thank you for review!

> - This parameter can only be set at server start.
> + This parameter can only be set in the postgresql.conf
> + file or on the server command line.
>
> I'm not sure it's good to change the context of this
> description. This was mentioning that changing of this parameter
> requires server (re)start. So if we want to be on the same
> context after rewriting, it would be like "This parameter can be
> set any time and causes WAL receiver restart with the new setting
> if the server is in standby mode."

Was written such way after this review: 
https://www.postgresql.org/message-id/20181125214313.lydvmrraqjfrb3s2%40alap3.anarazel.de

> And If I'm not missing something, I don't find an (explict)
> paramter of postmaster for setting primary_conninfo.

Well, we have common -c option: -c name=value

> Couldn't we do the same thing by just skipping the wait and
> setting lastSourceFailed to true in the case of intentional
> walreceiver restart?

Yes, it's possible. Let's see... Done in attached variant.
We need check pendingWalRcvRestart before rescanLatestTimeLine lines.

regards, Sergeidiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6612f95f9f..1fa48058e8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3929,9 +3929,15 @@ ANY num_sync ( 
@@ -3946,9 +3952,13 @@ ANY num_sync ( ).
-  This parameter can only be set at server start.
+  This parameter can only be set in the postgresql.conf
+  file or on the server command line.
   This setting has no effect if primary_conninfo is not
-  set.
+  set or the server is not in standby mode.
+ 
+ 
+  The WAL receiver is restarted after an update of primary_slot_name.
  
 

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b7ff004234..f0e47cc663 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -803,6 +803,12 @@ static XLogSource readSource = 0;	/* XLOG_FROM_* code */
 static XLogSource currentSource = 0;	/* XLOG_FROM_* code */
 static bool lastSourceFailed = false;
 
+/*
+ * Need for restart running WalReceiver due the configuration change.
+ * Suitable only for XLOG_FROM_STREAM source
+ */
+static bool pendingWalRcvRestart = false;
+
 typedef struct XLogPageReadPrivate
 {
 	int			emode;
@@ -11756,12 +11762,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 		int			oldSource = currentSource;
 
 		/*
-		 * First check if we failed to read from the current source, and
+		 * First check if we failed to read from the current source or if
+		 * we want to restart wal receiver, and
 		 * advance the state machine if so. The failure to read might've
 		 * happened outside this function, e.g when a CRC check fails on a
 		 * record, or within this loop.
 		 */
-		if (lastSourceFailed)
+		if (lastSourceFailed || pendingWalRcvRestart)
 		{
 			switch (currentSource)
 			{
@@ -11862,6 +11869,17 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	if (WalRcvStreaming())
 		ShutdownWalRcv();
 
+	/*
+	 * If wal receiver is requested to restart, we skip the
+	 * next XLOG_FROM_ARCHIVE to immediately starting it.
+	 */
+	if (pendingWalRcvRestart)
+	{
+		lastSourceFailed = true;
+		currentSource = XLOG_FROM_ARCHIVE;
+		continue;
+	}
+
 	/*
 	 * Before we sleep, re-scan for possible new timelines if
 	 * we were requested to recover to the latest timeline.
@@ -11881,7 +11899,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	 * obtaining the requested WAL. We're going to loop back
 	 * and retry from the archive, but if it hasn't been long
 	 * since last attempt, sleep wal_retrieve_retry_interval
-	 * milliseconds to avoid busy-waiting.
+	 * milliseconds to avoid busy-waiting. We don't wait if
+	 * explicitly requested to restart.
 	 */
 	now = GetCurrentTimestamp();
 	if (!TimestampDifferenceExceeds(last_fail_time, now,
@@ -11922,16 +11941,17 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 currentSource = XLOG_FROM_ARCHIVE;
 		}
 
-		if (currentSource != oldSource)
+		if (currentSource != oldSource && !pendingWalRcvRestart)
 			elog(DEBUG2, "switched WAL source from %s to %s after %s",
  xlogSourceNames[oldSource], xlogSourceNames[currentSource],
  lastSourceFailed ? "failure" : "success");
 
 		/*
-		 * We've now handled possible failure. Try to read from the chosen
-		 * source.
+		 * We've now handled possible failure and configuration change. Try to
+		 * read from the chosen source.
 		 */
 		lastSourceFailed = false;
+		pendingWalRcvRestart = false;
 
 		switch (currentSource)
 		{
@@ -12096,6 +12116,45 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	return false;/* not reached */
 }
 
+/*
+ * Re-read config file and plan to

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2019-09-19 Thread Alexey Kondratov


On 19.09.2019 16:21, Robert Haas wrote:

On Thu, Sep 19, 2019 at 12:43 AM Michael Paquier  wrote:

It seems to me that it would be good to keep the patch as simple as
possible for its first version, and split it into two if you would
like to add this new option instead of bundling both together.  This
makes the review of one and the other more simple.  Anyway, regarding
the grammar, is SET TABLESPACE really our best choice here?  What
about:
- TABLESPACE = foo, in parenthesis only?
- Only using TABLESPACE, without SET at the end of the query?

SET is used in ALTER TABLE per the set of subqueries available there,
but that's not the case of REINDEX.

So, earlier in this thread, I suggested making this part of ALTER
TABLE, and several people seemed to like that idea. Did we have a
reason for dropping that approach?


If we add this option to REINDEX, then for 'ALTER TABLE tb_name action1, 
REINDEX SET TABLESPACE tbsp_name, action3' action2 will be just a direct 
alias to 'REINDEX TABLE tb_name SET TABLESPACE tbsp_name'. So it seems 
practical to do this for REINDEX first.


The only one concern I have against adding REINDEX to ALTER TABLE in 
this context is that it will allow user to write such a chimera:


ALTER TABLE tb_name REINDEX SET TABLESPACE tbsp_name, SET TABLESPACE 
tbsp_name;


when they want to move both table and all the indexes. Because simple

ALTER TABLE tb_name REINDEX, SET TABLESPACE tbsp_name;

looks ambiguous. Should it change tablespace of table, indexes or both?


--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2019-09-19 Thread Robert Haas

On Thu, Sep 19, 2019 at 12:43 AM Michael Paquier  wrote:
> It seems to me that it would be good to keep the patch as simple as
> possible for its first version, and split it into two if you would
> like to add this new option instead of bundling both together.  This
> makes the review of one and the other more simple.  Anyway, regarding
> the grammar, is SET TABLESPACE really our best choice here?  What
> about:
> - TABLESPACE = foo, in parenthesis only?
> - Only using TABLESPACE, without SET at the end of the query?
>
> SET is used in ALTER TABLE per the set of subqueries available there,
> but that's not the case of REINDEX.

So, earlier in this thread, I suggested making this part of ALTER
TABLE, and several people seemed to like that idea. Did we have a
reason for dropping that approach?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-09-19 Thread Paul Guo

I've updated the patch series following the suggestions. Thanks.


>
>


v6-0001-Extact-common-functions-from-pg_basebackup-into-s.patch
Description: Binary data


v6-0003-Ensure-target-clean-shutdown-at-the-beginning-of-.patch
Description: Binary data


v6-0002-Add-option-to-write-recovery-configuration-inform.patch
Description: Binary data

Re: FETCH FIRST clause PERCENT option

2019-09-19 Thread Surafel Temesgen

Hi Tom,
In the attached patch i include the comments given

regards
Surafel
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 82d8140ba2..692d6492bd 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3035,7 +3035,8 @@ estimate_path_cost_size(PlannerInfo *root,
 		if (fpextra && fpextra->has_limit)
 		{
 			adjust_limit_rows_costs(, _cost, _cost,
-	fpextra->offset_est, fpextra->count_est);
+	fpextra->offset_est, fpextra->count_est,
+	LIMIT_OPTION_COUNT);
 			retrieved_rows = rows;
 		}
 	}
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 06d611b64c..4ef42df51d 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] [ PERCENT ] { ROW | ROWS } ONLY ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1430,7 +1430,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] [ PERCENT ] { ROW | ROWS } ONLY
 
 In this syntax, the start
 or count value is required by
@@ -1440,8 +1440,10 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
-and ROWS as well as FIRST
+With PERCENT count specifies the maximum number of rows to return 
+in percentage round up to the nearest integer. Using PERCENT
+is best suited to returning single-digit percentages of the query's total row count.
+ROW and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
 According to the standard, the OFFSET clause must come
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 059ec02cd0..1adc312f7a 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -348,7 +348,7 @@ F862	 in subqueries			YES
 F863	Nested  in 			YES	
 F864	Top-level  in views			YES	
 F865	 in 			YES	
-F866	FETCH FIRST clause: PERCENT option			NO	
+F866	FETCH FIRST clause: PERCENT option			YES	
 F867	FETCH FIRST clause: WITH TIES option			NO	
 R010	Row pattern recognition: FROM clause			NO	
 R020	Row pattern recognition: WINDOW clause			NO	
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index baa669abe8..c03bb2c971 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -21,6 +21,8 @@
 
 #include "postgres.h"
 
+#include 
+
 #include "executor/executor.h"
 #include "executor/nodeLimit.h"
 #include "miscadmin.h"
@@ -52,6 +54,7 @@ ExecLimit(PlanState *pstate)
 	 */
 	direction = node->ps.state->es_direction;
 	outerPlan = outerPlanState(node);
+	slot = node->subSlot;
 
 	/*
 	 * The main logic is a simple state machine.
@@ -81,7 +84,15 @@ ExecLimit(PlanState *pstate)
 			/*
 			 * Check for empty window; if so, treat like empty subplan.
 			 */
-			if (node->count <= 0 && !node->noCount)
+			if (node->limitOption == LIMIT_OPTION_PERCENT)
+			{
+if (node->percent == 0.0)
+{
+	node->lstate = LIMIT_EMPTY;
+	return NULL;
+}
+			}
+			else if (node->count <= 0 && !node->noCount)
 			{
 node->lstate = LIMIT_EMPTY;
 return NULL;
@@ -107,6 +118,16 @@ ExecLimit(PlanState *pstate)
 	break;
 			}
 
+			/*
+			 * We may needed this tuple in subsequent scan so put it into
+			 * tuplestore.
+			 */
+			if (node->limitOption == LIMIT_OPTION_PERCENT)
+			{
+tuplestore_puttupleslot(node->tupleStore, slot);
+tuplestore_advance(node->tupleStore, true);
+			}
+
 			/*
 			 * Okay, we have the first tuple of the window.
 			 */
@@ -124,6 +145,82 @@ ExecLimit(PlanState *pstate)
 		case LIMIT_INWINDOW:
 			if (ScanDirectionIsForward(direction))
 			{
+/*
+ * In case of coming back from backward scan the tuple is
+ * already in tuple store.
+ */
+if (node->limitOption == LIMIT_OPTION_PERCENT && node->backwardPosition > 0)
+{
+	if (tuplestore_gettupleslot_heaptuple(node->tupleStore, true, true, slot))
+	{
+		node->subSlot = slot;
+		node->position++;
+		node->backwardPosition--;
+		return slot;
+	}
+	else
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+}
+
+/*
+ * In LIMIT_OPTION_PERCENT case no need of executing outerPlan multiple
+ * times.
+ */
+if (node->limitOption == LIMIT_OPTION_PERCENT && node->reachEnd)
+{
+	node->lstate = LIMIT_WINDOWEND;
+
+

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2019-09-19 Thread Asim R P

On Thu, Aug 22, 2019 at 6:44 PM Paul Guo  wrote:
>
> Thanks. I updated the patch to v5. It passes install-check testing and
recovery testing.
>

This patch contains one more bug, in addition to what Anastasia has found.
If the test case in the patch is tweaked slightly, as follows, the standby
crashes due to PANIC.

--- a/src/test/recovery/t/011_crash_recovery.pl
+++ b/src/test/recovery/t/011_crash_recovery.pl
@@ -147,8 +147,6 @@ $node_standby->start;
 $node_master->poll_query_until(
'postgres', 'SELECT count(*) = 1 FROM pg_stat_replication');

-$node_master->safe_psql('postgres', "CREATE DATABASE db1 TABLESPACE ts1");
-
 # Make sure to perform restartpoint after tablespace creation
 $node_master->wait_for_catchup($node_standby, 'replay',

 $node_master->lsn('replay'));
@@ -156,7 +154,8 @@ $node_standby->safe_psql('postgres', 'CHECKPOINT');

 # Do immediate shutdown ...
 $node_master->safe_psql('postgres',
-   q[ALTER DATABASE db1 SET
TABLESPACE ts2;
+   q[CREATE DATABASE db1
TABLESPACE ts1;
+ ALTER DATABASE db1 SET
TABLESPACE ts2;
  DROP TABLESPACE ts1;]);
 $node_master->wait_for_catchup($node_standby, 'replay',

 $node_master->lsn('replay'));

Notice the create additional create database in the above change.  That
causes the same tablespace directory (ts1) logged twice in the list of
missing directories.  At the end of crash recovery, there is one unmatched
entry in the missing dirs list and the standby PANICs.

Please find attached a couple of tests that are built on top of what was
already written by Paul, Kyotaro.  The patch includes a test to demonstrate
the above mentioned failure and a test case that my friend Alexandra wrote
to implement the archive recovery scenario noted by Anastasia.

In order to fix the test failures, we need to distinguish between a missing
database directory and a missing tablespace directory.  And also add logic
to forget missing directories during tablespace drop.  I am working on it.

Asim

0001-Tests-for-replay-of-create-database-operation-on-sta.patch
Description: Binary data

Re: dropdb --force

2019-09-19 Thread vignesh C

On Thu, Sep 19, 2019 at 12:14 AM Pavel Stehule  wrote:
>
> Hi
>
> I am sending updated version - the changes against last patch are two. I use 
> pg_terminate_backed for killing other terminates like Tom proposed. I am not 
> sure if it is 100% practical - on second hand the necessary right to kill 
> other sessions is  almost available - and consistency with 
> pg_terminal_backend has sense. More - native implementation is significantly 
> better then solution implemented outer. I fixed docs little bit - the timeout 
> for logout (as reaction on SIGTERM is only 5 sec).
>
> Regards
>
> Pavel
>
>
I observed one thing with the latest patch:
There is a possibility that in case of drop database failure some
sessions may be terminated.

It can happen in the following scenario:
1) create database test; /* super user creates test database */
2) create user test password 'test@123'; /* super user creates test user */
3) alter user test nosuperuser; /* super user changes test use to non
super user */
4) alter database test owner to test; /* super user set test as test
database owner */

5) psql -d test /* connect to test database as super user - Session 1 */
6) psql -d postgres -U test /* connect to test database as test user -
Session 2 */
7) psql -d postgres -U test /* connect to test database as test user -
Session 3 */

8) drop database (force) test; /* Executed from Session 3 */

Drop database fails in Session 3 with:
ERROR:  must be a superuser to terminate superuser process

Session 2 is terminated, Session 1 is left as it is.

Is the above behaviour ok to terminate session 2 in case of drop
database failure?
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: subscriptionCheck failures on nightjar

2019-09-19 Thread Kuntal Ghosh

Hello hackers,

It seems there is a pattern how the error is occurring in different
systems. Following are the relevant log snippets:

nightjar:
sub3 LOG:  received replication command: CREATE_REPLICATION_SLOT
"sub3_16414_sync_16394" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
sub3 LOG:  logical decoding found consistent point at 0/160B578
sub1 PANIC:  could not open file
"pg_logical/snapshots/0-160B578.snap": No such file or directory

dromedary scenario 1:
sub3_16414_sync_16399 LOG:  received replication command:
CREATE_REPLICATION_SLOT "sub3_16414_sync_16399" TEMPORARY LOGICAL
pgoutput USE_SNAPSHOT
sub3_16414_sync_16399 LOG:  logical decoding found consistent point at 0/15EA694
sub2 PANIC:  could not open file
"pg_logical/snapshots/0-15EA694.snap": No such file or directory


dromedary scenario 2:
sub3_16414_sync_16399 LOG:  received replication command:
CREATE_REPLICATION_SLOT "sub3_16414_sync_16399" TEMPORARY LOGICAL
pgoutput USE_SNAPSHOT
sub3_16414_sync_16399 LOG:  logical decoding found consistent point at 0/15EA694
sub1 PANIC:  could not open file
"pg_logical/snapshots/0-15EA694.snap": No such file or directory

While subscription 3 is created, it eventually reaches to a consistent
snapshot point and prints the WAL location corresponding to it. It
seems sub1/sub2 immediately fails to serialize the snapshot to the
.snap file having the same WAL location.

Is this helpful?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2019-09-19 Thread Alexey Kondratov


Hi Michael,

Thank you for your comments.

On 19.09.2019 7:43, Michael Paquier wrote:

On Wed, Sep 18, 2019 at 03:46:20PM +0300, Alexey Kondratov wrote:

Currently in Postgres SET TABLESPACE always comes with [ NOWAIT ] option, so
I hope it worth adding this option here for convenience. Added in the new
version.

It seems to me that it would be good to keep the patch as simple as
possible for its first version, and split it into two if you would
like to add this new option instead of bundling both together.  This
makes the review of one and the other more simple.


OK, it makes sense. I would also prefer first patch as simple as 
possible, but adding this NOWAIT option required only a few dozens of 
lines, so I just bundled everything together. Anyway, I will split 
patches if we decide to keep [ SET TABLESPACE ... [NOWAIT] ] grammar.



Anyway, regarding
the grammar, is SET TABLESPACE really our best choice here?  What
about:
- TABLESPACE = foo, in parenthesis only?
- Only using TABLESPACE, without SET at the end of the query?

SET is used in ALTER TABLE per the set of subqueries available there,
but that's not the case of REINDEX.


I like SET TABLESPACE grammar, because it already exists and used both 
in ALTER TABLE and ALTER INDEX. Thus, if we once add 'ALTER INDEX 
index_name REINDEX SET TABLESPACE' (as was proposed earlier in the 
thread), then it will be consistent with 'REINDEX index_name SET 
TABLESPACE'. If we use just plain TABLESPACE, then it may be misleading 
in the following cases:


- REINDEX TABLE table_name TABLESPACE tablespace_name
- REINDEX (TABLESPACE = tablespace_name) TABLE table_name

since it may mean 'Reindex all indexes of table_name, that stored in the 
tablespace_name', doesn't it?


However, I have rather limited experience with Postgres, so I doesn't 
insist.



+-- check that all relations moved to new tablespace
+SELECT relname FROM pg_class
+WHERE reltablespace=(SELECT oid FROM pg_tablespace WHERE
spcname='regress_tblspace')
+AND relname IN ('regress_tblspace_test_tbl_idx');
+relname
+---
+ regress_tblspace_test_tbl_idx
+(1 row)
Just to check one relation you could use \d with the relation (index
or table) name.


Yes, \d outputs tablespace name if it differs from pg_default, but it 
shows other information in addition, which is not necessary here. Also 
its output has more chances to be changed later, which may lead to the 
failed tests. This query output is more or less stable and new relations 
may be easily added to tests if we once add tablespace change to 
CLUSTER/VACUUM FULL. I can change test to use \d, but not sure that it 
would reduce test output length or will be helpful for a future tests 
support.



-   if (RELATION_IS_OTHER_TEMP(iRel))
-   ereport(ERROR,
-   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-errmsg("cannot reindex temporary tables of other
-   sessions")))
I would keep the order of this operation in order with
CheckTableNotInUse().


Sure, I haven't noticed that reordered these operations, thanks.


--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Re: Zedstore - compressed in-core columnar storage

2019-09-19 Thread Ashutosh Sharma

On Thu, Sep 19, 2019 at 11:35 AM Ashutosh Sharma  wrote:
>
> On Thu, Sep 19, 2019 at 8:10 AM Alexandra Wang  wrote:
> >
> > On Tue, Sep 17, 2019 at 4:15 AM Ashutosh Sharma  
> > wrote:
> >>
> >> create table t1(a int, b int) using zedstore;
> >> insert into t1 select i, i+10 from generate_series(1, 100) i;
> >> postgres=# update t1 set b = 200;
> >> server closed the connection unexpectedly
> >> This probably means the server terminated abnormally
> >> before or while processing the request.
> >> The connection to the server was lost. Attempting reset: Failed.
> >>
> >> Above update statement crashed due to some extensive memory leak.
> >
> >
> > Thank you for reporting! We have located the memory leak and also
> > noticed some other memory related bugs. We are working on the fixes
> > please stay tuned!
> >
>
> Cool. As I suspected earlier, it's basically "ZedstoreAMTupleBuffers"
> context that is completely exhausting the memory and it is being used
> to spool the tuples.
>

Some more updates on top of this:

When doing update operation, for each tuple being modified,
*tuplebuffers_insert()* says that there is no entry for the relation
being modified in the hash table although it was already added when
the first tuple in the table was updated. Why is it so? I mean if I
have added an entry in the hash table *tuplebuffers* for let's say
table t1 then should the subsequent call to tuplebuffers_insert() say
that there is no entry for table t1 in the *tuplebuffers*. Shouldn't
that only happen once you have flushed all the tuples in the
tupbuffer->attbuffers. Because of this reason, for each tuple,
tupbuffer->attbuffers is allocated resulting into a lot of memory
consumption. OTOH if the insert is performed on the same table only
for the first tuple tuplebuffers_insert() says that is no entry for
the the table t1 in hash but from the second time onwards that doesn;t
happen. I think because of this reason the memory leak is happening in
case of update operation. Please let me know if I'm missing something
here just because I didn't get chance to spent
much time on this. Thank you.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Re: [PATCH] src/test/modules/dummy_index -- way to test reloptions from inside of access method

2019-09-19 Thread Nikolay Shaplov

В письме от четверг, 19 сентября 2019 г. 17:32:03 MSK пользователь Michael 
Paquier написал:

> > src/test/modules/dummy_index_am directory check" I see a similar
> > failure with "ERROR:  unrecognized lock mode: 2139062143".  I changed
> 
> > that to a PANIC and got a core file containing this stack:
> A simple make check on the module reproduces the failure, so that's
> hard to miss.
For some reason it does not reproduce on my dev environment, but it not really 
important, since the core of the problem is found.
> 
> From what I can see it is not a problem caused directly by this
> module, specifically knowing that this test module is actually copying
> what bloom is doing when declaring its reloptions.  Take this example:
> CREATE EXTENSION bloom;
> CREATE TABLE tbloom AS
> SELECT
>   (random() * 100)::int as i1,
>   (random() * 100)::int as i2,
>   (random() * 100)::int as i3,
>   (random() * 100)::int as i4,
>   (random() * 100)::int as i5,
>   (random() * 100)::int as i6
> FROM
>generate_series(1,100);
> CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
>WITH (length=80, col1=2, col2=2, col3=4);
> ALTER INDEX bloomidx SET (length=100);
> 
> And then you get that:
> ERROR:  XX000: unrecognized lock mode: 2139062143
> LOCATION:  LockAcquireExtended, lock.c:756
> 
> So the options are registered in the relOpts array managed by
> reloptions.c but the data is not properly initialized.  Hence when
> looking at the lock needed we have an option match, but the lock
> number is incorrect, causing the failure.  It looks like there is no
> direct way to enforce the lockmode used for a reloption added via
> add_int_reloption which does the allocation to add the option to
> add_reloption(), but enforcing the value to be initialized fixes the
> issue:
> --- a/src/backend/access/common/reloptions.c
> +++ b/src/backend/access/common/reloptions.c
> @@ -658,6 +658,7 @@ allocate_reloption(bits32 kinds, int type, const
> char *name, const char *desc)
> newoption->kinds = kinds;
> newoption->namelen = strlen(name);
> newoption->type = type;
> +   newoption->lockmode = AccessExclusiveLock;
> MemoryContextSwitchTo(oldcxt);

What a good catch! dummy_index already proved to be useful ;-)


> I would think that initializing that to a sane default is something
> that we should do anyway, or is there some trick allowing the
> manipulation of relOpts I am missing? 

Yes I think AccessExclusiveLock is quite good default I think. Especially in 
the case when these options are not really used in real world ;-)

> Changing the relopts APIs in
> back-branches is a no-go of course, but we could improve that for
> 13~.

As you know I have plans for rewriting options engine and there would be same 
options code both for core Access Methods and for options for AM from 
extensions. So there would be API for setting lockmode...
But the way it is going right now, I am not sure it will reviewed to reach 
13...


PS. Michael, who will submit this lock mode patch? Hope you will do it? It 
should go separately from dummy_index for sure...

Re: pgbench - allow to create partitioned tables

2019-09-19 Thread Dilip Kumar

On Thu, Sep 19, 2019 at 11:47 AM Amit Kapila  wrote:
>
> On Thu, Sep 19, 2019 at 10:25 AM Fabien COELHO  wrote:
> > Hello Amit,
> >
> > >> Yes, on -i it will fail because the syntax will not be recognized.
> > >
> > > Maybe we should be checking the server version, which would allow to
> > > produce more useful error messages when these options are used against
> > > older servers, like
> > >
> > > if (sversion < 1)
> > >fprintf(stderr, "cannot use --partitions/--partitions-method
> > > against servers older than 10");
> > >
> > > We would also have to check that partition-method=hash is not used 
> > > against v10.
> > >
> > > Maybe overkill?
> >
> > Yes, I think so: the error detection and messages would be more or less
> > replicated from the server and would vary from version to version.
> >
>
> Yeah, but I think Amit L's point is worth considering.  I think it
> would be good if a few other people can also share their suggestion on
> this point.  Alvaro, Dilip, anybody else following this thread, would
> like to comment?   It is important to know others opinion on this
> because this will change how pgbench behaves with prior versions.

IMHO, we don't need to invent the error handling at the pgbench
instead we can rely on the server's error.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Option to dump foreign data in pg_dump

2019-09-19 Thread vignesh C

On Thu, Sep 19, 2019 at 3:08 PM vignesh C  wrote:
>
> On Mon, Jul 15, 2019 at 6:09 PM Luis Carril  wrote:
> >
> > On 15.07.19 12:06, Daniel Gustafsson wrote:
> >
> Few comments:
>
> As you have specified required_argument in above:
> + {"include-foreign-data", required_argument, NULL, 11},
>
> The below check may not be required:
> + if (strcmp(optarg, "") == 0)
> + {
> + pg_log_error("empty string is not a valid pattern in 
> --include-foreign-data");
> + exit_nicely(1);
> + }
>
> + if (foreign_servers_include_patterns.head != NULL)
> + {
> + expand_foreign_server_name_patterns(fout, _servers_include_patterns,
> + _servers_include_oids);
> + if (foreign_servers_include_oids.head == NULL)
> + fatal("no matching foreign servers were found");
> + }
> +
>
> The above check if (foreign_servers_include_oids.head == NULL) may not
> be required, as there is a check present inside
> expand_foreign_server_name_patterns to handle this error:
> +
> + res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
> + if (PQntuples(res) == 0)
> + fatal("no matching foreign servers were found for pattern \"%s\"", 
> cell->val);
> +
>
> +static void
> +expand_foreign_server_name_patterns(Archive *fout,
> + SimpleStringList *patterns,
> + SimpleOidList *oids)
> +{
> + PQExpBuffer query;
> + PGresult   *res;
> + SimpleStringListCell *cell;
> + int i;
> +
> + if (patterns->head == NULL)
> + return; /* nothing to do */
> +
>
> The above check for patterns->head may not be required as similar
> check exists before this function is called:
> + if (foreign_servers_include_patterns.head != NULL)
> + {
> + expand_foreign_server_name_patterns(fout, _servers_include_patterns,
> + _servers_include_oids);
> + if (foreign_servers_include_oids.head == NULL)
> + fatal("no matching foreign servers were found");
> + }
> +
>
> + /* Skip FOREIGN TABLEs (no data to dump) if not requested explicitly */
> + if (tbinfo->relkind == RELKIND_FOREIGN_TABLE &&
> + (foreign_servers_include_oids.head == NULL ||
> + !simple_oid_list_member(_servers_include_oids,
> tbinfo->foreign_server_oid)))
> simple_oid_list_member can be split into two lines
>
Also can we include few tests for this feature.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: Option to dump foreign data in pg_dump

2019-09-19 Thread vignesh C

On Mon, Jul 15, 2019 at 6:09 PM Luis Carril  wrote:
>
> On 15.07.19 12:06, Daniel Gustafsson wrote:
>
Few comments:

As you have specified required_argument in above:
+ {"include-foreign-data", required_argument, NULL, 11},

The below check may not be required:
+ if (strcmp(optarg, "") == 0)
+ {
+ pg_log_error("empty string is not a valid pattern in --include-foreign-data");
+ exit_nicely(1);
+ }

+ if (foreign_servers_include_patterns.head != NULL)
+ {
+ expand_foreign_server_name_patterns(fout, _servers_include_patterns,
+ _servers_include_oids);
+ if (foreign_servers_include_oids.head == NULL)
+ fatal("no matching foreign servers were found");
+ }
+

The above check if (foreign_servers_include_oids.head == NULL) may not
be required, as there is a check present inside
expand_foreign_server_name_patterns to handle this error:
+
+ res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ if (PQntuples(res) == 0)
+ fatal("no matching foreign servers were found for pattern \"%s\"", cell->val);
+

+static void
+expand_foreign_server_name_patterns(Archive *fout,
+ SimpleStringList *patterns,
+ SimpleOidList *oids)
+{
+ PQExpBuffer query;
+ PGresult   *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+

The above check for patterns->head may not be required as similar
check exists before this function is called:
+ if (foreign_servers_include_patterns.head != NULL)
+ {
+ expand_foreign_server_name_patterns(fout, _servers_include_patterns,
+ _servers_include_oids);
+ if (foreign_servers_include_oids.head == NULL)
+ fatal("no matching foreign servers were found");
+ }
+

+ /* Skip FOREIGN TABLEs (no data to dump) if not requested explicitly */
+ if (tbinfo->relkind == RELKIND_FOREIGN_TABLE &&
+ (foreign_servers_include_oids.head == NULL ||
+ !simple_oid_list_member(_servers_include_oids,
tbinfo->foreign_server_oid)))
simple_oid_list_member can be split into two lines

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: [DOC] Document auto vacuum interruption

2019-09-19 Thread Amit Kapila

On Wed, Sep 18, 2019 at 10:25 AM Amit Kapila  wrote:
>
> On Tue, Sep 17, 2019 at 5:48 PM James Coleman  wrote:
> >
> > On Tue, Sep 17, 2019 at 2:21 AM Amit Kapila  wrote:
> > >
> > >
> > > Let me know what you think of attached?  I think we can back-patch
> > > this patch.  What do you think?  Does anyone else have an opinion on
> > > this patch especially if we see any problem in back-patching this?
> >
> > The attached looks great!
> >
> > I was working on HEAD for the patch, but this concern has been an
> > issue for quite a long time. We were running into it on 9.6 in
> > production, for example. And given how frequently it seems like there
> > are large-scale production issues related to auto vacuum, I think any
> > amount of back patching we can do to make that footgun less likely
> > would be a good thing.
> >
>
> Okay, I will commit this tomorrow unless someone has any comments or 
> objections.
>

Pushed with minor changes.  There was one extra space in a few lines
and the tag for back-branches (from 10~9.4) was slightly different.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: one line doc patch for v12

2019-09-19 Thread Amit Kapila

On Thu, Sep 19, 2019 at 2:15 PM Amit Kapila  wrote:
>
> On Thu, Sep 19, 2019 at 1:40 PM Filip Rembiałkowski
>  wrote:
> >
> > There is a small but eye catching glitch in the v12 (and master) docs
> > for "CREATE TABLE AS".
> >
> > https://www.postgresql.org/docs/12/sql-createtableas.html
> >
> > index b5c4ce6959..56d06838f1 100644
> > --- a/doc/src/sgml/ref/create_table_as.sgml
> > +++ b/doc/src/sgml/ref/create_table_as.sgml
> > @@ -146,7 +146,6 @@
> >clause for a table can also include OIDS=FALSE to
> >specify that rows of the new table should contain no OIDs (object
> >identifiers), OIDS=TRUE is not supported anymore.
> > -  OIDs.
> >   
> >  
> > 
> >
>
> Looks good to me, will take care of pushing this change in some time
> unless someone else takes care of it before me.
>

Pushed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: allow online change primary_conninfo

2019-09-19 Thread Kyotaro Horiguchi

Hello. I looked this and have some comments.

At Wed, 28 Aug 2019 12:49:46 +0300, Sergei Kornilov  wrote in 
<34084371566985...@sas1-0a6c2e2b59d7.qloud-c.yandex.net>
sk> Hello
sk> 
sk> Updated patch attached. (also I merged into one file)
sk> 
sk> > + 
sk> > + WAL receiver will be restarted after 
primary_slot_name
sk> > + was changed.
sk> >   
sk> > The sentence sounds strange. Here is a suggestion:
sk> > The WAL receiver is restarted after an update of primary_slot_name (or
sk> > primary_conninfo).
sk> 
sk> Changed.

-  This parameter can only be set at server start.
+  This parameter can only be set in the 
postgresql.conf
+  file or on the server command line.

I'm not sure it's good to change the context of this
description. This was mentioning that changing of this parameter
requires server (re)start. So if we want to be on the same
context after rewriting, it would be like "This parameter can be
set any time and causes WAL receiver restart with the new setting
if the server is in standby mode."

And If I'm not missing something, I don't find an (explict)
paramter of postmaster for setting primary_conninfo.



sk> > The comment at the top of the call of ProcessStartupSigHup() in
sk> > HandleStartupProcInterrupts() needs to be updated as it mentions a
sk> > configuration file re-read, but that's not the case anymore.
sk> 
sk> Changed to "Process any requests or signals received recently." like in 
other places, e.g. syslogger
sk>
sk> > pendingRestartSource's name does not represent what it does, as it is
sk> > only associated with the restart of a WAL receiver when in streaming
sk> > state, and that's a no-op for the archive mode and the local mode.
sk> 
sk> I renamed to pendingWalRcvRestart and replaced switch with simple condition.
sk> 
sk> > So when shutting down the WAL receiver after a parameter change, what
sk> > happens is that the startup process waits for retrieve_retry_interval
sk> > before moving back to the archive mode. Only after scanning again the
sk> > archives do we restart a new WAL receiver.
sk> 
sk> As I answered earlier, here is no switch to archive or 
wal_retrieve_retry_interval waiting in my patch. I recheck on current revision 
too:

@@ -11787,48 +11793,6 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool 
randAccess,
if (!StandbyMode)
return false;
 
-/*
- * If primary_conninfo is set, launch walreceiver to try
- * to stream the missing WAL.
- *
- * If fetching_ckpt is true, RecPtr points to the initial
- * checkpoint location. In that case, we use RedoStartLSN

Mentioning code, the movement of the code block makes the
surrounding swtich almost useless and the structure and the
result looks somewhat strange.

Couldn't we do the same thing by just skipping the wait and
setting lastSourceFailed to true in the case of intentional
walreceiver restart?

The attached is an incomplete (fraction) patch to sketch what is
in my mind. With something like this and making
ProcessStartupSigHup kill walreceiver would work.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6876537b62..61b235f8f7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11881,10 +11881,12 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	 * obtaining the requested WAL. We're going to loop back
 	 * and retry from the archive, but if it hasn't been long
 	 * since last attempt, sleep wal_retrieve_retry_interval
-	 * milliseconds to avoid busy-waiting.
+	 * milliseconds to avoid busy-waiting. We don't wait if
+	 * explicitly requested to restart.
 	 */
 	now = GetCurrentTimestamp();
-	if (!TimestampDifferenceExceeds(last_fail_time, now,
+	if (!pendingWalRcvRestart &&
+		!TimestampDifferenceExceeds(last_fail_time, now,
 	wal_retrieve_retry_interval))
 	{
 		long		secs,
@@ -11905,6 +11907,17 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	}
 	last_fail_time = now;
 	currentSource = XLOG_FROM_ARCHIVE;
+
+	/*
+	 * If wal receiver is requested to restart, we skip the
+	 * next XLOG_FROM_ARCHIVE to immediately starting it.
+	 */
+	if (pendingWalRcvRestart)
+	{
+		lastSourceFailed = true;
+		continue;
+	}
+
 	break;
 
 default:

Re: one line doc patch for v12

2019-09-19 Thread Amit Kapila

On Thu, Sep 19, 2019 at 1:40 PM Filip Rembiałkowski
 wrote:
>
> There is a small but eye catching glitch in the v12 (and master) docs
> for "CREATE TABLE AS".
>
> https://www.postgresql.org/docs/12/sql-createtableas.html
>
> index b5c4ce6959..56d06838f1 100644
> --- a/doc/src/sgml/ref/create_table_as.sgml
> +++ b/doc/src/sgml/ref/create_table_as.sgml
> @@ -146,7 +146,6 @@
>clause for a table can also include OIDS=FALSE to
>specify that rows of the new table should contain no OIDs (object
>identifiers), OIDS=TRUE is not supported anymore.
> -  OIDs.
>   
>  
> 
>

Looks good to me, will take care of pushing this change in some time
unless someone else takes care of it before me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: pgbench - allow to create partitioned tables

2019-09-19 Thread Amit Langote

Hi Fabien,

On Thu, Sep 19, 2019 at 1:55 PM Fabien COELHO  wrote:
> Hello Amit,
>
> >> Yes, on -i it will fail because the syntax will not be recognized.
> >
> > Maybe we should be checking the server version, which would allow to
> > produce more useful error messages when these options are used against
> > older servers, like
> >
> > if (sversion < 1)
> >fprintf(stderr, "cannot use --partitions/--partitions-method
> > against servers older than 10");
> >
> > We would also have to check that partition-method=hash is not used against 
> > v10.
> >
> > Maybe overkill?
>
> Yes, I think so: the error detection and messages would be more or less
> replicated from the server and would vary from version to version.
>
> I do not think that it is worth going this path because the use case is
> virtually void as people in 99.9% of cases would use a pgbench matching
> the server version. For those who do not, the error message should be
> clear enough to let them guess what the issue is. Also, it would be
> untestable.

Okay, I can understand the desire to not add code for rarely occurring
situations where the server's error is a good enough clue.

> One thing we could eventually do is just to check pgbench version against
> the server version like psql does and output a generic warning if they
> differ, but franckly I do not think it is worth the effort: ISTM that
> nobody ever complained about such issues.

Agree.

Thanks,
Amit

Re: [PATCH] src/test/modules/dummy_index -- way to test reloptions from inside of access method

2019-09-19 Thread Michael Paquier

On Thu, Sep 19, 2019 at 10:51:09AM +1200, Thomas Munro wrote:
> Let's see if I can see this on my Mac... yep, with "make -C
> src/test/modules/dummy_index_am directory check" I see a similar
> failure with "ERROR:  unrecognized lock mode: 2139062143".  I changed
> that to a PANIC and got a core file containing this stack:

A simple make check on the module reproduces the failure, so that's
hard to miss.

From what I can see it is not a problem caused directly by this
module, specifically knowing that this test module is actually copying
what bloom is doing when declaring its reloptions.  Take this example:
CREATE EXTENSION bloom;
CREATE TABLE tbloom AS
SELECT
  (random() * 100)::int as i1,
  (random() * 100)::int as i2,
  (random() * 100)::int as i3,
  (random() * 100)::int as i4,
  (random() * 100)::int as i5,
  (random() * 100)::int as i6
FROM
   generate_series(1,100);
CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
   WITH (length=80, col1=2, col2=2, col3=4);
ALTER INDEX bloomidx SET (length=100);

And then you get that:
ERROR:  XX000: unrecognized lock mode: 2139062143
LOCATION:  LockAcquireExtended, lock.c:756

So the options are registered in the relOpts array managed by
reloptions.c but the data is not properly initialized.  Hence when
looking at the lock needed we have an option match, but the lock
number is incorrect, causing the failure.  It looks like there is no
direct way to enforce the lockmode used for a reloption added via
add_int_reloption which does the allocation to add the option to
add_reloption(), but enforcing the value to be initialized fixes the
issue:
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -658,6 +658,7 @@ allocate_reloption(bits32 kinds, int type, const
char *name, const char *desc)
newoption->kinds = kinds;
newoption->namelen = strlen(name);
newoption->type = type;
+   newoption->lockmode = AccessExclusiveLock;
MemoryContextSwitchTo(oldcxt);

I would think that initializing that to a sane default is something
that we should do anyway, or is there some trick allowing the
manipulation of relOpts I am missing?  Changing the relopts APIs in
back-branches is a no-go of course, but we could improve that for
13~.

While reading through the code, I found some extra issues...  Here are
some comments about them.

+++ b/src/test/modules/dummy_index_am/Makefile
@@ -0,0 +1,21 @@
+# contrib/bloom/Makefile
Incorrect copy-paste here.

+extern IndexBulkDeleteResult *dibulkdelete(IndexVacuumInfo *info,
+IndexBulkDeleteResult *stats, IndexBulkDeleteCallback callback,
+void *callback_state);
All the routines defining the index AM can just be static, so there is
no point to complicate dummy_index.h with most of its contents.

Some routines are missing a (void) in their declaration when the
routines have no arguments.  This can cause warnings.

No sure I see the point of the various GUC with the use of WARNING
messages to check the sanity of the parameters.  I find that awkward.
--
Michael

signature.asc
Description: PGP signature

one line doc patch for v12

2019-09-19 Thread Filip Rembiałkowski

There is a small but eye catching glitch in the v12 (and master) docs
for "CREATE TABLE AS".

https://www.postgresql.org/docs/12/sql-createtableas.html

index b5c4ce6959..56d06838f1 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -146,7 +146,6 @@
   clause for a table can also include OIDS=FALSE to
   specify that rows of the new table should contain no OIDs (object
   identifiers), OIDS=TRUE is not supported anymore.
-  OIDs.
  
 



Sincerely,
Filip

Re: allow online change primary_conninfo

2019-09-19 Thread Sergei Kornilov

Hello

Thank you for review!

> ISTM that you need to update the above parts in postgresql.conf.sample.

Good catch, I forgot about conf sample.

> ISTM that you need to update the above comment in walreceiver.c.

Changed

> If primary_conninfo is set to an empty string, walreceiver just shuts down,
> not restarts. The above description in the doc is not always true.
> So I'm thinking something like the following description is better.
> Thought?
>
> If primary_conninfo is changed while WAL receiver is running,
> the WAL receiver shuts down and then restarts with new setting,
> except when primary_conninfo is an empty string.

Ok, changed.
I leave primary_slot_name description as before.

> There is the case where walreceiver doesn't shut down immediately
> after the change of primary_conninfo. If the change happens while
> the startup process in paused state (e.g., by pg_wal_replay_pause(),
> recovery_min_apply_delay, recovery conflict, etc), the startup
> process tries to terminate walreceiver after it gets out of such state.
> Shouldn't this fact be documented as a note?

Hmm. Is somewhere documented that walreceiver will receive WAL while the 
startup process in paused state?
(didn't add such note in current version)

regards, Sergeidiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6612f95f9f..1fa48058e8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3929,9 +3929,15 @@ ANY num_sync ( 
@@ -3946,9 +3952,13 @@ ANY num_sync ( ).
-  This parameter can only be set at server start.
+  This parameter can only be set in the postgresql.conf
+  file or on the server command line.
   This setting has no effect if primary_conninfo is not
-  set.
+  set or the server is not in standby mode.
+ 
+ 
+  The WAL receiver is restarted after an update of primary_slot_name.
  
 

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b7ff004234..743cae079e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -803,6 +803,12 @@ static XLogSource readSource = 0;	/* XLOG_FROM_* code */
 static XLogSource currentSource = 0;	/* XLOG_FROM_* code */
 static bool lastSourceFailed = false;
 
+/*
+ * Need for restart running WalReceiver due the configuration change.
+ * Suitable only for XLOG_FROM_STREAM source
+ */
+static bool pendingWalRcvRestart = false;
+
 typedef struct XLogPageReadPrivate
 {
 	int			emode;
@@ -11787,48 +11793,6 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	if (!StandbyMode)
 		return false;
 
-	/*
-	 * If primary_conninfo is set, launch walreceiver to try
-	 * to stream the missing WAL.
-	 *
-	 * If fetching_ckpt is true, RecPtr points to the initial
-	 * checkpoint location. In that case, we use RedoStartLSN
-	 * as the streaming start position instead of RecPtr, so
-	 * that when we later jump backwards to start redo at
-	 * RedoStartLSN, we will have the logs streamed already.
-	 */
-	if (PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0)
-	{
-		XLogRecPtr	ptr;
-		TimeLineID	tli;
-
-		if (fetching_ckpt)
-		{
-			ptr = RedoStartLSN;
-			tli = ControlFile->checkPointCopy.ThisTimeLineID;
-		}
-		else
-		{
-			ptr = RecPtr;
-
-			/*
-			 * Use the record begin position to determine the
-			 * TLI, rather than the position we're reading.
-			 */
-			tli = tliOfPointInHistory(tliRecPtr, expectedTLEs);
-
-			if (curFileTLI > 0 && tli < curFileTLI)
-elog(ERROR, "according to history file, WAL location %X/%X belongs to timeline %u, but previous recovered WAL file came from timeline %u",
-	 (uint32) (tliRecPtr >> 32),
-	 (uint32) tliRecPtr,
-	 tli, curFileTLI);
-		}
-		curFileTLI = tli;
-		RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
-			 PrimarySlotName);
-		receivedUpto = 0;
-	}
-
 	/*
 	 * Move to XLOG_FROM_STREAM state in either case. We'll
 	 * get immediate failure if we didn't launch walreceiver,
@@ -11928,10 +11892,66 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
  lastSourceFailed ? "failure" : "success");
 
 		/*
-		 * We've now handled possible failure. Try to read from the chosen
-		 * source.
+		 * Request walreceiver to start if we switch from another source or if
+		 * we need to change walreceiver connection configuration.
+		 */
+		if (currentSource == XLOG_FROM_STREAM && (lastSourceFailed || pendingWalRcvRestart))
+		{
+			/*
+			 * Ensure walreceiver is not running
+			 */
+			if (WalRcvRunning())
+ShutdownWalRcv();
+
+			/*
+			 * If primary_conninfo is set, launch walreceiver to try to stream
+			 * the missing WAL.
+			 *
+			 * If fetching_ckpt is true, RecPtr points to the initial
+			 * checkpoint location. In that case, we

Re: ecpglib major version changed

2019-09-19 Thread Peter Eisentraut

On 2019-09-16 15:41, Tom Lane wrote:
> Peter Eisentraut  writes:
>> The ecpglib major version (SO_MAJOR_VERSION) was changed in
>> bd7c95f0c1a38becffceb3ea7234d57167f6d4bf (Add DECLARE STATEMENT support
>> to ECPG.), but I don't see anything in that patch that would warrant
>> that.  I think we should undo that change.
> 
> Ouch.  Yeah, that's a Big Deal from a packaging standpoint.
> Why would it be necessary?

I double-checked this with abidiff and there was nothing changed except
a few added functions.  So I've reverted it now.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgbench - allow to create partitioned tables

2019-09-19 Thread Amit Kapila

On Thu, Sep 19, 2019 at 10:25 AM Fabien COELHO  wrote:
> Hello Amit,
>
> >> Yes, on -i it will fail because the syntax will not be recognized.
> >
> > Maybe we should be checking the server version, which would allow to
> > produce more useful error messages when these options are used against
> > older servers, like
> >
> > if (sversion < 1)
> >fprintf(stderr, "cannot use --partitions/--partitions-method
> > against servers older than 10");
> >
> > We would also have to check that partition-method=hash is not used against 
> > v10.
> >
> > Maybe overkill?
>
> Yes, I think so: the error detection and messages would be more or less
> replicated from the server and would vary from version to version.
>

Yeah, but I think Amit L's point is worth considering.  I think it
would be good if a few other people can also share their suggestion on
this point.  Alvaro, Dilip, anybody else following this thread, would
like to comment?   It is important to know others opinion on this
because this will change how pgbench behaves with prior versions.

> I do not think that it is worth going this path because the use case is
> virtually void as people in 99.9% of cases would use a pgbench matching
> the server version.

Fair enough, but there is no restriction of using it with prior
versions.  In fact some people might want to use this with v11 where
partitioning was present.  So, we shouldn't ignore this point.

> One thing we could eventually do is just to check pgbench version against
> the server version like psql does and output a generic warning if they
> differ, but franckly I do not think it is worth the effort:
>

Yeah and even if we want to do something like that, it should not be
part of this patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: pgbench - allow to create partitioned tables

2019-09-19 Thread Amit Kapila

On Wed, Sep 18, 2019 at 10:33 PM Fabien COELHO  wrote:
>
>
> Hello Amit,
>
> >>   - use search_path to find at most one pgbench_accounts
> >> It still uses left join because I still think that it is appropriate.
> >> I added a lateral to avoid repeating the array_position call
> >> to manage the search_path, and use explicit pg_catalog everywhere.
> >
> > It would be good if you can add some more comments to explain the
> > intent of query.
>
> Indeed, I put too few comments on the query.
>
> > + if (ps == NULL)
> > + partition_method = PART_NONE;
> >
> > When can we expect ps as NULL?  If this is not a valid case, then
> > probably and Assert would be better.
>
> No, ps is really NULL if there is no partitioning, because of the LEFT
> JOIN and pg_partitioned_table is just empty in that case.
>

'ps' itself won't be NULL in that case, the value it contains is NULL.
I have debugged this case as well.  'ps' itself can be NULL only when
you pass wrong column number or something like that to PQgetvalue.

> The last else where there is an unexpected entry is different, see
> comments about v11 below.
>
> > + else if (PQntuples(res) == 0)
> > + {
> > + /* no pgbench_accounts found, builtin script should fail later */
> > + partition_method = PART_NONE;
> > + partitions = -1;
> >
> > If we don't find pgbench_accounts, let's give error here itself rather
> > than later unless you have a valid case in mind.
>
> I thought of it, but decided not to: Someone could add a builtin script
> which does not use pgbench_accounts, or a parallel running script could
> create a table dynamically, whatever, so I prefer the error to be raised
> by the script itself, rather than deciding that it will fail before even
> trying.
>

I think this is not a possibility today and I don't know of the
future.  I don't think it is a good idea to add code which we can't
reach today.  You can probably add Assert if required.

> > + /*
> > + * Partition information. Assume no partitioning on any failure, so as
> > + * to avoid failing on an older version.
> > + */
> > ..
> > + if (PQresultStatus(res) != PGRES_TUPLES_OK)
> > + {
> > + /* probably an older version, coldly assume no partitioning */
> > + partition_method = PART_NONE;
> > + partitions = 0;
> > + }
> >
> > So, here we are silently absorbing the error when pgbench is executed
> > against older server version which doesn't support partitioning.
>
> Yes, exactly.
>
> > If that is the case, then I think if user gives --partitions for the old
> > server version, it will also give an error?
>
> Yes, on -i it will fail because the syntax will not be recognized.
>
> > It is not clear in documentation whether we support or not using pgbench
> > with older server versions.
>
> Indeed. We more or less do in practice. Command "psql" works back to 8
> AFAICR, and pgbench as well.
>
> > I guess it didn't matter, but with this feature, it can matter.  Do we
> > need to document this?
>
> This has been discussed in the past, and the conclusion was that it was
> not worth the effort. We just try not to break things if it is avoidable.
> On this regard, the patch slightly changes FILLFACTOR output, which is
> removed if the value is 100 (%) as it is the default, which means that
> table creation would work on very very old version which did not support
> fillfactor, unless you specify a lower percentage.
>

Hmm, why you need to change the fill factor behavior?  If it is not
specifically required for the functionality of this patch, then I
suggest keeping that behavior as it is.

> Attached v11:
>
>   - add quite a few comments on the pg_catalog query
>
>   - reverts the partitions >= 1 test; If some new partition method is
> added that pgbench does not know about, the failure mode will be that
> nothing is printed rather than printing something strange like
> "method none with 2 partitions".
>

but how will that new partition method will be associated with a table
created via pgbench?  I think the previous check was good because it
makes partition checking consistent throughout the patch.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

60 matches

Mail list logo