date:20181213

Re: Ryu floating point output patch

2018-12-13 Thread Andrew Gierth

> "Andrew" == Andrew Gierth  writes:

 Andrew> And again to fix the windows build

And again to see if it actually compiles now...

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index cf9327f885..24d41c2e89 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -21,6 +21,7 @@
 
 #include "catalog/pg_type.h"
 #include "common/int.h"
+#include "common/ryu.h"
 #include "libpq/pqformat.h"
 #include "utils/array.h"
 #include "utils/float.h"
@@ -29,7 +30,7 @@
 
 
 /* Configurable GUC parameter */
-int			extra_float_digits = 0; /* Added to DBL_DIG or FLT_DIG */
+int			extra_float_digits = 1; /* Added to DBL_DIG or FLT_DIG */
 
 /* Cached constants for degree-based trig functions */
 static bool degree_consts_set = false;
@@ -246,6 +247,12 @@ float4out(PG_FUNCTION_ARGS)
 	char	   *ascii = (char *) palloc(32);
 	int			ndig = FLT_DIG + extra_float_digits;
 
+	if (extra_float_digits > 0)
+	{
+		ryu_f2s_buffered(num, ascii);
+		PG_RETURN_CSTRING(ascii);
+	}
+
 	(void) pg_strfromd(ascii, 32, ndig, num);
 	PG_RETURN_CSTRING(ascii);
 }
@@ -462,6 +469,12 @@ float8out_internal(double num)
 	char	   *ascii = (char *) palloc(32);
 	int			ndig = DBL_DIG + extra_float_digits;
 
+	if (extra_float_digits > 0)
+	{
+		ryu_d2s_buffered(num, ascii);
+		return ascii;
+	}
+
 	(void) pg_strfromd(ascii, 32, ndig, num);
 	return ascii;
 }
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6fe1939881..6e223335bc 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2631,11 +2631,12 @@ static struct config_int ConfigureNamesInt[] =
 		{"extra_float_digits", PGC_USERSET, CLIENT_CONN_LOCALE,
 			gettext_noop("Sets the number of digits displayed for floating-point values."),
 			gettext_noop("This affects real, double precision, and geometric data types. "
-		 "The parameter value is added to the standard number of digits "
-		 "(FLT_DIG or DBL_DIG as appropriate).")
+		 "A zero or negative parameter value is added to the standard "
+		 "number of digits (FLT_DIG or DBL_DIG as appropriate). "
+		 "Any value greater than zero selects round-trip-safe output.")
 		},
 		&extra_float_digits,
-		0, -15, 3,
+		1, -15, 3,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/common/Makefile b/src/common/Makefile
index ec8139f014..02622f8c48 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -44,8 +44,10 @@ override CPPFLAGS += -DVAL_LIBS="\"$(LIBS)\""
 override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
 LIBS += $(PTHREAD_LIBS)
 
-OBJS_COMMON = base64.o config_info.o controldata_utils.o exec.o file_perm.o \
-	ip.o keywords.o link-canary.o md5.o pg_lzcompress.o \
+# If you add objects here, see also src/tools/msvc/Mkvcbuild.pm
+
+OBJS_COMMON = base64.o config_info.o controldata_utils.o d2s.o exec.o f2s.o \
+	file_perm.o ip.o keywords.o link-canary.o md5.o pg_lzcompress.o \
 	pgfnames.o psprintf.o relpath.o \
 	rmtree.o saslprep.o scram-common.o string.o unicode_norm.o \
 	username.o wait_error.o
diff --git a/src/common/d2s.c b/src/common/d2s.c
new file mode 100644
index 00..131b762b1e
--- /dev/null
+++ b/src/common/d2s.c
@@ -0,0 +1,961 @@
+/*---
+ *
+ * Ryu floating-point output for double precision.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/common/d2s.c
+ *
+ * This is a modification of code taken from github.com/ulfjack/ryu under the
+ * terms of the Boost license (not the Apache license). The original copyright
+ * notice follows:
+ *
+ * Copyright 2018 Ulf Adams
+ *
+ * The contents of this file may be used under the terms of the Apache
+ * License, Version 2.0.
+ *
+ * (See accompanying file LICENSE-Apache or copy at
+ *  http://www.apache.org/licenses/LICENSE-2.0)
+ *
+ * Alternatively, the contents of this file may be used under the terms of the
+ * Boost Software License, Version 1.0.
+ *
+ * (See accompanying file LICENSE-Boost or copy at
+ *  https://www.boost.org/LICENSE_1_0.txt)
+ *
+ * Unless required by applicable law or agreed to in writing, this software is
+ * distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.
+ *
+ *---
+ */
+
+/*
+ *  Runtime compiler options:
+ *
+ *  -DRYU_ONLY_64_BIT_OPS Avoid using uint128 or 64-bit intrinsics. Slower,
+ *  depending on your compiler.
+ */
+
+#ifndef FRONTEND
+#include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
+
+#include "common/ryu.h"
+
+/*
+ * For consistency, we use 128-bit types if and only if the rest of PG also
+ * does, even though we could use them here without worrying about the
+ * alignment concerns that apply elsewhere.
+ */
+#if !defined(HAVE_INT128) && defined(_MSC_VER) \
+	&& !defined(RYU_ONLY_64_BIT_OPS) && defined(_M_X64

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2018-12-13 Thread Peter Eisentraut

On 14/12/2018 01:23, Michael Paquier wrote:
> On Thu, Dec 13, 2018 at 07:14:57PM -0500, Stephen Frost wrote:
>> Based on at least a quick looking around, the actual grammar rule seems
>> to match my recollection[1], adverbs should typically go AFTER the
>> verb + object, and the adverb shouldn't ever be placed between the verb
>> and the object.
> 
> This part has been a long debate already in 2012-2013 when I sent the
> first iterations of the patch, and my memories on the matter are that
> the grammar you are showing here matches with the past agreement.

Do you happen to have a link for that?  I didn't find anything.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2018-12-13 Thread Peter Eisentraut

On 14/12/2018 01:14, Stephen Frost wrote:
 reindex table CONCURRENTLY test;
>>
>> By the way, does this syntax make sense?  I haven't seen a discussion on
>> this anywhere in the various threads.  I keep thinking that
>>
>> reindex concurrently table test;
>>
>> would make more sense.  How about in combination with (verbose)?
> 
> I don't think it's a mistake that we have 'create index concurrently'
> and it certainly would seem odd to me for 'create index' and 'reindex
> table' to be different.
> 
> Certainly, from my recollection of english, you'd say "I am going to
> reindex the table concurrently", you wouldn't say "I am going to
> reindex concurrently the table."
> 
> Based on at least a quick looking around, the actual grammar rule seems
> to match my recollection[1], adverbs should typically go AFTER the
> verb + object, and the adverb shouldn't ever be placed between the verb
> and the object.

So it would be grammatical to say

reindex table test concurrently

or in a pinch

reindex concurrently table test

but I don't see anything grammatical about

reindex table concurrently test

(given that the object is "table test").

Where this gets really messy is stuff like this:

reindex (verbose) database concurrently postgres

Why would "concurrently" not be part of the options next to "verbose"?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: allow online change primary_conninfo

2018-12-13 Thread Michael Paquier

On Thu, Dec 13, 2018 at 01:51:48PM +0300, Sergei Kornilov wrote:
> Depends on this discussion: 
> https://www.postgresql.org/message-id/20181212053042.gk17...@paquier.xyz
> If walreceiver will use GUC conninfo for startup - then yes, we can
> remove such static variables.  If walreceiver will still use conninfo
> from WalRcv - we have race condition between RequestXLogStreaming from
> startup, config reload, and walreceiver start. In this case I need
> save conninfo from WalRcv and compare new GUC value with them.

I would recommend waiting for the conclusion of other thread before
moving on with this one.  There are arguments for letting the startup
process deciding which connection string the WAL receiver should use as
well as letting the WAL receiver use directly the connection string from
the GUC.

> Hmm. We have infrastructure to hide such values?
> I need implement something like flag GUC_HIDE_VALUE_FROM_LOG near
> GUC_SUPERUSER_ONLY in separate thread? Log message will be changed to
> "LOG: parameter "primary_conninfo" changed" with this flag.

Good point.  Passwords in logs can be considered as a security issue.
Having such an option may be interesting for other things, though I am
not sure if just having an option to hide logs related to a given
parameter are the correct way to shape it.  We could also consider using
the show hook function of a parameter to print its value in such logs.
--
Michael


signature.asc
Description: PGP signature

Re: tab-completion debug print

2018-12-13 Thread Peter Eisentraut

On 14/12/2018 06:41, David Fetter wrote:
> There are problems tab completion, e.g. its catalog queries, could
> cause in production that would be difficult to simulate elsewhere.

Where are all these reports of tab completion query problems that we are
building elaborate infrastructure to diagnose?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PROPOSAL]a new data type 'bytea' for ECPG

2018-12-13 Thread Michael Meskes

Matsumura-san,

> > I meaned that existing applications that receive data of bytea
> > column
> > with using sqlda will encounter their unknown type(=ECPG.bytea) in
> > sqlvar_struct.sqltype.
> > 
> > You mean if they are not recompiled? If so, yes, how else could
> > that be
> > handled?
> 
> Even if they are recompiled, they will fail.
> 
>   switch (sqlvar_struct.sqltype)
>   {
> case ECPG.int:  break;
> case ECPG.char: break;
>   /* There is no case for ECPG.bytea */
> default:  abort();

Sorry, I should have been more precise. I meant if they are not
recompiled against the new ecpglib which has a case for ECPG.bytea.

> There is an idea as following, but it seems to be ugly.
> 
>   Implement a parameter for ecpglib.
>   The parameter means whether application want to receive
>   bytea data in binary format or not. Default is "not".
>   # I don't know any ecpglib's parameter like it.
> 
> In other words, if application uses "bytea" type host variable, 
> ecpglib could know its intent, but in case of sqlda ecpglib could
> not know it.

I'm at a loss here. I don't understand what you are trying to say. 

An incompatible change is ok if we change the version number of the
library accordingly. 

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL

Re: Introducing SNI in TLS handshake for SSL connections

2018-12-13 Thread Pablo Iranzo Gómez


Hi,

+++ Andreas Karlsson [13/12/18 01:30 +0100]:

On 12/11/18 3:52 PM, Pablo Iranzo Gómez wrote:
I came to this old thread while trying to figure out on how to setup 
postgres replication behind OpenShift/Kubernetes behind a route 
(which only forwards 80 or 443 traffic), but could work if SNI is 
supported on the client using it.


Hm ... while hacking at a patch for this I gave your specific problem 
some more thought.


I am not familiar with OpenShift or Kubernetes but I want you to be 
aware of that whatever proxy you are going to use will still need to 
be aware of, at least a subset of, the PostgreSQL protocol, since 
similar to SMTP's STARTTLS command the PostgreSQL client will start 
out using the plain text PostgreSQL protocol and then request the 
server to switch over to SSL[1]. So it would be necessary to add 
support for this to whatever proxy you intend to use.


Do you know if adding such custom protocol support is easy to do to 
the proxies you refer to? And do you have any links to documentation 
for these solutions?


I saw that they did incorporate some changes like SPDY support and other
http related things.

Let me try to find an answer (now sure how long will it take) and come
back.

I've did some basic search at
https://git.haproxy.org/?p=haproxy.git;a=summary but nothing evident
(for me).

I'll keep you updated.
Pablo




Notes

1. https://www.postgresql.org/docs/11/protocol-flow.html#id-1.10.5.7.11

Andreas


--

Pablo Iranzo Gómez (pablo.ira...@redhat.com)  GnuPG: 0x5BD8E1E4
Senior Software Engineer - Solutions Engineering   iranzo @ IRC
RHC{A,SS,DS,VA,E,SA,SP,AOSP}, JBCAA#110-215-852RHCA Level V

Blog: https://iranzo.github.io https://citellus.org


signature.asc
Description: PGP signature

Re: Hitting CheckRelationLockedByMe() ASSERT with force_generic_plan

2018-12-13 Thread Rushabh Lathia

While looking code further around this, I realized that we need
similar kind of fix for bitmap as well as index only scan as well.

Here is the patch, which does similar fix for bitmap and indexonly
scans.

Thanks,


On Fri, Nov 23, 2018 at 6:47 PM Rushabh Lathia 
wrote:

>
>
> On Fri, Nov 23, 2018 at 3:33 AM David Rowley 
> wrote:
>
>> On Thu, 22 Nov 2018 at 22:33, Rushabh Lathia 
>> wrote:
>> > CREATE TABLE foo (x int primary key);
>> > INSERT INTO foo VALUES (1), (2), (3), (4), (5);
>> >
>> > CREATE OR REPLACE FUNCTION f1(a int) RETURNS int
>> > AS $$
>> > BEGIN
>> >  DELETE FROM foo where x = a;
>> >  return 0;
>> > END;
>> > $$ LANGUAGE plpgsql;
>> >
>> > postgres@100858=#set plan_cache_mode = force_generic_plan;
>> > SET
>> > postgres@100858=#select f1(4);
>> >  f1
>> > 
>> >   0
>> > (1 row)
>> >
>> > postgres@100858=#select f1(4);
>> > server closed the connection unexpectedly
>>
>>
>> > Now, to fix this issue either we need to hold proper lock before
>> reaching
>> > to ExecInitIndexScan() or teach ExecInitIndexScan() to take
>> AccessShareLock
>> > on the scan coming from CMD_DELETE.
>>
>> I'd say the following comment and code in nodeIndexscan.c is outdated:
>>
>> /*
>> * Open the index relation.
>> *
>> * If the parent table is one of the target relations of the query, then
>> * InitPlan already opened and write-locked the index, so we can avoid
>> * taking another lock here.  Otherwise we need a normal reader's lock.
>> */
>> relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
>> indexstate->iss_RelationDesc = index_open(node->indexid,
>>   relistarget ? NoLock : AccessShareLock);
>>
>> Despite what the above comment claims, these indexes have not been
>> opened in InitPlan since 389af951552ff2. As you mentioned, they're
>> opened in nodeModifyTable.c for non-delete statements.
>>
>>
> +1.
>
> I tried the same approach and with further testing I haven't found
> any issues.
>
>
>> I think we either need to update the above code to align it to what's
>> now in nodeModifyTable.c, or just obtain an AccessShareLock
>> regardless. I kinda think we should just take the lock regardless as
>> determining if the relation is a result relation may be much more
>> expensive than just taking another lower-level lock on the index.
>>
>> Anyway. I've attached a small patch to update the above fragment.
>>
>> --
>>  David Rowley   http://www.2ndQuadrant.com/
>>  PostgreSQL Development, 24x7 Support, Training & Services
>>
>
>
> --
> Rushabh Lathia
>


-- 
Rushabh Lathia
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index d04f490..c615316 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -210,7 +210,7 @@ BitmapIndexScanState *
 ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
 {
 	BitmapIndexScanState *indexstate;
-	bool		relistarget;
+	bool		indexlocked;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -266,9 +266,17 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
 	 * InitPlan already opened and write-locked the index, so we can avoid
 	 * taking another lock here.  Otherwise we need a normal reader's lock.
 	 */
-	relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
+	/*
+	 * Open the index relation.
+	 *
+	 * For non-DELETE statement when the parent table is one of the target
+	 * relations of the query, ExecInitModifyTable will already have obtained
+	 * a lock on the index, otherwise we must obtain a read lock here.
+	 */
+	indexlocked = estate->es_plannedstmt->commandType != CMD_DELETE &&
+  ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
 	indexstate->biss_RelationDesc = index_open(node->indexid,
-			   relistarget ? NoLock : AccessShareLock);
+			   indexlocked ? NoLock : AccessShareLock);
 
 	/*
 	 * Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index d1201a1..d9caf00 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -493,8 +493,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 {
 	IndexOnlyScanState *indexstate;
 	Relation	currentRelation;
-	bool		relistarget;
 	TupleDesc	tupDesc;
+	bool		indexlocked;
 
 	/*
 	 * create state structure
@@ -558,13 +558,14 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
 	/*
 	 * Open the index relation.
 	 *
-	 * If the parent table is one of the target relations of the query, then
-	 * InitPlan already opened and write-locked the index, so we can avoid
-	 * taking another lock here.  Otherwise we need a normal reader's lock.
+	 * For non-DELETE statement when the parent table is one of the target
+	 * relations of the query, ExecInitModifyTable will already have ob

Re: tab-completion debug print

2018-12-13 Thread David Fetter

On Fri, Dec 14, 2018 at 12:23:17AM -0500, Tom Lane wrote:
> David Fetter  writes:
> > On Thu, Dec 13, 2018 at 07:43:52PM +0100, Peter Eisentraut wrote:
> >> I'm not sure that this should be a run-time option.
> 
> > Given the often onerous givens of installing new software on
> > production machines, I'm thinking it actually should be.
> 
> What's that have to do with it?  This is a development option,
> not a production option.

There are problems tab completion, e.g. its catalog queries, could
cause in production that would be difficult to simulate elsewhere.

> I don't think it should be in production builds at all, let alone be
> invokable via a simple command-line switch.  The notion that
> ordinary users will want to use it is hogwash --- we'd be much more
> likely to get complaints about psql spewing unexpected output.

With utmost respect, by the time someone is turning on logging for
psql, they're trying to diagnose a problem. If they're lucky, it's a
problem that's happening during development of psql itself and gets
caught during that process. I don't see "if they're lucky" is a
helpful approach to observability.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Row Visibility and Table Access Methods

2018-12-13 Thread Pavel Stehule

čt 13. 12. 2018 v 10:23 odesílatel Simon Riggs 
napsal:

> Currently, tables provide MVCC access semantics as the only option.
>
> A more complete list of desirable semantics in applications are
>
> * MVCC - provide latest consistent view
> * Historical - keep all old row versions so that queries can access data
> as it used to be
> * TTL=Duration - keep committed rows around for at most Duration seconds
> * Latest - show only the most current committed row version, at the cost
> of inconsistency
> There might be others
>
> Since I see these options as semantics rather than physical, I think we
> should separate these operations away from Table Access Methods. This
> allows those semantics to be implemented in different ways for different
> storage types.
>
> "Historical" access has been discussed many times, so no need to revisit
> here. Clearly, it is a very poopular idea, just not easily feasible with
> the current heap access layer. We might want an option for row_visibility
> retention. For tables with this option set, we would in later releases
> allow historical queries according to the SQL Standard.
>
> "TTL" or "Time To Live" -  time-limited access to data is available in
> many other databases. It is simple to implement and we could easily have
> this in PG12. Implementation is 1) adding the option, 2) adding a
> time-component into the visibility check for scan and vacuum. This option
> implies an option exists to specify row_visibility retention.
>
> "Latest" is similar to the way EvalPlanQual works, allowing UPDATE to see
> the latest version of a row before update, and similar to the way catalog
> scans work in that any access to a catalog entry sees the latest row based
> upon an immediate snapshot, not the one taken at the start of a query. It
> makes sense to allow this as an explicit table-level option, so any data
> viewed can see the latest version, just as UPDATEs already do. This also
> means that juggling bloat and old row versions becomes much less of an
> issue for very high volume update applications such as booking systems or
> stock tickers. (Clearly, better table access methods would improve on this
> further and even they would benefit from this simplification of the main
> issue around MVCC).
> Logic for this type of visibility already exists in PG
> via HeapTupleSatisfiesSelf(), so we are just exposing what is already there
> to the user; no need to discuss semantics at length.
> Again, patches to implement this option are simple and viable for PG12.
>
> User interface are 2 new table-level options
> * row_visibility = MVCC (default), TTL, Latest, Historical
> * row_visibility_retention_interval = 'system' (default)
> For MVCC, the only valid setting would be system, i.e. current MVCC
> behavior (but this might be altered by specific storage plugin parameters)
> For Latest, the only valid setting would be system
> For TTL, the interval to retain data for, setting of 0 is not valid
> For Historical, the interval to retain old row versions for, 0 means
> forever
>
> Implementation summary
> 1. Add new table-level option for row_visibility and
> row_visibility_retention_interval
> 2. Add option to heap_beginscan
> 3. Add option handling in heap prune
> 4. Add option to tqual
>
> Thoughts?
>

looks great

Pavel


> --
> Simon Riggshttp://www.2ndQuadrant.com/
> 
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: tab-completion debug print

2018-12-13 Thread Tom Lane

David Fetter  writes:
> On Thu, Dec 13, 2018 at 07:43:52PM +0100, Peter Eisentraut wrote:
>> I'm not sure that this should be a run-time option.

> Given the often onerous givens of installing new software on
> production machines, I'm thinking it actually should be.

What's that have to do with it?  This is a development option,
not a production option.  I don't think it should be in production
builds at all, let alone be invokable via a simple command-line switch.
The notion that ordinary users will want to use it is hogwash ---
we'd be much more likely to get complaints about psql spewing unexpected
output.

regards, tom lane

Re: Add pg_partition_root to get top-most parent of a partition tree

2018-12-13 Thread Amit Langote

Hi,

On 2018/12/12 10:48, Michael Paquier wrote:
> On Fri, Dec 07, 2018 at 11:46:05AM +0900, Michael Paquier wrote:
>> On Thu, Dec 06, 2018 at 10:48:59PM -0300, Alvaro Herrera wrote:
>>> I think adding a pg_partition_root() call to as many pg_partition_tree
>>> tests as you modified is overkill ... OTOH I'd have one test that
>>> invokes pg_partition_tree(pg_partition_root(some-partition)) to verify
>>> that starting from any point in the tree you get the whole tree.
>>
>> Good idea, thanks for the input.
> 
> The recent commit cc53123 has fixed a couple of issues with
> pg_partition_tree, so attached is a rebased patch which similarly makes
> pg_partition_root return NULL for unsupported relkinds and undefined
> relations.  I have also simplified the tests based on Alvaro's
> suggestion to use pg_partition_tree(pg_partition_root(partfoo)).

Thanks for working on this.  I have looked at this patch and here are some
comments.

+Return the top-most parent of a partition tree for the given
+partitioned table or partitioned index.

Given that pg_partition_root will return a valid result for any relation
that can be part of a partition tree, it seems strange that the above
sentence says "for the given partitioned table or partitioned index".  It
should perhaps say:

Return the top-most parent of the partition tree to which the given
relation belongs

+/*
+ * Perform several checks on a relation on which is extracted some
+ * information related to its partition tree.  Returns false if the
+ * relation cannot be processed, in which case it is up to the caller
+ * to decide what to do, by either raising an error or doing something
+ * else.
+ */
+static bool
+check_rel_for_partition_info(Oid relid)
+{
+charrelkind;
+
+/* Check if relation exists */
+if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relid)))
+return false;

This should be checked in the caller imho.

+
+relkind = get_rel_relkind(relid);
+
+/* Only allow relation types that can appear in partition trees. */
+if (relkind != RELKIND_RELATION &&
+relkind != RELKIND_FOREIGN_TABLE &&
+relkind != RELKIND_INDEX &&
+relkind != RELKIND_PARTITIONED_TABLE &&
+relkind != RELKIND_PARTITIONED_INDEX)
+return false;
+
+return true;
+}

I can't imagine this function growing more code to perform additional
checks beside just checking the relkind, so the name of this function may
be a bit too ambitious.  How about calling it
check_rel_can_be_partition()?  The comment above the function could be a
much simpler sentence too.  I know I may be just bikeshedding here though.

+/*
+ * If the relation is not a partition, return itself as a result.
+ */
+if (!get_rel_relispartition(relid))
+PG_RETURN_OID(relid);

Maybe the comment here could say "The relation itself may be the root parent".

+/*
+ * If the relation is actually a partition, 'rootrelid' has been set to
+ * the OID of the root table in the partition tree.  It should be a valid
+ * valid per the previous check for partition leaf above.
+ */
+Assert(OidIsValid(rootrelid));

"valid" is duplicated in the last sentence in the comment.  Anyway, what's
being Asserted can be described simply as:

/*
 * 'rootrelid' must contain a valid OID, given that the input relation is
 * a valid partition tree member as checked above.
 */

Thanks,
Amit

Re: Valgrind failures in Apply Launcher's bgworker_quickdie() exit

2018-12-13 Thread Tom Lane

Andres Freund  writes:
> On December 13, 2018 6:01:04 PM PST, Tom Lane  wrote:
>> Has anyone tried to reproduce this on other platforms?

> I recently also hit this locally, but since that's also Debian unstable... 
> Note that removing openssl "fixed" the issue for me.

FWIW, I tried to reproduce this on Fedora 28 and RHEL6, without success.
It's possible that there's some significant detail of your configuration
that I didn't match, but on the whole "bug in Debian unstable" seems
like the most probable theory right now.

regards, tom lane

Re: tab-completion debug print

2018-12-13 Thread David Fetter

On Thu, Dec 13, 2018 at 07:43:52PM +0100, Peter Eisentraut wrote:
> On 13/12/2018 12:07, Kyotaro HORIGUCHI wrote:
> > - v6-0002-Add-psql-g-option-to-control-debug-print.patch
> > 
> >   Applies on top of 0001. Code is always active, -g addition to
> >   -L activates debug print into the log file. If only -g is
> >   specified it is forcibly turned off.
> > 
> >   > $ psql postgres -g
> >   > psql: no session log file, turn off debug print
> >   > psql (12devel)
> >   > Type "help" for help.
> > 
> >   -g option shows in help message. (perhaps arguable.)
> 
> I'm not sure that this should be a run-time option.

Given the often onerous givens of installing new software on
production machines, I'm thinking it actually should be.

> But we surely don't want to use up a good short option letter for
> this.

Would it make sense to roll this into the -L option?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: BUG #15212: Default values in partition tables don't work as expected and allow NOT NULL violation

2018-12-13 Thread Amit Langote

On 2018/11/09 14:04, Amit Langote wrote:
> On 2018/11/09 4:39, Alvaro Herrera wrote:
>> I included the test case for collations to the three branches, but no
>> code changes.  We can patch master for the handling of collations per
>> your patch,
> 
> Okay, but should we back-patch it by adding WARNING to back-branches as
> you suggest?

I was looking at the pending patches that I'd sent and noticed this one to
throw an error when a partition specifies a collation for a column that
doesn't match the parent's.  Do we want to apply the attached rebased
patch to HEAD and leave the back-branches (10 and 11) alone?

Thanks,
Amit
From b802e96a78f1d52571c1a83dcfd9053ad32f81b8 Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 7 Nov 2018 10:32:14 +0900
Subject: [PATCH] Disallow creating partitions with mismatching collations for
 columns

---
 src/backend/commands/tablecmds.c   | 21 +
 src/test/regress/expected/create_table.out |  4 
 2 files changed, 25 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d6d0de1b01..ec583bb74c 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -2413,6 +2413,10 @@ MergeAttributes(List *schema, List *supers, char 
relpersistence,
 
if (strcmp(coldef->colname, restdef->colname) 
== 0)
{
+   Oid defTypeId;
+   int32   deftypmod;
+   Oid newCollId;
+
found = true;
coldef->is_not_null |= 
restdef->is_not_null;
 
@@ -2432,6 +2436,23 @@ MergeAttributes(List *schema, List *supers, char 
relpersistence,
coldef->raw_default = 
restdef->raw_default;
coldef->cooked_default = NULL;
}
+
+   /*
+* Collation must be same, so error out 
if a different one
+* specified for the partition.
+*/
+   typenameTypeIdAndMod(NULL, 
coldef->typeName,
+   
 &defTypeId, &deftypmod);
+   newCollId = GetColumnDefCollation(NULL, 
restdef,
+   
  defTypeId);
+   if (newCollId != coldef->collOid)
+   ereport(ERROR,
+   
(errcode(ERRCODE_COLLATION_MISMATCH),
+errmsg("column 
\"%s\" has a collation conflict",
+   
coldef->colname),
+
errdetail("\"%s\" versus \"%s\"",
+   
   get_collation_name(newCollId),
+   
   get_collation_name(coldef->collOid;
}
}
 
diff --git a/src/test/regress/expected/create_table.out 
b/src/test/regress/expected/create_table.out
index 7e52c27e3f..f3dd9d964f 100644
--- a/src/test/regress/expected/create_table.out
+++ b/src/test/regress/expected/create_table.out
@@ -770,9 +770,13 @@ create table parted_collate_must_match (a text collate 
"C", b text collate "C")
 -- on the partition key
 create table parted_collate_must_match1 partition of parted_collate_must_match
   (a collate "POSIX") for values from ('a') to ('m');
+ERROR:  column "a" has a collation conflict
+DETAIL:  "POSIX" versus "C"
 -- on another column
 create table parted_collate_must_match2 partition of parted_collate_must_match
   (b collate "POSIX") for values from ('m') to ('z');
+ERROR:  column "b" has a collation conflict
+DETAIL:  "POSIX" versus "C"
 drop table parted_collate_must_match;
 -- Partition bound in describe output
 \d+ part_b
-- 
2.11.0

Re: valgrind issues on Fedora 28

2018-12-13 Thread Tom Lane

Tomas Vondra  writes:
> On 11/8/18 1:07 PM, Tomas Vondra wrote:
>> Not sure what to do about this - maybe we should wildcard this first
>> frame, to accept all wcsnlen variants. Opinions?

> I've committed this with the first frame wirldcarded, and backpatched it
> all the way back to 9.4.

So I just got around to trying to do some valgrind testing for the first
time in a couple of months, and what I find is that this patch completely
breaks valgrind for me: it exits immediately with complaints like

==00:00:00:00.222 14821== FATAL: in suppressions file 
"/home/postgres/pgsql/src/tools/valgrind.supp" near line 227:
==00:00:00:00.222 14821==unknown tool suppression type
==00:00:00:00.222 14821== exiting now.

After a bit of hair-pulling I found that the line number report is
a lie, and what it's really unhappy about is the "Memcheck:Addr32"
specification on line 236.  This is not terribly surprising perhaps
considering that "Addr32" isn't even documented on what should be
the authoritative page:
http://valgrind.org/docs/manual/mc-manual.html#mc-manual.suppfiles

This is valgrind 3.8.1 (RHEL6's version), so not bleeding edge exactly,
but we have very little evidence about how far back the committed patch
works.  I'm inclined to think we can't use this solution.

regards, tom lane

Computing the conflict xid for index page-level-vacuum on primary

2018-12-13 Thread Andres Freund

Hi,

Currently nbtree and hash indexes (and soon gist, where it's missing due
to oversight) compute an xid that is used to resolve recovery conflicts.
They do so by visiting all the heap pages
xl_btree_delete/xl_hash_vacuum_one_page point to item-by-item.

That's problematic for two projects I'm working on:

1) For pluggable storage it's problematic, because obviously we can't
   have the nbtree/hash/gist code just assume that the index points to a
   heap.  There's no way we can do catalog lookups to determine the AM,
   and additionally AMs are database specific, so the startup process
   couldn't handle that, even if we somehow managed to get the oid or
   such of the AM handler.

   The zheap patchset currently solved that by having a separate flag
   indicating that the index is over zheap, but that's obviously doesn't
   work with table storage inteded to be pluggable.


2) For logical decoding on the standby [3], at least in my approach, we
   need to be able to recognize snapshot conflicts not just when the
   database is consistent, but also when it's not yet.  But
   unfortunately, the current approach requires that a consistent state
   has been reached:

/*
 * In what follows, we have to examine the previous state of the index
 * page, as well as the heap page(s) it points to.  This is only valid 
if
 * WAL replay has reached a consistent database state; which means that
 * the preceding check is not just an optimization, but is *necessary*. 
We
 * won't have let in any user sessions before we reach consistency.
 */
if (!reachedConsistency)
elog(PANIC, "btree_xlog_delete_get_latestRemovedXid: cannot 
operate with inconsistent data");


I think there's also the tertiary issue that doing substantial work
during recovery that's not performed during the primary is a bad idea,
because it makes it much more likely for the standby to start lagging
behind. Recovery is single-threaded, whereas records like
xl_btree_delete can be emitted concurrently by every single backend. In
the current way there's no backpressure whatsoever.  This also means
that we do that work in multiple replicas.


This leads me to think that we possibly should move computation of the
last removed xid from recovery to the primary, during the generation of
the xl_btree_delete WAL record.


That obviously has the downside that we're doing the additional work
whenever wal_level >= replica, which previously only was done during
recovery.  I don't really see a good way around that for pluggable
storage, unfortunately - I'd be delighted to hear a good one.  Both
during recovery and on the primary that works happens while the index
page is locked.

In the prototype I've attached, I've attempted to address that by making
the routine less inefficient than before:
- Previously it fetched a buffer for every single tuple, in the order of
  items on the index page. It's pretty common for there to be multiple
  tuples on the same page, and it's more efficient to access pages in
  the on-disk order.
- We didn't do any prefetching, which seems a waste given the ease that
  can be done in this case. I've for now just added unconditional
  prefetching, but I guess that ought to instead be something respective
  effective_io_concurrency :/
- It seems to me that it's more likely on the primary that the relevant
  heap pages are already cached: The kill_prior_tuple logic needs to
  have kicked in to start the page level deletion after all, and that
  doesn't WAL log any changes to the heap page.

While on my laptop I wasn't able to observe a performance regression due
to these changes, but was able to see recovery performance increase, I
suspect it's possible to see performance regressions on disks that are
less fast than my SSD, in particular on rotational disks.


Talking with Peter Geoghegan we were pondering a few ideas to address
that:
- we could stop doing page level deletion after the tuples on the first
  heap page, if that frees sufficient space on the index page, to stave
  of a page split for longer.  That has the downside that it's possible
  that we'd end up WAL logging more, because we'd repeatedly emit
  xl_btree_delete records.
- We could use the visits to the heap pages to be *more* aggressive
  about deleting the heap page. If there's some tuples on a heap page
  that are deletable, and we know about that due to the kill_prior_tuple
  logic, it's fairly likely that other pointers to the same page
  actually are also dead: We could re-check that for all index entries
  pointing to pages that we'd visit anyway.
- We could invent a 'exclusive read' lwlock level, and downgrade the
  index buffer lock while we check the heap pages, and then upgrade
  afterwards.  A cursory look makes that look safe, but it'd obviously
  be a nontrivial change.


Comments? Better ideas? Pitchforks?


Minor aside:

The code currently has these two comments:
 * If there'

Re: slight tweaks to documentation about runtime pruning

2018-12-13 Thread Amit Langote

On 2018/12/10 0:57, Amit Langote wrote:
> On Sun, Dec 9, 2018 at 8:13 PM David Rowley
>  wrote:
>> listp1 was scanned twice (loops=2), listp2 was scanned just once.
>>
>> Now it is true that if the subplan was executed 0 times that it will
>> appear as "(never executed)", but do we really need to explain in this
>> area that "(never executed)" means loops=0?
> 
> Ah, I see what you mean.  So, "(never executed)" is not the only sign
> of of run-time pruning occurring.  The value of loops can be different
> for different subplans / partitions, and it being lower for a given
> subplan means that the subplan has been pruned more number of times.

I updated the patch.  Regarding whether we should mention "(never
executed)", it wouldn't hurt to mention it imho, exactly because it's
shown in the place of showing loops=0.  How about the attached?

Thanks,
Amit
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 1925ff4550..0d9353fc3b 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -4421,8 +4421,12 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate 
>= DATE '2008-01-01';
query, partition pruning is performed whenever one of the
execution parameters being used by partition pruning changes.
Determining if partitions were pruned during this phase requires
-   careful inspection of the nloops property in
-   the EXPLAIN ANALYZE output.
+   careful inspection of the loops property in
+   the EXPLAIN ANALYZE output.  Subplans corresponding
+   to different partitions may have different values for it depending on
+   how many times each of them was pruned during execution.  Some may be
+   shown as (never executed) if they were pruned every
+   time.

Re: Valgrind failures in Apply Launcher's bgworker_quickdie() exit

2018-12-13 Thread Thomas Munro

On Fri, Dec 14, 2018 at 1:15 PM Andres Freund  wrote:
> On December 13, 2018 6:01:04 PM PST, Tom Lane  wrote:
> >Thomas Munro  writes:
> >> Since libcrypto.so is implicated, Andres asked me off-list if my
> >> changes to random number state initialisation might be linked to
> >> skink's failures beginning 12 or 15 days ago.  It appears not, as it
> >> was green for several runs after that commit.
> >> ...
> >> It's Debian unstable, which could be a factor.  Bugs in glibc?
>
> Note it's only failing on master...

Here are the commits between the last green run and the first Valgrind failure:

7d4524aed3 Fri Nov 30 22:53:18 2018 UTC  Fix tablespace path TAP test
of pg_verify_checksums for msys
9dc1225855 Fri Nov 30 13:20:49 2018 UTC  Silence compiler warning
dcfdf56e89 Fri Nov 30 06:20:43 2018 UTC  Fix typo.
5c99513975 Fri Nov 30 01:34:45 2018 UTC  Fix various checksum check
problems for pg_verify_checksums and base backups
a1c91dd110 Fri Nov 30 01:14:58 2018 UTC  Switch pg_verify_checksums
back to a blacklist
d328991578 Thu Nov 29 23:28:10 2018 UTC  Document handling of
invalid/ambiguous timestamp input near DST boundaries.
88bdbd3f74 Thu Nov 29 21:42:53 2018 UTC  Add log_statement_sample_rate parameter
826eff57c4 Thu Nov 29 20:53:44 2018 UTC  Ensure static libraries have
correct mod time even if ranlib messes it up.
68120427f4 Thu Nov 29 13:01:11 2018 UTC  doc: Add appendix detailing
some limits of PostgreSQL
44e22647f8 Thu Nov 29 09:00:08 2018 UTC  Add pg_partition_tree to
documentation index
431f1599a2 Thu Nov 29 01:31:12 2018 UTC  Add support for
NO_INSTALLCHECK in MSVC scripts
2ac180c286 Thu Nov 29 01:14:26 2018 UTC  Fix minor typo in dsa.c.
d79fb5d237 Thu Nov 29 00:39:07 2018 UTC  Add missing NO_INSTALLCHECK
in commit_ts and test_rls_hooks
4c703369af Thu Nov 29 00:12:19 2018 UTC  Fix handling of synchronous
replication for stopping WAL senders
1a990b207b Wed Nov 28 22:42:54 2018 UTC  Have BufFileSize() ereport()
on FileSize() failure.

f2cbffc7a6 Wed Nov 28 12:55:54 2018 UTC  Only allow one recovery target setting
eae9143d9a Wed Nov 28 12:34:10 2018 UTC  C comment:  remove extra '*'
0f9cdd7dca Wed Nov 28 01:12:30 2018 UTC  Don't set PAM_RHOST for Unix sockets.
f69c959df0 Wed Nov 28 00:43:08 2018 UTC  Do not decode TOAST data for
table rewrites
d1ce4ed2d5 Tue Nov 27 23:48:51 2018 UTC  Use wildcard to match parens
after CREATE STATISTICS
d67dae036b Tue Nov 27 22:58:10 2018 UTC  Don't count zero-filled
buffers as 'read' in EXPLAIN.
471a7af585 Tue Nov 27 20:16:55 2018 UTC  Ensure consistent sort order
of large objects in pg_dump.
b238527664 Tue Nov 27 18:07:03 2018 UTC  Fix jit compilation bug on wide tables.
f17889b221 Tue Nov 27 14:16:14 2018 UTC  Update ssl test certificates and keys
4c8750a9cc Tue Nov 27 07:26:05 2018 UTC  Fix ac218aa4f6 to work on
versions before 9.5.
ac218aa4f6 Tue Nov 27 01:00:43 2018 UTC  Update pg_upgrade test for
reg* to include regrole and regnamespace.
7a9d6779d9 Tue Nov 27 00:41:29 2018 UTC  doc:  fix wording for
plpgsql, add "and"
54bb22f66a Mon Nov 26 23:27:34 2018 UTC  Fix typo introduced in 578b229718.
12a53c732c Mon Nov 26 22:37:08 2018 UTC  Fix pg_upgrade for oid removal.
70d7e507ef Mon Nov 26 22:32:51 2018 UTC  Fix translation of special
characters in psql's LaTeX output modes.
95dcb8fc05 Mon Nov 26 20:30:24 2018 UTC  Avoid locale-dependent output
in numericlocale check.
67ed3b9d73 Mon Nov 26 20:24:14 2018 UTC  Fix sample output for
hash_metapage_info query
aa2ba50c2c Mon Nov 26 20:18:55 2018 UTC  Add CSV table output mode in psql.
9a98984f49 Mon Nov 26 17:41:42 2018 UTC  Improve regression test
coverage for psql output formats.
a7eece4fc9 Mon Nov 26 17:31:20 2018 UTC  Fix breakage of "\pset format latex".
36d442a25a Mon Nov 26 15:38:19 2018 UTC  Clarify that cross-row
constraints are unsupported
664f01b613 Mon Nov 26 07:43:19 2018 UTC  Revert "Fix typo in
documentation of toast storage"
058ef3a1a8 Mon Nov 26 06:49:23 2018 UTC  Fix typo in documentation of
toast storage
1d7dd18686 Mon Nov 26 02:12:11 2018 UTC  Revert all new recent changes
to add PGXS options for TAP and isolation
3955cae0c5 Mon Nov 26 01:49:49 2018 UTC  Fix regression test handling
of test_decoding with MSVC
b0b1f4183a Mon Nov 26 00:42:21 2018 UTC  Disable temporarily TAP tests
for contrib/bloom/
03faa4a8dd Sun Nov 25 23:39:19 2018 UTC  Add PGXS options to control
TAP and isolation tests

Maybe you could try reverting f17889b221 (change of key size) on the
superstition that it has something to do with libcrypto.so, but I have
no clue why Apply Launcher in particular would be affected by that.
Otherwise... assuming 2dedf4d9 doesn't show the problem, you could do
the world's slowest bisect 2dedf4d9 -> 7d4524aed3 in ~5.4 steps...

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Valgrind failures in Apply Launcher's bgworker_quickdie() exit

2018-12-13 Thread Andres Freund




On December 13, 2018 6:01:04 PM PST, Tom Lane  wrote:
>Thomas Munro  writes:
>> Since libcrypto.so is implicated, Andres asked me off-list if my
>> changes to random number state initialisation might be linked to
>> skink's failures beginning 12 or 15 days ago.  It appears not, as it
>> was green for several runs after that commit.
>> ...
>> It's Debian unstable, which could be a factor.  Bugs in glibc?

Note it's only failing on master...


>Has anyone tried to reproduce this on other platforms?

I recently also hit this locally, but since that's also Debian unstable... Note 
that removing openssl "fixed" the issue for me.


>It'd be interesting to know which updates were applied to skink's
>host before the first failing run.

I'll check the logs when I'm back at a real computer.

Andres

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Valgrind failures in Apply Launcher's bgworker_quickdie() exit

2018-12-13 Thread Tom Lane

Thomas Munro  writes:
> Since libcrypto.so is implicated, Andres asked me off-list if my
> changes to random number state initialisation might be linked to
> skink's failures beginning 12 or 15 days ago.  It appears not, as it
> was green for several runs after that commit.
> ...
> It's Debian unstable, which could be a factor.  Bugs in glibc?

Has anyone tried to reproduce this on other platforms?

It'd be interesting to know which updates were applied to skink's
host before the first failing run.

regards, tom lane

Valgrind failures in Apply Launcher's bgworker_quickdie() exit

2018-12-13 Thread Thomas Munro

Hello,

Since libcrypto.so is implicated, Andres asked me off-list if my
changes to random number state initialisation might be linked to
skink's failures beginning 12 or 15 days ago.  It appears not, as it
was green for several runs after that commit.  Looking at the report:

==2802== VALGRINDERROR-BEGIN
==2802== Invalid read of size 8
==2802==at 0x4DED5A5: check_free (dlerror.c:188)
==2802==by 0x4DEDAB1: free_key_mem (dlerror.c:221)
==2802==by 0x4DEDAB1: __dlerror_main_freeres (dlerror.c:239)
==2802==by 0x55DCF81: __libc_freeres (in /lib/x86_64-linux-gnu/libc-2.28.so)
==2802==by 0x482D19E: _vgnU_freeres (vg_preloaded.c:77)
==2802==by 0x478AD3: bgworker_quickdie (bgworker.c:661)
==2802==by 0x48626AF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.28.so)
==2802==by 0x556DB76: epoll_wait (epoll_wait.c:30)
==2802==by 0x4E25B9: WaitEventSetWaitBlock (latch.c:1078)
==2802==by 0x4E25B9: WaitEventSetWait (latch.c:1030)
==2802==by 0x4E28C1: WaitLatchOrSocket (latch.c:407)
==2802==by 0x4E29A6: WaitLatch (latch.c:347)
==2802==by 0x49E03E: ApplyLauncherMain (launcher.c:1062)
==2802==by 0x479831: StartBackgroundWorker (bgworker.c:834)
==2802==  Address 0x7fd7e28 is 12 bytes after a block of size 12 alloc'd
==2802==at 0x483577F: malloc (vg_replace_malloc.c:299)
==2802==by 0x4C3BD38: CRYPTO_zalloc (in
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1)
==2802==by 0x4C37F8D: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1)
==2802==by 0x4C615B9: RAND_DRBG_get0_public (in
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1)
==2802==by 0x4C615EF: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1)
==2802==by 0x675D75: pg_strong_random (pg_strong_random.c:135)
==2802==by 0x4848EB: RandomCancelKey (postmaster.c:5251)
==2802==by 0x484909: assign_backendlist_entry (postmaster.c:5822)
==2802==by 0x4873BA: do_start_bgworker (postmaster.c:5692)
==2802==by 0x487701: maybe_start_bgworkers (postmaster.c:5955)
==2802==by 0x4878C2: reaper (postmaster.c:2940)
==2802==by 0x48626AF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.28.so)
==2802==
==2802== VALGRINDERROR-END

The function __libc_freeres is a special glibc entry point provided
for leak checkers to call explicitly if they want glibc to clean up
after itself (normally it doesn't bother).  The specific thing being
cleaned up here is a piece of thread local storage that belongs to the
dynamic linker support code:

https://github.com/lattera/glibc/blob/master/dlfcn/dlerror.c#L228

Since we don't see strcmp() or free() at the top of the stack (and
assuming they aren't inlined), I think the line numbers must line up
with current glibc HEAD as of today, and it must be failing on
accessing rec->errstring at line 188, meaning that rec (the value
stored as a thread specific key) is a bad pointer.  That's quite
strange and I don't have an explanation; if libcrypto overran its
buffer, for example, that would perhaps trash rec->errstring but we'd
still be able to read the pointer itself.  So I wonder if libcrypto.so
is a red herring here.

It's Debian unstable, which could be a factor.  Bugs in glibc?
That's 2.28, out for 3 months now, but then why only in Apply
Launcher? Did we trash 'key', or the thread specific pointer table, or
is my assessment above wrong, and somehow it's really errstring that
is a bad pointer (which would allow for a more mundane explanation,
like someone trashed a bit of heap memory by overrunning a buffer)?

-- 
Thomas Munro
http://www.enterprisedb.com

RE: [PROPOSAL]a new data type 'bytea' for ECPG

2018-12-13 Thread Matsumura, Ryo

Meskes-san

> > > > The patch does not support ECPG.bytea in sqltype of "struct
> > > > sqlvar_struct"
> > > > because of compatibility.
> > 
> > Sorry I do not really understand what you mean. Could you please
> > explain?
> 
> I meaned that existing applications that receive data of bytea column
> with using sqlda will encounter their unknown type(=ECPG.bytea) in
> sqlvar_struct.sqltype.
> 
> You mean if they are not recompiled? If so, yes, how else could that be
> handled?

Even if they are recompiled, they will fail.

  switch (sqlvar_struct.sqltype)
  {
case ECPG.int:  break;
case ECPG.char: break;
  /* There is no case for ECPG.bytea */
default:  abort();

There is an idea as following, but it seems to be ugly.

  Implement a parameter for ecpglib.
  The parameter means whether application want to receive
  bytea data in binary format or not. Default is "not".
  # I don't know any ecpglib's parameter like it.

In other words, if application uses "bytea" type host variable, 
ecpglib could know its intent, but in case of sqlda ecpglib could
not know it.

Regards
Ryo Matsumura

Re: Introducing SNI in TLS handshake for SSL connections

2018-12-13 Thread Chapman Flack

On 12/13/18 08:07, Andreas Karlsson wrote:
> But I will attach my small patch for this, which I am now opposed to, anyway
> so the code exists if a use case turns up in the future (or if it turns out
> my reasoning above is incorrect).

Here's the same patch with one small copy-pasto fixed.

-Chap
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index d2e5b08541e..528757f775d 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1460,6 +1460,23 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
  
 
+ 
+  sslsni
+  
+   
+If set to 1, the host name is sent to the server using SSL's
+SNI (Server Name Indication) extension.  If set
+to 0, no SNI extension will be sent.  The default is
+0.  This parameter is ignored if a connection without SSL is made.
+   
+
+   
+The PostgreSQL server ignores the SNI extension,
+but it can be used by SSL-aware proxy software.
+   
+  
+ 
+
  
   sslcert
   
@@ -7373,6 +7390,16 @@ myEventProc(PGEventId evtId, void *evtInfo, void *passThrough)
  
 
 
+
+ 
+  
+   PGSSLSNI
+  
+  PGSSLSNI behaves the same as the  connection parameter.
+ 
+
+
 
  
   
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index bc456fec0c2..4587e5ebb5a 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -278,6 +278,10 @@ static const internalPQconninfoOption PQconninfoOptions[] = {
 		"SSL-Compression", "", 1,
 	offsetof(struct pg_conn, sslcompression)},
 
+	{"sslsni", "PGSSLSNI", "0", NULL,
+		"SSL-SNI", "", 1,
+	offsetof(struct pg_conn, sslsni)},
+
 	{"sslcert", "PGSSLCERT", NULL, NULL,
 		"SSL-Client-Cert", "", 64,
 	offsetof(struct pg_conn, sslcert)},
@@ -3690,6 +3694,8 @@ freePGconn(PGconn *conn)
 		free(conn->sslcrl);
 	if (conn->sslcompression)
 		free(conn->sslcompression);
+	if (conn->sslsni)
+		free(conn->sslsni);
 	if (conn->requirepeer)
 		free(conn->requirepeer);
 	if (conn->connip)
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index beca3492e8d..fdae2eac74f 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -781,6 +781,7 @@ initialize_SSL(PGconn *conn)
 	char		homedir[MAXPGPATH];
 	char		fnbuf[MAXPGPATH];
 	char		sebuf[PG_STRERROR_R_BUFLEN];
+	char	   *host;
 	bool		have_homedir;
 	bool		have_cert;
 	bool		have_rootcert;
@@ -1183,6 +1184,11 @@ initialize_SSL(PGconn *conn)
 #endif
 #endif
 
+	host = conn->connhost[conn->whichhost].host;
+
+	if (conn->sslsni && conn->sslsni[0] == '1' && host)
+		SSL_set_tlsext_host_name(conn->ssl, host);
+
 	return 0;
 }
 
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 66fd317b949..9f69fbdf5fc 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -353,6 +353,7 @@ struct pg_conn
 	 * retransmits */
 	char	   *sslmode;		/* SSL mode (require,prefer,allow,disable) */
 	char	   *sslcompression; /* SSL compression (0 or 1) */
+	char	   *sslsni;			/* SSL SNI extension (0 or 1) */
 	char	   *sslkey;			/* client key filename */
 	char	   *sslcert;		/* client certificate filename */
 	char	   *sslrootcert;	/* root certificate filename */

Re: automatically assigning catalog toast oids

2018-12-13 Thread Tom Lane

Andres Freund  writes:
> [ separation of FirstBootstrapObjectId and FirstGenbkiObjectId ]

It seems like we should also add a check to genbki.pl that the ending
value of GenbkiNextOid is <= FirstBootstrapObjectId, so that we'll
certainly notice when it's necessary to adjust those boundaries.

(There's not a similar check for whether initdb runs past
FirstNormalObjectId, but I think that's fine: the system should cope
with hitting those OIDs again after a wraparound.  Also, because of our
uncertainty about just how many OIDs will be consumed for collations,
it'd be a bad idea to ship code with a hard constraint on that boundary
not being overrun.)

>> I did *not* change record_plan_function_dependency(), it seems correct
>> that it doesn't track genbki assigned oids, they certainly can't change
>> while a server is running.  But I'm not entirely clear to why that's not
>> using FirstNormalObjectId as the cutoff, so perhaps I'm missing
>> something.  Similar with logic in predicate.c.

It's not about whether they can change whether the server is running,
it's about whether they're pinned.  OIDs above 1 might or might
not be pinned; we shouldn't hard-wire assumptions about that.  So
I suspect you made the wrong choice there.

BTW, this ties into something that was bugging me this afternoon while
looking at dependencies on the default collation: there's a bunch of
places that special-case DEFAULT_COLLATION_OID to skip adding dependencies
on it, for instance this in dependency.c:

/*
 * We must also depend on the constant's collation: it could be
 * different from the datatype's, if a CollateExpr was const-folded to
 * a simple constant.  However we can save work in the most common
 * case where the collation is "default", since we know that's pinned.
 */
if (OidIsValid(con->constcollid) &&
con->constcollid != DEFAULT_COLLATION_OID)
add_object_address(OCLASS_COLLATION, con->constcollid, 0,
   context->addrs);

I'm pretty sure that that special case is my fault, added in perhaps
over-eagerness to minimize the cost added by the collations feature.
Looking at it now, it seems like mostly a wart.  But perhaps there is
a more general principle here: why don't we just hard-wire an assumption
that all OIDs below FirstGenbkiObjectId are pinned?  I'm thinking of
getting rid of the DEFAULT_COLLATION_OID special cases in dependency-
recording logic and instead having someplace central like isObjectPinned
just assume that such OIDs are pinned, so that we don't bother to
consult or update pg_depend for them.

We could take it a bit further and not make the 'p' entries for such
OIDs in the first place, greatly reducing the size of pg_depend in
typical installations.  Perhaps that'd confuse clients that look at
that catalog, though.

Looking at the existing entries, it seems like we'd have to have
one special case: schema public has OID 2200 but is intentionally
not pinned.  I wouldn't have a hard time with teaching isObjectPinned
about that; though if it turns out that many places need to know
about it, maybe it's not workable.

Thoughts?

regards, tom lane

Re: Minimal logical decoding on standbys

2018-12-13 Thread Andres Freund

Hi,

On 2018-12-13 19:32:19 -0500, Robert Haas wrote:
> On Wed, Dec 12, 2018 at 3:41 PM Andres Freund  wrote:
> > I don't like the approach of managing the catalog horizon via those
> > periodically logged catalog xmin announcements.  I think we instead
> > should build ontop of the records we already have and use to compute
> > snapshot conflicts.  As of HEAD we don't know whether such tables are
> > catalog tables, but that's just a bool that we need to include in the
> > records, a basically immeasurable overhead given the size of those
> > records.
> 
> To me, this paragraph appears to say that you don't like Craig's
> approach without quite explaining why you don't like it.  Could you be
> a bit more explicit about that?

I think the conflict system introduced in Craig's patch is quite
complicated, relies on logging new wal records on a regular basis, adds
needs to be more conservative about the xmin horizon, which is obviously
not great for performance.

If you look at Craig's patch, it currently relies on blocking out
concurrent checkpoints:
   /*
* We must prevent a concurrent checkpoint, otherwise the 
catalog xmin
* advance xlog record with the new value might be written 
before the
* checkpoint but the checkpoint may still see the old
* oldestCatalogXmin value.
*/
   if (!LWLockConditionalAcquire(CheckpointLock, LW_SHARED))
   /* Couldn't get checkpointer lock; will retry later */
   return;
which on its own seems unacceptable, given that CheckpointLock can be
held by checkpointer for a very long time. While that's ongoing the
catalog xmin horizon doesn't advance.

Looking at the code it seems hard, to me, to make that approach work
nicely. But I might just be tired.

> > I'm wondering if it's time to move the latestRemovedXid computation for
> > this type of record to the primary - it's likely to be cheaper there and
> > avoids this kind of complication. Secondarily, it'd have the advantage
> > of making pluggable storage integration easier - there we have the
> > problem that we don't know which type of relation we're dealing with
> > during recovery, so such lookups make pluggability harder (zheap just
> > adds extra flags to signal that, but that's not extensible).
> 
> That doesn't look trivial.  It seems like _bt_delitems_delete() would
> need to get an array of XIDs, but that gets called from
> _bt_vacuum_one_page(), which doesn't have that information available.
> It doesn't look like there is a particularly cheap way of getting it,
> either.  What do you have in mind?

I've a prototype attached, but let's discuss the details in a separate
thread. This also needs to be changed for pluggable storage, as we don't
know about table access methods in the startup process, so we can't call
can't determine which AM the heap is from during
btree_xlog_delete_get_latestRemovedXid() (and sibling routines).

Writing that message right now.

> > - enforce hot-standby to be enabled on the standby when logical slots
> >   are created, and at startup if a logical slot exists
> 
> Why do we need this?

Currently the conflict routines are only called when hot standby is
on. There's also no way to use logical decoding (including just advancing the
slot), without hot-standby being enabled, so I think that'd be a pretty
harmless restriction.

> > - Have a nicer conflict handling than what I implemented here.  Craig's
> >   approach deleted the slots, but I'm not sure I like that.  Blocking
> >   seems more appropriately here, after all it's likely that the
> >   replication topology would be broken afterwards.
> 
> I guess the viable options are approximately --

> (1) drop the slot

Doable.

> (2)  advance the slot

That's not realistically possible, I think. We'd need to be able to use
most of the logical decoding infrastructure in that context, and we
don't have that available.  It's also possible to deadlock, where
advancing the slot's xmin horizon would need further WAL, but WAL replay
is blocked on advancing the slot.

> (3) mark the slot as "failed" but leave it in existence as a tombstone

We currently don't have that, but it'd be doable, I think.

> (4) wait until something changes.
> (4) seems pretty unfortunate unless there's some other system for
> having the slot advance automatically.  Seems like a way for
> replication to hang indefinitely without anybody understanding why
> it's happened (or, maybe, noticing).

On the other hand, it would often allow whatever user of the slot to
continue using it, till the conflict is "resolved". To me it seems about
as easy to debug physical replication being blocked, as somehow the slot
being magically deleted or marked as invalid.

Thanks for looking,

Andres Freund
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index ab5aaff1566..2f13a0fd2ad 100644
--- a/src/backend/acc

Re: Minimal logical decoding on standbys

2018-12-13 Thread Robert Haas

On Wed, Dec 12, 2018 at 3:41 PM Andres Freund  wrote:
> I don't like the approach of managing the catalog horizon via those
> periodically logged catalog xmin announcements.  I think we instead
> should build ontop of the records we already have and use to compute
> snapshot conflicts.  As of HEAD we don't know whether such tables are
> catalog tables, but that's just a bool that we need to include in the
> records, a basically immeasurable overhead given the size of those
> records.

To me, this paragraph appears to say that you don't like Craig's
approach without quite explaining why you don't like it.  Could you be
a bit more explicit about that?

> I also don't think we should actually enforce hot_standby_feedback being
> enabled - there's use-cases where that's not convenient, and it's not
> bullet proof anyway (can be enabled/disabled without using logical
> decoding inbetween).  I think when there's a conflict we should have the
> HINT mention that hs_feedback can be used to prevent such conflicts,
> that ought to be enough.

If we can make that work, +1 from me.

> I'm wondering if it's time to move the latestRemovedXid computation for
> this type of record to the primary - it's likely to be cheaper there and
> avoids this kind of complication. Secondarily, it'd have the advantage
> of making pluggable storage integration easier - there we have the
> problem that we don't know which type of relation we're dealing with
> during recovery, so such lookups make pluggability harder (zheap just
> adds extra flags to signal that, but that's not extensible).

That doesn't look trivial.  It seems like _bt_delitems_delete() would
need to get an array of XIDs, but that gets called from
_bt_vacuum_one_page(), which doesn't have that information available.
It doesn't look like there is a particularly cheap way of getting it,
either.  What do you have in mind?

> Another alternative would be to just prevent such index deletions for
> catalog tables when wal_level = logical.

That doesn't sound like a very nice idea.

> If we were to go with this approach, there'd be at least the following
> tasks:
> - adapt tests from [2]

OK.

> - enforce hot-standby to be enabled on the standby when logical slots
>   are created, and at startup if a logical slot exists

Why do we need this?

> - fix issue around btree_xlog_delete_get_latestRemovedXid etc mentioned
>   above.

OK.

> - Have a nicer conflict handling than what I implemented here.  Craig's
>   approach deleted the slots, but I'm not sure I like that.  Blocking
>   seems more appropriately here, after all it's likely that the
>   replication topology would be broken afterwards.

I guess the viable options are approximately -- (1) drop the slot, (2)
advance the slot, (3) mark the slot as "failed" but leave it in
existence as a tombstone, (4) wait until something changes.  I like
(3) better than (1).  (4) seems pretty unfortunate unless there's some
other system for having the slot advance automatically.  Seems like a
way for replication to hang indefinitely without anybody understanding
why it's happened (or, maybe, noticing).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_partition_tree crashes for a non-defined relation

2018-12-13 Thread Michael Paquier

On Thu, Dec 13, 2018 at 03:34:56PM +0900, Amit Langote wrote:
> Thank you Michael for taking care of this.  I agree with the consensus
> that instead of throwing an error for unsupported relkinds,
> pg_partition_tree should rather return single row containing NULL for all
> columns.

Cool, thanks for confirming!  If there are complains on the matter,
let's of course tune it again if necessary before the release.  There is
still time for that.
--
Michael


signature.asc
Description: PGP signature

Re: Can we remove the #fdef UNUSED code from nbtxlog.c

2018-12-13 Thread Michael Paquier

On Thu, Dec 13, 2018 at 03:28:15PM -0800, Andres Freund wrote:
> nbtxlog.c has a fairly large section of code commented out with #ifdef
> UNUSED. With a node justifying that with:
> 
> +* This section of code is thought to be no longer needed, after
> +* analysis of the calling paths. It is retained to allow the code
> +* to be reinstated if a flaw is revealed in that thinking.

This comes from 3e4b7d87.

> That seems much better addressed by git revert/git log.  Having a
> comment explaining facts that aren't true anymore is just plain
> confusing.

+1.  It seems to me as well that this is better if just deleted.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2018-12-13 Thread Michael Paquier

On Thu, Dec 13, 2018 at 07:14:57PM -0500, Stephen Frost wrote:
> Based on at least a quick looking around, the actual grammar rule seems
> to match my recollection[1], adverbs should typically go AFTER the
> verb + object, and the adverb shouldn't ever be placed between the verb
> and the object.

This part has been a long debate already in 2012-2013 when I sent the
first iterations of the patch, and my memories on the matter are that
the grammar you are showing here matches with the past agreement.
--
Michael


signature.asc
Description: PGP signature

Re: Change pgarch_readyXlog() to return .history files first

2018-12-13 Thread Michael Paquier

On Thu, Dec 13, 2018 at 11:53:53AM -0500, David Steele wrote:
> I also think we should consider back-patching this change.  It's hard to
> imagine that archive commands would have trouble with this reordering
> and the current ordering causes real pain in HA clusters.

I would like to hear opinion from other though if we should consider
that as an improvement or an actual bug fix.  Changing the order of the
files to map with what the startup process does when promoting does not
sound like a bug fix to me, still this is not really invasive, so we
could really consider it worth back-patching to reduce common pain from
users when it comes to timeline handling.

> -if (!found)
> +/* Is this a history file? */
> +bool history = basenamelen >= sizeof(".history") &&
> +strcmp(rlde->d_name + (basenamelen - sizeof(".history") + 1),
> +   ".history.ready") == 0;

Or you could just use IsTLHistoryFileName here?

If one wants to simply check this code, you can just create dummy orphan
files in archive_status and see in which order they get removed.
--
Michael

signature.asc
Description: PGP signature

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2018-12-13 Thread Stephen Frost

Greetings,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 09/12/2018 19:55, Sergei Kornilov wrote:
> >> reindex table CONCURRENTLY test;
> 
> By the way, does this syntax make sense?  I haven't seen a discussion on
> this anywhere in the various threads.  I keep thinking that
> 
> reindex concurrently table test;
> 
> would make more sense.  How about in combination with (verbose)?

I don't think it's a mistake that we have 'create index concurrently'
and it certainly would seem odd to me for 'create index' and 'reindex
table' to be different.

Certainly, from my recollection of english, you'd say "I am going to
reindex the table concurrently", you wouldn't say "I am going to
reindex concurrently the table."

Based on at least a quick looking around, the actual grammar rule seems
to match my recollection[1], adverbs should typically go AFTER the
verb + object, and the adverb shouldn't ever be placed between the verb
and the object.

Thanks!

Stephen

[1]: http://www.grammar.cl/Notes/Adverbs.htm

signature.asc
Description: PGP signature

Re: Cache lookup errors with functions manipulation object addresses

2018-12-13 Thread Michael Paquier

On Thu, Dec 13, 2018 at 02:58:02PM -0300, Alvaro Herrera wrote:
> On 2018-Dec-13, Michael Paquier wrote:
>> Attached is an updated version for that as 0001.  Thanks for the
>> review.  Does that part look good to you now?
> 
> +1.

Thanks for the review, I have applied this part.

> Hmm ... "routine"?

That's even better.

> I'm not sure if NULLs are better than empty arrays, but I agree that we
> should pick one representation for undefined object and use it
> consistently for all object types.

Okay, thanks.

>> There is some more refactoring work still needed for constraints, large
>> objects and functions, in a way similar to a26116c6.  I am pretty happy
>> with the shape of 0001, so this could be applied, 0002 still needs to be
>> reworked so as all undefined object types behave as described above in a
>> consistent manner.  Do those definitions make sense?
> 
> I think so, yes.
> 
> Thanks for taking care of this.

Thanks again for looking up at what was proposed.  I'll see if I can
finish the refactoring part for the next CF, and be done with this
stuff.
--
Michael


signature.asc
Description: PGP signature

Re: Remove Deprecated Exclusive Backup Mode

2018-12-13 Thread Robert Haas

On Thu, Dec 13, 2018 at 2:29 PM David Steele  wrote:
> Good question.
>
> We could leave the third parameter (changing the default to false) and
> error if it has any value but false.  It's a bit ugly but it does
> maintain compatibility with the current non-exclusive backup interface.
>  And, the third parameter would never need to be specified unless we add
> a fourth.
>
> Or we could add a new function and deprecate this one -- but what to
> call it?
>
> I agree it's not very cool to break code for users who actually *did*
> migrate to non-exclusive backups.

Uh, yeah.  If you do that, you're not just forcing people off the old
interface -- you're making *everybody* update their stuff again.  And
having to have conditional logic to handle different releases presents
its own set of dangers. IMHO, the biggest enemy in this area BY FAR is
human error, not the details of how the interface works. No amount of
providing a better-designed interface will compensate for the
confusion factor involved in changing it.

My vote is ... when we actually deprecate this, change nothing at the
SQL level initially, but just throw an error if the "exclusive"
argument isn't false:

ERROR: exclusive backup mode is no longer supported

That will require everyone to use the three-argument form of
pg_start_backup(), but I think that's good, because if we accept the
one and two argument forms, how do we actually know that the user
updated their code?  If you want to actually force people to switch to
the non-exclusive mode, you have to make sure that anything that might
be a residual request for exclusive backup mode errors out.  If the
same SQL just does something different, the user might fail to notice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Can we remove the #fdef UNUSED code from nbtxlog.c

2018-12-13 Thread Andres Freund

Hi,

nbtxlog.c has a fairly large section of code commented out with #ifdef
UNUSED. With a node justifying that with:

+* This section of code is thought to be no longer needed, after
+* analysis of the calling paths. It is retained to allow the code
+* to be reinstated if a flaw is revealed in that thinking.

That seems much better addressed by git revert/git log.  Having a
comment explaining facts that aren't true anymore is just plain
confusing.

Greetings,

Andres Freund

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2018-12-13 Thread Peter Eisentraut

On 09/12/2018 19:55, Sergei Kornilov wrote:
>> reindex table CONCURRENTLY test;

By the way, does this syntax make sense?  I haven't seen a discussion on
this anywhere in the various threads.  I keep thinking that

reindex concurrently table test;

would make more sense.  How about in combination with (verbose)?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: automatically assigning catalog toast oids

2018-12-13 Thread Andres Freund

On 2018-12-11 15:08:02 -0800, Andres Freund wrote:
> Hi,
> 
> On 2018-12-09 18:43:14 -0500, Tom Lane wrote:
> > Andres Freund  writes:
> > > On 2018-12-09 17:14:42 -0500, Tom Lane wrote:
> > >> Well, that's just a different very-easily-broken assumption.  There are
> > >> a lot of things that make auto-assigned OIDs unstable, and I do not think
> > >> that we want to guarantee that they'll hold still across a release 
> > >> series.
> > 
> > > Why wouldn't they be for genbki (rather than initdb) assigned oids?  I
> > > don't think it's reasonable to add new functions or such post release
> > > that would have move oid assignments for other objects?
> > 
> > As you've got this set up, we couldn't change *anything* for fear of
> > it moving auto-assignments; there's no isolation between catalogs.
> 
> But there wasn't any previously either?
> 
> 
> > Another thing I seriously dislike is that this allows people to omit OIDs
> > from .dat entries in catalogs where we traditionally hand-assign OIDs.
> 
> That's not new, is it?  Sure, now genbki.pl assigns the oid, but
> previously it'd just have been heap_insert()? bootparse.y/bootstrap.c
> never enforced that oids are assigned for tables that have oids.
> 
> 
> > That's not a good idea because it would mean those entries don't have
> > stable OIDs, whereas the whole point of hand assignment is to ensure
> > all built-in objects of a particular type have stable OIDs.  Now, you
> > could argue about the usefulness of that policy for any given catalog;
> > but if we decide that catalog X doesn't need stable OIDs then that should
> > be an intentional policy change, not something that can happen because
> > one lazy hacker didn't follow the policy.
> 
> I think we should change that policy, but I also think that there wasn't
> any meaningful "assignment policy" change in what I did. So that just
> seems like a separate argument.
> 
> Note that changing that for "prominent" catalogs would be a bit more
> work than just changing the policy, as we'd need to assign oids before
> the lookup tables are built - although the current behaviour would kind
> of allow us to implement the "not crazy" policy of allowing
> auto-assignment as long as the object isn't referenced; but via an imo
> fairly opaque mechanism.
> 
> 
> > > I'm fine with adding a distinct range, the earlier version of the patch
> > > had that. I'd asked for comments if anybody felt a need to keep that,
> > > nobody replied...  I alternatively proposed that we could just start at
> > > FirstNormalObjectId for those and update the server's oid start value to
> > > the maximum genbki assigned oid.  Do you have preferences around that?
> > 
> > Yeah, I thought about the latter as well.  But it adds complexity to the
> > bootstrap process and makes it harder to tell what assigned a particular
> > OID, so I'd rather go with the former, at least until the OID situation
> > gets too tight to allow for daylight between the ranges.
> 
> Yea, it doesn't seem perfect, that's basically why I didn't go for it
> last time.
> 
> 
> > It looks to me like as of HEAD, genbki.pl is auto-assigning about 1470
> > OIDs.  Meanwhile, on my RHEL6 machine, initdb is auto-assigning about
> > 1740 OIDs (what a coincidence); of those, 872 are collation entries
> > that are absorbed from the system environment.  So the second number is
> > likely to vary a lot from platform to platform.  (I don't have ICU
> > enabled; I wonder how many that typically adds.)
> > 
> > I'd be inclined to allow say 2000 OIDs for genbki.pl, with 4384 therefore
> > available for initdb.  We could expect to have to raise the boundary
> > from time to time, but not very often.
> 
> I've attached a patch implementing that.  I'm not particularly in love
> with FirstGenbkiObjectId as the symbol, but I couldn't think of
> something more descriptive.
> 
> I changed the length of fmgr_builtin_oid_index to FirstGenbkiObjectId -
> until we allow pg_proc oids to be auto-assigned that'd just be wasted
> memory otherwise?
> 
> I did *not* change record_plan_function_dependency(), it seems correct
> that it doesn't track genbki assigned oids, they certainly can't change
> while a server is running.  But I'm not entirely clear to why that's not
> using FirstNormalObjectId as the cutoff, so perhaps I'm missing
> something.  Similar with logic in predicate.c.
> 
> I did however change postgres_fdw's is_builtin(), as that says:
>  /*
>   * Return true if given object is one of PostgreSQL's built-in objects.
>   *
> - * We use FirstBootstrapObjectId as the cutoff, so that we only consider
> + * We use FirstGenbkiObjectId as the cutoff, so that we only consider
>   * objects with hand-assigned OIDs to be "built in", not for instance any
>   * function or type defined in the information_schema.
>   *
> @@ -154,7 +154,7 @@ lookup_shippable(Oid objectId, Oid classId, 
> PgFdwRelationInfo *fpinfo)
> 
> and >= FirstGenbkiObjectId would not be maniually assigned.
> 
> 
> I added a th

Re: Ryu floating point output patch

2018-12-13 Thread Alvaro Herrera

On 2018-Dec-13, Andrew Gierth wrote:

> And again to fix the windows build - why does Mkvcbuild.pm have its own
> private copy of the file list for src/common?

I think at some point the Makefile parsing code was too stupid to deal
with the src/port Makefile, so it was hardcoded; later probably I
cargo-culted that into the src/common makefile.  Maybe the parser has
already improved (or the makefile simplified) that the msvc tooling can
extract the files from the makefile?  Dunno.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Ryu floating point output patch

2018-12-13 Thread Andrew Gierth

> "Andrew" == Andrew Gierth  writes:

 Andrew> This code in particular was very heavily into using mixed
 Andrew> declarations and code throughout. Removing those was moderately
 Andrew> painful.

 Andrew> And it turns out I missed one, sigh.

 Andrew> Updated patch.

And again to fix the windows build - why does Mkvcbuild.pm have its own
private copy of the file list for src/common?

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index cf9327f885..24d41c2e89 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -21,6 +21,7 @@
 
 #include "catalog/pg_type.h"
 #include "common/int.h"
+#include "common/ryu.h"
 #include "libpq/pqformat.h"
 #include "utils/array.h"
 #include "utils/float.h"
@@ -29,7 +30,7 @@
 
 
 /* Configurable GUC parameter */
-int			extra_float_digits = 0; /* Added to DBL_DIG or FLT_DIG */
+int			extra_float_digits = 1; /* Added to DBL_DIG or FLT_DIG */
 
 /* Cached constants for degree-based trig functions */
 static bool degree_consts_set = false;
@@ -246,6 +247,12 @@ float4out(PG_FUNCTION_ARGS)
 	char	   *ascii = (char *) palloc(32);
 	int			ndig = FLT_DIG + extra_float_digits;
 
+	if (extra_float_digits > 0)
+	{
+		ryu_f2s_buffered(num, ascii);
+		PG_RETURN_CSTRING(ascii);
+	}
+
 	(void) pg_strfromd(ascii, 32, ndig, num);
 	PG_RETURN_CSTRING(ascii);
 }
@@ -462,6 +469,12 @@ float8out_internal(double num)
 	char	   *ascii = (char *) palloc(32);
 	int			ndig = DBL_DIG + extra_float_digits;
 
+	if (extra_float_digits > 0)
+	{
+		ryu_d2s_buffered(num, ascii);
+		return ascii;
+	}
+
 	(void) pg_strfromd(ascii, 32, ndig, num);
 	return ascii;
 }
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6fe1939881..6e223335bc 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2631,11 +2631,12 @@ static struct config_int ConfigureNamesInt[] =
 		{"extra_float_digits", PGC_USERSET, CLIENT_CONN_LOCALE,
 			gettext_noop("Sets the number of digits displayed for floating-point values."),
 			gettext_noop("This affects real, double precision, and geometric data types. "
-		 "The parameter value is added to the standard number of digits "
-		 "(FLT_DIG or DBL_DIG as appropriate).")
+		 "A zero or negative parameter value is added to the standard "
+		 "number of digits (FLT_DIG or DBL_DIG as appropriate). "
+		 "Any value greater than zero selects round-trip-safe output.")
 		},
 		&extra_float_digits,
-		0, -15, 3,
+		1, -15, 3,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/common/Makefile b/src/common/Makefile
index ec8139f014..02622f8c48 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -44,8 +44,10 @@ override CPPFLAGS += -DVAL_LIBS="\"$(LIBS)\""
 override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
 LIBS += $(PTHREAD_LIBS)
 
-OBJS_COMMON = base64.o config_info.o controldata_utils.o exec.o file_perm.o \
-	ip.o keywords.o link-canary.o md5.o pg_lzcompress.o \
+# If you add objects here, see also src/tools/msvc/Mkvcbuild.pm
+
+OBJS_COMMON = base64.o config_info.o controldata_utils.o d2s.o exec.o f2s.o \
+	file_perm.o ip.o keywords.o link-canary.o md5.o pg_lzcompress.o \
 	pgfnames.o psprintf.o relpath.o \
 	rmtree.o saslprep.o scram-common.o string.o unicode_norm.o \
 	username.o wait_error.o
diff --git a/src/common/d2s.c b/src/common/d2s.c
new file mode 100644
index 00..5155dceeea
--- /dev/null
+++ b/src/common/d2s.c
@@ -0,0 +1,961 @@
+/*---
+ *
+ * Ryu floating-point output for double precision.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/common/d2s.c
+ *
+ * This is a modification of code taken from github.com/ulfjack/ryu under the
+ * terms of the Boost license (not the Apache license). The original copyright
+ * notice follows:
+ *
+ * Copyright 2018 Ulf Adams
+ *
+ * The contents of this file may be used under the terms of the Apache
+ * License, Version 2.0.
+ *
+ * (See accompanying file LICENSE-Apache or copy at
+ *  http://www.apache.org/licenses/LICENSE-2.0)
+ *
+ * Alternatively, the contents of this file may be used under the terms of the
+ * Boost Software License, Version 1.0.
+ *
+ * (See accompanying file LICENSE-Boost or copy at
+ *  https://www.boost.org/LICENSE_1_0.txt)
+ *
+ * Unless required by applicable law or agreed to in writing, this software is
+ * distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.
+ *
+ *---
+ */
+
+/*
+ *  Runtime compiler options:
+ *
+ *  -DRYU_ONLY_64_BIT_OPS Avoid using uint128 or 64-bit intrinsics. Slower,
+ *  depending on your compiler.
+ */
+
+#ifndef FRONTEND
+#include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
+
+#include "common/ryu.h"
+
+/*
+ * For consistency, we use 128-bit types if

Re: Ryu floating point output patch

2018-12-13 Thread Thomas Munro

On Fri, Dec 14, 2018 at 8:00 AM Andres Freund  wrote:
> Hi,
>
> On 2018-12-13 20:53:23 +, Andrew Gierth wrote:
> > > "Andres" == Andres Freund  writes:
> >
> >  >> This code in particular was very heavily into using mixed
> >  >> declarations and code throughout. Removing those was moderately
> >  >> painful.
> >
> >  Andres> I wonder if we should instead relax those restriction for the
> >  Andres> largely foreign pieces of code?
> >
> > Do we have any reason for the restriction beyond stylistic preference?
>
> No. Robert and Tom were against allowing mixing declarations and code, a
> few more against // comments. I don't quite agree with either, but
> getting to the rest of C99 seemed more important than fighting those out
> at that moment.

-1 for making superficial changes to code that we are "vendoring",
unless it is known to be dead/abandoned and we're definitively forking
it.  It just makes it harder to track upstream's bug fixes and
improvements, surely?

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Ryu floating point output patch

2018-12-13 Thread Andrew Gierth

> "Andrew" == Andrew Gierth  writes:

 Andrew> This code in particular was very heavily into using mixed
 Andrew> declarations and code throughout. Removing those was moderately
 Andrew> painful.

 Andrew> And it turns out I missed one, sigh.

Updated patch.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index cf9327f885..24d41c2e89 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -21,6 +21,7 @@
 
 #include "catalog/pg_type.h"
 #include "common/int.h"
+#include "common/ryu.h"
 #include "libpq/pqformat.h"
 #include "utils/array.h"
 #include "utils/float.h"
@@ -29,7 +30,7 @@
 
 
 /* Configurable GUC parameter */
-int			extra_float_digits = 0; /* Added to DBL_DIG or FLT_DIG */
+int			extra_float_digits = 1; /* Added to DBL_DIG or FLT_DIG */
 
 /* Cached constants for degree-based trig functions */
 static bool degree_consts_set = false;
@@ -246,6 +247,12 @@ float4out(PG_FUNCTION_ARGS)
 	char	   *ascii = (char *) palloc(32);
 	int			ndig = FLT_DIG + extra_float_digits;
 
+	if (extra_float_digits > 0)
+	{
+		ryu_f2s_buffered(num, ascii);
+		PG_RETURN_CSTRING(ascii);
+	}
+
 	(void) pg_strfromd(ascii, 32, ndig, num);
 	PG_RETURN_CSTRING(ascii);
 }
@@ -462,6 +469,12 @@ float8out_internal(double num)
 	char	   *ascii = (char *) palloc(32);
 	int			ndig = DBL_DIG + extra_float_digits;
 
+	if (extra_float_digits > 0)
+	{
+		ryu_d2s_buffered(num, ascii);
+		return ascii;
+	}
+
 	(void) pg_strfromd(ascii, 32, ndig, num);
 	return ascii;
 }
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6fe1939881..6e223335bc 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2631,11 +2631,12 @@ static struct config_int ConfigureNamesInt[] =
 		{"extra_float_digits", PGC_USERSET, CLIENT_CONN_LOCALE,
 			gettext_noop("Sets the number of digits displayed for floating-point values."),
 			gettext_noop("This affects real, double precision, and geometric data types. "
-		 "The parameter value is added to the standard number of digits "
-		 "(FLT_DIG or DBL_DIG as appropriate).")
+		 "A zero or negative parameter value is added to the standard "
+		 "number of digits (FLT_DIG or DBL_DIG as appropriate). "
+		 "Any value greater than zero selects round-trip-safe output.")
 		},
 		&extra_float_digits,
-		0, -15, 3,
+		1, -15, 3,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/common/Makefile b/src/common/Makefile
index ec8139f014..ae89dd8c5e 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -44,8 +44,8 @@ override CPPFLAGS += -DVAL_LIBS="\"$(LIBS)\""
 override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
 LIBS += $(PTHREAD_LIBS)
 
-OBJS_COMMON = base64.o config_info.o controldata_utils.o exec.o file_perm.o \
-	ip.o keywords.o link-canary.o md5.o pg_lzcompress.o \
+OBJS_COMMON = base64.o config_info.o controldata_utils.o d2s.o exec.o f2s.o \
+	file_perm.o ip.o keywords.o link-canary.o md5.o pg_lzcompress.o \
 	pgfnames.o psprintf.o relpath.o \
 	rmtree.o saslprep.o scram-common.o string.o unicode_norm.o \
 	username.o wait_error.o
diff --git a/src/common/d2s.c b/src/common/d2s.c
new file mode 100644
index 00..5155dceeea
--- /dev/null
+++ b/src/common/d2s.c
@@ -0,0 +1,961 @@
+/*---
+ *
+ * Ryu floating-point output for double precision.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/common/d2s.c
+ *
+ * This is a modification of code taken from github.com/ulfjack/ryu under the
+ * terms of the Boost license (not the Apache license). The original copyright
+ * notice follows:
+ *
+ * Copyright 2018 Ulf Adams
+ *
+ * The contents of this file may be used under the terms of the Apache
+ * License, Version 2.0.
+ *
+ * (See accompanying file LICENSE-Apache or copy at
+ *  http://www.apache.org/licenses/LICENSE-2.0)
+ *
+ * Alternatively, the contents of this file may be used under the terms of the
+ * Boost Software License, Version 1.0.
+ *
+ * (See accompanying file LICENSE-Boost or copy at
+ *  https://www.boost.org/LICENSE_1_0.txt)
+ *
+ * Unless required by applicable law or agreed to in writing, this software is
+ * distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.
+ *
+ *---
+ */
+
+/*
+ *  Runtime compiler options:
+ *
+ *  -DRYU_ONLY_64_BIT_OPS Avoid using uint128 or 64-bit intrinsics. Slower,
+ *  depending on your compiler.
+ */
+
+#ifndef FRONTEND
+#include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
+
+#include "common/ryu.h"
+
+/*
+ * For consistency, we use 128-bit types if and only if the rest of PG also
+ * does, even though we could use them here without worrying about the
+ * alignment concerns that apply elsewhere.
+ */
+#if !defined(HAVE_INT128) && defined(_MSC

Re: Ryu floating point output patch

2018-12-13 Thread Andrew Gierth

> "Andrew" == Andrew Gierth  writes:

 Andrew> This code in particular was very heavily into using mixed
 Andrew> declarations and code throughout. Removing those was moderately
 Andrew> painful.

And it turns out I missed one, sigh.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

2018-12-13 Thread Andres Freund

Hi,

On 2018-12-13 20:53:23 +, Andrew Gierth wrote:
> > "Andres" == Andres Freund  writes:
> 
>  >> This code in particular was very heavily into using mixed
>  >> declarations and code throughout. Removing those was moderately
>  >> painful.
> 
>  Andres> I wonder if we should instead relax those restriction for the
>  Andres> largely foreign pieces of code?
> 
> Do we have any reason for the restriction beyond stylistic preference?

No. Robert and Tom were against allowing mixing declarations and code, a
few more against // comments. I don't quite agree with either, but
getting to the rest of C99 seemed more important than fighting those out
at that moment.

Greetings,

Andres Freund

Re: Ryu floating point output patch

2018-12-13 Thread Andrew Gierth

> "Andres" == Andres Freund  writes:

 >> This code in particular was very heavily into using mixed
 >> declarations and code throughout. Removing those was moderately
 >> painful.

 Andres> I wonder if we should instead relax those restriction for the
 Andres> largely foreign pieces of code?

Do we have any reason for the restriction beyond stylistic preference?

 Andres> Right it should definitely be explained. But I do think
 Andres> pointing at the source, would be good. I kind of wonder if we
 Andres> should, if we were to accept something like this patch, clone
 Andres> the original ryu repository to either git.pg.org, or at least
 Andres> into pg's github repository? It'd be annoying if the original
 Andres> authors moved on and deleted the repository.

Keeping a copy on git.pg.org seems like a reasonable thing to do.

-- 
Andrew (irc:RhodiumToad)

Where to save data used by extension ?

2018-12-13 Thread Hao Wu

Hi pgsql-hackers,

I have an extension for postgresql. The extension works across some
databases, and needs to save some data. The data might be modified
dynamically by the extension, and all the changed result must be saved. I
have considered some methods.
1. Use GUC
I find no easy way to write the GUC values back to postgres.conf. And the
data might be a bit complex
2. Create a table under a special database, like postgres
The weak is the strong assumption that the database postgres does exist and
will not be dropped.
3. Create a table under a database created by the extension
A little heavy, but without dependencies.
4. Write to a native file.
The file can not sync to a standby

Currently, I use #2. Is there any better way to do this?

Thank you very much
Best Regards

Re: Ryu floating point output patch

2018-12-13 Thread Andres Freund

Hi,

On 2018-12-13 20:23:51 +, Andrew Gierth wrote:
> > "Andres" == Andres Freund  writes:
> 
>  >> From the upstream, I've taken only specific files which are
>  >> Boost-licensed, removed code not of interest to us, removed
>  >> C99-isms, applied project style for things like type names and use
>  >> of INT64CONST, removed some ad-hoc configuration #ifs in favour of
>  >> using values established by pg_config.h, and ran the whole thing
>  >> through pgindent and fixed up the resulting wreckage.
> 
>  Andres> Removed C99isms? Why, we allow that these days? Did you mean
>  Andres> C11?
> 
> My understanding is that we do not allow // comments or mixed
> declarations and code (other than loop control variables).

Ah, that makes sense.


> This code in particular was very heavily into using mixed declarations
> and code throughout. Removing those was moderately painful.

I wonder if we should instead relax those restriction for the largely
foreign pieces of code?



>  >> +/*  These tables are generated by PrintDoubleLookupTable. */
> 
>  Andres> This kind of reference is essentially dangling now, so perhaps
>  Andres> we should rewrite that?
> 
> Well, it's probably still useful to point out that they're generated
> (though not by us); I guess it should make clear where the generator is.

Right it should definitely be explained. But I do think pointing at the
source, would be good. I kind of wonder if we should, if we were to
accept something like this patch, clone the original ryu repository to
either git.pg.org, or at least into pg's github repository?  It'd be
annoying if the original authors moved on and deleted the repository.

Greetings,

Andres Freund

Re: Ryu floating point output patch

2018-12-13 Thread Andrew Gierth

> "Andres" == Andres Freund  writes:

 >> From the upstream, I've taken only specific files which are
 >> Boost-licensed, removed code not of interest to us, removed
 >> C99-isms, applied project style for things like type names and use
 >> of INT64CONST, removed some ad-hoc configuration #ifs in favour of
 >> using values established by pg_config.h, and ran the whole thing
 >> through pgindent and fixed up the resulting wreckage.

 Andres> Removed C99isms? Why, we allow that these days? Did you mean
 Andres> C11?

My understanding is that we do not allow // comments or mixed
declarations and code (other than loop control variables).

This code in particular was very heavily into using mixed declarations
and code throughout. Removing those was moderately painful.

 Andres> What's with all the commented out code?

Just stuff from upstream I didn't take out.

 Andres> I'm not sure that keeping close alignment with the original
 Andres> codebase with things like that has much value given the extent
 Andres> of the reformatting and functional changes?

Probably not, but otherwise I did not remove much in the way of upstream
comments or edit them except for formatting.

 >> +/*  These tables are generated by PrintDoubleLookupTable. */

 Andres> This kind of reference is essentially dangling now, so perhaps
 Andres> we should rewrite that?

Well, it's probably still useful to point out that they're generated
(though not by us); I guess it should make clear where the generator is.

 >> +++ b/src/common/d2s_intrinsics.h

 >> +static inline uint64
 >> +umul128(const uint64 a, const uint64 b, uint64 *const productHi)
 >> +{
 >> +   return _umul128(a, b, productHi);
 >> +}

 Andres> These are fairly generic names, perhaps we ought to prefix
 Andres> them?

Maybe, but presumably nothing else is going to include this file.

-- 
Andrew (irc:RhodiumToad)

Re: Reorganize collation lookup time and place

2018-12-13 Thread Andres Freund

On 2018-12-13 14:50:53 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-12-13 15:05:40 +0100, Peter Eisentraut wrote:
> >> On 12/12/2018 18:56, Andres Freund wrote:
> >>> Why isn't this integrated into fmgr_info()?
> 
> >> fmgr_info() looks up stuff in pg_proc.  pg_newlocale_...() looks up
> >> stuff in pg_collation.  There is no overlap between them.
> 
> > It looks up stuff necessary for calling a function, that doesn't fit
> > looking up the collation necessary to do so too badly. A lot of the the
> > changes you made are rote changes to each caller, taking the collation
> > oid and expanding it with pg_newlocale_from_collation().
> 
> I'm not very thrilled with the idea of changing every single caller of
> InitFunctionCallInfoData and related functions, especially when exactly
> zero functional change ensues.  We should work harder on avoiding that
> code churn; extension developers will thank us.

I am thinking of moving to commit 0001 (but not 0002) of
https://www.postgresql.org/message-id/20181009191802.ppt6lqcvkpjvkm76%40alap3.anarazel.de
which'd break some (but fewer) of the same place that'd be affected by
this.  The amount of memory saved for plpgsql and other places with lots
of expressoins is quite noticable, and it's a prerequisite for better
code generation and caching for JIT.  As that patch doesn't affect
DirectFunctionCall* etc, I'm not sure it's a sufficient argument to go
full breakage here, however.

Independent of that it still seems like putting stuff onto
every function caller that they don't need to deal with.

Greetings,

Andres Freund

Re: Ryu floating point output patch

2018-12-13 Thread Andres Freund

Hi,

On 2018-12-13 19:41:55 +, Andrew Gierth wrote:
> This is a mostly cleaned-up version of the test patch I posted
> previously for floating-point output using the Ryu algorithm.

Nice!


> From the upstream, I've taken only specific files which are
> Boost-licensed, removed code not of interest to us, removed C99-isms,
> applied project style for things like type names and use of INT64CONST,
> removed some ad-hoc configuration #ifs in favour of using values
> established by pg_config.h, and ran the whole thing through pgindent and
> fixed up the resulting wreckage.

Removed C99isms? Why, we allow that these days? Did you mean C11?



> +static inline uint64
> +mulShiftAll(const uint64 m, const uint64 *const mul, const int32 j,
> + uint64 *const vp, uint64 *const vm, const uint32 
> mmShift)
> +{
> +/*   m <<= 2; */
> +/*   uint128 b0 = ((uint128) m) * mul[0]; // 0 */
> +/*   uint128 b2 = ((uint128) m) * mul[1]; // 64 */
> +/*  */
> +/*   uint128 hi = (b0 >> 64) + b2; */
> +/*   uint128 lo = b0 & 0xull; */
> +/*   uint128 factor = (((uint128) mul[1]) << 64) + mul[0]; */
> +/*   uint128 vpLo = lo + (factor << 1); */
> +/*   *vp = (uint64) ((hi + (vpLo >> 64)) >> (j - 64)); */
> +/*   uint128 vmLo = lo - (factor << mmShift); */
> +/*   *vm = (uint64) ((hi + (vmLo >> 64) - (((uint128) 1ull) << 64)) >> (j - 
> 64)); */
> +/*   return (uint64) (hi >> (j - 64)); */
> + *vp = mulShift(4 * m + 2, mul, j);
> + *vm = mulShift(4 * m - 1 - mmShift, mul, j);
> + return mulShift(4 * m, mul, j);
> +}

What's with all the commented out code?  I'm not sure that keeping close
alignment with the original codebase with things like that has much
value given the extent of the reformatting and functional changes?


> +/*  These tables are generated by PrintDoubleLookupTable. */

This kind of reference is essentially dangling now, so perhaps we should
rewrite that?


> +++ b/src/common/d2s_intrinsics.h

> +static inline uint64
> +umul128(const uint64 a, const uint64 b, uint64 *const productHi)
> +{
> + return _umul128(a, b, productHi);
> +}

These are fairly generic names, perhaps we ought to prefix them?



Greetings,

Andres Freund

Re: Reorganize collation lookup time and place

2018-12-13 Thread Tom Lane

Andres Freund  writes:
> On 2018-12-13 15:05:40 +0100, Peter Eisentraut wrote:
>> On 12/12/2018 18:56, Andres Freund wrote:
>>> Why isn't this integrated into fmgr_info()?

>> fmgr_info() looks up stuff in pg_proc.  pg_newlocale_...() looks up
>> stuff in pg_collation.  There is no overlap between them.

> It looks up stuff necessary for calling a function, that doesn't fit
> looking up the collation necessary to do so too badly. A lot of the the
> changes you made are rote changes to each caller, taking the collation
> oid and expanding it with pg_newlocale_from_collation().

I'm not very thrilled with the idea of changing every single caller of
InitFunctionCallInfoData and related functions, especially when exactly
zero functional change ensues.  We should work harder on avoiding that
code churn; extension developers will thank us.

>> There is also not necessarily a one-to-one correspondence between
>> function and collation lookup calls.  For example, in some cases you
>> need to look up a sorting and a hashing functions, but only one
>> collation for both.

> That seems rare enough not to matter, performancewise.

I think it potentially does matter, but I also agree that it's not
the common case.  Could we perhaps keep the API for the existing
functions the same, and introduce new functions alongside them
to be used by the small number of places where it matters?

(I've not looked at Peter's patch yet, so maybe I'm talking through
my hat.  But we should set a pretty high value on avoiding code
churn in stuff as widely used as the fmgr interfaces.)

regards, tom lane

Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

2018-12-13 Thread Alexander Korotkov

On Thu, Dec 13, 2018 at 10:46 PM Andres Freund  wrote:
> On 2018-12-13 22:40:59 +0300, Alexander Korotkov wrote:
> > It doesn't mater, because we release all locks on every buffer at one
> > time.  The unlock order can have effect on what waiter will acquire
> > the lock next after ginRedoDeletePage().  However, I don't see why one
> > unlock order is better than another.  Thus, I just used the rule of
> > thumb to not change code when it's not necessary for bug fix.
>
> I think it's right to not change unlock order at the same time as a
> bugfix here.  More generally I think it can often be useful to default
> to release locks in the inverse order they've been acquired - if there's
> any likelihood that somebody will acquire them in the same order, that
> ensures that such a party would only need to wait for a lock once,
> instead of being woken up for one lock, and then immediately having to
> wait for the next one.

Good point, thank you!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

2018-12-13 Thread Andres Freund

Hi,

On 2018-12-13 22:40:59 +0300, Alexander Korotkov wrote:
> It doesn't mater, because we release all locks on every buffer at one
> time.  The unlock order can have effect on what waiter will acquire
> the lock next after ginRedoDeletePage().  However, I don't see why one
> unlock order is better than another.  Thus, I just used the rule of
> thumb to not change code when it's not necessary for bug fix.

I think it's right to not change unlock order at the same time as a
bugfix here.  More generally I think it can often be useful to default
to release locks in the inverse order they've been acquired - if there's
any likelihood that somebody will acquire them in the same order, that
ensures that such a party would only need to wait for a lock once,
instead of being woken up for one lock, and then immediately having to
wait for the next one.

Greetings,

Andres Freund

Ryu floating point output patch

2018-12-13 Thread Andrew Gierth

This is a mostly cleaned-up version of the test patch I posted
previously for floating-point output using the Ryu algorithm.

Upstream source: github.com/ulfjack/ryu/ryu at commit 795c8b57a

>From the upstream, I've taken only specific files which are
Boost-licensed, removed code not of interest to us, removed C99-isms,
applied project style for things like type names and use of INT64CONST,
removed some ad-hoc configuration #ifs in favour of using values
established by pg_config.h, and ran the whole thing through pgindent and
fixed up the resulting wreckage.

On top of that I made these functional changes:

1. Added an alternative fixed-point output format so that values with
exponents not less than -4 and less than the default precision for the
type are formatted in fixed-point, as sprintf does.

2. Exponents now always have a sign and at least two digits, also as per
sprintf formatting rules.

The fixed-point output hasn't been micro-optimized to the same extent as
the rest of the code, and there is a measurable performance effect - but
it's not that significant compared to the speedup over the existing code.

As for the extent of that speedup: in my tests, float8 output is now as
fast or marginally faster than bigint output, and roughly one order of
magnitude faster than the existing float8 output.

This version of the patch continues to use extra_float_digits to select
whether Ryu output is used - but I've also changed the default, so that
Ryu is used unless you explicitly set extra_float_digits to 0 or less.

This does mean that at the moment, about 13 regression tests fail on
this patch due to the higher precision of float output. I have
intentionally not yet adjusted those results, pending discussion.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index cf9327f885..24d41c2e89 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -21,6 +21,7 @@
 
 #include "catalog/pg_type.h"
 #include "common/int.h"
+#include "common/ryu.h"
 #include "libpq/pqformat.h"
 #include "utils/array.h"
 #include "utils/float.h"
@@ -29,7 +30,7 @@
 
 
 /* Configurable GUC parameter */
-int			extra_float_digits = 0; /* Added to DBL_DIG or FLT_DIG */
+int			extra_float_digits = 1; /* Added to DBL_DIG or FLT_DIG */
 
 /* Cached constants for degree-based trig functions */
 static bool degree_consts_set = false;
@@ -246,6 +247,12 @@ float4out(PG_FUNCTION_ARGS)
 	char	   *ascii = (char *) palloc(32);
 	int			ndig = FLT_DIG + extra_float_digits;
 
+	if (extra_float_digits > 0)
+	{
+		ryu_f2s_buffered(num, ascii);
+		PG_RETURN_CSTRING(ascii);
+	}
+
 	(void) pg_strfromd(ascii, 32, ndig, num);
 	PG_RETURN_CSTRING(ascii);
 }
@@ -462,6 +469,12 @@ float8out_internal(double num)
 	char	   *ascii = (char *) palloc(32);
 	int			ndig = DBL_DIG + extra_float_digits;
 
+	if (extra_float_digits > 0)
+	{
+		ryu_d2s_buffered(num, ascii);
+		return ascii;
+	}
+
 	(void) pg_strfromd(ascii, 32, ndig, num);
 	return ascii;
 }
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6fe1939881..6e223335bc 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2631,11 +2631,12 @@ static struct config_int ConfigureNamesInt[] =
 		{"extra_float_digits", PGC_USERSET, CLIENT_CONN_LOCALE,
 			gettext_noop("Sets the number of digits displayed for floating-point values."),
 			gettext_noop("This affects real, double precision, and geometric data types. "
-		 "The parameter value is added to the standard number of digits "
-		 "(FLT_DIG or DBL_DIG as appropriate).")
+		 "A zero or negative parameter value is added to the standard "
+		 "number of digits (FLT_DIG or DBL_DIG as appropriate). "
+		 "Any value greater than zero selects round-trip-safe output.")
 		},
 		&extra_float_digits,
-		0, -15, 3,
+		1, -15, 3,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/common/Makefile b/src/common/Makefile
index ec8139f014..ae89dd8c5e 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -44,8 +44,8 @@ override CPPFLAGS += -DVAL_LIBS="\"$(LIBS)\""
 override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
 LIBS += $(PTHREAD_LIBS)
 
-OBJS_COMMON = base64.o config_info.o controldata_utils.o exec.o file_perm.o \
-	ip.o keywords.o link-canary.o md5.o pg_lzcompress.o \
+OBJS_COMMON = base64.o config_info.o controldata_utils.o d2s.o exec.o f2s.o \
+	file_perm.o ip.o keywords.o link-canary.o md5.o pg_lzcompress.o \
 	pgfnames.o psprintf.o relpath.o \
 	rmtree.o saslprep.o scram-common.o string.o unicode_norm.o \
 	username.o wait_error.o
diff --git a/src/common/d2s.c b/src/common/d2s.c
new file mode 100644
index 00..e7d7f2a14c
--- /dev/null
+++ b/src/common/d2s.c
@@ -0,0 +1,974 @@
+/*---
+ *
+ * Ryu floating-point output for double precision.
+ *
+ * Portions Copyright (c) 2018, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/common/

Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

2018-12-13 Thread Alexander Korotkov

On Thu, Dec 13, 2018 at 8:06 PM Andrey Borodin  wrote:
> > if (BufferIsValid(lbuffer))
> > UnlockReleaseBuffer(lbuffer);
> > if (BufferIsValid(pbuffer))
> > UnlockReleaseBuffer(pbuffer);
> > if (BufferIsValid(dbuffer))
> > UnlockReleaseBuffer(dbuffer);
> > ==>
> > if (BufferIsValid(pbuffer))
> > UnlockReleaseBuffer(pbuffer);
> > if (BufferIsValid(dbuffer))
> > UnlockReleaseBuffer(dbuffer);
> > if (BufferIsValid(lbuffer))
> > UnlockReleaseBuffer(lbuffer);
>
> I think that unlock order does not matter. But I may be wrong. May be just 
> for uniformity?

It doesn't mater, because we release all locks on every buffer at one
time.  The unlock order can have effect on what waiter will acquire
the lock next after ginRedoDeletePage().  However, I don't see why one
unlock order is better than another.  Thus, I just used the rule of
thumb to not change code when it's not necessary for bug fix.

> > 13 дек. 2018 г., в 21:48, Tom Lane  написал(а):
> >
> > Bruce Momjian  writes:
> >> I am seeing this warning in the 9.4 branch:
> >>  ginxlog.c:756:5: warning: ‘lbuffer’ may be used uninitialized
> >>  in this function [-Wmaybe-uninitialized]
> >
> > I'm also getting that, just in 9.4, but at a different line number:
> >
> > ginxlog.c: In function 'ginRedoDeletePage':
> > ginxlog.c:693: warning: 'lbuffer' may be used uninitialized in this function
> >
> > That's with gcc version 4.4.7 20120313 (Red Hat 4.4.7-23)
>
>
> That's the same variable, one place is definition while other is potential 
> misuse.
> Seems like these 2 lines [0]
>
> +   if (BufferIsValid(lbuffer))
> +   UnlockReleaseBuffer(lbuffer);
>
> are superfluous: lbuffer is UnlockReleased earlier.

Thanks to everybody for noticing!  Speaking more generally backpatch
to 9.4 was completely wrong: there were multiple errors.  It's my
oversight, I forget how much better our xlog system became since 9.4
:)

I've pushed bf0e5a73be fixing that.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Reorganize collation lookup time and place

2018-12-13 Thread Andres Freund

Hi,

On 2018-12-13 15:05:40 +0100, Peter Eisentraut wrote:
> On 12/12/2018 18:56, Andres Freund wrote:
> >> This makes the collation lookup more similar to the function lookup.  In
> >> many cases they now happen right next to each other.
> >> pg_newlocale_from_collation() becomes analogous to fmgr_info() and
> >> pg_locale_t to FmgrInfo *.
> > Why isn't this integrated into fmgr_info()?
> 
> fmgr_info() looks up stuff in pg_proc.  pg_newlocale_...() looks up
> stuff in pg_collation.  There is no overlap between them.

It looks up stuff necessary for calling a function, that doesn't fit
looking up the collation necessary to do so too badly. A lot of the the
changes you made are rote changes to each caller, taking the collation
oid and expanding it with pg_newlocale_from_collation().  It seems not
necessary to force every external user to do rote changes like:

fmgr_info(opexpr->opfuncid, finfo);
fmgr_info_set_expr((Node *) node, finfo);
InitFunctionCallInfoData(*fcinfo, finfo, 2,
-   
 opexpr->inputcollid, NULL, NULL);
+   
 pg_newlocale_from_collation(opexpr->inputcollid), NULL, NULL);
...
-   h = FunctionCall1Coll(array_extra_data->hash, DEFAULT_COLLATION_OID, d);
+   h = FunctionCall1Coll(array_extra_data->hash, 
pg_newlocale_from_collation(DEFAULT_COLLATION_OID), d);

A lot of the new pg_newlocale_from_collation() calls go a fair bit over
the customary line length limit.

As it stands I have another patch that wants to change the function call
interface, including for extensions:
https://www.postgresql.org/message-id/20180605172952.x34m5uz6ju6enaem%40alap3.anarazel.de
so perhaps we can just bite the bullet and do both.

But I'm somewhat doubtful it's useful to expose having to lookup
pg_newlocale_from_collation() at the callsites to every single caller of
functions in the codebase. You say that's a separate conern of
initializing function calls, for me it's unnecessarily exposing details
that seem likely to change.

> There is also not necessarily a one-to-one correspondence between
> function and collation lookup calls.  For example, in some cases you
> need to look up a sorting and a hashing functions, but only one
> collation for both.

That seems rare enough not to matter, performancewise.

Greetings,

Andres Freund

Re: Remove Deprecated Exclusive Backup Mode

2018-12-13 Thread David Steele

On 12/13/18 2:00 PM, Peter Eisentraut wrote:
> On 11/12/2018 23:24, David Steele wrote:
>> @@ -845,7 +838,7 @@ test ! -f 
>> /mnt/server/archivedir/000100A90065 && cp pg_wal/0
>>   rights to run pg_start_backup (superuser, or a user who has been 
>> granted
>>   EXECUTE on the function) and issue the command:
>>  
>> -SELECT pg_start_backup('label', false, false);
>> +SELECT pg_start_backup('label', false);
>>  
>>   where label is any string you want to use to 
>> uniquely
>>   identify this backup operation. The connection
> 
> Is it good to change the meaning of an existing interface like that?

Good question.

We could leave the third parameter (changing the default to false) and
error if it has any value but false.  It's a bit ugly but it does
maintain compatibility with the current non-exclusive backup interface.
 And, the third parameter would never need to be specified unless we add
a fourth.

Or we could add a new function and deprecate this one -- but what to
call it?

I agree it's not very cool to break code for users who actually *did*
migrate to non-exclusive backups.

-- 
-David
da...@pgmasters.net

Re: Upgrading pg_statistic to handle collation honestly

2018-12-13 Thread Tom Lane

Peter Eisentraut  writes:
> On 12/12/2018 16:57, Tom Lane wrote:
>> * Probably this conflicts to some extent with Peter's "Reorganize
>> collation lookup" patch, but I haven't studied that yet.

> I've looked it over, and it's nothing that can't be easily fixed up.  In
> fact, it simplifies a few things, so I'm in favor of moving your patch
> along first.

Thanks for looking!  I'll take a closer look at the loose ends I mentioned
and then push it.

regards, tom lane

Re: Remove Deprecated Exclusive Backup Mode

2018-12-13 Thread David Steele

On 12/13/18 2:02 PM, Peter Eisentraut wrote:
> On 12/12/2018 05:09, David Steele wrote:
>> I think the patch stands on its own.  Exclusive backup mode is broken
>> and is know to cause problems in the field.
> 
> Is this breakage documented anywhere for users?

Yes:

https://www.postgresql.org/docs/11/continuous-archiving.html

"Note that if the server crashes during the backup it may not be
possible to restart until the backup_label file has been manually
deleted from the PGDATA directory."

Seems like the wording could be stronger.

-- 
-David
da...@pgmasters.net

Re: Is DLIST_STATIC_INIT() a net loss?

2018-12-13 Thread Andres Freund

Hi,

On 2018-12-13 12:35:15 -0500, Tom Lane wrote:
> I happened to notice today that the initializer macro for dlist_head
> variables is
>
> #define DLIST_STATIC_INIT(name) {{&(name).head, &(name).head}}
>
> However, all the functions that work with dlists are prepared to handle
> a dlist_head that starts out as zeroes, so that this could also be
>
> #define DLIST_STATIC_INIT(name) {{NULL, NULL}}

Historically that's because my patch initially didn't handle structs
starting out as zeroes, and that that was changed over my objections. I
still think those zero checks are a waste of cycles.

> I submit that we'd be better off with the latter.  The number of cycles
> that the linker and loader expend on getting those non-constant values
> correctly set up (especially in PIE builds) probably dwarf what it
> costs for the first dlist access to initialize them.  It's especially
> obviously a loss in processes that never touch the particular dlist
> at all.
>
> Another thought is that maybe we should deprecate the use of the
> [DS]LIST_STATIC_INIT macros altogether, and just write
>
> static dlist_header myheader;

But if we're not going to change that - which I'd vote for, but not
forsee happening - we should indeed just skip over initialization,
there's really not much point.

Greetings,

Andres Freund

Re: Remove Deprecated Exclusive Backup Mode

2018-12-13 Thread Peter Eisentraut

On 12/12/2018 05:09, David Steele wrote:
> I think the patch stands on its own.  Exclusive backup mode is broken
> and is know to cause problems in the field.

Is this breakage documented anywhere for users?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Remove Deprecated Exclusive Backup Mode

2018-12-13 Thread Peter Eisentraut

On 11/12/2018 23:24, David Steele wrote:
> @@ -845,7 +838,7 @@ test ! -f /mnt/server/archivedir/000100A90065 
> && cp pg_wal/0
>   rights to run pg_start_backup (superuser, or a user who has been granted
>   EXECUTE on the function) and issue the command:
>  
> -SELECT pg_start_backup('label', false, false);
> +SELECT pg_start_backup('label', false);
>  
>   where label is any string you want to use to uniquely
>   identify this backup operation. The connection

Is it good to change the meaning of an existing interface like that?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Remove Deprecated Exclusive Backup Mode

2018-12-13 Thread Peter Eisentraut

On 12/12/2018 05:31, Robert Haas wrote:
> Most of the features I've been involved in removing have been
> deprecated for 5+ years.  The first release where this one was
> deprecated was only 2 years ago.  So it feels dramatically faster to
> me than what I think we have typically done.

I was just looking this up as well, and I find it too fast.  The
nonexclusive backups were introduced in 9.6.  So I'd say that we could
remove the exclusive ones when 9.5 goes EOL.  (That would mean this
patch could be submitted for PostgreSQL 13, since 9.5 will go out of
support around the time PG13 would be released.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Change pgarch_readyXlog() to return .history files first

2018-12-13 Thread David Steele

On 12/13/18 11:53 AM, David Steele wrote:
> Hackers,
> 
> The alphabetical ordering of pgarch_readyXlog() means that on promotion
> 000100010001.partial will be archived before 0002.history.
> 
> This appears harmless, but the .history files are what other potential
> primaries use to decide what timeline they should pick.  The additional
> latency of compressing/transferring the much larger partial file means
> that archiving of the .history file is delayed and greatly increases the
> chance that another primary will promote to the same timeline.
> 
> Teach pgarch_readyXlog() to return .history files first (and in order)
> to reduce the window where this can happen.  This won't prevent all
> conflicts, but it is a simple change and should greatly reduce
> real-world occurrences.
> 
> I also think we should consider back-patching this change.  It's hard to
> imagine that archive commands would have trouble with this reordering
> and the current ordering causes real pain in HA clusters.

Some gcc versions wanted more parens, so updated in attached.

-- 
-David
da...@pgmasters.net
From d5fd60b4e329fe99b2f6565355014040dcdcee26 Mon Sep 17 00:00:00 2001
From: David Steele 
Date: Thu, 13 Dec 2018 12:58:27 -0500
Subject: [PATCH] Change pgarch_readyXlog() to return .history files first.

The alphabetical ordering of pgarch_readyXlog() means that on promotion
000100010001.partial will be archived before 0002.history.

This appears harmless, but the .history files are what other potential primaries
use to decide what timeline they should pick.  The additional latency of
compressing/transferring the much larger partial file means that archiving of
the .history file is delayed and greatly increases the chance that another
primary will promote to the same timeline.

Teach pgarch_readyXlog() to return .history files first (and in order) to reduce
the window where this can happen.  This won't prevent all conflicts, but it is a
simple change and should greatly reduce real-world occurrences.
---
 src/backend/postmaster/pgarch.c | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 0dcf609f19..95898080d7 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -704,11 +704,11 @@ pgarch_archiveXlog(char *xlog)
  * 2) because the oldest ones will sooner become candidates for
  * recycling at time of checkpoint
  *
- * NOTE: the "oldest" comparison will presently consider all segments of
- * a timeline with a smaller ID to be older than all segments of a timeline
- * with a larger ID; the net result being that past timelines are given
- * higher priority for archiving.  This seems okay, or at least not
- * obviously worth changing.
+ * NOTE: the "oldest" comparison will consider any .history file to be older
+ * than any other file except another .history file.  Segments on a timeline
+ * with a smaller ID will be older than all segments on a timeline with a 
larger
+ * ID; the net result being that past timelines are given higher priority for
+ * archiving.  This seems okay, or at least not obviously worth changing.
  */
 static bool
 pgarch_readyXlog(char *xlog)
@@ -724,6 +724,7 @@ pgarch_readyXlog(char *xlog)
DIR*rldir;
struct dirent *rlde;
boolfound = false;
+   boolhistoryFound = false;
 
snprintf(XLogArchiveStatusDir, MAXPGPATH, XLOGDIR "/archive_status");
rldir = AllocateDir(XLogArchiveStatusDir);
@@ -737,12 +738,27 @@ pgarch_readyXlog(char *xlog)
strspn(rlde->d_name, VALID_XFN_CHARS) >= basenamelen &&
strcmp(rlde->d_name + basenamelen, ".ready") == 0)
{
-   if (!found)
+   /* Is this a history file? */
+   bool history = basenamelen >= sizeof(".history") &&
+   strcmp(rlde->d_name + (basenamelen - 
sizeof(".history") + 1),
+  ".history.ready") == 0;
+
+   /*
+* If this is the first file or the first history file, 
copy it
+*/
+   if (!found || (history && !historyFound))
{
strcpy(newxlog, rlde->d_name);
found = true;
+   historyFound = history;
}
-   else
+   /*
+* Else compare and see if this file is alpabetically 
less than
+* what we already have.  If so, copy it.  History 
files always get
+* this comparison but other files only do when no 
history file has
+* been found yet.
+*/
+

Re: tab-completion debug print

2018-12-13 Thread Peter Eisentraut

On 13/12/2018 12:07, Kyotaro HORIGUCHI wrote:
> - v6-0002-Add-psql-g-option-to-control-debug-print.patch
> 
>   Applies on top of 0001. Code is always active, -g addition to
>   -L activates debug print into the log file. If only -g is
>   specified it is forcibly turned off.
> 
>   > $ psql postgres -g
>   > psql: no session log file, turn off debug print
>   > psql (12devel)
>   > Type "help" for help.
> 
>   -g option shows in help message. (perhaps arguable.)

I'm not sure that this should be a run-time option.  But we surely don't
want to use up a good short option letter for this.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Cache lookup errors with functions manipulation object addresses

2018-12-13 Thread Alvaro Herrera

On 2018-Dec-13, Michael Paquier wrote:

> > I support 0001 other than those two points.
> 
> Attached is an updated version for that as 0001.  Thanks for the
> review.  Does that part look good to you now?

+1.

> > +SELECT * FROM pg_identify_object_as_address('pg_proc'::regclass, 0, 0); -- 
> > no function
> > + type | object_names | object_args 
> > +--+--+-
> > +  | {}   | {}
> > +(1 row)
> > 
> > All the other cases you add have a non-empty value in "type".
> 
> On this one I would expect NULL for object_names and object_args, as
> empty does not make sense for an undefined object, still "type" should
> print...  "type".

Hmm ... "routine"?

I'm not sure if NULLs are better than empty arrays, but I agree that we
should pick one representation for undefined object and use it
consistently for all object types.

> > I think this is our chance to fix the mess.  Previously (before I added
> > the SQL-access of the object addressing mechanism in 9.5 I mean) it
> > wasn't possible to get at this code at all with arbitrary input, so this
> > is all a "new" problem (I mean new in the last 5 years).

Obviously I failed to subtract 11 from 9.5 correctly ... I meant 4
years.


> This requires a bit more work with the low-level APIs grabbing the
> printed information though.  And I think that the following guidelines
> make sense to work on as a base definition for undefined objects:
> - pg_identify_object returns the object type name, NULL for the other fields.
> - pg_describe_object returns just NULL.
> - pg_identify_object_as_address returns the object type name and NULL
> for the other fields.

Sounds good to me.

> There is some more refactoring work still needed for constraints, large
> objects and functions, in a way similar to a26116c6.  I am pretty happy
> with the shape of 0001, so this could be applied, 0002 still needs to be
> reworked so as all undefined object types behave as described above in a
> consistent manner.  Do those definitions make sense?

I think so, yes.

Thanks for taking care of this.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add timeline to partial WAL segments

2018-12-13 Thread David Steele

On 12/13/18 8:35 AM, David Steele wrote:
> On 12/12/18 7:17 PM, Michael Paquier wrote:
>> On Wed, Dec 12, 2018 at 07:54:05AM -0500, David Steele wrote:
> 
>>> But, we could at least use the . notation and end up with something like
>>> 000100010001.0002.partial or perhaps
>>> 000100010001.T0002.partial?  Maybe
>>> 000100010001.0002.tpartial?
>>>

I updated the patch to use 000100010001.0002.partial
since this is more consistent with what we do in other cases.

-- 
-David
da...@pgmasters.net
From 7eda6932eab47cd25a4fa25a8a5575b41b51c9ff Mon Sep 17 00:00:00 2001
From: David Steele 
Date: Thu, 13 Dec 2018 12:20:43 -0500
Subject: [PATCH] Add timeline to partial WAL segments.

If two clusters are promoted from the same timeline on the same WAL segment, 
they will both archive a .partial file with the same name even if they select 
different timelines to promote to.

If one partially promotes and later fails we want to be able to promote another 
cluster.  This duplicate .partial file causes problems for archive commands 
that 
won't overwrite an existing file.  Even if the archive command checks the 
contents they might not be binary identical due to differing unused space at 
the 
end of the file or because one cluster accepted more transactions.

Add the timeline being promoted *to* into the .partial filename to prevent the 
conflict, e.g. 000100010001.0002.partial.

If more than one cluster tries to promote to the same timeline there will still 
be a conflict but this is desirable.  Once a cluster has claimed a timeline no 
other cluster should be archiving on it.
---
 src/backend/access/transam/xlog.c | 12 
 src/include/postmaster/pgarch.h   |  2 +-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index c80b14ed97..be4da2c3d0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7624,8 +7624,10 @@ StartupXLOG(void)
 * the completed version of the same segment later, it will 
fail. (We
 * used to do that in 9.4 and below, and it caused such 
problems).
 *
-* As a compromise, we rename the last segment with the .partial
-* suffix, and archive it. Archive recovery will never try to 
read
+* As a compromise, we rename the last segment with the new 
timeline and
+* .partial suffix, and archive it. The timeline is added in 
case there
+* are multiple promotions from the same timeline before a 
stable
+* primary emerges.  Archive recovery will never try to read
 * .partial segments, so they will normally go unused. But in 
the odd
 * PITR case, the administrator can copy them manually to the 
pg_wal
 * directory (removing the suffix). They can be useful in 
debugging,
@@ -7653,8 +7655,10 @@ StartupXLOG(void)
charpartialpath[MAXPGPATH];
 
XLogFilePath(origpath, EndOfLogTLI, 
endLogSegNo, wal_segment_size);
-   snprintf(partialfname, MAXFNAMELEN, 
"%s.partial", origfname);
-   snprintf(partialpath, MAXPGPATH, "%s.partial", 
origpath);
+   snprintf(partialfname, MAXFNAMELEN, 
"%s.%08X.partial",
+origfname, ThisTimeLineID);
+   snprintf(partialpath, MAXPGPATH, 
"%s.%08X.partial", origpath,
+ThisTimeLineID);
 
/*
 * Make sure there's no .done or .ready file 
for the .partial
diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h
index 292e63a26a..504ab7b9dc 100644
--- a/src/include/postmaster/pgarch.h
+++ b/src/include/postmaster/pgarch.h
@@ -23,7 +23,7 @@
  * --
  */
 #define MIN_XFN_CHARS  16
-#define MAX_XFN_CHARS  40
+#define MAX_XFN_CHARS  41
 #define VALID_XFN_CHARS "0123456789ABCDEF.history.backup.partial"
 
 /* --
-- 
2.17.2 (Apple Git-113)


signature.asc
Description: OpenPGP digital signature

Is DLIST_STATIC_INIT() a net loss?

2018-12-13 Thread Tom Lane

I happened to notice today that the initializer macro for dlist_head
variables is

#define DLIST_STATIC_INIT(name) {{&(name).head, &(name).head}}

However, all the functions that work with dlists are prepared to handle
a dlist_head that starts out as zeroes, so that this could also be

#define DLIST_STATIC_INIT(name) {{NULL, NULL}}

I submit that we'd be better off with the latter.  The number of cycles
that the linker and loader expend on getting those non-constant values
correctly set up (especially in PIE builds) probably dwarf what it
costs for the first dlist access to initialize them.  It's especially
obviously a loss in processes that never touch the particular dlist
at all.

Another thought is that maybe we should deprecate the use of the
[DS]LIST_STATIC_INIT macros altogether, and just write

static dlist_header myheader;

leaving the compiler to drop such variables into a BSS area instead
of an initialized-data area.  I'm not very sure how much that saves,
but I bet it saves something.

regards, tom lane

Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

2018-12-13 Thread Andrey Borodin

Hi!


> 13 дек. 2018 г., в 17:03, Alexander Korotkov  
> написал(а):
> 
> Thank you.  I've revised your patch and pushed it.  As long as two
> other patches in this thread.

That's great! Thanks!

> 13 дек. 2018 г., в 20:12, chjis...@163.com написал(а):
> 
> 
> hi
> I Have a question. Why the order of unlocking is not adjusted in this patch? 
> like this:
> 
> if (BufferIsValid(lbuffer))
> UnlockReleaseBuffer(lbuffer);
> if (BufferIsValid(pbuffer))
> UnlockReleaseBuffer(pbuffer);
> if (BufferIsValid(dbuffer))
> UnlockReleaseBuffer(dbuffer);
> ==>
> if (BufferIsValid(pbuffer))
> UnlockReleaseBuffer(pbuffer);
> if (BufferIsValid(dbuffer))
> UnlockReleaseBuffer(dbuffer);
> if (BufferIsValid(lbuffer))
> UnlockReleaseBuffer(lbuffer);

I think that unlock order does not matter. But I may be wrong. May be just for 
uniformity?


> 13 дек. 2018 г., в 21:48, Tom Lane  написал(а):
> 
> Bruce Momjian  writes:
>> I am seeing this warning in the 9.4 branch:
>>  ginxlog.c:756:5: warning: ‘lbuffer’ may be used uninitialized
>>  in this function [-Wmaybe-uninitialized]
> 
> I'm also getting that, just in 9.4, but at a different line number:
> 
> ginxlog.c: In function 'ginRedoDeletePage':
> ginxlog.c:693: warning: 'lbuffer' may be used uninitialized in this function
> 
> That's with gcc version 4.4.7 20120313 (Red Hat 4.4.7-23)


That's the same variable, one place is definition while other is potential 
misuse.
Seems like these 2 lines [0]

+   if (BufferIsValid(lbuffer))
+   UnlockReleaseBuffer(lbuffer);

are superfluous: lbuffer is UnlockReleased earlier.



Best regards, Andrey Borodin.

[0] 
https://github.com/postgres/postgres/commit/19cf52e6cc33b9e126775f28269b6ddf7e30#diff-ed6446a8993c76d2884ec6413e49a6b6R757

Change pgarch_readyXlog() to return .history files first

2018-12-13 Thread David Steele

Hackers,

The alphabetical ordering of pgarch_readyXlog() means that on promotion
000100010001.partial will be archived before 0002.history.

This appears harmless, but the .history files are what other potential
primaries use to decide what timeline they should pick.  The additional
latency of compressing/transferring the much larger partial file means
that archiving of the .history file is delayed and greatly increases the
chance that another primary will promote to the same timeline.

Teach pgarch_readyXlog() to return .history files first (and in order)
to reduce the window where this can happen.  This won't prevent all
conflicts, but it is a simple change and should greatly reduce
real-world occurrences.

I also think we should consider back-patching this change.  It's hard to
imagine that archive commands would have trouble with this reordering
and the current ordering causes real pain in HA clusters.

Regards,
-- 
-David
da...@pgmasters.net
From 2279697ce828e825066065e3e40c9926633523ad Mon Sep 17 00:00:00 2001
From: David Steele 
Date: Thu, 13 Dec 2018 11:00:37 -0500
Subject: [PATCH] Change pgarch_readyXlog() to return .history files first

The alphabetical ordering of pgarch_readyXlog() means that on promotion 
000100010001.partial will be archived before 0002.history.

This appears harmless, but the .history files are what other potential 
primaries 
use to decide what timeline they should pick.  The additional latency of 
compressing/transferring the much larger partial file means that archiving of 
the .history file is delayed and greatly increases the chance that another 
primary will promote to the same timeline.

Teach pgarch_readyXlog() to return .history files first (and in order) to 
reduce 
the window where this can happen.  This won't prevent all conflicts, but it is 
a 
simple change and should greatly reduce real-world occurrences.
---
 src/backend/postmaster/pgarch.c | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 0dcf609f19..5dbab6a3d7 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -704,11 +704,11 @@ pgarch_archiveXlog(char *xlog)
  * 2) because the oldest ones will sooner become candidates for
  * recycling at time of checkpoint
  *
- * NOTE: the "oldest" comparison will presently consider all segments of
- * a timeline with a smaller ID to be older than all segments of a timeline
- * with a larger ID; the net result being that past timelines are given
- * higher priority for archiving.  This seems okay, or at least not
- * obviously worth changing.
+ * NOTE: the "oldest" comparison will consider any .history file to be older
+ * than any other file except another .history file.  Segments on a timeline
+ * with a smaller ID will be older than all segments on a timeline with a 
larger
+ * ID; the net result being that past timelines are given higher priority for
+ * archiving.  This seems okay, or at least not obviously worth changing.
  */
 static bool
 pgarch_readyXlog(char *xlog)
@@ -724,6 +724,7 @@ pgarch_readyXlog(char *xlog)
DIR*rldir;
struct dirent *rlde;
boolfound = false;
+   boolhistoryFound = false;
 
snprintf(XLogArchiveStatusDir, MAXPGPATH, XLOGDIR "/archive_status");
rldir = AllocateDir(XLogArchiveStatusDir);
@@ -737,12 +738,27 @@ pgarch_readyXlog(char *xlog)
strspn(rlde->d_name, VALID_XFN_CHARS) >= basenamelen &&
strcmp(rlde->d_name + basenamelen, ".ready") == 0)
{
-   if (!found)
+   /* Is this a history file? */
+   bool history = basenamelen >= sizeof(".history") &&
+   strcmp(rlde->d_name + (basenamelen - 
sizeof(".history") + 1),
+  ".history.ready") == 0;
+
+   /*
+* If this is the first file or the first history file, 
copy it
+*/
+   if (!found || history && !historyFound)
{
strcpy(newxlog, rlde->d_name);
found = true;
+   historyFound = history;
}
-   else
+   /*
+* Else compare and see if this file is alpabetically 
less than
+* what we already have.  If so, copy it.  History 
files always get
+* this comparison but other files only do when no 
history file has
+* been found yet.
+*/
+   else if (history || !history && !historyFound)
{
if (st

Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

2018-12-13 Thread Tom Lane

Bruce Momjian  writes:
> I am seeing this warning in the 9.4 branch:
>   ginxlog.c:756:5: warning: ‘lbuffer’ may be used uninitialized
>   in this function [-Wmaybe-uninitialized]

I'm also getting that, just in 9.4, but at a different line number:

ginxlog.c: In function 'ginRedoDeletePage':
ginxlog.c:693: warning: 'lbuffer' may be used uninitialized in this function

That's with gcc version 4.4.7 20120313 (Red Hat 4.4.7-23)

regards, tom lane

Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

2018-12-13 Thread Bruce Momjian

On Thu, Dec 13, 2018 at 03:03:54PM +0300, Alexander Korotkov wrote:
> On Wed, Dec 12, 2018 at 4:08 PM Andrey Borodin  wrote:
> > > 12 дек. 2018 г., в 3:26, Alexander Korotkov  
> > > написал(а):
> > >
> > > BTW, I still can't see you're setting deleteXid field of
> > > ginxlogDeletePage struct anywhere.
> > Oops. Fixed.
> > >
> > > Also, I note that protection against re-usage of recently deleted
> > > pages is implemented differently than it is in B-tree.
> > > 1) You check TransactionIdPrecedes(GinPageGetDeleteXid(page),
> > > RecentGlobalDataXmin)), while B-tree checks
> > > TransactionIdPrecedes(opaque->btpo.xact, RecentGlobalXmin).  Could you
> > > clarify why do we use RecentGlobalDataXmin instead of RecentGlobalXmin
> > > here?
> > As far as I understand RecentGlobalDataXmin may be slightly farther than 
> > RecentGlobalXmin in case when replication_slot_catalog_xmin is holding 
> > RecentGlobalXmin. And GIN is never used by catalog tables. But let's use 
> > RecentGlobalXmin like in B-tree.
> >
> > > 2) B-tree checks this condition both before putting page into FSM and
> > > after getting page from FSM.  You're checking only after getting page
> > > from FSM.  Approach of B-tree looks better for me.  It's seems more
> > > consistent when FSM pages are really free for usage.
> > Fixed.
> 
> Thank you.  I've revised your patch and pushed it.  As long as two
> other patches in this thread.

I am seeing this warning in the 9.4 branch:

ginxlog.c:756:5: warning: ‘lbuffer’ may be used uninitialized
in this function [-Wmaybe-uninitialized]

from this commit:

commit 19cf52e6cc33b9e126775f28269b6ddf7e30
Author: Alexander Korotkov 
Date:   Thu Dec 13 06:12:25 2018 +0300

Prevent deadlock in ginRedoDeletePage()

On standby ginRedoDeletePage() can work concurrently with read-only queries.
Those queries can traverse posting tree in two ways.
1) Using rightlinks by ginStepRight(), which locks the next page before
   unlocking its left sibling.
2) Using downlinks by ginFindLeafPage(), which locks at most one page at 
time.

Original lock order was: page, parent, left sibling.  That lock order can
deadlock with ginStepRight().  In order to prevent deadlock this commit 
changes
lock order to: left sibling, page, parent.  Note, that position of parent in
locking order seems insignificant, because we only lock one page at time 
while
traversing downlinks.

Reported-by: Chen Huajun
Diagnosed-by: Chen Huajun, Peter Geoghegan, Andrey Borodin
Discussion: 
https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com
Author: Alexander Korotkov
Backpatch-through: 9.4

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re:Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

2018-12-13 Thread chjis...@163.com

hiI Have a question. Why the order of unlocking is not adjusted in this patch? like this:if (BufferIsValid(lbuffer))UnlockReleaseBuffer(lbuffer);if (BufferIsValid(pbuffer))UnlockReleaseBuffer(pbuffer);if (BufferIsValid(dbuffer))UnlockReleaseBuffer(dbuffer);==>if (BufferIsValid(pbuffer))UnlockReleaseBuffer(pbuffer);if (BufferIsValid(dbuffer))UnlockReleaseBuffer(dbuffer);if (BufferIsValid(lbuffer))UnlockReleaseBuffer(lbuffer);On Tue, Dec 11, 2018 at 1:50 AM Alexander Korotkov  wrote:> On Sun, Dec 9, 2018 at 10:25 PM Alexander Korotkov  wrote:> > On Sat, Dec 8, 2018 at 12:48 PM Andrey Borodin  wrote:> > > > 8 дек. 2018 г., в 6:54, Alexander Korotkov  написал(а):> > > >> > > > Yep, please find attached draft patch.> > >> > > Patch seems good to me, I'll check it in more detail.> > > The patch gets posting item at FirstOffsetNumber instead of btree->getLeftMostChild(). This seem OK, since dataGetLeftMostPage() is doing just the same, but with few Assert()s.> >> > I'd like to evade creating GinBtree for just calling> > getLeftMostChild().  Also, few more places in ginvacuum.c do the same.> > We have the same amount of Assert()s in ginVacuumPostingTreeLeaves().> > So, let's keep it uniform.> >> > I would also like Peter Geoghegan to take a look at this patch before> > committing it.>> I've slightly adjusted commit message.  I'm going to commit this fix> if no objections.Please also find patch changing lock order in ginRedoDeletePage()preventing deadlock on standby.  I'm going to commit it too.--Alexander KorotkovPostgres Professional: http://www.postgrespro.comThe Russian Postgres Company

Re: 'infinity'::Interval should be added

2018-12-13 Thread Tom Lane

Chapman Flack  writes:
> On 12/13/18 04:41, Simon Riggs wrote:
>> [ we should allow this: ]
>> SELECT 'infinity'::interval;
>> ERROR:  invalid input syntax for type interval: "infinity"

> ... and if that is made to work, perhaps then arithmetic should be
> updated as well,

That wouldn't be optional.  All built-in functions on interval
would *have* to be updated to deal sanely with the new values.

(Fortunately, there aren't that many of them, IIRC.)

regards, tom lane

Re: alternative to PG_CATCH

2018-12-13 Thread Tom Lane

Andrew Dunstan  writes:
> But the block inside parentheses feels kinda funny, doesn't it?

+1.  I think this is a good concept, but let's put in enough effort to
not require weird syntax.

regards, tom lane

Re: Reorganize collation lookup time and place

2018-12-13 Thread Peter Eisentraut

On 12/12/2018 18:56, Andres Freund wrote:
>> This makes the collation lookup more similar to the function lookup.  In
>> many cases they now happen right next to each other.
>> pg_newlocale_from_collation() becomes analogous to fmgr_info() and
>> pg_locale_t to FmgrInfo *.
> Why isn't this integrated into fmgr_info()?

fmgr_info() looks up stuff in pg_proc.  pg_newlocale_...() looks up
stuff in pg_collation.  There is no overlap between them.

There is also not necessarily a one-to-one correspondence between
function and collation lookup calls.  For example, in some cases you
need to look up a sorting and a hashing functions, but only one
collation for both.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: 'infinity'::Interval should be added

2018-12-13 Thread Chapman Flack

On 12/13/18 04:41, Simon Riggs wrote:
> SELECT 'infinity'::timestamp;
> works
> 
> SELECT 'infinity'::interval;
> ERROR:  invalid input syntax for type interval: "infinity"

... and if that is made to work, perhaps then arithmetic should be
updated as well, with this producing an infinite interval result:

SELECT timestamptz 'infinity' - current_timestamp;
ERROR:  cannot subtract infinite timestamps

ISO (2011 draft) seems to have nothing to say about infinity, for
datetimes or intervals.

-Chap

Re: alternative to PG_CATCH

2018-12-13 Thread David Steele

On 12/13/18 7:26 AM, Kyotaro HORIGUCHI wrote:
> At Thu, 13 Dec 2018 11:33:39 +0100, Peter Eisentraut 
>  wrote in 
> 
>>
>> I've played with a way to express this more compactly:
>>
>> PG_TRY();
>> {
>> ... code that might throw ereport(ERROR) ...
>> }
>> PG_FINALLY({
>> cleanup();
>> });
>>
>> See attached patch for how this works out in practice.

+1

> Though I didn't look into individual change, this kind of
> refactoring looks good to me. But the syntax looks
> somewhat.. uh..
> 
> I'm not sure it is actually workable, but I suspect that what we
> need here is just a shortcut of 'PG_CATCH();{PG_RE_THROW();}'.
> Something like this:
> 
> #define PG_FINALLY()\
> } \
> else \
> { \
> PG_exception_stack = save_exception_stack; \
> error_context_stack = save_context_stack; \
> PG_RE_THROW();\
> } \
> PG_exception_stack = save_exception_stack;\
> error_context_stack = save_context_stack; \
> {
> 
> Which can be used as:
> 
> PG_TRY();
> {
> ... code that might throw ereport(ERROR) ...
> }
> PG_FINALLY();
> {
> cleanup();
> }
> PG_TRY_END();

I like this syntax better.  We use something very similar in the
pgBackRest project:

TRY_BEGIN()
{

}
CATCH(Error1)
{

}
CATCH(Error2)
{

}
CATCH_ANY()
{

}
FINALLY()
{

}
TRY_END();

The syntax works out simpler if the FINALLY is part of the TRY block.

See attached for the implementation.

Regards,
-- 
-David
da...@pgmasters.net
/***
Error Handler

Implement a try ... catch ... finally error handler.

TRY_BEGIN()
{

}
CATCH(Error1)
{

}
CATCH(Error2)
{

}
CATCH_ANY()
{

}
FINALLY()
{

}
TRY_END();

The CATCH() and FINALLY() blocks are optional but at least one must be 
specified.  There is no need for a TRY block by itself
because errors will automatically be propagated to the nearest try block in the 
call stack.

IMPORTANT: If a local variable of the function containing a TRY block is 
modified in the TRY_BEGIN() block and used later in the
function after an error is thrown, that variable must be declared "volatile" if 
the preserving the value is important.  Beware that
gcc's -Wclobbered warnings are almost entirely useless for catching such issues.

IMPORTANT: Never call return from within any of the error-handling blocks.
***/
#ifndef COMMON_ERROR_H
#define COMMON_ERROR_H

#include 
#include 
#include 

/***
Error type object
***/
typedef struct ErrorType ErrorType;

// Macro for declaring new error types
#define ERROR_DECLARE(name) 
   \
extern const ErrorType name

// Include error type declarations
#include "common/error.auto.h"

/***
Functions to get information about a generic error type
***/
int errorTypeCode(const ErrorType *errorType);
const ErrorType *errorTypeFromCode(int code);
const char *errorTypeName(const ErrorType *errorType);
const ErrorType *errorTypeParent(const ErrorType *errorType);
bool errorTypeExtends(const ErrorType *child, const ErrorType *parent);

/***
Functions to get information about the current error
***/
const ErrorType *errorType(void);
int errorCode(void);
const char *errorFileName(void);
const char *errorFunctionName(void);
int errorFileLine(void);
bool errorInstanceOf(const ErrorType *errorTypeTest);
const char *errorMessage(void);
const char *errorName(void);
const char *errorStackTrace(void);

/***
Functions to get information about the try stack
***/
unsigned int errorTryDepth(void);

/*

Re: Add timeline to partial WAL segments

2018-12-13 Thread David Steele

On 12/12/18 7:17 PM, Michael Paquier wrote:
> On Wed, Dec 12, 2018 at 07:54:05AM -0500, David Steele wrote:

>> But, we could at least use the . notation and end up with something like
>> 000100010001.0002.partial or perhaps
>> 000100010001.T0002.partial?  Maybe
>> 000100010001.0002.tpartial?
>>
>> I can't decide whether the .partial files generated by pg_receivewal are
>> a problem or not.  It seems to me that only the original cluster is
>> going to be able to stream this file -- once the segment is renamed to
>> .partial it won't be visible to pg_receivexlog.  So, .partial without a
>> timeline extension just means partial from the original timeline.
> 
> Still this does not actually solve the problem when two servers are
> trying to use the same timeline?  You would get conflicts with the
> history file as well in this case but the partial segment gets archived
> first..  It seems to me that it is kind of difficult to come with a
> totally bullet-proof solution.  

What we would like to do here is reduce the window in which the server
will make a bad choice by pushing timeline files first, then allow the
server to archive by using a more unique name for partial WAL.

It is understood that servers will need to be rebuilt in HA
environments, but we would like to reduce the frequency.  More
importantly we want to preserve the integrity of the WAL archive since
it serves as the ground truth for all the clusters.  Reducing duplicates
spares users decisions about what to delete to get their cluster
archiving again.

>> There's another difference.  The .partial generated by pg_receivewal is
>> an actively-worked-on file whereas the .partial generated by a promotion
>> is probably headed for oblivion.  I haven't see a single case where one
>> was used in an actual recovery (which doesn't mean it hasn't happened,
>> of course).
> 
> There are many people implementing their own backup solutions, it is
> hard to say that none of those solutions are actually able to copy the
> last partial file to replay up to the end of a wanted timeline for a
> PITR.

Indeed, we've thought about adding it to pgBackRest.  But we haven't,
and to my knowledge nobody else has either.  It will come though, and a
better way of naming these files will help make them useful.

Regards,
-- 
-David
da...@pgmasters.net

signature.asc
Description: OpenPGP digital signature

Re: alternative to PG_CATCH

2018-12-13 Thread Andrew Dunstan



On 12/13/18 5:33 AM, Peter Eisentraut wrote:
> This is a common pattern:
>
> PG_TRY();
> {
> ... code that might throw ereport(ERROR) ...
> }
> PG_CATCH();
> {
> cleanup();
> PG_RE_THROW();
> }
> PG_END_TRY();
> cleanup();  /* the same as above */
>
> I've played with a way to express this more compactly:
>
> PG_TRY();
> {
> ... code that might throw ereport(ERROR) ...
> }
> PG_FINALLY({
> cleanup();
> });
>
> See attached patch for how this works out in practice.
>
> Thoughts?  Other ideas?
>
> One problem is that this currently makes pgindent crash.  That's
> probably worth fixing anyway.  Or perhaps find a way to write this
> differently.
>


The pgindent issue isn't at all surprising. If we wanted to fix it the
way would be to get the perl script to rewrite it with something indent
would accept (e.g. move the block outside the parentheses) and then undo
that after the indent had run.


But the block inside parentheses feels kinda funny, doesn't it?


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Introducing SNI in TLS handshake for SSL connections

2018-12-13 Thread Andreas Karlsson


On 12/13/18 7:43 AM, Pablo Iranzo Gómez wrote:

haproxy is what is used behind, the idea is that haproxy by default when
enabled via a 'route' does allow http or https protocol ONLY, BUT
(https://docs.openshift.com/container-platform/3.9/architecture/networking/routes.html), 


routers do support TLS with SNI.

As PSQL by default tries TLS and fallbacks to plain psql protocol the
idea behind is that we tell OpenShift route to be 'Secure' and
'passtrough', in this way, when PSQL does speak to '443' port in the
route that goes to the 'pod' running postgres using TLS and SNI, the
connection goes thru without any special protocol change.


Sadly that confirms what I feared. Adding SNI to the PostgreSQL protocol 
wont help with solving your use case because the PostgreSQL protocol has 
its own handshake which happens before the SSL handshake so the session 
will not look like SSL to HA Proxy.


Just like HA Proxy does not support STARTTLS for IMAP[1] I do not think 
that it will ever support SSL for the PostgreSQL protocol, SNI or not.


To solve your use case I recommend using something like stunnel, which 
does support SNI, to wrap the unencrypted PostgreSQL protocol in SSL.


This all makes me very skeptical to if there is any realistic use case 
for adding SNI support to libpq, and just adding SNI support feels like 
adding a trap to people who do not know that they should rather use 
stunnel than PostgreSQL's built-in SSL support of they want to route on 
SNI with HA Proxy.


But I will attach my small patch for this, which I am now opposed to, 
anyway so the code exists if a use case turns up in the future (or if it 
turns out my reasoning above is incorrect).


Notes

1. 
https://www.haproxy.com/documentation/haproxy/deployment-guides/exchange-2010/imap4/


Andreas
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index d2e5b08541e..528757f775d 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1460,6 +1460,23 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
  
 
+ 
+  sslcompression
+  
+   
+If set to 1, the host name is sent to the server using SSL's
+SNI (Server Name Indication) extension.  If set
+to 0, no SNI extension will be sent.  The default is
+0.  This parameter is ignored if a connection without SSL is made.
+   
+
+   
+The PostgreSQL server ignores the SNI extension,
+but it can be used by SSL-aware proxy software.
+   
+  
+ 
+
  
   sslcert
   
@@ -7373,6 +7390,16 @@ myEventProc(PGEventId evtId, void *evtInfo, void *passThrough)
  
 
 
+
+ 
+  
+   PGSSLSNI
+  
+  PGSSLSNI behaves the same as the  connection parameter.
+ 
+
+
 
  
   
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index bc456fec0c2..4587e5ebb5a 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -278,6 +278,10 @@ static const internalPQconninfoOption PQconninfoOptions[] = {
 		"SSL-Compression", "", 1,
 	offsetof(struct pg_conn, sslcompression)},
 
+	{"sslsni", "PGSSLSNI", "0", NULL,
+		"SSL-SNI", "", 1,
+	offsetof(struct pg_conn, sslsni)},
+
 	{"sslcert", "PGSSLCERT", NULL, NULL,
 		"SSL-Client-Cert", "", 64,
 	offsetof(struct pg_conn, sslcert)},
@@ -3690,6 +3694,8 @@ freePGconn(PGconn *conn)
 		free(conn->sslcrl);
 	if (conn->sslcompression)
 		free(conn->sslcompression);
+	if (conn->sslsni)
+		free(conn->sslsni);
 	if (conn->requirepeer)
 		free(conn->requirepeer);
 	if (conn->connip)
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index beca3492e8d..fdae2eac74f 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -781,6 +781,7 @@ initialize_SSL(PGconn *conn)
 	char		homedir[MAXPGPATH];
 	char		fnbuf[MAXPGPATH];
 	char		sebuf[PG_STRERROR_R_BUFLEN];
+	char	   *host;
 	bool		have_homedir;
 	bool		have_cert;
 	bool		have_rootcert;
@@ -1183,6 +1184,11 @@ initialize_SSL(PGconn *conn)
 #endif
 #endif
 
+	host = conn->connhost[conn->whichhost].host;
+
+	if (conn->sslsni && conn->sslsni[0] == '1' && host)
+		SSL_set_tlsext_host_name(conn->ssl, host);
+
 	return 0;
 }
 
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 66fd317b949..9f69fbdf5fc 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -353,6 +353,7 @@ struct pg_conn
 	 * retransmits */
 	char	   *sslmode;		/* SSL mode (require,prefer,allow,disable) */
 	char	   *sslcompression; /* SSL compression (0 or 1) */
+	char	   *sslsni;			/* SSL SNI extension (0 or 1) */
 	char	   *sslkey;			/* client key filename */
 	char	   *sslcert;		/* client certificate filename */
 	char	   *sslrootcert;	/* root certificate filename */

Re: alternative to PG_CATCH

2018-12-13 Thread Kyotaro HORIGUCHI

At Thu, 13 Dec 2018 11:33:39 +0100, Peter Eisentraut 
 wrote in 

> This is a common pattern:
> 
> PG_TRY();
> {
> ... code that might throw ereport(ERROR) ...
> }
> PG_CATCH();
> {
> cleanup();
> PG_RE_THROW();
> }
> PG_END_TRY();
> cleanup();  /* the same as above */
> 
> I've played with a way to express this more compactly:
> 
> PG_TRY();
> {
> ... code that might throw ereport(ERROR) ...
> }
> PG_FINALLY({
> cleanup();
> });
> 
> See attached patch for how this works out in practice.
> 
> Thoughts?  Other ideas?
> 
> One problem is that this currently makes pgindent crash.  That's
> probably worth fixing anyway.  Or perhaps find a way to write this
> differently.

Though I didn't look into individual change, this kind of
refactoring looks good to me. But the syntax looks
somewhat.. uh..

I'm not sure it is actually workable, but I suspect that what we
need here is just a shortcut of 'PG_CATCH();{PG_RE_THROW();}'.
Something like this:

#define PG_FINALLY()\
} \
else \
{ \
PG_exception_stack = save_exception_stack; \
error_context_stack = save_context_stack; \
PG_RE_THROW();\
} \
PG_exception_stack = save_exception_stack;\
error_context_stack = save_context_stack; \
{

Which can be used as:

PG_TRY();
{
... code that might throw ereport(ERROR) ...
}
PG_FINALLY();
{
cleanup();
}
PG_TRY_END();


Or


#define PG_END_TRY_CATCHING_ALL()  \
PG_FINALLY(); \
PG_END_TRY();

PG_TRY();
{
... code that might throw ereport(ERROR) ...
}
PG_END_TRY_CATCHING_ALL();

cleanup();



regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

2018-12-13 Thread Alexander Korotkov

On Wed, Dec 12, 2018 at 4:08 PM Andrey Borodin  wrote:
> > 12 дек. 2018 г., в 3:26, Alexander Korotkov  
> > написал(а):
> >
> > BTW, I still can't see you're setting deleteXid field of
> > ginxlogDeletePage struct anywhere.
> Oops. Fixed.
> >
> > Also, I note that protection against re-usage of recently deleted
> > pages is implemented differently than it is in B-tree.
> > 1) You check TransactionIdPrecedes(GinPageGetDeleteXid(page),
> > RecentGlobalDataXmin)), while B-tree checks
> > TransactionIdPrecedes(opaque->btpo.xact, RecentGlobalXmin).  Could you
> > clarify why do we use RecentGlobalDataXmin instead of RecentGlobalXmin
> > here?
> As far as I understand RecentGlobalDataXmin may be slightly farther than 
> RecentGlobalXmin in case when replication_slot_catalog_xmin is holding 
> RecentGlobalXmin. And GIN is never used by catalog tables. But let's use 
> RecentGlobalXmin like in B-tree.
>
> > 2) B-tree checks this condition both before putting page into FSM and
> > after getting page from FSM.  You're checking only after getting page
> > from FSM.  Approach of B-tree looks better for me.  It's seems more
> > consistent when FSM pages are really free for usage.
> Fixed.

Thank you.  I've revised your patch and pushed it.  As long as two
other patches in this thread.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: tab-completion debug print

2018-12-13 Thread Kyotaro HORIGUCHI

Hello.

At Wed, 28 Nov 2018 17:28:39 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20181128.172839.242071562.horiguchi.kyot...@lab.ntt.co.jp>
> I'm not sure how much it is wanted but it's easy to do.  Using
> psql variable doesn't seem to make sense since the debug print
> (currently) requires session log file, which can be turned on
> only at the startup time. '-g' option is available by 0003.
> 
> $ psql -gL ~/psql.log  postgres 
> 
> > Can we conceive of a circumstance where the check for -L/\L would be
> > significant?  I've seen people type pretty quickly, but not thus far
> > fast enough to notice a cache miss.
> 
> We could switch completion_matches body as necessity using
> function poniter, but we aren't so eager for speed here. It is at
> most called every time entering tab.

So, the new version v6 patch consists of the following two files:

- v6-0001-Tab-copletion-debug-log.patch

  Full-context version. Activated by -DTABCOMPLETION_DEBUG at
  compile time and always emit logs into session log file
  specified -L. The format of a line looks like below.

> tab-complete.c:1870: (...table t1 alter [CO]) -> ("CO", "COLUMN", 
> "CONSTRAINT")


- v6-0002-Add-psql-g-option-to-control-debug-print.patch

  Applies on top of 0001. Code is always active, -g addition to
  -L activates debug print into the log file. If only -g is
  specified it is forcibly turned off.

  > $ psql postgres -g
  > psql: no session log file, turn off debug print
  > psql (12devel)
  > Type "help" for help.

  -g option shows in help message. (perhaps arguable.)

I'll register this to the next CF.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From ab5f2b27a13ee6fd19e0c496748d1b0d299e86d6 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 27 Nov 2018 14:57:46 +0900
Subject: [PATCH 1/2] Tab-copletion debug log

With this patch, psql built with TABCOMPLETION_DEBUG defined emits
tab-completion debug log into the file specified -L option.
---
 src/bin/psql/tab-complete.c | 82 +
 1 file changed, 75 insertions(+), 7 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index fa44b2820b..8addca60a6 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -60,6 +60,67 @@ extern char *filename_completion_function();
 #define completion_matches rl_completion_matches
 #endif
 
+/*
+ * By enabling TABCOMPLETION_DEBUG, every completion attempt is logged in
+ * session log file if any.
+ */
+#ifdef TABCOMPLETION_DEBUG
+
+#undef completion_matches
+#define completion_matches(text, func) \
+	completion_debug(__LINE__, (text), (func), \
+	 previous_words, previous_words_count)
+
+#define DEBUG_NCONTEXT 3
+
+static char **
+completion_debug(int line, const char *text, rl_compentry_func_t *func,
+ char **previous_words, int previous_words_count)
+{
+#ifdef HAVE_RL_COMPLETION_MATCHES
+	char **list = rl_completion_matches(text, func);
+#else
+	char **list = completion_matches(text, func);
+#endif
+
+	if (pset.logfile)
+	{
+		/* Emit completion log */
+
+		/* Enclose empty list with brackets since it is an intermediate state
+		 * which is immediately followed by a non-empty list.
+		 */
+		fprintf(pset.logfile, "%s:%d: %s(", __FILE__, line, list ? "" : "[");
+
+		/* input context */
+		if (previous_words_count > 0)
+		{
+			int nwords = DEBUG_NCONTEXT;
+
+			/* ellipse more than DEBUG_NCONTEXT words back */
+			if (previous_words_count > nwords)
+fprintf(pset.logfile, "...");
+			else
+nwords = previous_words_count;
+
+			for (int i = 0 ; i < nwords ; ++i)
+fprintf(pset.logfile, "%s ", previous_words[nwords - i - 1]);
+		}
+
+		fprintf(pset.logfile, "[%s]) -> (", text);
+
+		/* completion result */
+		for (int i = 0; list && list[i]; ++i)
+			fprintf(pset.logfile, "%s\"%s\"", i ? ", " : "", list[i]);			
+		fprintf(pset.logfile, ")%s\n", list ? "": "]");
+
+		fflush(pset.logfile);
+	}
+
+	return list;
+}
+#endif	/* TABCOMPLETION_DEBUG */
+
 /* word break characters */
 #define WORD_BREAKS		"\t\n@$><=;|&{() "
 
@@ -1012,7 +1073,8 @@ static void append_variable_names(char ***varnames, int *nvars,
 	  int *maxvars, const char *varname,
 	  const char *prefix, const char *suffix);
 static char **complete_from_variables(const char *text,
-		const char *prefix, const char *suffix, bool need_value);
+	  const char *prefix, const char *suffix, bool need_value,
+	  char ** previous_words, int previous_words_count);
 static char *complete_from_files(const char *text, int state);
 
 static char *pg_strdup_keyword_case(const char *s, const char *ref);
@@ -1385,11 +1447,14 @@ psql_completion(const char *text, int start, int end)
 	else if (text[0] == ':' && text[1] != ':')
 	{
 		if (text[1] == '\'')
-			matches = complete_from_variables(text, ":'", "'", true);
+			matches = complete_from_variables(text, ":'", "'", true,
+ 		previous_words, previous_words_count);
 		else if (text[1]

Re: allow online change primary_conninfo

2018-12-13 Thread Sergei Kornilov

Hello

> Do we no longer need static version of conninfo/slotname in
> walreceiver.c? We can detect a change of the variables without
> them (as you did in the previous version.).

Depends on this discussion: 
https://www.postgresql.org/message-id/20181212053042.gk17...@paquier.xyz
If walreceiver will use GUC conninfo for startup - then yes, we can remove such 
static variables.
If walreceiver will still use conninfo from WalRcv - we have race condition 
between RequestXLogStreaming from startup, config reload, and walreceiver 
start. In this case i need save conninfo from WalRcv and compare new GUC value 
with them.

> I don't think it is acceptable that raw conninfo is exposed into
> log file.
>
>>   LOG: parameter "primary_conninfo" changed to "host=/tmp port=5432 
>> password=hoge"

Hmm. We have infrastructure to hide such values?
I need implement something like flag GUC_HIDE_VALUE_FROM_LOG near 
GUC_SUPERUSER_ONLY in separate thread? Log message will be changed to "LOG: 
parameter "primary_conninfo" changed" with this flag.

I also think that this is not a big problem. We may already have secret data in 
the logs. For example, in query text from application.

>>  errmsg("closing replication connection because primary_conninfo was 
>> changed")));
>
> Maybe it is better being as follows according to similar messages:
>
>>  errmsg("terminating walreceiver process due to change of 
>> primary_conninfo")));
>
> And it *might* be good to have detail.
>
>>  errdetail("In a moment starts streaming WAL with new configuration.");

Agreed, will change.

regards, Sergei

alternative to PG_CATCH

2018-12-13 Thread Peter Eisentraut

This is a common pattern:

PG_TRY();
{
... code that might throw ereport(ERROR) ...
}
PG_CATCH();
{
cleanup();
PG_RE_THROW();
}
PG_END_TRY();
cleanup();  /* the same as above */

I've played with a way to express this more compactly:

PG_TRY();
{
... code that might throw ereport(ERROR) ...
}
PG_FINALLY({
cleanup();
});

See attached patch for how this works out in practice.

Thoughts?  Other ideas?

One problem is that this currently makes pgindent crash.  That's
probably worth fixing anyway.  Or perhaps find a way to write this
differently.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From e4d05fba0b2e97f7344c77b17a3b8aa6378ded8f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 13 Dec 2018 11:19:07 +0100
Subject: [PATCH] PG_FINALLY

This gives an alternative way of catching exceptions, for the common
case where the cleanup code is the same in the error and non-error
cases.  So instead of

PG_TRY();
{
... code that might throw ereport(ERROR) ...
}
PG_CATCH();
{
cleanup();
PG_RE_THROW();
}
PG_END_TRY();
cleanup();

one can write

PG_TRY();
{
... code that might throw ereport(ERROR) ...
}
PG_FINALLY({
cleanup();
});
---
 contrib/auto_explain/auto_explain.c   | 16 ++-
 contrib/dblink/dblink.c   | 31 +++---
 contrib/hstore_plpython/hstore_plpython.c |  8 +---
 .../pg_stat_statements/pg_stat_statements.c   | 24 +++
 contrib/pg_trgm/trgm_regexp.c |  9 +---
 contrib/postgres_fdw/connection.c |  9 +---
 contrib/sepgsql/hooks.c   |  8 +---
 contrib/sepgsql/label.c   | 42 +--
 contrib/sepgsql/selinux.c |  8 +---
 contrib/sepgsql/uavc.c|  9 +---
 src/backend/catalog/index.c   | 16 ++-
 src/backend/commands/async.c  | 18 +---
 src/backend/commands/copy.c   |  8 +---
 src/backend/commands/event_trigger.c  | 18 ++--
 src/backend/commands/extension.c  | 10 +
 src/backend/commands/matview.c|  8 +---
 src/backend/commands/subscriptioncmds.c   | 21 ++
 src/backend/commands/trigger.c|  8 +---
 src/backend/commands/vacuum.c | 10 +
 src/backend/libpq/be-fsstubs.c|  8 +---
 src/backend/tcop/utility.c| 18 ++--
 src/backend/utils/adt/xml.c   | 25 +++
 src/include/utils/elog.h  | 30 +
 src/pl/plperl/plperl.c| 24 ++-
 src/pl/plpgsql/src/pl_handler.c   | 13 ++
 src/pl/plpython/plpy_cursorobject.c   |  8 +---
 src/pl/plpython/plpy_elog.c   | 16 +--
 src/pl/plpython/plpy_exec.c   | 20 ++---
 src/pl/plpython/plpy_spi.c|  7 +---
 src/pl/plpython/plpy_typeio.c |  8 +---
 src/pl/tcl/pltcl.c| 17 ++--
 31 files changed, 126 insertions(+), 349 deletions(-)

diff --git a/contrib/auto_explain/auto_explain.c 
b/contrib/auto_explain/auto_explain.c
index 646cd0d42c..54d9a8c5e5 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -295,14 +295,10 @@ explain_ExecutorRun(QueryDesc *queryDesc, ScanDirection 
direction,
prev_ExecutorRun(queryDesc, direction, count, 
execute_once);
else
standard_ExecutorRun(queryDesc, direction, count, 
execute_once);
-   nesting_level--;
}
-   PG_CATCH();
-   {
+   PG_FINALLY({
nesting_level--;
-   PG_RE_THROW();
-   }
-   PG_END_TRY();
+   });
 }
 
 /*
@@ -318,14 +314,10 @@ explain_ExecutorFinish(QueryDesc *queryDesc)
prev_ExecutorFinish(queryDesc);
else
standard_ExecutorFinish(queryDesc);
-   nesting_level--;
}
-   PG_CATCH();
-   {
+   PG_FINALLY({
nesting_level--;
-   PG_RE_THROW();
-   }
-   PG_END_TRY();
+   });
 }
 
 /*
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 3b73ff13f1..068c2c602c 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -776,18 +776,11 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool 
is_async)
}
}
}
-   PG_CATCH();
-   {
+   PG_FINALLY({
/* if needed, close the connection to the database */
if (freeconn)
PQfinish(conn);
-   PG_RE_THROW();
-   }
-   P

Re: allow online change primary_conninfo

2018-12-13 Thread Kyotaro HORIGUCHI

At Tue, 11 Dec 2018 13:16:23 +0300, Sergei Kornilov  wrote in 
<9653601544523...@iva8-37fc2ad204cd.qloud-c.yandex.net>
sk> oops, forgot attach patch
sk> 
sk> 11.12.2018, 13:14, "Sergei Kornilov" :
sk> > Hello
sk> >
sk> > After some thinking i can rewrite this patch in another way.
sk> >
sk> > This is better or worse?

As the whole the new version looks better for me.

===
Do we no longer need static version of conninfo/slotname in
walreceiver.c? We can detect a change of the variables without
them (as you did in the previous version.).


===
I don't think it is acceptable that raw conninfo is exposed into
log file.

>  LOG:  parameter "primary_conninfo" changed to "host=/tmp port=5432 
> password=hoge"


===
> errmsg("closing replication connection because primary_conninfo was 
> changed")));

Maybe it is better being as follows according to similar messages:

> errmsg("terminating walreceiver process due to change of primary_conninfo")));

And it *might* be good to have detail.

> errdetail("In a moment starts streaming WAL with new configuration.");


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

'infinity'::Interval should be added

2018-12-13 Thread Simon Riggs

At present we have a timestamp of 'infinity' and infinite ranges, but no
infinite interval

SELECT 'infinity'::timestamp;
works

SELECT 'infinity'::interval;
ERROR:  invalid input syntax for type interval: "infinity"

Seems a strange anomaly that we should fix.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Row Visibility and Table Access Methods

2018-12-13 Thread Simon Riggs

Currently, tables provide MVCC access semantics as the only option.

A more complete list of desirable semantics in applications are

* MVCC - provide latest consistent view
* Historical - keep all old row versions so that queries can access data as
it used to be
* TTL=Duration - keep committed rows around for at most Duration seconds
* Latest - show only the most current committed row version, at the cost of
inconsistency
There might be others

Since I see these options as semantics rather than physical, I think we
should separate these operations away from Table Access Methods. This
allows those semantics to be implemented in different ways for different
storage types.

"Historical" access has been discussed many times, so no need to revisit
here. Clearly, it is a very poopular idea, just not easily feasible with
the current heap access layer. We might want an option for row_visibility
retention. For tables with this option set, we would in later releases
allow historical queries according to the SQL Standard.

"TTL" or "Time To Live" -  time-limited access to data is available in many
other databases. It is simple to implement and we could easily have this in
PG12. Implementation is 1) adding the option, 2) adding a time-component
into the visibility check for scan and vacuum. This option implies an
option exists to specify row_visibility retention.

"Latest" is similar to the way EvalPlanQual works, allowing UPDATE to see
the latest version of a row before update, and similar to the way catalog
scans work in that any access to a catalog entry sees the latest row based
upon an immediate snapshot, not the one taken at the start of a query. It
makes sense to allow this as an explicit table-level option, so any data
viewed can see the latest version, just as UPDATEs already do. This also
means that juggling bloat and old row versions becomes much less of an
issue for very high volume update applications such as booking systems or
stock tickers. (Clearly, better table access methods would improve on this
further and even they would benefit from this simplification of the main
issue around MVCC).
Logic for this type of visibility already exists in PG
via HeapTupleSatisfiesSelf(), so we are just exposing what is already there
to the user; no need to discuss semantics at length.
Again, patches to implement this option are simple and viable for PG12.

User interface are 2 new table-level options
* row_visibility = MVCC (default), TTL, Latest, Historical
* row_visibility_retention_interval = 'system' (default)
For MVCC, the only valid setting would be system, i.e. current MVCC
behavior (but this might be altered by specific storage plugin parameters)
For Latest, the only valid setting would be system
For TTL, the interval to retain data for, setting of 0 is not valid
For Historical, the interval to retain old row versions for, 0 means forever

Implementation summary
1. Add new table-level option for row_visibility and
row_visibility_retention_interval
2. Add option to heap_beginscan
3. Add option handling in heap prune
4. Add option to tqual

Thoughts?

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

88 matches

Mail list logo