Re: Postgres 11 release notes

2018-05-17 Thread Andrey Borodin
H, Bruce!

> 11 мая 2018 г., в 20:08, Bruce Momjian  написал(а):
> 
> I expect a torrent of feedback.  ;-)


I'm not sure it is usefull in release notes since it is more about API, and not 
user-facing change. Just in case.
GiST opclasses now can omit compress and decompress functions. If compress 
function is omited, IndexOnlyScan is enabled for opclass without any extra 
change.
https://github.com/postgres/postgres/commit/d3a4f89d8a3e500bd7c0b7a8a8a5ce1b47859128
 


Best regards, Andrey Borodin.

Re: Possible bug in logical replication.

2018-05-17 Thread Kyotaro HORIGUCHI
At Thu, 17 May 2018 13:54:07 +0300, Arseny Sher  wrote 
in <87in7md034.fsf@ars-thinkpad>
> 
> Konstantin Knizhnik  writes:
> 
> > I think that using restart_lsn instead of confirmed_flush is not right 
> > approach.
> > If restart_lsn is not available and confirmed_flush is pointing to page
> > boundary, then in any case we should somehow handle this case and adjust
> > startlsn to point on the valid record position (by jjust adding page header
> > size?).
> 
> Well, restart_lsn is always available on live slot: it is initially set
> in ReplicationSlotReserveWal during slot creation.

restart_lsn stays at the beginning of a transaction until the
transaction ends so just using restart_lsn allows repeated
decoding of a transaction, in short, rewinding occurs. The
function works only for inactive slot so the current code works
fine on this point. Addition to that restart_lsn also can be on a
page bounary.


We can see the problem easily.

1. Just create a logical replication slot with setting current LSN.

  select pg_create_logical_replication_slot('s1', 'pgoutput');

2. Advance LSN by two or three pages by doing anyting.

3. Advance the slot to a page bounadry.

  e.g. select pg_replication_slot_advance('s1', '0/9624000');

4. advance the slot further, then crash.

So directly set ctx->reader->EndRecPtr by startlsn fixes the
problem, but I found another problem here.

The function accepts any LSN even if it is not at the begiining
of a record. We will see errors or crashs or infinite waiting or
maybe any kind of trouble by such values. The moved LSN must
always be at the "end of a record" (that is, at the start of the
next recored). The attached patch also fixes this.

The documentation doesn't look requiring a fix.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index d9e10263bb..d3cb777f9f 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -318,6 +318,11 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
 
 /*
  * Helper function for advancing physical replication slot forward.
+ *
+ * This function accepts arbitrary LSN even if the LSN is not at the beginning
+ * of a record. This can lead to any kind of misbehavior but currently the
+ * value is used only to determine up to what wal segment to keep and
+ * successive implicit advancing fixes the state.
  */
 static XLogRecPtr
 pg_physical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
@@ -344,6 +349,7 @@ pg_logical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
 	LogicalDecodingContext *ctx;
 	ResourceOwner old_resowner = CurrentResourceOwner;
 	XLogRecPtr	retlsn = InvalidXLogRecPtr;
+	XLogRecPtr	upto;
 
 	PG_TRY();
 	{
@@ -354,6 +360,13 @@ pg_logical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
 	logical_read_local_xlog_page,
 	NULL, NULL, NULL);
 
+		/*
+		 * startlsn can be on page boundary but it is not accepted as explicit
+		 * parameter to XLogReadRecord. Set it in reader context.
+		 */
+		Assert(startlsn != InvalidXLogRecPtr);
+		upto = ctx->reader->EndRecPtr = startlsn;
+	
 		CurrentResourceOwner = ResourceOwnerCreate(CurrentResourceOwner,
    "logical decoding");
 
@@ -361,22 +374,18 @@ pg_logical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
 		InvalidateSystemCaches();
 
 		/* Decode until we run out of records */
-		while ((startlsn != InvalidXLogRecPtr && startlsn < moveto) ||
-			   (ctx->reader->EndRecPtr != InvalidXLogRecPtr && ctx->reader->EndRecPtr < moveto))
+		while (ctx->reader->EndRecPtr <= moveto)
 		{
 			XLogRecord *record;
 			char	   *errm = NULL;
+ 
+			/* ctx->reader->EndRecPtr cannot be go backward here */
+			upto = ctx->reader->EndRecPtr;
 
-			record = XLogReadRecord(ctx->reader, startlsn, );
+			record = XLogReadRecord(ctx->reader, InvalidXLogRecPtr, );
 			if (errm)
 elog(ERROR, "%s", errm);
 
-			/*
-			 * Now that we've set up the xlog reader state, subsequent calls
-			 * pass InvalidXLogRecPtr to say "continue from last record"
-			 */
-			startlsn = InvalidXLogRecPtr;
-
 			/*
 			 * The {begin_txn,change,commit_txn}_wrapper callbacks above will
 			 * store the description into our tuplestore.
@@ -384,18 +393,14 @@ pg_logical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
 			if (record != NULL)
 LogicalDecodingProcessRecord(ctx, ctx->reader);
 
-			/* check limits */
-			if (moveto <= ctx->reader->EndRecPtr)
-break;
-
 			CHECK_FOR_INTERRUPTS();
 		}
 
 		CurrentResourceOwner = old_resowner;
 
-		if (ctx->reader->EndRecPtr != InvalidXLogRecPtr)
+		if (startlsn != upto)
 		{
-			LogicalConfirmReceivedLocation(moveto);
+			LogicalConfirmReceivedLocation(upto);
 
 			/*
 			 * If only the confirmed_flush_lsn has changed the slot won't get


Re: Is a modern build system acceptable for older platforms

2018-05-17 Thread Pavel Stehule
2018-05-18 5:50 GMT+02:00 Craig Ringer :

> On 18 May 2018 at 11:10, Andres Freund  wrote:
>
>>
>>
>> On May 17, 2018 7:44:44 PM PDT, Bruce Momjian  wrote:
>> >On Thu, May  3, 2018 at 12:32:39AM +0200, Catalin Iacob wrote:
>> >> I do have a real concern about the long term attractiveness of the
>> >> project to new developers, especially younger ones as time passes.
>> >> It's not a secret that people will just avoid creaky old projects,
>> >and
>> >> for Postgres old out of fashion things do add up: autoconf, raw make,
>> >> Perl for tests, C89, old platform support. I have no doubt that the
>> >> project is already loosing competent potential developers due to
>> >this.
>> >> One can say this is superficial and those developers should look at
>> >> the important things but that does not change reality that some will
>> >> just say pass because of dislike of the old technologies I mentioned.
>> >> Personally, I can say that if the project were still in CVS I would
>> >> probably not bother as I just don't have energy to learn an inferior
>> >> old version control system especially as I see version control as
>> >> fundamental to a developer. I don't feel the balance between
>> >> recruiting new developers and end user benefits tilted enough to
>> >> replace the build system but maybe in some years that will be the
>> >> case.
>> >
>> >What percentage of our adoption decline from new developers is based on
>> >our build system, and how much of it is based on the fact we use the C
>> >language?
>>
>> I think neither is as strong a factor as our weird procedures and slow
>> review. People are used to github pull requests, working bug trackers,
>> etc.  I do think that using more modern C or a reasonable subset of
>> C++would make things easier. Don't think there's really an alternative
>> there quite yet.
>>
>>
> +10.
>
> Also - mailing lists. We're an ageing community and a lot of younger
> people just don't like or use mailing lists, let alone like to work *only*
> on mailing lists without forums, issue trackers, etc etc.
>
> I happen to be pretty OK with the status quo, but it's definitely harder
> to get involved casually or as a new participant. OTOH, that helps cut down
> the noise level of crap suggestions and terrible patches a little bit,
> which matters when we have limited review bandwidth.
>
> Then there's the Windows build setup - you can't just fire up Visual
> Studio and start learning the codebase.
>
> We also have what seems like half an OS worth of tooling to support our
> shared-nothing-by-default multi-processing model. Custom spinlocks, our
> LWLocks, our latches, signal based IPC + ProcSignal signal multiplexing,
> extension shmem reservation and allocation, DSM, DSA, longjmp based
> exception handling and unwinding ... the learning curve for PostgreSQL
> programming is a whole lot more than just C even before you get into the
> DB-related bits. And there's not a great deal of help with the learning
> curve.
>
> I keep wanting to write some blogs and docs on relevant parts, but you
> know how it is with time.
>
> The only part that build system changes would help with would be getting
> Windows/VS and OSX/XCode users started a little more easily. Which wouldn't
> help tons when they looked at our code and went "WTF, where do I find out
> what any of this stuff even is?".
>

+1

I have to maintain Orafce and plpgsql_check for MS Windows and it is not
nice work

Pavel



> (Yes, I know there are some good READMEs already, but often you need to
> understand quite a bit of the system before you can understand the
> READMEs...)
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>


Re: Problem while updating a foreign table pointing to a partitioned table on foreign server

2018-05-17 Thread Ashutosh Bapat
On Thu, May 17, 2018 at 11:56 PM, Robert Haas  wrote:
> On Thu, May 17, 2018 at 2:10 AM, Ashutosh Bapat
>  wrote:
>> The second would mean that SELECT * from foreign table reports
>> remotetableoid as well, which is awkward.
>
> No it wouldn't.  You'd just make the additional column resjunk, same
> as we do for wholerow.

You suggested
--
> I think that the place to start would be to change this code to use
> something other than TableOidAttributeNumber:
>
> +   var = makeVar(parsetree->resultRelation,
> + TableOidAttributeNumber,
> + OIDOID,
> + -1,
> + InvalidOid,
> + 0);
--

Wholerow has its own attribute number 0, ctid has its attribute number
-1. So we can easily create Vars for those and add resjunk entries in
the targetlist. But a "remotetableoid" doesn't have an attribute
number yet! Either it has to be a new system column, which I and
almost everybody here is opposing, or it has to be a user defined
attribute, with an entry in pg_attributes table. In the second case,
how would one make that column resjunk? I don't see any third
possibility.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Is a modern build system acceptable for older platforms

2018-05-17 Thread Craig Ringer
On 18 May 2018 at 11:10, Andres Freund  wrote:

>
>
> On May 17, 2018 7:44:44 PM PDT, Bruce Momjian  wrote:
> >On Thu, May  3, 2018 at 12:32:39AM +0200, Catalin Iacob wrote:
> >> I do have a real concern about the long term attractiveness of the
> >> project to new developers, especially younger ones as time passes.
> >> It's not a secret that people will just avoid creaky old projects,
> >and
> >> for Postgres old out of fashion things do add up: autoconf, raw make,
> >> Perl for tests, C89, old platform support. I have no doubt that the
> >> project is already loosing competent potential developers due to
> >this.
> >> One can say this is superficial and those developers should look at
> >> the important things but that does not change reality that some will
> >> just say pass because of dislike of the old technologies I mentioned.
> >> Personally, I can say that if the project were still in CVS I would
> >> probably not bother as I just don't have energy to learn an inferior
> >> old version control system especially as I see version control as
> >> fundamental to a developer. I don't feel the balance between
> >> recruiting new developers and end user benefits tilted enough to
> >> replace the build system but maybe in some years that will be the
> >> case.
> >
> >What percentage of our adoption decline from new developers is based on
> >our build system, and how much of it is based on the fact we use the C
> >language?
>
> I think neither is as strong a factor as our weird procedures and slow
> review. People are used to github pull requests, working bug trackers,
> etc.  I do think that using more modern C or a reasonable subset of
> C++would make things easier. Don't think there's really an alternative
> there quite yet.
>
>
+10.

Also - mailing lists. We're an ageing community and a lot of younger people
just don't like or use mailing lists, let alone like to work *only* on
mailing lists without forums, issue trackers, etc etc.

I happen to be pretty OK with the status quo, but it's definitely harder to
get involved casually or as a new participant. OTOH, that helps cut down
the noise level of crap suggestions and terrible patches a little bit,
which matters when we have limited review bandwidth.

Then there's the Windows build setup - you can't just fire up Visual Studio
and start learning the codebase.

We also have what seems like half an OS worth of tooling to support our
shared-nothing-by-default multi-processing model. Custom spinlocks, our
LWLocks, our latches, signal based IPC + ProcSignal signal multiplexing,
extension shmem reservation and allocation, DSM, DSA, longjmp based
exception handling and unwinding ... the learning curve for PostgreSQL
programming is a whole lot more than just C even before you get into the
DB-related bits. And there's not a great deal of help with the learning
curve.

I keep wanting to write some blogs and docs on relevant parts, but you know
how it is with time.

The only part that build system changes would help with would be getting
Windows/VS and OSX/XCode users started a little more easily. Which wouldn't
help tons when they looked at our code and went "WTF, where do I find out
what any of this stuff even is?".

(Yes, I know there are some good READMEs already, but often you need to
understand quite a bit of the system before you can understand the
READMEs...)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Is a modern build system acceptable for older platforms

2018-05-17 Thread Andres Freund


On May 17, 2018 7:44:44 PM PDT, Bruce Momjian  wrote:
>On Thu, May  3, 2018 at 12:32:39AM +0200, Catalin Iacob wrote:
>> I do have a real concern about the long term attractiveness of the
>> project to new developers, especially younger ones as time passes.
>> It's not a secret that people will just avoid creaky old projects,
>and
>> for Postgres old out of fashion things do add up: autoconf, raw make,
>> Perl for tests, C89, old platform support. I have no doubt that the
>> project is already loosing competent potential developers due to
>this.
>> One can say this is superficial and those developers should look at
>> the important things but that does not change reality that some will
>> just say pass because of dislike of the old technologies I mentioned.
>> Personally, I can say that if the project were still in CVS I would
>> probably not bother as I just don't have energy to learn an inferior
>> old version control system especially as I see version control as
>> fundamental to a developer. I don't feel the balance between
>> recruiting new developers and end user benefits tilted enough to
>> replace the build system but maybe in some years that will be the
>> case.
>
>What percentage of our adoption decline from new developers is based on
>our build system, and how much of it is based on the fact we use the C
>language?

I think neither is as strong a factor as our weird procedures and slow review. 
People are used to github pull requests, working bug trackers, etc.  I do think 
that using more modern C or a reasonable subset of C++would make things easier. 
Don't think there's really an alternative there quite yet.

Andres

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: SCRAM with channel binding downgrade attack

2018-05-17 Thread Michael Paquier
On Fri, May 18, 2018 at 10:46:46AM +0900, Michael Paquier wrote:
> From a security point of view, 1) is important for libpq, but I am not
> much enthusiast about 2) as a whole.  The backend has proper support for
> channel binding, hence other drivers speaking the protocol could have
> their own restriction mechanisms.

So, I have been playing with libpq to address point 1), and added a new
connection parameter called channel_binding_mode which can be set to
'prefer', which is what libpq uses now, and 'require'.  The patch has
two important parts:
1) If a server does not support channel binding, still it is sending
back a SCRAM authentication, but the client still wants to enforce the
use of channel binding, then libpq reacts as follows:
$ psql -d "dbname=foo channel_binding_mode=require"
psql: channel binding required for SASL authentication but no valid
mechanism could be selected
This requires pg_SASL_init() to be patched after the SASL mechanism has
been selected.  That error can be triggered with a v10 server with whom
a SCRAM authentication is done, as well as with a v11 server where SSL
is not used.  Some people may use sslmode=prefer in combination to
channel_binding_mode=require, in which case an error should be raised if
the SSL connection cannot be achieved first.  That addresses a bit of
the craziness of sslmode=prefer...
2) If client wants to use channel binding, but the server is trying to
enforce another protocol than SCRAM, like MD5, trust, gssapi or such,
then the following error happens:
$ psql -d "dbname=foo channel_binding_mode=require"
psql: channel binding required for authentication but no valid protocol are used
In this case, it seems to me that the best bet is to patch
pg_fe_sendauth() and filter by message types.

In the attached, I have added the parameter and some documentation.  I
have not added tests, but some things could be tested in the SSL suite:
- Check for incorrect values in the parameter.
- Test connection without SCRAM with "require"
- Test connection without SSL but SCRAM with "require"
I have not put much thoughts into the documentation, but the patch is
rather simple so hopefully that helps in making progress in the
discussion.
--
Michael
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 498b8df988..aedc65b63f 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1238,6 +1238,49 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
  
 
+ 
+  channel_binding_mode
+  
+   
+This option determines whether channel binding should be used for
+authentication or not.  As of now, channel binding is only supported
+with SCRAM authentication.  There are two modes:
+
+
+ 
+  prefer (default)
+  
+   
+Try to use channel binding if available but rely on the server
+depending on what types of SASL mechanisms are published during
+the message exchange.  This is the default, and it can be
+considered as insecure as this does not protect from servers
+attempting downgrade authentication attacks to a client.
+   
+  
+ 
+
+ 
+  require
+  
+   
+Reject any connection attempts to a server if channel binding
+is not supported by the protocol used.  Channel binding being
+supported only for SCRAM in the context of an SSL connection,
+a connection with this option enabled would fail except in this
+authentication context.  This protects from downgrade attacks
+and ensures that a SCRAM connection with channel binding is
+able to transparently achieve MITM protection with
+libpq.
+   
+  
+ 
+
+   
+
+  
+ 
+
  
   scram_channel_binding
   
diff --git a/src/interfaces/libpq/fe-auth.c b/src/interfaces/libpq/fe-auth.c
index 3b2073a47f..e23ab83c1b 100644
--- a/src/interfaces/libpq/fe-auth.c
+++ b/src/interfaces/libpq/fe-auth.c
@@ -547,6 +547,25 @@ pg_SASL_init(PGconn *conn, int payloadlen)
 		goto error;
 	}
 
+	/*
+	 * If client is willing to enforce the use the channel binding but
+	 * it has not been previously selected, because it was either not
+	 * published by the server or could not be selected, then complain
+	 * back.  If SSL is not used for this connection, still complain
+	 * similarly, as the client may want channel binding but forgot
+	 * to set it up to do so which could be the case with sslmode=prefer
+	 * for example.  This protects from any kind of downgrade attacks
+	 * from rogue servers attempting to bypass channel binding.
+	 */
+	if (conn->channel_binding_mode &&
+		strcmp(conn->channel_binding_mode, "require") == 0 &&
+		strcmp(selected_mechanism, SCRAM_SHA_256_PLUS_NAME) != 0)
+	{
+		printfPQExpBuffer(>errorMessage,
+		  libpq_gettext("channel binding required for SASL 

Re: Odd procedure resolution

2018-05-17 Thread Peter Eisentraut
On 5/17/18 16:54, Tom Lane wrote:
> TBH, this is several months too late for v11.

Maybe, but ...

You're talking about a
> really fundamental redesign, at least if it's done right and not as a
> desperate last-minute hack (which is what this looks like).  The points
> you make here are just the tip of the iceberg of things that would need
> to be reconsidered.

I disagree with that assessment.  The patch is large because it reverts
some earlier changes.  But I think the new setup is sound, in large
parts cleaner than before, and addresses various comments that have been
made.  I leave it up to the community to decide whether we want to
address this now or later.

One thing to take into consideration is that leaving things as is for
PG11 and then making this change or one like it in PG12 would create
quite a bit of churn in client programs like psql, pg_dump, and probably
things like pgadmin.

> I also remain of the opinion that if we're to separate these namespaces,
> the way to do that is to put procedures somewhere other than pg_proc.

That would just create an unfathomable amount of code duplication and
mess and unnecessary extra work in PL implementations.

> Unless you can show that "separate namespaces" is the *only* correct
> reading of the SQL spec, which I doubt given the ROUTINE syntax,
> I think we're pretty much stuck with the choice we made already.

I have apparently read the SQL standard differently at different times.
Are you asking me whether I think my current reading is more correct
than my previous ones?  Well, yes.  But someone else should perhaps
check that.  Start with the syntax rules for .

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Is a modern build system acceptable for older platforms

2018-05-17 Thread Bruce Momjian
On Thu, May  3, 2018 at 12:32:39AM +0200, Catalin Iacob wrote:
> I do have a real concern about the long term attractiveness of the
> project to new developers, especially younger ones as time passes.
> It's not a secret that people will just avoid creaky old projects, and
> for Postgres old out of fashion things do add up: autoconf, raw make,
> Perl for tests, C89, old platform support. I have no doubt that the
> project is already loosing competent potential developers due to this.
> One can say this is superficial and those developers should look at
> the important things but that does not change reality that some will
> just say pass because of dislike of the old technologies I mentioned.
> Personally, I can say that if the project were still in CVS I would
> probably not bother as I just don't have energy to learn an inferior
> old version control system especially as I see version control as
> fundamental to a developer. I don't feel the balance between
> recruiting new developers and end user benefits tilted enough to
> replace the build system but maybe in some years that will be the
> case.

What percentage of our adoption decline from new developers is based on
our build system, and how much of it is based on the fact we use the C
language?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Transform for pl/perl

2018-05-17 Thread Peter Eisentraut
On 5/17/18 17:11, Alvaro Herrera wrote:
> This is still listed as an open item, though the patch proposed by Peter
> upthread has been committed.  If I understand correctly, ilmari was
> going to propose another patch.  Or is the right course of action to set
> the open item as resolved?

The items that are still open from the original email are:

2) jsonb scalar values are passed to the plperl function wrapped in not
   one, but _two_ layers of references

3) jsonb numeric values are passed as perl's NV (floating point) type,
   losing precision if they're integers that would fit in an IV or UV.

#2 appears to be a quality of implementation issue without any
user-visible effects.

#3 is an opportunity for future improvement, but works as intended right
now.

I think patches for these issues could still be considered during beta,
but they are not release blockers IMO.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-17 Thread Amit Langote
On 2018/05/18 5:56, David Rowley wrote:
> On 18 May 2018 at 06:21, Robert Haas  wrote:
>> All right, so let's just say that explicitly.  Maybe something like
>> the attached.
> 
> That looks fine to me.

Me too, except:

+* *pds list is the root partition, so 0 always means the first leaf. 
When

I prefer "root partitioned table" over "root partition", and git grep
suggests that that's actually what we use elsewhere in the source code.

Thanks,
Amit




Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-17 Thread Amit Langote
On 2018/05/18 6:14, David Rowley wrote:
> On 18 May 2018 at 02:13, Tom Lane  wrote:
>> Maybe what you need is a redesign.  This convention seems impossibly
>> confusing and hence error-prone.  What about using a separate bool to
>> indicate which list the index refers to?
> 
> While I agree that the coding is a bit unusual, I think it's also good
> that we can get away without allocating yet another array nparts in
> size. ExecSetupPartitionTupleRouting is already a huge bottleneck with
> single-row INSERT into a partitioned table with a large number of
> partitions. Allocating yet another array nparts in size will just slow
> it down further.

I recall having considered the idea of adding an array of bools, but went
with the negative-indexes-for-partitioned-tables idea anyway, which I
remember was suggested by Robert back then [1].  I admit it's a bit
confusing, but it's nice not have one more array allocation in that path
as you say.

> I have patches locally that I'll be submitting during the v12 cycle to
> improve on this. Among other things, the patches go to lengths to not
> allocate these arrays when we don't have to.

That would be nice.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmobF2r%3Df-crrE-k7WM8iFpBKLz3dtBtEc%3DKmkudYViYcyQ%40mail.gmail.com




Re: Postgres 11 release notes

2018-05-17 Thread Michael Paquier
On Thu, May 17, 2018 at 02:23:00PM -0400, Bruce Momjian wrote:
> On Thu, May 17, 2018 at 10:35:53PM +0900, Michael Paquier wrote:
> > Hi Bruce,
> > 
> > Here is some bonus feedback.
> > 
> > On Fri, May 11, 2018 at 11:08:52AM -0400, Bruce Momjian wrote:
> > > I expect a torrent of feedback.  ;-)
> > 
> > I have just noticed that this entry does not have the correct author
> > (guess who?):
> 
> Fixed.  (I think I guessed right.)

Thanks, Bruce.  It looks that your guess is right.
--
Michael


signature.asc
Description: PGP signature


Re: Built-in connection pooling

2018-05-17 Thread Bruce Momjian
On Fri, May  4, 2018 at 03:25:15PM -0400, Robert Haas wrote:
> On Fri, May 4, 2018 at 11:22 AM, Merlin Moncure  wrote:
> > If we are breaking 1:1 backend:session relationship, what controls
> > would we have to manage resource consumption?
> 
> I mean, if you have a large number of sessions open, it's going to
> take more memory in any design.  If there are multiple sessions per
> backend, there may be some possibility to save memory by allocating it
> per-backend rather than per-session; it shouldn't be any worse than if
> you didn't have pooling in the first place.
> 
> However, I think that's probably worrying about the wrong end of the
> problem first.  IMHO, what we ought to start by doing is considering
> what a good architecture for this would be, and how to solve the
> general problem of per-backend session state.  If we figure that out,
> then we could worry about optimizing whatever needs optimizing, e.g.
> memory usage.

Yes, I think this matches my previous question --- if we are going to
swap out session state to allow multiple sessions to multiplex in the
same OS process, and that swapping has similar overhead to how the OS
swaps processes, why not just let the OS continue doing the process
swapping.

I think we need to first find out what it is that makes high session
counts slow.  For example, if we swap out session state, will we check
the visibility rules for the swapped out session.  If not, and that is
what makes swapping session state make Postgres faster, let's just find
a way to skip checking visibility rules for inactive sessions and get
the same benefit more simply.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



RE: log_min_messages shows debug instead of debug2

2018-05-17 Thread Ideriha, Takeshi
>-Original Message-
>From: Robert Haas [mailto:robertmh...@gmail.com]
>OK, I'm happy enough to commit it then, barring other objections.  I was just 
>going to
>just do that but then I realized we're in feature freeze right now, so I 
>suppose this
>should go into the next CommitFest.

Thank you for your discussion.
Sure, I registed it to the next CommitFest with 'Ready for Committer'.

https://commitfest.postgresql.org/18/1638/ 

Regards,
Takeshi Ideriha


Re: Infinite loop on master shutdown

2018-05-17 Thread Kyotaro HORIGUCHI
At Thu, 17 May 2018 09:20:01 -0700, Andres Freund  wrote in 
<20180517162001.rzd7l6g2h66hv...@alap3.anarazel.de>
> Hi,
> 
> On 2018-05-17 17:19:00 +0900, Kyotaro HORIGUCHI wrote:
> > Hello, as in pgsql-bug ML.
> > 
> > https://www.postgresql.org/message-id/20180517.170021.24356216.horiguchi.kyot...@lab.ntt.co.jp
> > 
> > Master can go into infinite loop on shutdown. But it is caused by
> > a broken database like storage rolled-back one. (The steps to
> > replay this is shown in the above mail.)
> > 
> > I think this can be avoided by rejecting a standby if it reports
> > that write LSN is smaller than flush LSN after catching up.
> > 
> > Is it worth fixing?
> 
> I'm very doubtful. If you do bad stuff to a standby, bad things can
> happen...

Yes, I doubted its worthiness since I didn't find more natural
way to cause that.

Thanks for the opinion.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: PG 11 feature count

2018-05-17 Thread Bruce Momjian
On Thu, May 17, 2018 at 05:01:17PM -0700, Andres Freund wrote:
> On 2018-05-17 19:56:43 -0400, Tom Lane wrote:
> > David Rowley  writes:
> > > On 18 May 2018 at 11:29, Bruce Momjian  wrote:
> > >> I regularly track the number of items documented in each major release.
> > >> I use the attached script.  You might be surprised to learn that PG 11
> > >> has the lowest feature count of any release back through 7.4:
> > 
> > > Interesting.  I wonder how much of that drop over the past few years
> > > can be accounted for by the fact that easier stuff tends to get
> > > implemented first, and now we're all just left with the hard stuff.
> > 
> > I don't think the "features" are all the same size, either.
> > Procedures and JIT are both pretty major things ...
> 
> Yea. You could easily break down either feature into at least 10
> sub-features that would independently be listed if they happend in
> subsequent releases...  I don't think counting items in the release
> notes yields something particularly meaningful.

Agreed, but I reported the number in case someone can find some meaning
in it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: PG 11 feature count

2018-05-17 Thread Gavin Flower

On 18/05/18 11:29, Bruce Momjian wrote:

I regularly track the number of items documented in each major release.
I use the attached script.  You might be surprised to learn that PG 11
has the lowest feature count of any release back through 7.4:

7.4 280
8.0 238
8.1 187
8.2 230
8.3 237
8.4 330
9.0 252
9.1 213
9.2 250
9.3 187
9.4 217
9.5 200
9.6 220
10  194
11  167

I try to use the same criteria in choosing items each year, though I
certainly am not perfectly accurate.

One reason the PG 11 count is lower is because the the major items for
this release are not listed at the top yet, but that is only around six
items.


I wonder what the ranking would be in terms of:

1. Complexity
2. Usefulness
3. Lines-of-Code
4. ...


Suspect Lines-of-code is the one most easily measured, but the least useful!

Whereas: Usefulness is probably the most valuable, but the most 
difficult to measure -- for obvious reasons...



Cheers,
Gavin




Re: PG 11 feature count

2018-05-17 Thread Andres Freund
On 2018-05-17 19:56:43 -0400, Tom Lane wrote:
> David Rowley  writes:
> > On 18 May 2018 at 11:29, Bruce Momjian  wrote:
> >> I regularly track the number of items documented in each major release.
> >> I use the attached script.  You might be surprised to learn that PG 11
> >> has the lowest feature count of any release back through 7.4:
> 
> > Interesting.  I wonder how much of that drop over the past few years
> > can be accounted for by the fact that easier stuff tends to get
> > implemented first, and now we're all just left with the hard stuff.
> 
> I don't think the "features" are all the same size, either.
> Procedures and JIT are both pretty major things ...

Yea. You could easily break down either feature into at least 10
sub-features that would independently be listed if they happend in
subsequent releases...  I don't think counting items in the release
notes yields something particularly meaningful.

Greetings,

Andres Freund



Re: PG 11 feature count

2018-05-17 Thread Tom Lane
David Rowley  writes:
> On 18 May 2018 at 11:29, Bruce Momjian  wrote:
>> I regularly track the number of items documented in each major release.
>> I use the attached script.  You might be surprised to learn that PG 11
>> has the lowest feature count of any release back through 7.4:

> Interesting.  I wonder how much of that drop over the past few years
> can be accounted for by the fact that easier stuff tends to get
> implemented first, and now we're all just left with the hard stuff.

I don't think the "features" are all the same size, either.
Procedures and JIT are both pretty major things ...

regards, tom lane



Re: PG 11 feature count

2018-05-17 Thread David Rowley
On 18 May 2018 at 11:29, Bruce Momjian  wrote:
> I regularly track the number of items documented in each major release.
> I use the attached script.  You might be surprised to learn that PG 11
> has the lowest feature count of any release back through 7.4:

Interesting.  I wonder how much of that drop over the past few years
can be accounted for by the fact that easier stuff tends to get
implemented first, and now we're all just left with the hard stuff.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



PG 11 feature count

2018-05-17 Thread Bruce Momjian
I regularly track the number of items documented in each major release.
I use the attached script.  You might be surprised to learn that PG 11
has the lowest feature count of any release back through 7.4:

7.4 280
8.0 238
8.1 187
8.2 230
8.3 237
8.4 330
9.0 252
9.1 213
9.2 250
9.3 187
9.4 217
9.5 200
9.6 220
10  194
11  167

I try to use the same criteria in choosing items each year, though I
certainly am not perfectly accurate.

One reason the PG 11 count is lower is because the the major items for
this release are not listed at the top yet, but that is only around six
items.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +
:

for FILE in $(ls /pgsgml/release-*.sgml | grep -v /pgsgml/release-old\.sgml)
do  echo -n "$FILE"
awk 'BEGIN { major_start = "N" }
$0 ~ /Release [0-9]\.?[0-9]'
done |
sed -e 's;^/pgsgml/release-;;' -e 's/\.sgml//' |
sort -n |
sed 's/  */ /'


Re: Transform for pl/perl

2018-05-17 Thread Tom Lane
Alvaro Herrera  writes:
> This is still listed as an open item, though the patch proposed by Peter
> upthread has been committed.  If I understand correctly, ilmari was
> going to propose another patch.  Or is the right course of action to set
> the open item as resolved?

AIUI, ilmari complained about several things only some of which have been
resolved, so that this is still an open item.  But I think the ball is in
his court to propose a patch.

regards, tom lane



Re: Problem while updating a foreign table pointing to a partitioned table on foreign server

2018-05-17 Thread Tom Lane
Alvaro Herrera  writes:
> Can we just add a new junk attr, with its own fixed system column
> number?  I think that's what Robert was proposing.

Junk attr yes, "fixed system column number" no.  That's not how
junk attrs work.  What it'd need is a convention for the name of
these resjunk attrs (corresponding to ctidN, wholerowN, etc).
We do already have tableoidN junk attrs, but by the same token
those should always be local OIDs, or we'll be in for deep
confusion.  Maybe "remotetableoidN" ?

regards, tom lane



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-17 Thread David Rowley
On 18 May 2018 at 02:13, Tom Lane  wrote:
> Maybe what you need is a redesign.  This convention seems impossibly
> confusing and hence error-prone.  What about using a separate bool to
> indicate which list the index refers to?

While I agree that the coding is a bit unusual, I think it's also good
that we can get away without allocating yet another array nparts in
size. ExecSetupPartitionTupleRouting is already a huge bottleneck with
single-row INSERT into a partitioned table with a large number of
partitions. Allocating yet another array nparts in size will just slow
it down further.

I have patches locally that I'll be submitting during the v12 cycle to
improve on this. Among other things, the patches go to lengths to not
allocate these arrays when we don't have to.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Transform for pl/perl

2018-05-17 Thread Alvaro Herrera
Hello

This is still listed as an open item, though the patch proposed by Peter
upthread has been committed.  If I understand correctly, ilmari was
going to propose another patch.  Or is the right course of action to set
the open item as resolved?


On 2018-May-02, Dagfinn Ilmari Mannsåker wrote:

> Peter Eisentraut  writes:
> 
> > These two items are now outstanding:
> >
> > On 4/10/18 07:33, Dagfinn Ilmari Mannsåker wrote:
> >> 2) jsonb scalar values are passed to the plperl function wrapped in not
> >>one, but _two_ layers of references
> >
> > I don't understand this one, or why it's a problem, or what to do about it.
> 
> It means that if you call a jsonb-transforming pl/perl function like
> 
>select somefunc(jsonb '42');
> 
> it receives not the scalar 42, but reference to a reference to the
> scalar (**int instead of an int, in C terms).  This is not caught by the
> current round-trip tests because the output transform automatically
> dereferences any number of references on the way out again.
> 
> The fix is to reshuffle the newRV() calls in Jsonb_to_SV() and
> jsonb_to_plperl().  I am working on a patch (and improved tests) for
> this, but have not have had time to finish it yet.  I hope be able to in
> the next week or so.
> 
> >> 3) jsonb numeric values are passed as perl's NV (floating point) type,
> >>losing precision if they're integers that would fit in an IV or UV.
> >
> > This seems fixable, but perhaps we need to think through whether this
> > will result in other strange behaviors.
> 
> Nubers > 2⁵³ are not "interoperable" in the sense of the JSON spec,
> because JavaScript only has doubles, but it seems desirable to preserve
> whatever precision one reasonably can, and I can't think of any
> downsides.  We already support the full numeric range when processing
> JSONB in SQL, it's just in the PL/Perl transform (and possibly
> PL/Python, I didn't look) we're losing precision.
> 
> Perl can also be configured to use long double or __float128 (via
> libquadmath) for its NV type, but I think preserving 64bit integers when
> building against a Perl with a 64bit integer type would be sufficient.
> 
> - ilmari
> -- 
> "The surreality of the universe tends towards a maximum" -- Skud's Law
> "Never formulate a law or axiom that you're not prepared to live with
>  the consequences of."  -- Skud's Meta-Law
> 


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Problem while updating a foreign table pointing to a partitioned table on foreign server

2018-05-17 Thread Tom Lane
Robert Haas  writes:
> Yeah, but I'm not sure I like that solution very much.  I don't think
> abusing the tableoid to store a remote table OID is very nice.

I'd say it's totally unacceptable.  Tableoid *has to* be something
that you can look up in the local pg_class instance, or serious
confusion will ensue.

regards, tom lane



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-17 Thread David Rowley
On 18 May 2018 at 06:21, Robert Haas  wrote:
> All right, so let's just say that explicitly.  Maybe something like
> the attached.

That looks fine to me.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Odd procedure resolution

2018-05-17 Thread Tom Lane
Peter Eisentraut  writes:
> I think I have made a mistake here.  I was reading in between the lines
> of a competitor's documentation that they have functions and procedures
> in different name spaces, which made me re-read the SQL standard, which
> appears to support that approach.

> So I'm proposing here a patch to fix that.  It is similar to the patch
> proposed earlier in the thread, but more extensive.

> One open problem in my patch is that regproc/regprocedure don't have a
> way to distinguish functions from procedures.  Maybe a two-argument
> version of to_regprocedure?  This will also affect psql's \ef function
> and the like.

TBH, this is several months too late for v11.  You're talking about a
really fundamental redesign, at least if it's done right and not as a
desperate last-minute hack (which is what this looks like).  The points
you make here are just the tip of the iceberg of things that would need
to be reconsidered.

I also remain of the opinion that if we're to separate these namespaces,
the way to do that is to put procedures somewhere other than pg_proc.

Unless you can show that "separate namespaces" is the *only* correct
reading of the SQL spec, which I doubt given the ROUTINE syntax,
I think we're pretty much stuck with the choice we made already.

Or we can push beta back a month or two while we rethink that.
But there's no way you're convincing me that this is a good change
to make four days before beta.

regards, tom lane



Re: [GSoC] Question about returning bytea array

2018-05-17 Thread Andrew Gierth
> "Charles" == Charles Cui  writes:

 Charles> I have the requirements to return a bytea array for some
 Charles> functions in pg_thrift plugin.

If you mean you want the return value to be of type bytea[], i.e. an SQL
array of bytea values, then you need to be using construct_array to
construct the result.

-- 
Andrew (irc:RhodiumToad)



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-17 Thread Alvaro Herrera
On 2018-May-17, Tom Lane wrote:

> Robert Haas  writes:
> > Hang on, I can't be wrong (famous last words).  If the negative
> > indexes were 0-based, that would mean that the first element of the
> > list was referenced by -0, which obviously can't be true, because 0 =
> > -0.  In other words, we can't be using 0-based indexing for both the
> > positive and the negative values, because then 0 itself would be
> > ambiguous.  It's got to be that -1 is the first element of the *pds
> > list, which means -- AFAICS, anyway -- that the way I phrased it is
> > correct.

> Maybe what you need is a redesign.  This convention seems impossibly
> confusing and hence error-prone.  What about using a separate bool to
> indicate which list the index refers to?

That was my impression I first came across this, FWIW, and I confess I
didn't try hard enough to understand it fully.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Problem while updating a foreign table pointing to a partitioned table on foreign server

2018-05-17 Thread Robert Haas
On Thu, May 17, 2018 at 2:10 AM, Ashutosh Bapat
 wrote:
> The second would mean that SELECT * from foreign table reports
> remotetableoid as well, which is awkward.

No it wouldn't.  You'd just make the additional column resjunk, same
as we do for wholerow.

> Anyway, my comment to which you have replied is obsolete now. I found
> a solution to that problem, which I have implemented in 0003 in the
> latest patch-set I have shared.

Yeah, but I'm not sure I like that solution very much.  I don't think
abusing the tableoid to store a remote table OID is very nice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Postgres 11 release notes

2018-05-17 Thread Bruce Momjian
On Thu, May 17, 2018 at 10:35:53PM +0900, Michael Paquier wrote:
> Hi Bruce,
> 
> Here is some bonus feedback.
> 
> On Fri, May 11, 2018 at 11:08:52AM -0400, Bruce Momjian wrote:
> > I expect a torrent of feedback.  ;-)
> 
> I have just noticed that this entry does not have the correct author
> (guess who?):

Fixed.  (I think I guessed right.)

>
> Add libpq option to support channel binding when using  linkend="auth-password">SCRAM
> authentication (Peter Eisentraut)
>
> 
> I think that there should be two different entries in the release notes
> for channel binding:
> 1) The new connection parameter in libpq which allows to control the
> channel binding name, as well as to decide if it should be disabled.
> I would think that this is better placed within the section for client
> interface changes.

Well, I tend to put items in the first section that applies, and in this
case, you are right that the API is libpq but the feature is
authentication.  We know we are going to need to adjust this feature,
so let's see where it ends up and let's revisit it.

> 2) Channel binding itself, which should be part of the authentication
> section.
> 
>
> Have libpq's  linkend="libpq-pqhost">PQhost()
> always return the actual connected host (Hari Babu)
>
> Should this be added as well in the section "Client interfaces"?

Well, again, using the rules above, the PQhost item goes into the first
section where it fits, and incompatibility is the first such section. 
There are other items in incompatibility that could be moved to lower,
but again, we want the incompatibilities to all be in the same place.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-17 Thread Robert Haas
On Thu, May 17, 2018 at 10:36 AM, Amit Langote  wrote:
> On Thu, May 17, 2018 at 10:29 PM, Robert Haas  wrote:
>> Unless the indexing system actually can't reference the first element
>> of *pds, and -1 means the second element.  But then I think we need a
>> more verbose explanation here.
>
> First element in *pds list (and the array subsequently created from
> it) contains the root table's entry.  So, a -1 does mean the 2nd entry
> in that list/array.  A 0 in the indexes array always refers to a leaf
> partition and hence an index into the array for leaf partitions.

All right, so let's just say that explicitly.  Maybe something like
the attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


another-try.patch
Description: Binary data


Re: Postgres, fsync, and OSs (specifically linux)

2018-05-17 Thread Robert Haas
On Thu, May 17, 2018 at 12:44 PM, Andres Freund  wrote:
> Hi,
>
> On 2018-05-10 09:50:03 +0800, Craig Ringer wrote:
>>   while ((src = (RewriteMappingFile *) hash_seq_search(_status)) != 
>> NULL)
>>   {
>>   if (FileSync(src->vfd, WAIT_EVENT_LOGICAL_REWRITE_SYNC) != 0)
>> - ereport(ERROR,
>> + ereport(PANIC,
>>   (errcode_for_file_access(),
>>errmsg("could not fsync file \"%s\": 
>> %m", src->path)));
>
> To me this (and the other callers) doesn't quite look right. First, I
> think we should probably be a bit more restrictive about when PANIC
> out. It seems like we should PANIC on ENOSPC and EIO, but possibly not
> others.  Secondly, I think we should centralize the error handling. It
> seems likely that we'll acrue some platform specific workarounds, and I
> don't want to copy that knowledge everywhere.

Maybe something like:

ereport(promote_eio_to_panic(ERROR), ...)

?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PROPOSAL] Shared Ispell dictionaries

2018-05-17 Thread Robert Haas
On Thu, May 17, 2018 at 1:52 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, May 17, 2018 at 10:18 AM, Tom Lane  wrote:
>>> I think the point you've not addressed is that "syscache callback
>>> occurred" does not equate to "object was dropped".  Can the code
>>> survive having this occur at any invalidation point?
>>> (CLOBBER_CACHE_ALWAYS testing would soon expose any fallacy there.)
>
>> Well, I'm not advocating for a lack of testing, and
>> CLOBBER_CACHE_ALWAYS testing is a good idea.  However, I suspect that
>> calling dsm_detach() from a syscache callback should be fine.
>> Obviously there will be trouble if the surrounding code is still using
>> that mapping, but that would be a bug at some higher level, like using
>> an object without locking it.
>
> No, you're clearly not getting the point.  You could have an absolutely
> airtight exclusive lock of any description whatsoever, and that would
> provide no guarantee at all that you don't get a cache flush callback.
> It's only a cache, not a catalog, and it can get flushed for any reason
> or no reason.  (That's why we have pin counts on catcache and relcache
> entries, rather than assuming that locking the corresponding object is
> enough.)  So I think it's highly likely that unmapping in a syscache
> callback is going to lead quickly to SIGSEGV.  The only way it would not
> is if we keep the shared dictionary mapped only in short straight-line
> code segments that never do any other catalog accesses ... which seems
> awkward, inefficient, and error-prone.

Yeah, that's true, but again, you can work around that problem.  A DSM
mapping is fundamentally not that different from a backend-private
memory allocation.  If you can avoid freeing memory while you're
referencing it -- as the catcache and the syscache clearly do -- you
can avoid it here, too.

> Do we actually need to worry about unmapping promptly on DROP TEXT
> DICTIONARY?  It seems like the only downside of not doing that is that
> we'd leak some address space until process exit.  If you were thrashing
> dictionaries at some unreasonable rate on a 32-bit host, you might
> eventually run some sessions out of address space; but that doesn't seem
> like a situation that's so common that we need fragile coding to avoid it.

I'm not sure what the situation is here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PROPOSAL] Shared Ispell dictionaries

2018-05-17 Thread Tom Lane
Robert Haas  writes:
> On Thu, May 17, 2018 at 10:18 AM, Tom Lane  wrote:
>> I think the point you've not addressed is that "syscache callback
>> occurred" does not equate to "object was dropped".  Can the code
>> survive having this occur at any invalidation point?
>> (CLOBBER_CACHE_ALWAYS testing would soon expose any fallacy there.)

> Well, I'm not advocating for a lack of testing, and
> CLOBBER_CACHE_ALWAYS testing is a good idea.  However, I suspect that
> calling dsm_detach() from a syscache callback should be fine.
> Obviously there will be trouble if the surrounding code is still using
> that mapping, but that would be a bug at some higher level, like using
> an object without locking it.

No, you're clearly not getting the point.  You could have an absolutely
airtight exclusive lock of any description whatsoever, and that would
provide no guarantee at all that you don't get a cache flush callback.
It's only a cache, not a catalog, and it can get flushed for any reason
or no reason.  (That's why we have pin counts on catcache and relcache
entries, rather than assuming that locking the corresponding object is
enough.)  So I think it's highly likely that unmapping in a syscache
callback is going to lead quickly to SIGSEGV.  The only way it would not
is if we keep the shared dictionary mapped only in short straight-line
code segments that never do any other catalog accesses ... which seems
awkward, inefficient, and error-prone.

Do we actually need to worry about unmapping promptly on DROP TEXT
DICTIONARY?  It seems like the only downside of not doing that is that
we'd leak some address space until process exit.  If you were thrashing
dictionaries at some unreasonable rate on a 32-bit host, you might
eventually run some sessions out of address space; but that doesn't seem
like a situation that's so common that we need fragile coding to avoid it.

regards, tom lane



Re: [BUGFIX] amcanbackward is not checked before building backward index paths

2018-05-17 Thread Tom Lane
Alvaro Herrera  writes:
> To make matters worse, IIUC it's actually fine to read the cursor in one
> direction to completion, then in the other direction to completion,
> without this flag, right?

In principle that'd be possible without amcanbackward if you were to
shut down the plan at the end of the forward scan and start a fresh
execution for the backwards scan; but there is no code or API for doing
that with a cursor.  So I think this is just a red herring.

The case that's actually of interest is where you write

SELECT ... FROM tab ORDER BY indexed_col DESC

and then just read that normally without a cursor, or at least without
a scrollable one.  This does not require amcanbackward, since the
executor will not change scan directions: it's all ForwardScanDirection
at the top level (which gets inverted to BackwardScanDirection somewhere
in nodeIndexscan, which knows the index is being used backwards).

Currently we assume that any index capable of amcanorder can do this
scenario too.  I'm willing to entertain a proposal to have a separate
capability flag for it, although I think it's fair to question whether
designing an order-supporting index AM that can't do this is actually a
good idea.  The current user docs say explicitly that you don't need
to make both ASC and DESC order indexes on the same thing.  I do not
really want to put in weasel wording saying "unless you're using some
crappy third party index type".

regards, tom lane



Re: [BUGFIX] amcanbackward is not checked before building backward index paths

2018-05-17 Thread Andrew Gierth
> "Alvaro" == Alvaro Herrera  writes:

 >> "Can the scan direction be changed in mid-scan (to support FETCH
 >> FORWARD and FETCH BACKWARD on a cursor)?"

How about,

"Can the scan direction be changed in mid-scan (to support FETCH
BACKWARD on a cursor without needing materialization)?"

 Alvaro> To me that sounds like the flag is a prerequisite of using the
 Alvaro> cursor in either direction. But maybe "to support both FETCH
 Alvaro> FORWARD and FETCH BACKWARD on the same cursor" is sufficient.
 Alvaro> Or maybe "to support changing scan direction on a cursor".

 Alvaro> To make matters worse, IIUC it's actually fine to read the
 Alvaro> cursor in one direction to completion, then in the other
 Alvaro> direction to completion, without this flag, right?

If you explicitly declare your cursor as SCROLL, which you should do if
you want to fetch backward in it, then it's always OK to switch
directions - the planner will have inserted a Materialize node if one is
needed. If you didn't declare it with SCROLL, and it's not implicitly
SCROLL as per the rules below, you can't fetch backward in it at all
regardless of whether you reached the end. (see error message in
PortalRunSelect for where this happens)

(I found a bit of a wart in that code: MOVE/FETCH ABSOLUTE will
arbitrarily fail or not fail on a no-scroll cursor according to the
absolute position requested - if it's closer to 1 than the current
position it'll rewind to the start, which always works, then scan
forwards; but if it's closer to the current position, it'll try moving
backwards even in a no-scroll cursor.)

What amcanbackward actually affects, as I read the code, is this:

1. ExecSupportsBackwardScan applied to an IndexScan or IndexOnlyScan is
   true if and only if amcanbackward is true.

2. If you specify neither SCROLL nor NO SCROLL when creating a cursor,
   then the cursor is implicitly SCROLL if and only if the topmost plan
   node returns true from ExecSupportsBackwardScan.

3. If you specify SCROLL when creating a cursor, then the planner
   inserts a Materialize node on the top of the plan if
   ExecSupportsBackwardScan is not true of the previous top plan node.

-- 
Andrew (irc:RhodiumToad)



Re: [PROPOSAL] Shared Ispell dictionaries

2018-05-17 Thread Robert Haas
On Thu, May 17, 2018 at 10:18 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> ... Assuming that we can
>> convince ourselves that that much is OK, I don't see why using a
>> syscache callback to help ensure that the mappings are blown away in
>> an at-least-somewhat-timely fashion is worse than any other approach.
>
> I think the point you've not addressed is that "syscache callback
> occurred" does not equate to "object was dropped".  Can the code
> survive having this occur at any invalidation point?
> (CLOBBER_CACHE_ALWAYS testing would soon expose any fallacy there.)

Well, I'm not advocating for a lack of testing, and
CLOBBER_CACHE_ALWAYS testing is a good idea.  However, I suspect that
calling dsm_detach() from a syscache callback should be fine.
Obviously there will be trouble if the surrounding code is still using
that mapping, but that would be a bug at some higher level, like using
an object without locking it.  And there will be trouble if you
register an on_dsm_detach callback that does something strange, but
the ones that the core code installs (when you use shm_mq, for
example) should be safe.  And there will be trouble if you're not
careful about memory contexts, because someplace you probably need to
remember that you detached from that DSM so you don't try to do it
again, and you'd better be sure you have the right context selected
when updating your data structures.  But it all seems pretty solvable.
I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [BUGFIX] amcanbackward is not checked before building backward index paths

2018-05-17 Thread Alvaro Herrera
On 2018-May-17, Tom Lane wrote:

> "David G. Johnston"  writes:
> > On Thu, May 17, 2018 at 8:46 AM, Tom Lane  wrote:
> >> Maybe "Can the scan direction be reversed in mid-scan?".  I'm not
> >> absolutely sure that that's better ...
> 
> > ​A cursory read might conclude that "reversing" can only happen once while
> > they will likely understand that "changing" can happen multiple times.
> > This is minor point - the two are effectively the same.
> > Maybe: "Supports both FETCH FORWARD and FETCH BACKWARD during the same scan"
> 
> Oh, yeah, mentioning what it's *for* would help clarify things, no?
> So perhaps
> 
> "Can the scan direction be changed in mid-scan (to support FETCH FORWARD
> and FETCH BACKWARD on a cursor)?"

To me that sounds like the flag is a prerequisite of using the cursor in
either direction.  But maybe "to support both FETCH FORWARD and FETCH
BACKWARD on the same cursor" is sufficient.  Or maybe "to support
changing scan direction on a cursor".


To make matters worse, IIUC it's actually fine to read the cursor in one
direction to completion, then in the other direction to completion,
without this flag, right?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Postgres, fsync, and OSs (specifically linux)

2018-05-17 Thread Andres Freund
Hi,

On 2018-05-10 09:50:03 +0800, Craig Ringer wrote:
>   while ((src = (RewriteMappingFile *) hash_seq_search(_status)) != 
> NULL)
>   {
>   if (FileSync(src->vfd, WAIT_EVENT_LOGICAL_REWRITE_SYNC) != 0)
> - ereport(ERROR,
> + ereport(PANIC,
>   (errcode_for_file_access(),
>errmsg("could not fsync file \"%s\": 
> %m", src->path)));

To me this (and the other callers) doesn't quite look right. First, I
think we should probably be a bit more restrictive about when PANIC
out. It seems like we should PANIC on ENOSPC and EIO, but possibly not
others.  Secondly, I think we should centralize the error handling. It
seems likely that we'll acrue some platform specific workarounds, and I
don't want to copy that knowledge everywhere.

Also, don't we need the same on close()?

- Andres



Re: [PROPOSAL] Shared Ispell dictionaries

2018-05-17 Thread Arthur Zakirov
On Thu, May 17, 2018 at 09:57:59AM -0400, Robert Haas wrote:
> I think you and Tom have misunderstood each other somehow.  If you
> look at CommitTransaction(), you will see a comment that says:

Oh, I understood. You are right.

> Also, there is no absolute prohibition on kernel calls in post-commit
> cleanup, or in no-fail code in general.

Thank you for the explanation!

The current approach depends on syscache callbacks anyway. Backend 2
(from the example above) knows is it necessary to unpin segments after
syscache callback was called. Tom pointed below that callbacks are
occured in various events. So I think I should check the current approach
too using CLOBBER_CACHE_ALWAYS. It could show some problems in the
current patch.

Then if everything is OK I think I'll check another approach (unmapping
in TS syscache callback) using CLOBBER_CACHE_ALWAYS.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: Infinite loop on master shutdown

2018-05-17 Thread Andres Freund
Hi,

On 2018-05-17 17:19:00 +0900, Kyotaro HORIGUCHI wrote:
> Hello, as in pgsql-bug ML.
> 
> https://www.postgresql.org/message-id/20180517.170021.24356216.horiguchi.kyot...@lab.ntt.co.jp
> 
> Master can go into infinite loop on shutdown. But it is caused by
> a broken database like storage rolled-back one. (The steps to
> replay this is shown in the above mail.)
> 
> I think this can be avoided by rejecting a standby if it reports
> that write LSN is smaller than flush LSN after catching up.
> 
> Is it worth fixing?

I'm very doubtful. If you do bad stuff to a standby, bad things can
happen...

Greetings,

Andres Freund



Re: [BUGFIX] amcanbackward is not checked before building backward index paths

2018-05-17 Thread Tom Lane
"David G. Johnston"  writes:
> On Thu, May 17, 2018 at 8:46 AM, Tom Lane  wrote:
>> Maybe "Can the scan direction be reversed in mid-scan?".  I'm not
>> absolutely sure that that's better ...

> ​A cursory read might conclude that "reversing" can only happen once while
> they will likely understand that "changing" can happen multiple times.
> This is minor point - the two are effectively the same.
> Maybe: "Supports both FETCH FORWARD and FETCH BACKWARD during the same scan"

Oh, yeah, mentioning what it's *for* would help clarify things, no?
So perhaps

"Can the scan direction be changed in mid-scan (to support FETCH FORWARD
and FETCH BACKWARD on a cursor)?"

> Or:
> "Supports SCROLL(able) WITHOUT HOLD cursors"

That seems a bit more vague/indirect.

regards, tom lane



Re: [BUGFIX] amcanbackward is not checked before building backward index paths

2018-05-17 Thread David G. Johnston
On Thu, May 17, 2018 at 8:46 AM, Tom Lane  wrote:

> Andrew Gierth  writes:
> > I'll fix the docs accordingly. I'm referring specifically to this bit:
> > https://www.postgresql.org/docs/current/static/functions-
> info.html#FUNCTIONS-INFO-INDEX-PROPS
> > which I think should say "Can the scan direction be changed in
> > mid-scan?" in place of the current text (unless anyone has better
> > wording?)
>
> Maybe "Can the scan direction be reversed in mid-scan?".  I'm not
> absolutely sure that that's better ...
>

​A cursory read might conclude that "reversing" can only happen once while
they will likely understand that "changing" can happen multiple times.
This is minor point - the two are effectively the same.

Maybe: "Supports both FETCH FORWARD and FETCH BACKWARD during the same scan"

Or:

"Supports SCROLL(able) WITHOUT HOLD cursors"

David J.


Re: [BUGFIX] amcanbackward is not checked before building backward index paths

2018-05-17 Thread Tom Lane
Andrew Gierth  writes:
> Ugh, so the docs for amutils get this wrong, and if I'd looked at this
> more carefully when doing them to begin with I'd have given the
> 'backwards_scan' property a better name or omitted it entirely.

Ooops.

> I'll fix the docs accordingly. I'm referring specifically to this bit:
> https://www.postgresql.org/docs/current/static/functions-info.html#FUNCTIONS-INFO-INDEX-PROPS
> which I think should say "Can the scan direction be changed in
> mid-scan?" in place of the current text (unless anyone has better
> wording?)

Maybe "Can the scan direction be reversed in mid-scan?".  I'm not
absolutely sure that that's better ...

regards, tom lane



Re: [BUGFIX] amcanbackward is not checked before building backward index paths

2018-05-17 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> What amcanbackward is about is whether the index can support
 Tom> reversing direction mid-scan, as would be required to support
 Tom> FETCH FORWARD followed by FETCH BACKWARD in a cursor. That's
 Tom> actually independent of whether the index can implement a defined
 Tom> ordering; see for example the hash AM, which sets amcanbackward
 Tom> but not amcanorder.

Ugh, so the docs for amutils get this wrong, and if I'd looked at this
more carefully when doing them to begin with I'd have given the
'backwards_scan' property a better name or omitted it entirely.

I'll fix the docs accordingly. I'm referring specifically to this bit:

https://www.postgresql.org/docs/current/static/functions-info.html#FUNCTIONS-INFO-INDEX-PROPS

which I think should say "Can the scan direction be changed in
mid-scan?" in place of the current text (unless anyone has better
wording?)

-- 
Andrew (irc:RhodiumToad)



Re: NaNs in numeric_power (was Re: Postgres 11 release notes)

2018-05-17 Thread Tom Lane
Huong Dangminh  writes:
> Thank you. The patch looks fine to me.
> Also, I have done the "make check" in Windows and Linux environment with no 
> problem.

Pushed, thanks for reviewing/testing.

regards, tom lane



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-17 Thread Amit Langote
On Thu, May 17, 2018 at 10:29 PM, Robert Haas  wrote:
> Unless the indexing system actually can't reference the first element
> of *pds, and -1 means the second element.  But then I think we need a
> more verbose explanation here.

First element in *pds list (and the array subsequently created from
it) contains the root table's entry.  So, a -1 does mean the 2nd entry
in that list/array.  A 0 in the indexes array always refers to a leaf
partition and hence an index into the array for leaf partitions.

Thanks,
Amit



Re: Should we add GUCs to allow partition pruning to be disabled?

2018-05-17 Thread Robert Haas
On Thu, May 17, 2018 at 12:04 AM, David Rowley
 wrote:
>> Append
>>   Execution-Time Pruning: order_lines (at executor startup)
>>   -> Index Scan ...
>
> Perhaps Append should be shown as "Unordered Partitioned Table Scan on
> ". That seems more aligned to how else we show which relation a
> node belongs to. The partition being scanned is simple to obtain. It's
> just the first item in the partitioned_rels List. (MergeAppend would
> be an "Ordered Partitioned Table Scan")

Hmm, that's a radical proposal but I'm not sure I like it.  For one
thing, table scan might mean sequential scan to some users.  For
another, it's not really unordered.  Unless it's parallel-aware, we're
going to scan them strictly in the order they're given.

> I'm not really a fan of overloading properties with a bunch of text.
> Multiple int or text properties would be easier to deal with,
> especially so when you consider the other explain formats. Remember,
> all 3 pruning methods could be used for a single Append node.

I was imagining it as two properties in non-text format that got
displayed in a special way in text mode.  I intended that this would
only give information about execution-time pruning, so there would
only two methods to consider here, but, yeah, you might have something
like:

Execution-Time Pruning: order_lines (at executor startup, at runtime)

> I guess doing work here would require additional code in the planner
> to track how many relations were removed by both partition pruning and
> constraint exclusion. Dunno if that would be tracked together or
> separately. However, I'd prefer to have a clear idea of what exactly
> the design should be before I go write some code that perhaps nobody
> will like.

I don't feel strongly about adding more code to track the number of
removed partitions.  I think that the important thing is whether or
not partitioning is happening and at what stage, and I think it's
useful to show the relation name if we can.  As you pointed out, it's
largely possible already to figure out how well we did at removing
stuff and at which stages, but to me it seems quite easy to be
confused about which stages tried to remove things.  For example,
consider:

Gather
-> Nested Loop
  -> Seq Scan
Filter: something
  -> Append
-> Index Scan
-> Index Scan
-> Index Scan

I think it's going to be quite tricky to figure out whether that
Append node is trying to do execution-time pruning without some
annotation.  The nloops values are going to be affected by how many
rows are in which partitions and how many workers got which rows as
well as by whether execution-time pruning worked and how effectively.
You might be able to figure out it out by staring at the EXPLAIN
output for a while... but it sure seems like it would be a lot nicer
to have an explicit indicator... especially if you're some random user
rather than a PostgreSQL expect.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-05-17 Thread Ashutosh Bapat
On Thu, May 17, 2018 at 4:50 PM, Etsuro Fujita
 wrote:
> (2018/05/16 22:49), Ashutosh Bapat wrote:
>>
>> On Wed, May 16, 2018 at 4:01 PM, Etsuro Fujita
>>   wrote:
>>>
>>> However, considering that
>>> pull_var_clause is used in many places where partitioning is *not*
>>> involved,
>>> I still think it's better to avoid spending extra cycles needed only for
>>> partitioning in that function.
>>
>>
>> Right. The changes done in inheritance_planner() imply that we can
>> encounter a ConvertRowtypeExpr even in the expressions for a relation
>> which is not a child relation in the translated query. It was a child
>> in the original query but after translating it becomes a parent and
>> has ConvertRowtypeExpr somewhere there.
>
>
> Sorry, I don't understand this.  Could you show me such an example?

Even without inheritance we can induce a ConvertRowtypeExpr on a base
relation which is not "other" relation

create table parent (a int, b int);
create table child () inherits(parent);
alter table child add constraint whole_par_const check ((child::parent).a = 1);

There is no way to tell by looking at the parsed query whether
pull_var_clause() from StoreRelCheck() will encounter a
ConvertRowtypeExpr or not. We could check whether the tables involved
are inherited or not, but that's more expensive than
is_converted_whole_row_reference().

>
>> So, there is no way to know
>> whether a given expression will have ConvertRowtypeExpr in it. That's
>> my understanding. But if we can device such a test, I am happy to
>> apply that test before or witin the call to pull_var_clause().
>>
>> We don't need that expensive test if user has passed
>> PVC_RECURSE_CONVERTROWTYPE. So we could test that first and then check
>> the shape of expression tree. It would cause more asymmetry in
>> pull_var_clause(), but we can live with it or change the order of
>> tests for all the three options. I will differ that to a committer.
>
>
> Sounds more invasive.  Considering where we are in the development cycle for
> PG11, I think it'd be better to address this by something like the approach
> I proposed.  Anyway, +1 for asking the committer's opinion.

Thanks.

>
> - On patch 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch:
>
> +extern bool
> +is_converted_whole_row_reference(Node *node)
>
> I think we should remove "extern" from the definition.

Done.

>
> - On patch 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch:
>
>
> I think we can fix this by adding another flag to indicate whether we
> deparsed the first live column of the relation, as in the "first" bool flag
> in deparseTargetList.

Thanks. Fixed.

>
> One more thing to add is: the patch adds support for deparsing a
> ConvertRowtypeExpr that translates a wholerow of a childrel into a wholerow
> of its parentrel's rowtype, so by modifying is_foreign_expr accordingly, we
> could push down such CREs in JOIN ON conditions to the remote for example.
> But that would be an improvement, not a fix, so I think we should leave that
> for another patch for PG12.

Right.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


expr_ref_error_pwj_v9.tar.gz
Description: GNU Zip compressed data


Re: [PROPOSAL] Shared Ispell dictionaries

2018-05-17 Thread Tom Lane
Robert Haas  writes:
> ... Assuming that we can
> convince ourselves that that much is OK, I don't see why using a
> syscache callback to help ensure that the mappings are blown away in
> an at-least-somewhat-timely fashion is worse than any other approach.

I think the point you've not addressed is that "syscache callback
occurred" does not equate to "object was dropped".  Can the code
survive having this occur at any invalidation point?
(CLOBBER_CACHE_ALWAYS testing would soon expose any fallacy there.)

regards, tom lane



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-17 Thread Tom Lane
Robert Haas  writes:
> Hang on, I can't be wrong (famous last words).  If the negative
> indexes were 0-based, that would mean that the first element of the
> list was referenced by -0, which obviously can't be true, because 0 =
> -0.  In other words, we can't be using 0-based indexing for both the
> positive and the negative values, because then 0 itself would be
> ambiguous.  It's got to be that -1 is the first element of the *pds
> list, which means -- AFAICS, anyway -- that the way I phrased it is
> correct.

> Unless the indexing system actually can't reference the first element
> of *pds, and -1 means the second element.  But then I think we need a
> more verbose explanation here.

Maybe what you need is a redesign.  This convention seems impossibly
confusing and hence error-prone.  What about using a separate bool to
indicate which list the index refers to?

regards, tom lane



Re: lazy detoasting

2018-05-17 Thread Tom Lane
Peter Eisentraut  writes:
> In reviewing the committed patch, I noticed that in ER_get_flat_size()
> you have removed the PG_DETOAST_DATUM() call and let
> expanded_record_set_field_internal() do the de-toasting work.  I had
> considered that too, but my impression is that the purpose of the
> PG_DETOAST_DATUM() is to de-compress for the purpose of size
> computation, whereas expanded_record_set_field_internal() only does the
> inlining of externally stored values and doesn't do any explicit
> decompression.  Is this correct?

Hmm ... yeah, there is a semantics change there, but I think it's fine.
I don't recall that I'd thought specifically about the behavior for
inline-compressed/short-header Datums when I wrote that code to begin
with.  But for its purpose, which is that we're trying to flatten the
expanded record into a tuple, leaving an inline-compressed field in that
form seems OK, perhaps actually preferable.  And short-header is a no-op:
it'd end up with a short header in the emitted tuple regardless of our
choice here.

regards, tom lane



SCRAM with channel binding downgrade attack

2018-05-17 Thread Bruce Momjian
On Thu, May 17, 2018 at 03:38:36PM +0200, Magnus Hagander wrote:
> What's the take of others?  Magnus, Stephen or Heikki perhaps (you've
> been the most involved with SCRAM early talks)?
> 
> Saw it by luck. It would probably be better if it wasn't hidden deep in a
> thread about release notes.

Agreed.  The problem was so glaring that I assumed I was not
understanding it.  I have modified this email subject so users will
hopefully read back in this thread to see the details.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: [PROPOSAL] Shared Ispell dictionaries

2018-05-17 Thread Robert Haas
On Wed, May 16, 2018 at 4:42 PM, Arthur Zakirov
 wrote:
> I haven't deep knowledge about guts of invalidation callbacks. It seems
> that there is problem with it. Tom pointed above:
>
>> > I'm not sure that I understood the second case correclty. Can cache
>> > invalidation help in this case? I don't have confident knowledge of cache
>> > invalidation. It seems to me that InvalidateTSCacheCallBack() should
>> > release segment after commit.
>>
>> "Release after commit" sounds like a pretty dangerous design to me,
>> because a release necessarily implies some kernel calls, which could
>> fail. We can't afford to inject steps that might fail into post-commit
>> cleanup (because it's too late to recover by failing the transaction).
>> It'd be better to do cleanup while searching for a dictionary to use.
>
> But it is possible that I misunderstood his note.

I think you and Tom have misunderstood each other somehow.  If you
look at CommitTransaction(), you will see a comment that says:

 * This is all post-commit cleanup.  Note that if an error is
raised here,
 * it's too late to abort the transaction.  This should be just
 * noncritical resource releasing.

Between that point and the end of that function, we shouldn't do
anything that throws an error, because the transaction is already
committed and it's too late to change our mind.  But if session A
drops an object, session B is not going to get a callback to
InvalidateTSCacheCallBack at that point.  It's going to happen
sometime in the middle of the transaction, like when it next tries to
lock a relation or something.  So Tom's complaint is irrelevant in
that scenario.

Also, there is no absolute prohibition on kernel calls in post-commit
cleanup, or in no-fail code in general.  For example, the
RESOURCE_RELEASE_AFTER_LOCKS phase of resowner cleanup calls
FileClose().  That's actually completely alarming when you really
think about it, because one of the documented return values for
close() is EIO, which certainly represents a very dangerous kind of
failure -- see nearby threads about fsync-safety.  Transaction abort
acquires and releases numerous LWLocks, which can result in kernel
calls that could fail.  We're OK with that because, in practice, it
never happens.  Unmapping a DSM segment is probably about as safe as
acquiring and releasing an LWLock, maybe safer.  On my MacBook, the
only documented return value for munmap is EINVAL, and any such error
would indicate a PostgreSQL bug (or a kernel bug, or a cosmic ray
hit).  I checked a Linux system; things there are less clear, because
mmap and mumap share a single man page, and mmap can fail for all
kinds of reasons.  But very few of the listed error codes look like
things that could legitimately happen during munmap.  Also, if munmap
did fail (or shmdt/shmctl if using System V shared memory), it would
be reported as a WARNING, not an ERROR, so we'd still be sorta OK.

I think the only real question here is whether it's safe, at a high
level, to drop the object at time T0 and have various backends drop
the mapping at unpredictable later times T1, T2, ... all greater than
T0.  Generally, one wants to remove all references to an object before
the object itself, which in this case we can't.  Assuming that we can
convince ourselves that that much is OK, I don't see why using a
syscache callback to help ensure that the mappings are blown away in
an at-least-somewhat-timely fashion is worse than any other approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Postgres 11 release notes

2018-05-17 Thread Bruce Momjian
On Thu, May 17, 2018 at 09:48:54PM +0900, Michael Paquier wrote:
> On Wed, May 16, 2018 at 09:09:22PM -0400, Bruce Momjian wrote:
> > On Thu, May 17, 2018 at 09:56:49AM +0900, Michael Paquier wrote:
> >> On Wed, May 16, 2018 at 08:20:49PM -0400, Bruce Momjian wrote:
> >>> SCRAM-with-binding is the first password method that attempts to avoid
> >>> man-in-the-middle attacks, and therefore is much less likely to be able
> >>> to trust what the endpoints supports.  I think it is really the
> >>> channel_binding_mode that we want to control at the client.  The lesser
> >>> modes are much more reasonable to use an automatic best-supported
> >>> negotiation, which is what we do now.
> >> 
> >> Noted.  Which means that the parameter is ignored when using a non-SSL
> >> connection, as well as when the server tries to enforce the use of
> >> anything else than SCRAM.
> > 
> > Uh, a man-in-the-middle could prevent SSL or ask for a different
> > password authentication method and then channel binding would not be
> > used.  I think when you say you want channel binding, you have to fail
> > if you don't get it.
> 
> I am not exactly sure what is the result we are looking for here, so I
> am adding for now an open item which refers to this part of the thread.
> Please note that I am fine to spend cycles if needed to address any
> issues and/or concerns.  Let's the discussion continue for now.

Agreed, and I just posted a more detailed email about when
authentication downgrades are possible.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Postgres 11 release notes

2018-05-17 Thread Bruce Momjian
On Wed, May 16, 2018 at 09:09:22PM -0400, Bruce Momjian wrote:
> > > FYI, I think the server could also require channel binding for SCRAM. We
> > > already have scram-sha-256 in pg_hba.conf, and I think
> > > scram-sha-256-plus would be reasonable.
> > 
> > Noted as well.  There is of course the question of v10 libpq versions
> > which don't support channel binding, but if an admin is willing to set
> > up scram-sha-256-plus in pg_hba.conf then he can request his users to
> > update his drivers/libs as well.
> 
> Yes, I don't see a way around it.  Once you accept that someone in the
> middle can change what you request undetected, then you can't do 
> fallback.  Imagine a man-in-the-middle with TLS where the
> man-in-the-middle allows the two end-points to negotiate the shared
> secret, but the man-in-the-middle forces a weak cipher.  This is what is
> happening with Postgres when the man-in-the-middle forces a weaker
> authentication method.

Technically, you can do automatic fallback if the fallback has
man-in-the-middle and downgrade protection.  Technically, because TLS is
already active when we start authentication negotiation, we don't need
downgrade protection as long as we have man-in-the-middle protection.

Unfortunately, only SCRAM with channel binding and sslmode=verify-full
have man-in-the-middle protection.  Therefore, downgrading from SCRAM
with channel binding to SCRAM without channel binding, MD5, or
'password' is only safe if sslmode=verify-full is enabled.  Technically
MD5, or 'password' has weak or non-existent replay protection, but if we
are requiring TLS, then that doesn't really matter, assuming we have
man-in-the-middle protection.

I don't know if falling back from SCRAM with channel binding to a lesser
authentication methods only if sslmode=verify-full is enabled is really
helpful to anyone since it requires certificate installation.

TLS has similar downgrade issues:


http://www.educatedguesswork.org/2012/07/problems_with_secure_upgrade_t.html

but many of its downgrade options have downgrade protection, and you
don't lose man-in-the-middle protection by downgrading. 
Man-in-the-middle protection via certificate checking happens
independent of the TLS version being used, which is not the case for
Postgres authentication downgrade options.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



'tuple concurrently updated' error w/o visible catalog updates

2018-05-17 Thread Alex Kliukin
Hello,

Earlier this week we have split our Postgres 9.6.8 shards, each having two
databases, into one database per shard setup. This was done by promoting
replicas and subsequently removing unused databases.

Immediately afterwards we have discovered repeated 'tuple concurrently updated'
errors on most of those new shards.

The error is always shown at the same statement:
ERROR,XX000,"tuple concurrently updated","SQL statement ""UPDATE
config_content SET cc_content = l_config_content WHERE cc_config_content_id =
l_ccm_content_id""

By searching the archives (i.e.
https://www.postgresql.org/messageid/flat/CAB7nPqSZCkVfibTvx9TYmHYhVtV_vOMNwOpLHnRU85qeiimUaQ%40mail.gmail.com#cab7npqszckvfibtvx9tymhyhvtv_vomnwoplhnru85qeiim...@mail.gmail.com)
I’ve got an impression that this error manifests itself when system catalog
tuples are updated concurrently, however I see none of that in the query that
leads to an ERROR. There are no triggers on 'config_content' table, neither
there are any views referring to it.

The errors stopped when we disabled a call to the 'upsert_foo_content' function
(here and below I obfuscated real names). This is a fairly simple pl/pgsql 
function
that does a few selects and an upsert. The block inside that function that 
contains
the statement at fault is:

--
SELECT ccm_content_id, ccm_simple_update_received_at INTO l_ccm_content_id, 
l_ccm_simple_update_received_at FROM config_content_metadata
WHERE  ccm_config_id = l_c_id AND ccm_sales_channel_id = l_sales_channel_id;

IF (l_ccm_content_id IS NULL) THEN
 -- insert config content --
 INSERT INTO config_content_metadata(ccm_config_id, ccm_sales_channel_id, 
ccm_update_caused_by, ccm_simple_update_eid, ccm_simple_update_received_at)
 VALUES(l_c_id, l_sales_channel_id, l_rp_id, l_content_update_eid, 
l_content_update_received_at) RETURNING ccm_content_id INTO l_ccm_content_id;

 INSERT INTO config_content(cc_config_content_id, cc_content) VALUES 
(l_ccm_content_id, l_config_content);

ELSIF (l_ccm_simple_update_received_at < l_content_update_received_at) THEN

 UPDATE  config_content_metadata
 SET ccm_update_caused_by = l_rp_id, ccm_simple_update_eid = 
l_content_update_eid, ccm_simple_update_received_at = 
l_content_update_received_at, ccm_updated_at = now()
 WHERE ccm_content_id = l_ccm_content_id;

 -- XXX problematic statement XXX
 UPDATE config_content SET cc_content = l_config_content WHERE 
cc_config_content_id = l_ccm_content_id;

END IF;
--

Note that config_content references config_metdata with a foreign key, however,
the referenced column is not updated.

That 'upsert_foo_content' is called by another one,  upsert_foo_content_batch,
in a loop over the elements of a JSON array, something like:

--
CREATE OR REPLACE FUNCTION upsert_foo_content_batch(p_batch jsonb)
RETURN void LANGUAGE plpgpsql
AS $function$
DECLARE
...
BEGIN
FOR item IN SELECT * FROM jsonb_array_elements(p_batch)
LOOP
  -- some unpacking of fields from json into the local variables
  PERFORM upsert_foo_content(..) -- called with the unpacked variables
END LOOP;
END;
$function$
--

'upsert_foo_content_batch' is called, in order, at the end of a long pl/pgsql
function 'upsert_foo_event_batch', which consists of a very long CTE that
extracts individual fields from a JSON argument, and then performs a number of
inserts into some tables, doing on conflict do nothing, afterwards performing
more inserts into the tables that reference the previous ones, doing on
conflict do update. However, it modifies neither 'config_content' or
'config_content_metadata' tables.

So the chain of calls is
'upsert_foo_event_batch' -> 
'upsert_foo_content_batch' -> 
   'upsert_foo_content'.
(the last one contains the statement that leads to the "tuple concurrently 
updated" error).


It is possible that 'upsert_foo_content' function is called with the same data
multiple times in different processes, however, I’d expect it to either
complete successfully, or throw an error because the PK already exists (this is
running in a read committed mode, so ISTM not immune to the case where the row
in the metadata table is inserted after another session does the check, but
before the insert), but not an error mentioned at the beginning of this
message.

Are there any signs in this description that the queries might be doing
something unexpected to PostgreSQL, or that something went wrong during the
split?  I am running out of options of what could cause the issue, so any
pointers or help in debugging it is appreciated (note that this is a production
database, I cannot just stop it at will).

Cheers,
Oleksii



Re: Postgres 11 release notes

2018-05-17 Thread Magnus Hagander
On Thu, May 17, 2018 at 2:56 AM, Michael Paquier 
wrote:

> On Wed, May 16, 2018 at 08:20:49PM -0400, Bruce Momjian wrote:
> > SCRAM-with-binding is the first password method that attempts to avoid
> > man-in-the-middle attacks, and therefore is much less likely to be able
> > to trust what the endpoints supports.  I think it is really the
> > channel_binding_mode that we want to control at the client.  The lesser
> > modes are much more reasonable to use an automatic best-supported
> > negotiation, which is what we do now.
>
> Noted.  Which means that the parameter is ignored when using a non-SSL
> connection, as well as when the server tries to enforce the use of
> anything else than SCRAM.
>

(apologies if this was covered earlier, as I'm entering late into the
discussion)

"ignored" in combination with a security parameter is generally a very very
red flag.

If the client requests channel binding and ends up using a non encrypted
connection, surely the correct thing to do is fail the connection, rather
than downgrade the authentication?

We should really make sure we don't re-implement something as silly as our
current "sslmode=prefer", because it makes no sense. From the client side
perspective, there really only needs to be two choices -- "enforce channel
binding at level " or "meh, I don't care". In the "meh, I don't care"
mode, go with whatever the server picks (through enforcement in pg_hba.conf
for example).


> FYI, I think the server could also require channel binding for SCRAM. We
> > already have scram-sha-256 in pg_hba.conf, and I think
> > scram-sha-256-plus would be reasonable.
>
> Noted as well.  There is of course the question of v10 libpq versions
> which don't support channel binding, but if an admin is willing to set
> up scram-sha-256-plus in pg_hba.conf then he can request his users to
> update his drivers/libs as well.
>

Yes. And they *should* fail if they don't upgrade. That's what requirement
means... :)


What's the take of others?  Magnus, Stephen or Heikki perhaps (you've
> been the most involved with SCRAM early talks)?
>

Saw it by luck. It would probably be better if it wasn't hidden deep in a
thread about release notes.



-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Fwd: Question about xmloption and pg_restore

2018-05-17 Thread Stefan Fercot
Hi all,


I've got some question about XML OPTION and pg_restore.

In a xml type column are stored documents.

When restoring a plain SQL dump, we got the message :

/ERROR:  invalid XML content//
//DETAIL:  line 1: StartTag: invalid element name//
//http://mrcc.com/qgis.dtd' 'SYSTEM'>//
// ^//
//CONTEXT:  COPY layer_styles, line 1, column styleqml: "http://mrcc.com/qgis.dtd' 'SYSTEM'>/


Adding "/SET XML OPTION DOCUMENT;/" in top of the dump file allows to
restore it.

Now, with the "custom" format, we have to use before pg_restore :
/export PGOPTIONS="-c xmloption=DOCUMENT"/.

Do you think of any other way to solve this issue ?

What if we got the 2 xml options used in the database?

Would it be possible to have something like : ALTER TABLE ... ALTER
COLUMN ... SET XML OPTION ...; ?


Kind regards,

-- 
Stefan FERCOT
http://dalibo.com - http://dalibo.org



signature.asc
Description: OpenPGP digital signature


Re: Postgres 11 release notes

2018-05-17 Thread Michael Paquier
Hi Bruce,

Here is some bonus feedback.

On Fri, May 11, 2018 at 11:08:52AM -0400, Bruce Momjian wrote:
> I expect a torrent of feedback.  ;-)

I have just noticed that this entry does not have the correct author
(guess who?):
   
Add libpq option to support channel binding when using SCRAM
authentication (Peter Eisentraut)
   

I think that there should be two different entries in the release notes
for channel binding:
1) The new connection parameter in libpq which allows to control the
channel binding name, as well as to decide if it should be disabled.
I would think that this is better placed within the section for client
interface changes.
2) Channel binding itself, which should be part of the authentication
section.

   
Have libpq's PQhost()
always return the actual connected host (Hari Babu)
   
Should this be added as well in the section "Client interfaces"?
--
Michael


signature.asc
Description: PGP signature


Re: log_min_messages shows debug instead of debug2

2018-05-17 Thread Robert Haas
On Thu, May 17, 2018 at 4:58 AM, Kyotaro HORIGUCHI
 wrote:
> It's not about me, but without knowing the history, "debug" can
> look as if it is less chatty than "debug2" or even "debug1". All
> log level of "debugN" are seen as DEBUG in log lines. Thus
> showing "debug2" as just "debug" may obfuscate the cause of
> having so many log lines.
>
> From another view point, it can be thought as not-good that print
> a value with a hidden word.
>
> In short, I submit +1 for this.

OK, I'm happy enough to commit it then, barring other objections.  I
was just going to just do that but then I realized we're in feature
freeze right now, so I suppose this should go into the next
CommitFest.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-17 Thread Robert Haas
On Wed, May 16, 2018 at 3:20 PM, Robert Haas  wrote:
> On Wed, May 16, 2018 at 2:28 PM, David Rowley
>  wrote:
>> Thanks for committing. Although, I disagree with your tweak:
>>
>> +* 1-based index into the *pds list.
>>
>> I think that's making the same mistake as the last comment did. You
>> think it's 1-based because the index is being set with list_length
>> rather than list_length - 1, but it can do that simply because the
>> item has not been added to the list yet.
>
> Uh, maybe I've got that wrong.  We can say 0-based instead if that's
> right.  I just didn't want to say that in one case it was 0-based and
> in the other case make no mention.

Hang on, I can't be wrong (famous last words).  If the negative
indexes were 0-based, that would mean that the first element of the
list was referenced by -0, which obviously can't be true, because 0 =
-0.  In other words, we can't be using 0-based indexing for both the
positive and the negative values, because then 0 itself would be
ambiguous.  It's got to be that -1 is the first element of the *pds
list, which means -- AFAICS, anyway -- that the way I phrased it is
correct.

Unless the indexing system actually can't reference the first element
of *pds, and -1 means the second element.  But then I think we need a
more verbose explanation here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_ugprade test failure on data set with column with default value with type bit/varbit

2018-05-17 Thread Robert Haas
On Thu, May 17, 2018 at 4:20 AM, Paul Guo  wrote:
> Thanks. I tentatively submitted a patch (See the attachment).

You probably want to add this to the next Commitfest.

> By the way, current pg_upgrade test script depends on the left data on test
> database, but it seems that
> a lot of tables are dropped in those test SQL files so this affects the
> pg_upgrade test coverage much.
> Maybe this needs to be addressed in the future. (Maybe when anyone checks in
> test cases pg_upgrade
> needs to be considered also?)

There's been a deliberate attempt to leave a representative selection
of tables not dropped for this exact reason, but it's definitely
possible that interesting cases have been missed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Use access() to check file existence in GetNewRelFileNode().

2018-05-17 Thread Michael Paquier
On Thu, May 17, 2018 at 05:23:28PM +0800, Paul Guo wrote:
> F_OK seems to be better than R_OK because we want to check file existence
> (not read permission) before creating the relation file with the path
> later.

Please do not top-post, this breaks the discussion logic of the thread.

Perhaps Tom remembers why things have been done this way in 721e537.  At
the end, on second-thoughts, the current coding looks a bit
over-engineered as there is no check for any error codes other than
ENOENT so using only F_OK would be OK.  Note that access cannot complain
about EPERM at least on Linux, so you'd want to actually modify this
comment block.

You should also add this patch to the next commit fest, development of
v11 is done and it is a stabilization period now, so no new patches are
merged.  Here is where you can register the patch:
https://commitfest.postgresql.org/18/
--
Michael


signature.asc
Description: PGP signature


Re: lazy detoasting

2018-05-17 Thread Peter Eisentraut
On 5/12/18 17:25, Tom Lane wrote:
> Anyway, attached is a revised patch.  I found a test case for
> expanded_record_set_fields(), too.

Thank you for fixing this up.

In reviewing the committed patch, I noticed that in ER_get_flat_size()
you have removed the PG_DETOAST_DATUM() call and let
expanded_record_set_field_internal() do the de-toasting work.  I had
considered that too, but my impression is that the purpose of the
PG_DETOAST_DATUM() is to de-compress for the purpose of size
computation, whereas expanded_record_set_field_internal() only does the
inlining of externally stored values and doesn't do any explicit
decompression.  Is this correct?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: postgres_fdw: Oddity in pushing down inherited UPDATE/DELETE joins to remote servers

2018-05-17 Thread Etsuro Fujita

(2018/05/17 14:19), Amit Langote wrote:

Looking at this for a bit, I wondered if this crash wouldn't have occurred
if the "propagation" had also considered join relations in addition to
simple relations.  For example, if I changed inheritance_planner like the
attached (not proposing that we consider committing it), reported crash
doesn't occur.  The fact that it's not currently that way means that
somebody thought that there is no point in keeping all of those joinrels
around until plan creation time.


One reason for that would be that we use the per-child PlannerInfo, not 
the parent one, at plan creation time.  Here is the code in 
create_modifytable_plan:


/*
 * In an inherited UPDATE/DELETE, reference the per-child modified
 * subroot while creating Plans from Paths for the child rel. 
This is
 * a kluge, but otherwise it's too hard to ensure that Plan 
creation

 * functions (particularly in FDWs) don't depend on the contents of
 * "root" matching what they saw at Path creation time.  The main
 * downside is that creation functions for Plans that might appear
 * below a ModifyTable cannot expect to modify the contents of 
"root"

 * and have it "stick" for subsequent processing such as setrefs.c.
 * That's not great, but it seems better than the alternative.
 */
subplan = create_plan_recurse(subroot, subpath, CP_EXACT_TLIST);

So, we don't need to accumulate the joinrel lists for child relations 
into a single list and store that list into the parent PlannerInfo in 
inheritance_planner, as in the patch you proposed.  I think the change 
by the commit is based on the same idea as that.


Best regards,
Etsuro Fujita



Re: Postgres 11 release notes

2018-05-17 Thread Michael Paquier
On Wed, May 16, 2018 at 09:09:22PM -0400, Bruce Momjian wrote:
> On Thu, May 17, 2018 at 09:56:49AM +0900, Michael Paquier wrote:
>> On Wed, May 16, 2018 at 08:20:49PM -0400, Bruce Momjian wrote:
>>> SCRAM-with-binding is the first password method that attempts to avoid
>>> man-in-the-middle attacks, and therefore is much less likely to be able
>>> to trust what the endpoints supports.  I think it is really the
>>> channel_binding_mode that we want to control at the client.  The lesser
>>> modes are much more reasonable to use an automatic best-supported
>>> negotiation, which is what we do now.
>> 
>> Noted.  Which means that the parameter is ignored when using a non-SSL
>> connection, as well as when the server tries to enforce the use of
>> anything else than SCRAM.
> 
> Uh, a man-in-the-middle could prevent SSL or ask for a different
> password authentication method and then channel binding would not be
> used.  I think when you say you want channel binding, you have to fail
> if you don't get it.

I am not exactly sure what is the result we are looking for here, so I
am adding for now an open item which refers to this part of the thread.
Please note that I am fine to spend cycles if needed to address any
issues and/or concerns.  Let's the discussion continue for now.
--
Michael


signature.asc
Description: PGP signature


Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-05-17 Thread Etsuro Fujita

(2018/05/16 22:49), Ashutosh Bapat wrote:

On Wed, May 16, 2018 at 4:01 PM, Etsuro Fujita
  wrote:

However, considering that
pull_var_clause is used in many places where partitioning is *not* involved,
I still think it's better to avoid spending extra cycles needed only for
partitioning in that function.


Right. The changes done in inheritance_planner() imply that we can
encounter a ConvertRowtypeExpr even in the expressions for a relation
which is not a child relation in the translated query. It was a child
in the original query but after translating it becomes a parent and
has ConvertRowtypeExpr somewhere there.


Sorry, I don't understand this.  Could you show me such an example?


So, there is no way to know
whether a given expression will have ConvertRowtypeExpr in it. That's
my understanding. But if we can device such a test, I am happy to
apply that test before or witin the call to pull_var_clause().

We don't need that expensive test if user has passed
PVC_RECURSE_CONVERTROWTYPE. So we could test that first and then check
the shape of expression tree. It would cause more asymmetry in
pull_var_clause(), but we can live with it or change the order of
tests for all the three options. I will differ that to a committer.


Sounds more invasive.  Considering where we are in the development cycle 
for PG11, I think it'd be better to address this by something like the 
approach I proposed.  Anyway, +1 for asking the committer's opinion.



+   /* Construct the conversion map. */
+   parent_desc = lookup_rowtype_tupdesc(cre->resulttype, 0);

I think it'd be better to pass -1, not 0, as the typmod argument for that
function because that would be consistent with other places where we use
that function with named rowtypes (eg, ruleutils.c).


Done.


Thanks!


-is_subquery_var(Var *node, RelOptInfo *foreignrel, int *relno, int *colno)
+is_subquery_column(Node *node, Index varno, RelOptInfo *foreignrel,
+   int *relno, int *colno)

-1 for that change; because 1) we use "var" for implying many things as in
eg, pull_var_clause and 2) I think it'd make back-patching easy to keep the
name as-is.


I think pull_var_clause is the only place where we do that and I don't
like that very much. I also don't like is_subquery_var() name since
it's too specific. We might want that kind of functionality to ship
generic expressions and not just Vars in future. Usually we would be
searching for columns in the subquery targetlist so the name change
looks good to me. IIRC, the function was added one release back, so
backpatching pain, if any, won't be much. But I don't think we should
carry a misnomer for many releases to come. Let's differ this to a
committer.


OK


Here's set of updated patches.


Thanks for updating the patch set!  I noticed followings.  Sorry for not 
having noticed that before.


- On patch 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch:

+extern bool
+is_converted_whole_row_reference(Node *node)

I think we should remove "extern" from the definition.

- On patch 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch:

+   /* Construct ROW expression according to the conversion map. */
+   appendStringInfoString(buf, "ROW(");
+   for (cnt = 0; cnt < parent_desc->natts; cnt++)
+   {
+   /* Ignore dropped columns. */
+   if (attrMap[cnt] == 0)
+   continue;
+
+   if (cnt > 0)
+   appendStringInfoString(buf, ", ");
+   deparseColumnRef(buf, child_var->varno, attrMap[cnt],
+planner_rt_fetch(child_var->varno, root),
+qualify_col);
+   }
+   appendStringInfoChar(buf, ')');

The result would start with ", " in the case where the cnt=0 column of 
the parent relation is a dropped one.  Here is an example:


postgres=# create table pt1 (a int, b int) partition by range (b);
CREATE TABLE
postgres=# alter table pt1 drop column a;
ALTER TABLE
postgres=# alter table pt1 add column a int;
ALTER TABLE
postgres=# create table loct11 (like pt1);
CREATE TABLE
postgres=# create foreign table pt1p1 partition of pt1 for values from 
(0) to (100) server loopback options (table_name 'loct11', 
use_remote_estimate 'true');

CREATE FOREIGN TABLE
postgres=# insert into pt1 select i, i from generate_series(0, 99, 2) i;
INSERT 0 50
postgres=# analyze pt1;
ANALYZE
postgres=# analyze pt1p1;
ANALYZE
postgres=# create table pt2 (a int, b int) partition by range (b);
CREATE TABLE
postgres=# create table loct21 (like pt2);
CREATE TABLE
postgres=# create foreign table pt2p1 partition of pt2 for values from 
(0) to (100) server loopback options (table_name 'loct21', 
use_remote_estimate 'true');

CREATE FOREIGN TABLE
postgres=# insert into pt2 select i, i from generate_series(0, 99, 3) i;
INSERT 0 34
postgres=# analyze pt2;
ANALYZE
postgres=# analyze pt2p1;
ANALYZE
postgres=# set enable_partitionwise_join to true;
SET
postgres=# explain verbose select pt1.* from pt1, pt2 where pt1.b = 
pt2.b for update 

Re: Possible bug in logical replication.

2018-05-17 Thread Arseny Sher

Konstantin Knizhnik  writes:

> I think that using restart_lsn instead of confirmed_flush is not right 
> approach.
> If restart_lsn is not available and confirmed_flush is pointing to page
> boundary, then in any case we should somehow handle this case and adjust
> startlsn to point on the valid record position (by jjust adding page header
> size?).

Well, restart_lsn is always available on live slot: it is initially set
in ReplicationSlotReserveWal during slot creation.


--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: overhead of expression evaluation

2018-05-17 Thread Pavel Stehule
2018-05-17 11:32 GMT+02:00 Pavel Stehule :

> Hi
>
> just for curiosity - I tested simple script
>
> postgres=# do
> $$
> declare s bigint = 0;
> begin
>   for i in 1..10 loop
> s := s + i;
>   end loop;
>   raise notice '%', s;
> end;
> $$;
>
> to see a overhead of some components
>
>5,72%  plpgsql.so [.] exec_assign_value
>5,59%  postgres   [.] GetSnapshotData
>4,94%  plpgsql.so [.] exec_eval_simple_expr
>4,84%  postgres   [.] AllocSetAlloc
>3,61%  postgres   [.]
> ResourceOwnerForgetPlanCacheRef
>2,95%  postgres   [.] choose_custom_plan
>2,94%  postgres   [.] GetUserId
>2,92%  postgres   [.] GetCachedPlan
>2,87%  postgres   [.] LWLockAttemptLock
>2,85%  postgres   [.] memset@plt
>2,76%  postgres   [.] LWLockRelease
>2,70%  postgres   [.] AllocSetFree
>2,42%  plpgsql.so [.] assign_simple_var
>2,36%  postgres   [.] ExecInterpExpr
>2,35%  postgres   [.] PopActiveSnapshot
>2,27%  postgres   [.]
> OverrideSearchPathMatchesCurrent
>2,22%  plpgsql.so [.]
> plpgsql_param_eval_var
>2,21%  plpgsql.so [.] exec_stmt
>1,89%  libc-2.27.so   [.]
> __memset_sse2_unaligned_erms
>1,75%  postgres   [.] PushActiveSnapshot
>1,74%  postgres   [.] AcquireExecutorLocks
>1,67%  postgres   [.] CopySnapshot
>1,64%  postgres   [.] CheckCachedPlan
>1,56%  postgres   [.] RevalidateCachedQuery
>1,55%  postgres   [.] MemoryContextAlloc
>1,53%  postgres   [.] pfree
>1,50%  postgres   [.] ResourceArrayRemove
>1,43%  postgres   [.] LWLockAcquire
>1,40%  postgres   [.]
> SPI_plan_get_cached_plan
>
>
maybe we need some new kind of function, that doesn't requires snapshots,
user info, locks - and the plan can be stored directly instead using plan
cache.

Hard to believe, what necessary be done before we want to calculate two
numbers.

Regards

Pavel


> Regards
>
> Pavel
>


RE: NaNs in numeric_power (was Re: Postgres 11 release notes)

2018-05-17 Thread Huong Dangminh
Hi, 

> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> David Rowley  writes:
> > On 16 May 2018 at 02:01, Tom Lane  wrote:
> >> I'm not particularly fussed about getting credit for that.  However,
> >> looking again at how that patch series turned out --- ie, that we
> >> ensured POSIX behavior for NaNs only in HEAD --- I wonder whether we
> >> shouldn't do what was mentioned in the commit log for 6bdf1303, and
> >> teach numeric_pow() about these same special cases.
> >> It seems like it would be more consistent to change both functions
> >> for v11, rather than letting that other shoe drop in some future
> >> major release.
> 
> > I'm inclined to agree. It's hard to imagine these two functions
> > behaving differently in regards to NaN input is useful to anyone.
> 

Thank you. The patch looks fine to me.
Also, I have done the "make check" in Windows and Linux environment with no 
problem.


Thanks and best regards,
---
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/


overhead of expression evaluation

2018-05-17 Thread Pavel Stehule
Hi

just for curiosity - I tested simple script

postgres=# do
$$
declare s bigint = 0;
begin
  for i in 1..10 loop
s := s + i;
  end loop;
  raise notice '%', s;
end;
$$;

to see a overhead of some components

   5,72%  plpgsql.so [.] exec_assign_value
   5,59%  postgres   [.] GetSnapshotData
   4,94%  plpgsql.so [.] exec_eval_simple_expr
   4,84%  postgres   [.] AllocSetAlloc
   3,61%  postgres   [.]
ResourceOwnerForgetPlanCacheRef
   2,95%  postgres   [.] choose_custom_plan
   2,94%  postgres   [.] GetUserId
   2,92%  postgres   [.] GetCachedPlan
   2,87%  postgres   [.] LWLockAttemptLock
   2,85%  postgres   [.] memset@plt
   2,76%  postgres   [.] LWLockRelease
   2,70%  postgres   [.] AllocSetFree
   2,42%  plpgsql.so [.] assign_simple_var
   2,36%  postgres   [.] ExecInterpExpr
   2,35%  postgres   [.] PopActiveSnapshot
   2,27%  postgres   [.]
OverrideSearchPathMatchesCurrent
   2,22%  plpgsql.so [.] plpgsql_param_eval_var
   2,21%  plpgsql.so [.] exec_stmt
   1,89%  libc-2.27.so   [.]
__memset_sse2_unaligned_erms
   1,75%  postgres   [.] PushActiveSnapshot
   1,74%  postgres   [.] AcquireExecutorLocks
   1,67%  postgres   [.] CopySnapshot
   1,64%  postgres   [.] CheckCachedPlan
   1,56%  postgres   [.] RevalidateCachedQuery
   1,55%  postgres   [.] MemoryContextAlloc
   1,53%  postgres   [.] pfree
   1,50%  postgres   [.] ResourceArrayRemove
   1,43%  postgres   [.] LWLockAcquire
   1,40%  postgres   [.]
SPI_plan_get_cached_plan

Regards

Pavel


Re: Possible bug in logical replication.

2018-05-17 Thread Konstantin Knizhnik



On 17.05.2018 10:45, Konstantin Knizhnik wrote:
We got the following assertion failure at our buildfarm of master 
branch of Postgres in contrib/test_decoding regression tests:


2018-05-07 19:50:07.241 MSK [5af083bf.54ae:49] DETAIL:  Streaming transactions 
committing after 0/2A0, reading WAL from 0/29FFF1C.
2018-05-07 19:50:07.241 MSK [5af083bf.54ae:50] STATEMENT:  SELECT end_lsn FROM 
pg_replication_slot_advance('regression_slot1', '0/2A00174')
TRAP: FailedAssertion("!(((RecPtr) % 8192 >= (((uintptr_t) ((sizeof(XLogPageHeaderData))) + ((4) - 
1)) & ~((uintptr_t) ((4) - 1)", File: "xlogreader.c", Line: 241)


Stack trace is the following:

$ gdb -x ./gdbcmd --batch 
pgsql.build/tmp_install/home/buildfarm/build-farm/CORE-353-stable-func/inst/bin/postgres
 pgsql.build/contrib/test_decoding/tmp_check/data/core
[New LWP 21678]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: buildfarm regression [local] SELECT
 '.
Program terminated with signal SIGABRT, Aborted.
#0  0xf7722c89 in __kernel_vsyscall ()
#0  0xf7722c89 in __kernel_vsyscall ()
#1  0xf6b5ddd0 in __libc_signal_restore_set (set=0xffaf2240) at 
../sysdeps/unix/sysv/linux/nptl-signals.h:79
#2  __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:48
#3  0xf6b5f297 in __GI_abort () at abort.c:89
#4  0x56b3931a in ExceptionalCondition (conditionName=0x56bd0c38 "!(((RecPtr) % 8192 >= (((uintptr_t) 
((sizeof(XLogPageHeaderData))) + ((4) - 1)) & ~((uintptr_t) ((4) - 1)", errorType=0x56b8bf61 
"FailedAssertion", fileName=0x56bd0df0 "xlogreader.c", lineNumber=241) at assert.c:54
#5  0x5678c573 in XLogReadRecord (state=0x57628c84, RecPtr=44040192, 
errormsg=0xffaf2560) at xlogreader.c:241
#6  0x569c3191 in pg_logical_replication_slot_advance (startlsn=, moveto=44040564) at slotfuncs.c:370
#7  0x569c3c8e in pg_replication_slot_advance (fcinfo=0xffaf2708) at 
slotfuncs.c:487
#8  0x568a69c0 in ExecMakeTableFunctionResult (setexpr=0x57626e30, 
econtext=0x57626d88, argContext=0x57620b48, expectedDesc=0x57627e44, 
randomAccess=false) at execSRF.c:231
#9  0x568b41d3 in FunctionNext (node=0x57626cfc) at nodeFunctionscan.c:94
#10 0x568a5ce2 in ExecScanFetch (recheckMtd=0x568b3ec0 , 
accessMtd=0x568b3ed0 , node=0x57626cfc) at execScan.c:95
#11 ExecScan (node=0x57626cfc, accessMtd=0x568b3ed0 , 
recheckMtd=0x568b3ec0 ) at execScan.c:162
#12 0x568b4243 in ExecFunctionScan (pstate=0x57626cfc) at nodeFunctionscan.c:270
#13 0x5689caba in ExecProcNode (node=0x57626cfc) at 
../../../src/include/executor/executor.h:238
#14 ExecutePlan (execute_once=, dest=0x0, direction=NoMovementScanDirection, 
numberTuples=, sendTuples=, operation=CMD_SELECT, 
use_parallel_mode=, planstate=0x57626cfc, estate=0x57626bf0) at execMain.c:1731
#15 standard_ExecutorRun (queryDesc=0x5760f248, direction=ForwardScanDirection, 
count=0, execute_once=true) at execMain.c:368
#16 0x56a132cd in PortalRunSelect (portal=portal@entry=0x575c4ea8, 
forward=forward@entry=true, count=0, count@entry=2147483647, dest=0x576232d4) 
at pquery.c:932
#17 0x56a14b00 in PortalRun (portal=0x575c4ea8, count=2147483647, isTopLevel=true, 
run_once=true, dest=0x576232d4, altdest=0x576232d4, completionTag=0xffaf2c40 
"") at pquery.c:773
#18 0x56a0fbb7 in exec_simple_query (query_string=query_string@entry=0x57579070 
"SELECT end_lsn FROM pg_replication_slot_advance('regression_slot1', '0/2A00174') 
") at postgres.c:1122
#19 0x56a11a8e in PostgresMain (argc=1, argv=0x575a0b8c, dbname=, 
username=0x575a09f0 "buildfarm") at postgres.c:4153
#20 0x566cd9cb in BackendRun (port=0x5759a358) at postmaster.c:4361
#21 BackendStartup (port=0x5759a358) at postmaster.c:4033
#22 ServerLoop () at postmaster.c:1706
#23 0x5698a608 in PostmasterMain (argc=, argv=) 
at postmaster.c:1379
#24 0x566cf642 in main (argc=, argv=) at 
main.c:228



As you can see, assertion failure happen because specified startlsn (0x2a0) 
is not considered to be valid.
This LSN is taken from slot's confirmed flush LSN in 
pg_replication_slot_advance:
startlsn = MyReplicationSlot->data.confirmed_flush;


Unfortunately I was not able to reproduce the problem even repeating this 
regression tests 1000 times, so it seems to be very difficult to reproduced 
non-deterministic bug.
I wonder if there is a warranty that confirmed_flush always points to the start 
of the record.
Inspecting of xlogreader.c code shows that confirmed_flush is for example 
assigned EndRecPtr in DecodingContextFindStartpoint.
And EndRecPtr is updated in  XLogReadRecord and if record doesn't cross page 
boundary, then the following formula is used:

state->EndRecPtr = RecPtr + MAXALIGN(total_len);

And if record ends at page boundary, then looks like EndRecPtr can point to 
page boundary.
It is confirmed by the following comment in XLogReadRecord function:

/*
 * RecPtr is pointing to end+1 

Re: [PATCH] Use access() to check file existence in GetNewRelFileNode().

2018-05-17 Thread Paul Guo
F_OK seems to be better than R_OK because we want to check file existence
(not read permission) before creating the relation file with the path
later.

2018-05-17 17:09 GMT+08:00 Michael Paquier :

> On Thu, May 17, 2018 at 04:09:27PM +0800, Paul Guo wrote:
> > Previous code uses BasicOpenFile() + close().
> >
> > access() should be faster than BasicOpenFile()+close() and access()
> > should be more correct since BasicOpenFile() could fail for various
> > cases (e.g. due to file permission, etc) even the file exists.
>
> Failing because of file permissions would be correct.  There have been
> cases in the past, particularly on Windows, where anti-virus softwares
> wildly scan files, causing EACCES on various points of the data folder.
>
> > access() is supported on Linux/Unix. I do not have a Windows dev
> > environment, but MSDN tells me that access() is supported on Windows also
> > and there have been access() call in the workspace, so I assume there is
> no
> > portability issue.
>
> Yes, access() is spread already in the core code.
>
> -fd = BasicOpenFile(rpath, O_RDONLY | PG_BINARY);
>
> -if (fd >= 0)
> +if (access(rpath, F_OK) == 0)
>
> What you are looking for here is R_OK, no?
> --
> Michael
>


Re: [PATCH] Use access() to check file existence in GetNewRelFileNode().

2018-05-17 Thread Michael Paquier
On Thu, May 17, 2018 at 04:09:27PM +0800, Paul Guo wrote:
> Previous code uses BasicOpenFile() + close().
> 
> access() should be faster than BasicOpenFile()+close() and access()
> should be more correct since BasicOpenFile() could fail for various
> cases (e.g. due to file permission, etc) even the file exists.

Failing because of file permissions would be correct.  There have been
cases in the past, particularly on Windows, where anti-virus softwares
wildly scan files, causing EACCES on various points of the data folder.

> access() is supported on Linux/Unix. I do not have a Windows dev
> environment, but MSDN tells me that access() is supported on Windows also
> and there have been access() call in the workspace, so I assume there is no
> portability issue.

Yes, access() is spread already in the core code.

-fd = BasicOpenFile(rpath, O_RDONLY | PG_BINARY);
 
-if (fd >= 0)
+if (access(rpath, F_OK) == 0)

What you are looking for here is R_OK, no?
--
Michael


signature.asc
Description: PGP signature


Re: log_min_messages shows debug instead of debug2

2018-05-17 Thread Kyotaro HORIGUCHI
At Wed, 16 May 2018 12:37:48 -0400, Robert Haas  wrote 
in 
> On Wed, May 16, 2018 at 11:28 AM, Tom Lane  wrote:
> > Robert Haas  writes:
> >> On Tue, May 15, 2018 at 4:46 AM, Ideriha, Takeshi
> >>  wrote:
> >>> I noticed that if log_min_messages is set to ‘debug2’, it shows ‘debug’
> >>> instead of debug2.
> >
> >> Seems worth changing to me.  Anyone else think differently?
> >
> > I think the current behavior was intentional at some point, probably
> > with the idea that if you put in "debug" it should come out as "debug".
> 
> Hmm, that's an angle I hadn't considered.
> 
> > This patch just moves the discrepancy.  Still, it's more precise to
> > print debug2, so I'm okay with changing.
> 
> OK.  Let's see if anyone else has an opinion.

It's not about me, but without knowing the history, "debug" can
look as if it is less chatty than "debug2" or even "debug1". All
log level of "debugN" are seen as DEBUG in log lines. Thus
showing "debug2" as just "debug" may obfuscate the cause of
having so many log lines.

From another view point, it can be thought as not-good that print
a value with a hidden word.

In short, I submit +1 for this.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Re: inconsistency and inefficiency in setup_conversion()

2018-05-17 Thread Thomas Munro
On Thu, May 3, 2018 at 12:19 AM, John Naylor  wrote:
> Attached is a draft patch to do this, along with the conversion script
> used to create the entries. In writing this, a few points came up that
> are worth bringing up:

Hi John,

This failed for my patch testing robot on Windows, with this message[1]:

Generating pg_config_paths.h...
No include path; you must specify -I.
Could not open src/backend/catalog/schemapg.h at
src/tools/msvc/Mkvcbuild.pm line 832.

I see that you changed src/backend/catalog/Makefile to pass the new -I
switch to genbki.pl.  I think for Windows you might need to add it to
the line in src/tools/msvc/Solution.pm that invokes genbki.pl via
system()?

[1] https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.48

-- 
Thomas Munro
http://www.enterprisedb.com



Re: pg_ugprade test failure on data set with column with default value with type bit/varbit

2018-05-17 Thread Paul Guo
Thanks. I tentatively submitted a patch (See the attachment).

By the way, current pg_upgrade test script depends on the left data on test
database, but it seems that
a lot of tables are dropped in those test SQL files so this affects the
pg_upgrade test coverage much.
Maybe this needs to be addressed in the future. (Maybe when anyone checks
in test cases pg_upgrade
needs to be considered also?)


2018-05-11 3:08 GMT+08:00 Robert Haas :

> On Fri, Mar 30, 2018 at 5:36 AM, Paul Guo  wrote:
> > There is no diff in functionality of the dump SQLs, but it is annoying.
> The
> > simple patch below could fix this. Thanks.
> >
> > --- a/src/backend/utils/adt/ruleutils.c
> > +++ b/src/backend/utils/adt/ruleutils.c
> > @@ -9389,7 +9389,7 @@ get_const_expr(Const *constval, deparse_context
> > *context, int showtype)
> >
> > case BITOID:
> > case VARBITOID:
> > -   appendStringInfo(buf, "B'%s'", extval);
> > +   appendStringInfo(buf, "'%s'", extval);
> > break;
> >
> > case BOOLOID:
>
> My first reaction was to guess that this would be unsafe, but looking
> it over I don't see a problem.  For the other types handled in that
> switch statement, we rely on the custom representation to avoid
> needing a typecast, but it seems that for BITOID and VARBITOID we
> insert a typecast no matter what.  So maybe the presence of absence of
> the "B" makes no difference.
>
> This logic seems to have been added by commit
> c828ec88205a232a9789f157d8cf9c3d82f85152, Peter Eisentraut, vintage
> 2002.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


0001-Fix-pg_upgrade-test-failure-caused-by-the-DDL-below.patch
Description: Binary data


Infinite loop on master shutdown

2018-05-17 Thread Kyotaro HORIGUCHI
Hello, as in pgsql-bug ML.

https://www.postgresql.org/message-id/20180517.170021.24356216.horiguchi.kyot...@lab.ntt.co.jp

Master can go into infinite loop on shutdown. But it is caused by
a broken database like storage rolled-back one. (The steps to
replay this is shown in the above mail.)

I think this can be avoided by rejecting a standby if it reports
that write LSN is smaller than flush LSN after catching up.

Is it worth fixing?

# The patch is slightly different from that I posted to -bugs.

It is enough to chek for the invalid state just once but the
patch continues the check.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index e47ddca6bc..1916acf629 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1809,6 +1809,19 @@ ProcessStandbyReplyMessage(void)
 	if (replyRequested)
 		WalSndKeepalive(false);
 
+	/*
+	 * Once this stream catches up to WAL, writePtr cannot be smaller than
+	 * flushPtr. Otherwise we haven't reached the standby's starting LSN thus
+	 * this database cannot be a proper master of the standby. The state
+	 * causes infinite loop on shutdown.
+	 */
+	if (MyWalSnd->state >= WALSNDSTATE_CATCHUP &&
+		writePtr != InvalidXLogRecPtr && writePtr < flushPtr)
+		ereport(ERROR,
+(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Standby started from the future LSN for this server"),
+ errhint("This means that the standby is not created from this database.")));
+
 	/*
 	 * Update shared state for this WalSender process based on reply data from
 	 * standby.


Re: Removing unneeded self joins

2018-05-17 Thread Konstantin Knizhnik



On 17.05.2018 05:19, Andres Freund wrote:

On 2018-05-16 22:11:22 -0400, Tom Lane wrote:

David Rowley  writes:

On 17 May 2018 at 11:00, Andres Freund  wrote:

Wonder if we shouldn't just cache an estimated relation size in the
relcache entry till then. For planning purposes we don't need to be
accurate, and usually activity that drastically expands relation size
will trigger relcache activity before long. Currently there's plenty
workloads where the lseeks(SEEK_END) show up pretty prominently.

While I'm in favour of speeding that up, I think we'd get complaints
if we used a stale value.

Yeah, that scares me too.  We'd then be in a situation where (arguably)
any relation extension should force a relcache inval.  Not good.
I do not buy Andres' argument that the value is noncritical, either ---
particularly during initial population of a table, where the size could
go from zero to something-significant before autoanalyze gets around
to noticing.

I don't think every extension needs to force a relcache inval. It'd
instead be perfectly reasonable to define a rule that an inval is
triggered whenever crossing a 10% relation size boundary. Which'll lead
to invalidations for the first few pages, but much less frequently
later.



I'm a bit skeptical of the idea of maintaining an accurate relation
size in shared memory, too.  AIUI, a lot of the problem we see with
lseek(SEEK_END) has to do with contention inside the kernel for access
to the single-point-of-truth where the file's size is kept.  Keeping
our own copy would eliminate kernel-call overhead, which can't hurt,
but it won't improve the contention angle.

A syscall is several hundred instructions. An unlocked read - which'll
be be sufficient in many cases, given that the value can quickly be out
of date anyway - is a few cycles. Even with a barrier you're talking a
few dozen cycles.  So I can't see how it'd not improve the contention.

But the main reason for keeping it in shmem is less the lseek avoidance
- although that's nice, context switches aren't great - but to make
relation extension need far less locking.

Greetings,

Andres Freund



I completely agree with Andreas. In my multithreaded Postgres prototype 
file description cache (shared by all threads) becomes bottleneck 
exactly because of each query execution requires
access to file system (lseek) to provide optimizer estimation of the 
relation size, despite to the fact that all database fits in memory.
Well, this is certainly specific of shared descriptor's pool in my 
prototype, but the fact the we have to perform lseek at each query 
compilation seems to be annoying in any case.


And there is really no problem that cached relation size estimation is 
not precise.  It really can be invalidated even if relation size is 
changed more than some threshold value (1Mb?) or lease time for cached 
value is expired.
May be it is reasonable to implement specific invalidation for relation 
size esimation, to avoid complete invalidation and reconstruction of 
relation description and all dependent objects.
In this case time-based invalidation seems to be the easiest choice to 
implement. Repeating lseek each 10 or 1 second seems to have no 
noticeable impact on performance and relation size can not dramatically 
changed during this time.




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




[PATCH] Use access() to check file existence in GetNewRelFileNode().

2018-05-17 Thread Paul Guo
Previous code uses BasicOpenFile() + close().

access() should be faster than BasicOpenFile()+close() and access()
should be more correct since BasicOpenFile() could fail for various
cases (e.g. due to file permission, etc) even the file exists.

access() is supported on Linux/Unix. I do not have a Windows dev
environment, but MSDN tells me that access() is supported on Windows also
and there have been access() call in the workspace, so I assume there is no
portability issue.

Thanks.


0001-Use-access-to-check-file-existence-in-GetNewRelFileN.patch
Description: Binary data


Possible bug in logical replication.

2018-05-17 Thread Konstantin Knizhnik
We got the following assertion failure at our buildfarm of master branch 
of Postgres in contrib/test_decoding regression tests:


2018-05-07 19:50:07.241 MSK [5af083bf.54ae:49] DETAIL:  Streaming transactions 
committing after 0/2A0, reading WAL from 0/29FFF1C.
2018-05-07 19:50:07.241 MSK [5af083bf.54ae:50] STATEMENT:  SELECT end_lsn FROM 
pg_replication_slot_advance('regression_slot1', '0/2A00174')
TRAP: FailedAssertion("!(((RecPtr) % 8192 >= (((uintptr_t) ((sizeof(XLogPageHeaderData))) + ((4) - 
1)) & ~((uintptr_t) ((4) - 1)", File: "xlogreader.c", Line: 241)


Stack trace is the following:

$ gdb -x ./gdbcmd --batch 
pgsql.build/tmp_install/home/buildfarm/build-farm/CORE-353-stable-func/inst/bin/postgres
 pgsql.build/contrib/test_decoding/tmp_check/data/core
[New LWP 21678]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: buildfarm regression [local] SELECT
 '.
Program terminated with signal SIGABRT, Aborted.
#0  0xf7722c89 in __kernel_vsyscall ()
#0  0xf7722c89 in __kernel_vsyscall ()
#1  0xf6b5ddd0 in __libc_signal_restore_set (set=0xffaf2240) at 
../sysdeps/unix/sysv/linux/nptl-signals.h:79
#2  __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:48
#3  0xf6b5f297 in __GI_abort () at abort.c:89
#4  0x56b3931a in ExceptionalCondition (conditionName=0x56bd0c38 "!(((RecPtr) % 8192 >= (((uintptr_t) 
((sizeof(XLogPageHeaderData))) + ((4) - 1)) & ~((uintptr_t) ((4) - 1)", errorType=0x56b8bf61 
"FailedAssertion", fileName=0x56bd0df0 "xlogreader.c", lineNumber=241) at assert.c:54
#5  0x5678c573 in XLogReadRecord (state=0x57628c84, RecPtr=44040192, 
errormsg=0xffaf2560) at xlogreader.c:241
#6  0x569c3191 in pg_logical_replication_slot_advance (startlsn=, moveto=44040564) at slotfuncs.c:370
#7  0x569c3c8e in pg_replication_slot_advance (fcinfo=0xffaf2708) at 
slotfuncs.c:487
#8  0x568a69c0 in ExecMakeTableFunctionResult (setexpr=0x57626e30, 
econtext=0x57626d88, argContext=0x57620b48, expectedDesc=0x57627e44, 
randomAccess=false) at execSRF.c:231
#9  0x568b41d3 in FunctionNext (node=0x57626cfc) at nodeFunctionscan.c:94
#10 0x568a5ce2 in ExecScanFetch (recheckMtd=0x568b3ec0 , 
accessMtd=0x568b3ed0 , node=0x57626cfc) at execScan.c:95
#11 ExecScan (node=0x57626cfc, accessMtd=0x568b3ed0 , 
recheckMtd=0x568b3ec0 ) at execScan.c:162
#12 0x568b4243 in ExecFunctionScan (pstate=0x57626cfc) at nodeFunctionscan.c:270
#13 0x5689caba in ExecProcNode (node=0x57626cfc) at 
../../../src/include/executor/executor.h:238
#14 ExecutePlan (execute_once=, dest=0x0, direction=NoMovementScanDirection, 
numberTuples=, sendTuples=, operation=CMD_SELECT, 
use_parallel_mode=, planstate=0x57626cfc, estate=0x57626bf0) at execMain.c:1731
#15 standard_ExecutorRun (queryDesc=0x5760f248, direction=ForwardScanDirection, 
count=0, execute_once=true) at execMain.c:368
#16 0x56a132cd in PortalRunSelect (portal=portal@entry=0x575c4ea8, 
forward=forward@entry=true, count=0, count@entry=2147483647, dest=0x576232d4) 
at pquery.c:932
#17 0x56a14b00 in PortalRun (portal=0x575c4ea8, count=2147483647, isTopLevel=true, 
run_once=true, dest=0x576232d4, altdest=0x576232d4, completionTag=0xffaf2c40 
"") at pquery.c:773
#18 0x56a0fbb7 in exec_simple_query (query_string=query_string@entry=0x57579070 
"SELECT end_lsn FROM pg_replication_slot_advance('regression_slot1', '0/2A00174') 
") at postgres.c:1122
#19 0x56a11a8e in PostgresMain (argc=1, argv=0x575a0b8c, dbname=, 
username=0x575a09f0 "buildfarm") at postgres.c:4153
#20 0x566cd9cb in BackendRun (port=0x5759a358) at postmaster.c:4361
#21 BackendStartup (port=0x5759a358) at postmaster.c:4033
#22 ServerLoop () at postmaster.c:1706
#23 0x5698a608 in PostmasterMain (argc=, argv=) 
at postmaster.c:1379
#24 0x566cf642 in main (argc=, argv=) at 
main.c:228



As you can see, assertion failure happen because specified startlsn (0x2a0) 
is not considered to be valid.
This LSN is taken from slot's confirmed flush LSN in 
pg_replication_slot_advance:

startlsn = MyReplicationSlot->data.confirmed_flush;


Unfortunately I was not able to reproduce the problem even repeating this 
regression tests 1000 times, so it seems to be very difficult to reproduced 
non-deterministic bug.
I wonder if there is a warranty that confirmed_flush always points to the start 
of the record.
Inspecting of xlogreader.c code shows that confirmed_flush is for example 
assigned EndRecPtr in DecodingContextFindStartpoint.
And EndRecPtr is updated in  XLogReadRecord and if record doesn't cross page 
boundary, then the following formula is used:

state->EndRecPtr = RecPtr + MAXALIGN(total_len);

And if record ends at page boundary, then looks like EndRecPtr can point to 
page boundary.
It is confirmed by the following comment in XLogReadRecord function:

/*
 * RecPtr is pointing to end+1 of the previous WAL record.  If 
we're
 

Re: Problem while updating a foreign table pointing to a partitioned table on foreign server

2018-05-17 Thread Ashutosh Bapat
On Wed, May 16, 2018 at 11:31 PM, Robert Haas  wrote:
> On Mon, Apr 16, 2018 at 7:35 AM, Ashutosh Bapat
>  wrote:
>> It does fix the problem. But the patch as is interferes with the way
>> we handle tableoid currently. That can be seen from the regression
>> diffs that the patch causes.  RIght now, every tableoid reference gets
>> converted into the tableoid of the foreign table (and not the tableoid
>> of the foreign table). Somehow we need to differentiate between the
>> tableoid injected for DML and tableoid references added by the user in
>> the original query and then use tableoid on the foreign server for the
>> first and local foreign table's oid for the second. Right now, I don't
>> see a simple way to do that.
>
> I think that the place to start would be to change this code to use
> something other than TableOidAttributeNumber:
>
> +   var = makeVar(parsetree->resultRelation,
> + TableOidAttributeNumber,
> + OIDOID,
> + -1,
> + InvalidOid,
> + 0);
>
> Note that rewriteTargetListUD, which calls AddForeignUpdateTargets,
> also contingently adds a "wholerow" attribute which ExecModifyTable()
> is able to fish out later.  It seems like it should be possible to add
> a "remotetableoid" column that works similarly, although I'm not
> exactly sure what would be involved.

As of today, all the attributes added by AddForeignUpdateTargets hook
of postgres_fdw are recognised by PostgreSQL. But remotetableoid is
not a recognised attributes. In order to use it, we either have to
define a new system attribute "remotetableoid" or add a user defined
attribute "remotetableoid" in every foreign table. The first one will
be very specific for postgres_fdw and other FDWs won't be able to use
it. The second would mean that SELECT * from foreign table reports
remotetableoid as well, which is awkward. Me and Horiguchi-san
discussed those ideas in this mail thread.

Anyway, my comment to which you have replied is obsolete now. I found
a solution to that problem, which I have implemented in 0003 in the
latest patch-set I have shared.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company