Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-04 Thread Shulgin, Oleksandr
On Apr 5, 2016 00:31, "Tom Lane"  wrote:
>
> Alex Shulgin  writes:
> > On Mon, Apr 4, 2016 at 1:06 AM, Tom Lane  wrote:
> >> I'm inclined to
> >> revert the aspect of 3d3bf62f3 that made us work from "d" (the observed
> >> number of distinct values in the sample) rather than stadistinct (the
> >> extrapolated estimate for the table).  On reflection I think that
that's
> >> inconsistent with the theory behind the old MCV-cutoff rule.  It
wouldn't
> >> matter if we were going to replace the cutoff rule with something else,
> >> but it's beginning to sound like that won't happen for 9.6.
>
> > Please feel free to do what you think is in the best interest of the
people
> > preparing 9.6 for the freeze.  I'm not all that familiar with the
process,
> > but I guess reverting this early might save some head-scratching if some
> > interesting interactions of this change combined with some others are
going
> > to show up.
>
> I've reverted that bit; so we still have the improvements associated with
> ignoring nulls, but nothing else at the moment.  I'll set this commitfest
> item back to Waiting on Author, just in case you are able to make some
> more progress before the end of the week.

OK, though it's unlikely that I'll get productive again before next week,
but maybe someone who has also been following this thread wants to step in?

Thanks.
--
Alex


Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-04-04 Thread Amit Langote
On 2016/04/05 0:23, Tom Lane wrote:
> Amit Langote  writes:
>> On Mon, Apr 4, 2016 at 11:24 PM, Tom Lane  wrote:
>>> A related issue, now that I've seen this example, is that altering
>>> FDW-level or server-level options won't cause a replan either.  I'm
>>> not sure there's any very good fix for that.  Surely we don't want
>>> to try to identify all tables belonging to the FDW or server and
>>> issue relcache invals on all of them.
> 
>> Hm, some kind of PlanInvalItem-based solution could work maybe?
> 
> Hm, so we'd expect that whenever an FDW consulted the options while
> making a plan, it'd have to record a plan dependency on them?  That
> would be a clean fix maybe, but I'm worried that third-party FDWs
> would fail to get the word about needing to do this.

I would imagine that that level of granularity may be a little too much; I
mean tracking dependencies at the level of individual FDW/foreign
table/foreign server options.  I think it should really be possible to do
the entire thing in core instead of requiring this to be made a concern of
FDW authors.  How about the attached that teaches
extract_query_dependencies() to add a foreign table and associated foreign
data wrapper and foreign server to invalItems.  Also, it adds plan cache
callbacks for respective caches.

One thing that I observed that may not be all that surprising is that we
may need a similar mechanism for postgres_fdw's connection cache, which
doesn't drop connections using older server connection info after I alter
them.  I was trying to test my patch by altering dbaname option of a
foreign server but that was silly, ;).  Although, I did confirm that the
patch works by altering use_remote_estimates server option. I could not
really test for FDW options though.

Thoughts?

Thanks,
Amit
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index dd2b9ed..e7ddc14 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -16,7 +16,9 @@
 #include "postgres.h"
 
 #include "access/transam.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_type.h"
+#include "foreign/foreign.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/pathnode.h"
@@ -150,6 +152,12 @@ static List *set_returning_clause_references(PlannerInfo *root,
 static bool fix_opfuncids_walker(Node *node, void *context);
 static bool extract_query_dependencies_walker(Node *node,
   PlannerInfo *context);
+static void record_plan_foreign_table_dependency(PlannerInfo *root,
+  Oid ftid);
+static void record_plan_foreign_data_wrapper_dependency(PlannerInfo *root,
+  Oid fdwid);
+static void record_plan_foreign_server_dependency(PlannerInfo *root,
+  Oid serverid);
 
 /*
  *
@@ -2652,6 +2660,54 @@ record_plan_function_dependency(PlannerInfo *root, Oid funcid)
 }
 
 /*
+ * record_plan_foreign_table_dependency
+ *		Mark the current plan as depending on a particular foreign table.
+ */
+static void
+record_plan_foreign_table_dependency(PlannerInfo *root, Oid ftid)
+{
+	PlanInvalItem *inval_item = makeNode(PlanInvalItem);
+
+	inval_item->cacheId = FOREIGNTABLEREL;
+	inval_item->hashValue = GetSysCacheHashValue1(FOREIGNTABLEREL,
+   ObjectIdGetDatum(ftid));
+
+	root->glob->invalItems = lappend(root->glob->invalItems, inval_item);
+}
+
+/*
+ * record_plan_foreign_data_wrapper_dependency
+ *		Mark the current plan as depending on a particular FDW.
+ */
+static void
+record_plan_foreign_data_wrapper_dependency(PlannerInfo *root, Oid fdwid)
+{
+	PlanInvalItem *inval_item = makeNode(PlanInvalItem);
+
+	inval_item->cacheId = FOREIGNDATAWRAPPEROID;
+	inval_item->hashValue = GetSysCacheHashValue1(FOREIGNDATAWRAPPEROID,
+   ObjectIdGetDatum(fdwid));
+
+	root->glob->invalItems = lappend(root->glob->invalItems, inval_item);
+}
+
+/*
+ * record_plan_foreign_server_dependency
+ *		Mark the current plan as depending on a particular FDW.
+ */
+static void
+record_plan_foreign_server_dependency(PlannerInfo *root, Oid serverid)
+{
+	PlanInvalItem *inval_item = makeNode(PlanInvalItem);
+
+	inval_item->cacheId = FOREIGNSERVEROID;
+	inval_item->hashValue = GetSysCacheHashValue1(FOREIGNSERVEROID,
+   ObjectIdGetDatum(serverid));
+
+	root->glob->invalItems = lappend(root->glob->invalItems, inval_item);
+}
+
+/*
  * extract_query_dependencies
  *		Given a not-yet-planned query or queries (i.e. a Query node or list
  *		of Query nodes), extract dependencies just as set_plan_references
@@ -2723,6 +2779,22 @@ extract_query_dependencies_walker(Node *node, PlannerInfo *context)
 			if (rte->rtekind == RTE_RELATION)
 context->glob->relationOids =
 	lappend_oid(context->glob->relationOids, rte->relid);
+
+			if (rte->relkind == RELKIND_FOREIGN_TABLE)
+			{
+ForeignTable   *ftable;
+ForeignServer  *fserver;

Re: [HACKERS] [PATCH v12] GSSAPI encryption support

2016-04-04 Thread Michael Paquier
On Tue, Apr 5, 2016 at 9:06 AM, Robbie Harwood  wrote:
> Here's v12, both here and on my github:
> https://github.com/frozencemetery/postgres/tree/feature/gssencrypt12
>
> What changed:
>
> - The code is aware of memory contexts now.  I actually really like the
>   memory context stuff; just didn't see any indication of its existence
>   in the code I had read.  Anyway, we allocate server buffers in the
>   connection-lifetime context.  The other alternative that we discussed
>   on IRC a bit was to avoid palloc()/pfree() entirely in favor of raw
>   calloc()/free(), but I think if possible I prefer this approach since
>   I find the StringInfo handy to work with.  This eliminates the
>   traceback for me with --enable-cassert.

Affirnative, I am not seeing the failure anymore.

+#ifdef ENABLE_GSS
+   {
+   MemoryContext save = CurrentMemoryContext;
+   CurrentMemoryContext = TopMemoryContext;
+
+   initStringInfo(>gss->buf);
+   initStringInfo(>gss->writebuf);
+
+   CurrentMemoryContext = save;
+   }
+#endif
So you are saving everything in the top memory context. I am fine to
give the last word to a committer here but I would just go with
calloc/free to simplify those hunks.

+#ifdef ENABLE_GSS
+   if (conn->gss->buf.data)
+   pfree(conn->gss->buf.data);
+   if (conn->gss->writebuf.data)
+   pfree(conn->gss->writebuf.data);
+#endif
This should happen in its own memory context, no

> - Error cleanup.  I've been looking very hard at this code in order to
>   try to fix the assert, and I've fixed up a couple error paths that
>   hadn't been taken.  This involves replacing the double-send with a
>   buffer-and-then-send, which turns out to be not only shorter but
>   easier for me to reason about.

@@ -775,6 +775,7 @@ infodir
 docdir
 oldincludedir
 includedir
+runstatedir
 localstatedir
 sharedstatedir
 sysconfdir
@@ -896,6 +897,7 @@ datadir='${datarootdir}'
 sysconfdir='${prefix}/etc'
 sharedstatedir='${prefix}/com'
 localstatedir='${prefix}/var'
+runstatedir='${localstatedir}/run'
What's that? I would recommend re-running autoconf to remove this
portion (committers do it at the end btw, so that's actually no bug
deal).


-#if defined(WIN32) && !defined(WIN32_ONLY_COMPILER)
-/*
- * MIT Kerberos GSSAPI DLL doesn't properly export the symbols for MingW
- * that contain the OIDs required. Redefine here, values copied
- * from src/athena/auth/krb5/src/lib/gssapi/generic/gssapi_generic.c
- */
-static const gss_OID_desc GSS_C_NT_USER_NAME_desc =
-{10, (void *) "\x2a\x86\x48\x86\xf7\x12\x01\x02\x01\x02"};
-static GSS_DLLIMP gss_OID GSS_C_NT_USER_NAME = _C_NT_USER_NAME_desc;
-#endif
Regarding patch 0003 it may be fine to remove that... Robbie, do you
know how long ago this has been fixed upstream? I'd rather not have
this bit removed if this could impact some users.

On 0003, +1 for reading the whole stack of messages. That's definitely
worth a separate patch.

Btw, those seem like small things to me, and my comments have been
addressed, so I have switched the patch as ready for committer. I
guess that Stephen would be the one to look at it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] oversight in parallel aggregate

2016-04-04 Thread Robert Haas
On Mon, Apr 4, 2016 at 10:35 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> One of my EDB colleagues, while in the process of refactoring some
>> unrelated Advanced Server code, discovered that (1) there's no way to
>> mark an aggregate as anything other than parallel-unsafe but (2) it
>> doesn't matter because has_parallel_hazard ignores Aggrefs anyway.
>> These mistakes cancel each other out (sorta) if all of your aggregates
>> happen to be parallel-safe, but otherwise not so much.  Barring
>> objections, I intend to speedily apply the attached patch to fix this.
>
> Um ... why is it a good idea to attach a parallel-safe annotation to an
> aggregate as such, rather than relying on the parallel-safe annotations
> of its implementation functions?
>
> This seems not entirely academic, since perhaps the functions are not
> all marked the same; which might be sensible.  Perhaps the transition
> function can be pushed down but not the final function.

We could do it that way, and then just ignore the marking of the
aggregate function itself.  However, that would require
has_parallel_hazard to do more syscache lookups, since it would have
to examine all of the functions bound into the aggregate instead of
just looking at the aggregate itself.  I think that's probably not
worth it, because I struggle to think of why an aggregate function
would be ever be parallel-restricted or parallel-unsafe.  I suppose
somebody could create a user-defined aggregate that has side effects,
but that seems like a corner case for which we shouldn't be
optimizing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining Aggregates

2016-04-04 Thread Robert Haas
On Wed, Mar 30, 2016 at 3:34 PM, David Rowley
 wrote:
> I wrote 0002 - 0004, these were reviewed by Tomas.
> 0005 is Haribabu's patch: Reviewed by Tomas and I.

I think it might be a good idea if these patches made less use of
bytea and exposed the numeric transition values as, say, a 2-element
array of numeric.  Maybe this doesn't really matter, but it's not
impossible that these values could be exposed somewhere, and a numeric
is an awful lot more user-friendly than an essentially opaque bytea.
One of the things I eventually want to figure out how to do with this
is distributed aggregation across multiple shards, and I think it
might be better to have the value being sent over the wire be
something like {26.6,435.12} rather than \x1234...

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Choosing parallel_degree

2016-04-04 Thread Amit Kapila
On Mon, Apr 4, 2016 at 11:39 PM, Julien Rouhaud 
wrote:
>
> On 04/04/2016 17:03, Julien Rouhaud wrote:
> > On 04/04/2016 08:55, Amit Kapila wrote:
> >
> > Thanks for the review!
> >
> >> Few comments:
> >> 1.
> >> +  limited according to the 
> >>
> >> A. typo.
> >>/gux-max-parallel-degree/guc-max-parallel-degree
> >>/worker/workers
> >
> > Oops, fixed.
> >
>
> And I managed to no fix it, sorry :/ Thanks to Andreas who warned me.
>

Few more comments:

1.
@@ -909,6 +909,17 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } |
UNLOGGED ] TABLE [ IF NOT EXI




+parallel_degree (integer)
+
+

+ Sets the degree of parallelism for an individual relation.  The
requested
+ number of workers will be
limited by .
+ 
+
+   

All other parameters in this category are supportted by Alter table command
as well, so I think this parameter should also be supported by Alter Table
command (for both SET and RESET variants).

2.
+ "Number of parallel processes per executor node wanted for this
relation.",

How about
Number of parallel processes that can be used for this relation per
executor node.

3.
- if (rel->pages < parallel_threshold && rel->reloptkind == RELOPT_BASEREL)
+ if (rel->pages <
parallel_threshold && rel->rel_parallel_degree == -1 &&
+ rel->reloptkind == RELOPT_BASEREL)

A. Second line should be indented with the begin of first line after
bracket '(' which means with rel->pages.  Refer multiline condition in near
by code.  Or you can run pgindent.
B. The comment above this condition needs slight adjustment as per new
condition.

4.
+ int parallel_degree; /* max number of parallel worker */
 } StdRdOptions;

Typo in comments
/worker/workers


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Timeline following for logical slots

2016-04-04 Thread Petr Jelinek

On 04/04/16 17:15, Andres Freund wrote:



* Robust sequence decoding and replication. If you were following the later
parts of that discussion you will've seen how fun that's going to be, but
it's the simplest of all of the problems.


Unconvinced. People used londiste and slony for years without that, and
it's not even remotely at the top of the list of problems with either.



Londiste and Slony also support physical failover unlike logical 
decoding which is the main point of this discussion, lets not forget that.





* Robust, seamless DDL replication, so things don't just break randomly.
This makes the other points above look nice and simple by comparison.


We're talking about a system which involves logical decoding. Whether
you have failover via physical rep or not, doesn't have anything to do
with this point.



It is if you are insisting on using logical rep as solution for failover.




I don't think there's any realistic way we're going to get there for
logical rep in 9.6+n for n<2 unless a whole lot more people get on board
and work on it. Even then.


I think the primary problem here is that you're focusing on things that
just are not very interesting for the majority of users, and which thus
won't get very enthusastic help.  The way to make progress is to get
something basic in, and then iterate from there.  Instead you're
focussing on the fringes; which nobody cares about, because the basics
aren't there.


Craig is focusing on solving failover for logical slots which is very 
damn basic issue with logical decoding right now no matter if we have 
logical replication in core or not. I personally don't think it's ok by 
any means to not be able to continue logical decoding on failover event 
2 versions after logical decoding was introduced.


I also don't buy your argument that it's unsafe to use timeline 
following on logical decoding on replica. You can always keep master 
from moving too far ahead by other means (even if you just use dummy 
slot which is only used for this purpose, yes ugly I know). If we got 
failover slots into 9.6 it would be better but that does not look 
realistic at this point. I don't think that current design for failover 
slots is best possible - I think failover slots should be created on 
replica and send their status up to the master which would then take 
them into account when calculating oldest needed catalog xmin and lsn 
(simple way of doing that would be to add this to feedback protocol and 
let physical slot to keep the xmin/lsn as well), but that does not mean 
timeline following isn't good thing on it's own (not to mention that 
iterative development is a thing).


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench more operators & functions

2016-04-04 Thread Alvaro Herrera
Fabien COELHO wrote:

> I try to review all patches in my (small) area of (limited) expertise, which
> is currently pgbench & some part of the checkpointer. I did also minor bug
> fixes (eg isbn). AFAICS none of the patches lacking a reviewer in 9.6 CF
> fall in these area.
> 
> Also note that while I submitted patches for the checkpointer, I ended up
> reviewing your version of the patches, so somehow I was first author, then a
> driving force to provoke you to do it your way, and finally a reviewer,
> esp in performance testing which is a time consumming task.

Please note that the checkpointer patch has two open items that perhaps
you can help with --- see https://wiki.postgresql.org/wiki/Open_Items

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-04-04 Thread Peter Geoghegan
On Sun, Apr 3, 2016 at 2:09 PM, Jeff Janes  wrote:
> Also, HOT-cleanup should stop the bloat increase once the snapshot
> crosses the old_snapshot_threshold without even needing to wait until
> the next autovac runs.
>
> Does the code intentionally only work for manual vacuums?  If so, that
> seems quite surprising.  Or perhaps I am missing something else here.

What proportion of the statements in your simulated workload were
updates? Per my last mail to this thread, I'm interested in knowing if
this was a delete heavy workload.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-04-04 Thread Amit Kapila
On Mon, Apr 4, 2016 at 8:55 PM, Jesper Pedersen 
wrote:

> On 04/01/2016 04:39 PM, Andres Freund wrote:
>
>> On April 1, 2016 10:25:51 PM GMT+02:00, Jesper Pedersen <
>> jesper.peder...@redhat.com> wrote:
>>
>>> Hi,
>>>
>>> On 03/31/2016 06:21 PM, Andres Freund wrote:
>>>
 On March 31, 2016 11:13:46 PM GMT+02:00, Jesper Pedersen

>>>  wrote:
>>>

 I can do a USE_CONTENT_LOCK run on 0003 if it is something for 9.6.
>

 Yes please. I think the lock variant is realistic, the lockless did

>>> isn't.
>>>


>>> I have done a run with -M prepared on unlogged running 10min per data
>>> point, up to 300 connections. Using data + wal on HDD.
>>>
>>> I'm not seeing a difference between with and without USE_CONTENT_LOCK
>>> --
>>> all points are within +/- 0.5%.
>>>
>>> Let me know if there are other tests I can perform
>>>
>>
>> How do either compare to just 0002 applied?
>>
>>
> 0001 + 0002 compared to 0001 + 0002 + 0003 (either way) were pretty much
> the same +/- 0.5% on the HDD run.
>
>
I think the main reason why there is no significant gain shown in your
tests is that on the m/c where you are testing the contention due to
CLOGControlLock is not high enough that any reduction on the same will
help.  To me, it is visible in some of the high-end machines like which
have 4 or more sockets.  So, I think these results should be taken as an
indication that there is no regression in the tests performed by you.

Thanks for doing all the tests for these patches.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: [COMMITTERS] pgsql: Add trigonometric functions that work in degrees.

2016-04-04 Thread Tom Lane
Michael Paquier  writes:
> On Tue, Apr 5, 2016 at 10:16 AM, Peter Eisentraut  wrote:
>> On 01/22/2016 03:46 PM, Tom Lane wrote:
>>> Add trigonometric functions that work in degrees.

>> I have a host here that is having regression test failures from this commit:

> Likely an oversight not tracked by the buildfarm. What are you using here?

Indeed.  Also, I trust you're checking 00347575e or later, and not
e1bd684a3 itself?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] oversight in parallel aggregate

2016-04-04 Thread Tom Lane
Robert Haas  writes:
> One of my EDB colleagues, while in the process of refactoring some
> unrelated Advanced Server code, discovered that (1) there's no way to
> mark an aggregate as anything other than parallel-unsafe but (2) it
> doesn't matter because has_parallel_hazard ignores Aggrefs anyway.
> These mistakes cancel each other out (sorta) if all of your aggregates
> happen to be parallel-safe, but otherwise not so much.  Barring
> objections, I intend to speedily apply the attached patch to fix this.

Um ... why is it a good idea to attach a parallel-safe annotation to an
aggregate as such, rather than relying on the parallel-safe annotations
of its implementation functions?

This seems not entirely academic, since perhaps the functions are not
all marked the same; which might be sensible.  Perhaps the transition
function can be pushed down but not the final function.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default Roles (was: Additional role attributes)

2016-04-04 Thread Noah Misch
On Sun, Apr 03, 2016 at 10:27:02PM -0400, Stephen Frost wrote:
> * Fujii Masao (masao.fu...@gmail.com) wrote:
> > Currently only superusers can call pgstattuple().
> 
> I started looking into this.
> 
> If we were starting from a green field, the pg_dump dump catalog ACLs
> patch would work just fine for this case.  Simply remove the superuser
> checks and REVOKE EXECUTE from public in the script and we're done.
> 
> Unfortunately, we aren't, and that's where things get complicated.  The
> usual pg_upgrade case will, quite correctly, dump out the objects
> exactly as they exist from the 9.5-or-earlier system and restore them
> into the 9.6 system, however, the new .so will be installed and that .so
> won't have the superuser checks in it.
> 
> The only approach to addressing this which I can think of offhand would
> be to have the new .so library check the version of the extension and,
> for the 1.3 (pre-9.6) and previous versions, keep the superuser check,
> but skip it for 1.4 (9.6) and later versions.

At the C level, have a pgstattuple function and a pgstattuple_v1_4 function.
Let them differ only in that the former has a superuser check.  Binary
upgrades will use the former, and fresh CREATE EXTENSION shall use the latter.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] oversight in parallel aggregate

2016-04-04 Thread David Rowley
On 5 April 2016 at 13:09, David Rowley  wrote:
> On 5 April 2016 at 11:59, Robert Haas  wrote:
>> One of my EDB colleagues, while in the process of refactoring some
>> unrelated Advanced Server code, discovered that (1) there's no way to
>> mark an aggregate as anything other than parallel-unsafe but (2) it
>> doesn't matter because has_parallel_hazard ignores Aggrefs anyway.
>> These mistakes cancel each other out (sorta) if all of your aggregates
>> happen to be parallel-safe, but otherwise not so much.  Barring
>> objections, I intend to speedily apply the attached patch to fix this.
>
> Thanks for working on this. I should have noticed this myself...
>
> I had a quick look at this and I manged to make this happen;
>
> david=# create aggregate mysum(int) (sfunc=int4pl, combinefunc=int4pl,
> stype=int, parallel);
> server closed the connection unexpectedly
>
> I've attached a fix, which makes the code a bit more simple, and also
> inline with the other code in DefineAggregate().
>
> I think there was also a couple of missing syntax synopsis in the docs
> too. I've added those.

Another thought;

+ else if (IsA(node, Aggref))
+ {
+ Aggref   *aggref = (Aggref *) node;
+
+ if (parallel_too_dangerous(func_parallel(aggref->aggfnoid), context))
+ return true;
+ }

Does this need to check the parallel flags on the transfn or serialfn?
these'll be executed on the worker process. Possibly we also need the
combinefn/deserialfn/finalfn to be checked too as I see that we do
generate_gather_paths() from set_append_rel_pathlist().

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dealing with extension dependencies that aren't quite 'e'

2016-04-04 Thread Abhijit Menon-Sen
At 2016-04-04 18:55:03 -0300, alvhe...@2ndquadrant.com wrote:
>
> At this point I think we're missing user-level docs for the additional
> clause in each ALTER command.

Thanks for having a look. Now that you're happy with the grammar, I'll
write the remaining docs and resubmit the patch later today.

> Do you have any ideas on how this would appear in regression tests?

No specific ideas.

At a high level, I think we should install an empty extension, create
one of each kind of object, alter them to depend on the extension, check
that pg_depend has the right 'x' rows, then drop the extension and make
sure the objects have gone away.

Does that sound reasonable? Suggestions welcome.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-04-04 Thread Peter Geoghegan
On Wed, Mar 30, 2016 at 12:46 PM, Kevin Grittner  wrote:
> On Wed, Mar 30, 2016 at 2:29 PM, Peter Geoghegan  wrote:
>
>> [Does the patch allow dangling page pointers?]
>
>> Again, I don't want to prejudice anyone against your patch, which I
>> haven't read.
>
> I don't believe that the way the patch does its business opens any
> new vulnerabilities of this type.  If you see such after looking at
> the patch, let me know.

Okay, let me be more concrete about this. The patch does this:

> --- a/src/backend/access/heap/pruneheap.c
> +++ b/src/backend/access/heap/pruneheap.c
> @@ -92,12 +92,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
>  * need to use the horizon that includes slots, otherwise the data-only
>  * horizon can be used. Note that the toast relation of user defined
>  * relations are *not* considered catalog relations.
> +*
> +* It is OK to apply the old snapshot limit before acquiring the cleanup
> +* lock because the worst that can happen is that we are not quite as
> +* aggressive about the cleanup (by however many transaction IDs are
> +* consumed between this point and acquiring the lock).  This allows us to
> +* save significant overhead in the case where the page is found not to be
> +* prunable.
>  */
> if (IsCatalogRelation(relation) ||
> RelationIsAccessibleInLogicalDecoding(relation))
> OldestXmin = RecentGlobalXmin;
> else
> -   OldestXmin = RecentGlobalDataXmin;
> +   OldestXmin =
> +   TransactionIdLimitedForOldSnapshots(RecentGlobalDataXmin,
> +   relation);

This new intermediary function TransactionIdLimitedForOldSnapshots()
is called to decide what OldestXmin actually gets to be above, based
in part on the new GUC old_snapshot_threshold:

> +/*
> + * TransactionIdLimitedForOldSnapshots
> + *
> + * Apply old snapshot limit, if any.  This is intended to be called for page
> + * pruning and table vacuuming, to allow old_snapshot_threshold to override
> + * the normal global xmin value.  Actual testing for snapshot too old will be
> + * based on whether a snapshot timestamp is prior to the threshold timestamp
> + * set in this function.
> + */
> +TransactionId
> +TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
> +   Relation relation)

It might not be RecentGlobalDataXmin that is usually returned as
OldestXmin as it is today, which is exactly the point of this patch:
VACUUM can be more aggressive in cleaning up bloat, not unlike the
non-catalog logical decoding case, on the theory that we can reliably
detect when that causes failures for old snapshots, and just raise a
"snapshot too old" error. (RecentGlobalDataXmin is morally about the
same as RecentGlobalXmin, as far as this patch goes).

So far, so good. It's okay that _bt_page_recyclable() never got the
memo about any of this...:

/*
 *  _bt_page_recyclable() -- Is an existing page recyclable?
 *
 * This exists to make sure _bt_getbuf and btvacuumscan have the same
 * policy about whether a page is safe to re-use.
 */
bool
_bt_page_recyclable(Page page)
{
BTPageOpaque opaque;

   ...

/*
 * Otherwise, recycle if deleted and too old to have any processes
 * interested in it.
 */
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
if (P_ISDELETED(opaque) &&
TransactionIdPrecedes(opaque->btpo.xact, RecentGlobalXmin))
return true;
return false;
}

...because this patch does nothing to advance RecentGlobalXmin (or
RecentGlobalDataXmin) itself more aggressively. It does make
vacuum_set_xid_limits() get a more aggressive cutoff point, but we
don't see that being passed back down by lazy vacuum here; within
_bt_page_recyclable(), we rely on the more conservative
RecentGlobalXmin, which is not subject to any clever optimization in
the patch.

Fortunately, this seems correct, since index scans will always succeed
in finding a deleted page, per nbtree README notes on
RecentGlobalXmin. Unfortunately, this does stop recycling from
happening early for B-Tree pages, even though that's probably safe in
principle. This is probably not so bad -- it just needs to be
considered when reviewing this patch (the same is true of logical
decoding's RecentGlobalDataXmin; it also doesn't appear in
_bt_page_recyclable(), and I guess that that was never a problem).
Index relations will not get smaller in some important cases, but they
will be made less bloated by VACUUM in a sense that's still probably
very useful. Maybe that explains some of what Jeff talked about.

I think another part of the problems that Jeff mentioned (with
pruning) could be this existing code within heap_hot_search_buffer():

/*
 * If we can't see it, maybe no one else can either.  At caller
 * request, check whether all chain members are dead to all
 * transactions.
 */

Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-04-04 Thread Kyotaro HORIGUCHI
At Mon, 04 Apr 2016 11:23:34 -0400, Tom Lane  wrote in 
<9798.1459783...@sss.pgh.pa.us>
> Amit Langote  writes:
> > On Mon, Apr 4, 2016 at 11:24 PM, Tom Lane  wrote:
> >> A related issue, now that I've seen this example, is that altering
> >> FDW-level or server-level options won't cause a replan either.  I'm
> >> not sure there's any very good fix for that.  Surely we don't want
> >> to try to identify all tables belonging to the FDW or server and
> >> issue relcache invals on all of them.
> 
> > Hm, some kind of PlanInvalItem-based solution could work maybe?
> 
> Hm, so we'd expect that whenever an FDW consulted the options while
> making a plan, it'd have to record a plan dependency on them?  That
> would be a clean fix maybe, but I'm worried that third-party FDWs
> would fail to get the word about needing to do this.

If the "recording" means recoding oids to CachedPlanSource like
relationOids, it seems to be able to be recorded automatically by
the core, not by fdw side, we already know the server id for
foreign tables.

I'm missing something?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Correction for replication slot creation error message in 9.6

2016-04-04 Thread Peter Eisentraut

On 03/31/2016 10:18 AM, Alvaro Herrera wrote:

I thought we had agreed that we weren't going to consider the wal_level
values as a linear scale -- in other words, wordings such as "greater
than FOO" are discouraged.  That's always seemed a bit odd to me.


I don't think there was any agreement about that.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Correction for replication slot creation error message in 9.6

2016-04-04 Thread Peter Eisentraut

On 03/30/2016 09:15 PM, Ian Barwick wrote:

Currently pg_create_physical_replication_slot() may refer to
the deprecated wal_level setting "archive":


I have fixed this in the most direct way, since there was some 
disagreement about rewording.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Add trigonometric functions that work in degrees.

2016-04-04 Thread Michael Paquier
On Tue, Apr 5, 2016 at 10:16 AM, Peter Eisentraut  wrote:
> On 01/22/2016 03:46 PM, Tom Lane wrote:
>>
>> Add trigonometric functions that work in degrees.
>>
>> The implementations go to some lengths to deliver exact results for values
>> where an exact result can be expected, such as sind(30) = 0.5 exactly.
>
> I have a host here that is having regression test failures from this commit:
>
> --- src/test/regress/expected/float8.out
> +++ src/test/regress/results/float8.out
> @@ -490,9 +490,9 @@
>x   | asind | acosd | atand
>  --+---+---+---
> -1 |   -90 |   180 |   -45
> - -0.5 |   -30 |   120 |
> + -0.5 |   |   120 |
>  0 | 0 |90 | 0
> -  0.5 |30 |60 |
> +  0.5 |   |60 |
>  1 |90 | 0 |45
>  (5 rows)
>
> Any ideas?

Likely an oversight not tracked by the buildfarm. What are you using here?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Add trigonometric functions that work in degrees.

2016-04-04 Thread Peter Eisentraut

On 01/22/2016 03:46 PM, Tom Lane wrote:

Add trigonometric functions that work in degrees.

The implementations go to some lengths to deliver exact results for values
where an exact result can be expected, such as sind(30) = 0.5 exactly.


I have a host here that is having regression test failures from this commit:

--- src/test/regress/expected/float8.out
+++ src/test/regress/results/float8.out
@@ -490,9 +490,9 @@
   x   | asind | acosd | atand
 --+---+---+---
-1 |   -90 |   180 |   -45
- -0.5 |   -30 |   120 |
+ -0.5 |   |   120 |
 0 | 0 |90 | 0
-  0.5 |30 |60 |
+  0.5 |   |60 |
 1 |90 | 0 |45
 (5 rows)

Any ideas?



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] oversight in parallel aggregate

2016-04-04 Thread David Rowley
On 5 April 2016 at 11:59, Robert Haas  wrote:
> One of my EDB colleagues, while in the process of refactoring some
> unrelated Advanced Server code, discovered that (1) there's no way to
> mark an aggregate as anything other than parallel-unsafe but (2) it
> doesn't matter because has_parallel_hazard ignores Aggrefs anyway.
> These mistakes cancel each other out (sorta) if all of your aggregates
> happen to be parallel-safe, but otherwise not so much.  Barring
> objections, I intend to speedily apply the attached patch to fix this.

Thanks for working on this. I should have noticed this myself...

I had a quick look at this and I manged to make this happen;

david=# create aggregate mysum(int) (sfunc=int4pl, combinefunc=int4pl,
stype=int, parallel);
server closed the connection unexpectedly

I've attached a fix, which makes the code a bit more simple, and also
inline with the other code in DefineAggregate().

I think there was also a couple of missing syntax synopsis in the docs
too. I've added those.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


create-aggregate-parallel_david.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Yet another small patch - reorderbuffer.c:1099

2016-04-04 Thread Michael Paquier
On Tue, Apr 5, 2016 at 1:03 AM, Aleksander Alekseev
 wrote:
> There is weird peace of code in reorderbuffer.c:
>
> ```
> /* delete from list of known subxacts */
> if (txn->is_known_as_subxact)
> {
> /* NB: nsubxacts count of parent will be too high now */
> dlist_delete(>node);
> }
> /* delete from LSN ordered list of toplevel TXNs */
> else
> {
> dlist_delete(>node);
> }
> ```
>
> As you see `if` an `else` branches are exactly the same. I wonder
> whether this is a bug or code just requires a bit of cleaning. In the
> latter case here is a patch.
>
> According to `git log` both branches were introduced in one commit
> b89e1510. I added author and committer of this code to CC since they
> have much better understanding of it than I do.

I recall discussing this code with Andres, and I think that he has
mentioned me this is intentional, because should things be changed for
a reason or another in the future, we want to keep in mind that a list
of TXIDs and a list of sub-TXIDs should be handled differently.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PATCH: pg_restore parallel-execution-deadlock issue

2016-04-04 Thread Armin Schöffmann
worthy hackers,
I propose the below patches to parallels.c and pg_backup_utils.c fixing 
deadlocks in pg_restore (windows only) if running more than 2 parallel jobs.
This problem was reported by me earlier this year.
http://www.postgresql.org/message-id/20160307161619.25731.78...@wrigleys.postgresql.org

- Winsock's "recv(...)" called in piperead() is a blocking read by default, 
therefor, signalizing termEvent as used in ShutdownWorkersHard() is not enough 
to make worker-threads go away.
We need a preceding shutdown(pipeWrite, SD_BOTH), first, to abort blocking IO 
in this case.
Otherwise, the main-thread will wait forever, if more than one additional 
worker is active (e.g. option -j3) and a premature EOF occurs in the input-file.

Findings in pg_backup_utils.c/ parallels.c, which could impact other tools, too:

- threads created with _beginthreadex need to be exited by either a "return 
exitcode"  or "_endthreadex(exitcode)". It might be obsolete in 
fire-and-forget-scenarios, but it matters in other cases.
As of current, pg_backup_utils uses EndThread to retire additional 
worker-threads., which are spawned by _beginthreadex in parallel.c. The 
corresponding call for ExitThread would be CreateThread,
nevertheless, _beginthreadex is the correct choice here, as we do call-out into 
CRT and need to retain the thread-handle for after-death synchronization with 
the main-thread.
The thread-handle needs to be closed explicitly.

If this is not the correct place to discuss patches, I'd be glad if somebody 
can notify the tool's maintainer, to take a look into it.

best regards,
Armin Schöffmann.


--
Aegaeon technologies GmbH
phone: +49.941.8107344
fax:   +49.941.8107356

Legal disclaimer:
http://aegaeon.de/disclaimer/email_all_int.txt


parallel.c

@@ -350,7 +350,8 @@ static void
 ShutdownWorkersHard(ParallelState *pstate)
 {
-#ifndef WIN32
+
int i;
 
+#ifndef WIN32
signal(SIGPIPE, SIG_IGN);
 
@@ -367,4 +368,7 @@ ShutdownWorkersHard(ParallelState *pstate)
/* The workers monitor this event via checkAborting(). */
SetEvent(termEvent);
+
+   for (i = 0; i < pstate->numWorkers; i++)
+   shutdown(pstate->parallelSlot[i].pipeWrite, SD_BOTH);
 #endif
 
@@ -408,5 +412,8 @@ WaitForTerminatingWorkers(ParallelState *pstate)
for (j = 0; j < pstate->numWorkers; j++)
if (pstate->parallelSlot[j].hThread == hThread)
+   {
slot = &(pstate->parallelSlot[j]);
+   CloseHandle(hThread);
+   }
 
free(lpHandles);



pg_backup_utils.c

@@ -120,5 +120,5 @@ exit_nicely(int code)
 #ifdef WIN32
if (parallel_init_done && GetCurrentThreadId() != mainThreadId)
-   ExitThread(code);
+   _endthreadex(code);
 #endif







parallel.c.diff
Description: Binary data


pg_backup_utils.c.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] show xl_prev in xlog.c errcontext

2016-04-04 Thread Michael Paquier
On Tue, Apr 5, 2016 at 6:14 AM, Alvaro Herrera  wrote:
> Michael Paquier wrote:
>> And both?
>
> I couldn't see much point of doing this, so I didn't.  If you have a
> rationale for it, let's hear it.

Now that I think on it, it does not actually matter to print both.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v12] GSSAPI encryption support

2016-04-04 Thread Robbie Harwood
Hello friends,

Here's v12, both here and on my github:
https://github.com/frozencemetery/postgres/tree/feature/gssencrypt12

What changed:

- The code is aware of memory contexts now.  I actually really like the
  memory context stuff; just didn't see any indication of its existence
  in the code I had read.  Anyway, we allocate server buffers in the
  connection-lifetime context.  The other alternative that we discussed
  on IRC a bit was to avoid palloc()/pfree() entirely in favor of raw
  calloc()/free(), but I think if possible I prefer this approach since
  I find the StringInfo handy to work with.  This eliminates the
  traceback for me with --enable-cassert.

- Error cleanup.  I've been looking very hard at this code in order to
  try to fix the assert, and I've fixed up a couple error paths that
  hadn't been taken.  This involves replacing the double-send with a
  buffer-and-then-send, which turns out to be not only shorter but
  easier for me to reason about.

Thanks!
From 945805d45e8021f92ad73518b3a74ac6bab89525 Mon Sep 17 00:00:00 2001
From: Robbie Harwood 
Date: Fri, 26 Feb 2016 16:07:05 -0500
Subject: [PATCH 1/3] Move common GSSAPI code into its own files

On both the frontend and backend, prepare for GSSAPI encryption suport
by moving common code for error handling into a common file.  Other than
build-system changes, no code changes occur in this patch.

Thanks to Michael Paquier for the Windows fixes.
---
 configure   |  2 +
 configure.in|  1 +
 src/Makefile.global.in  |  1 +
 src/backend/libpq/Makefile  |  4 ++
 src/backend/libpq/auth.c| 63 +---
 src/backend/libpq/be-gssapi-common.c| 74 +
 src/include/libpq/be-gssapi-common.h| 26 
 src/interfaces/libpq/Makefile   |  4 ++
 src/interfaces/libpq/fe-auth.c  | 48 +
 src/interfaces/libpq/fe-gssapi-common.c | 63 
 src/interfaces/libpq/fe-gssapi-common.h | 21 ++
 src/tools/msvc/Mkvcbuild.pm | 18 ++--
 12 files changed, 212 insertions(+), 113 deletions(-)
 create mode 100644 src/backend/libpq/be-gssapi-common.c
 create mode 100644 src/include/libpq/be-gssapi-common.h
 create mode 100644 src/interfaces/libpq/fe-gssapi-common.c
 create mode 100644 src/interfaces/libpq/fe-gssapi-common.h

diff --git a/configure b/configure
index b3f3abe..a5bd629 100755
--- a/configure
+++ b/configure
@@ -713,6 +713,7 @@ with_systemd
 with_selinux
 with_openssl
 krb_srvtab
+with_gssapi
 with_python
 with_perl
 with_tcl
@@ -5491,6 +5492,7 @@ $as_echo "$with_gssapi" >&6; }
 
 
 
+
 #
 # Kerberos configuration parameters
 #
diff --git a/configure.in b/configure.in
index 0bd90d7..4fd8f05 100644
--- a/configure.in
+++ b/configure.in
@@ -636,6 +636,7 @@ PGAC_ARG_BOOL(with, gssapi, no, [build with GSSAPI support],
   krb_srvtab="FILE:\$(sysconfdir)/krb5.keytab"
 ])
 AC_MSG_RESULT([$with_gssapi])
+AC_SUBST(with_gssapi)
 
 
 AC_SUBST(krb_srvtab)
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index e94d6a5..3dbc5c2 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -183,6 +183,7 @@ with_perl	= @with_perl@
 with_python	= @with_python@
 with_tcl	= @with_tcl@
 with_openssl	= @with_openssl@
+with_gssapi	= @with_gssapi@
 with_selinux	= @with_selinux@
 with_systemd	= @with_systemd@
 with_libxml	= @with_libxml@
diff --git a/src/backend/libpq/Makefile b/src/backend/libpq/Makefile
index 09410c4..a8cc9a0 100644
--- a/src/backend/libpq/Makefile
+++ b/src/backend/libpq/Makefile
@@ -21,4 +21,8 @@ ifeq ($(with_openssl),yes)
 OBJS += be-secure-openssl.o
 endif
 
+ifeq ($(with_gssapi),yes)
+OBJS += be-gssapi-common.o
+endif
+
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 57c2f48..73d493e 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -136,11 +136,7 @@ bool		pg_krb_caseins_users;
  *
  */
 #ifdef ENABLE_GSS
-#if defined(HAVE_GSSAPI_H)
-#include 
-#else
-#include 
-#endif
+#include "libpq/be-gssapi-common.h"
 
 static int	pg_GSS_recvauth(Port *port);
 #endif   /* ENABLE_GSS */
@@ -715,63 +711,6 @@ recv_and_check_password_packet(Port *port, char **logdetail)
  */
 #ifdef ENABLE_GSS
 
-#if defined(WIN32) && !defined(WIN32_ONLY_COMPILER)
-/*
- * MIT Kerberos GSSAPI DLL doesn't properly export the symbols for MingW
- * that contain the OIDs required. Redefine here, values copied
- * from src/athena/auth/krb5/src/lib/gssapi/generic/gssapi_generic.c
- */
-static const gss_OID_desc GSS_C_NT_USER_NAME_desc =
-{10, (void *) "\x2a\x86\x48\x86\xf7\x12\x01\x02\x01\x02"};
-static GSS_DLLIMP gss_OID GSS_C_NT_USER_NAME = _C_NT_USER_NAME_desc;
-#endif
-
-
-static void
-pg_GSS_error(int severity, char *errmsg, OM_uint32 maj_stat, OM_uint32 

[HACKERS] oversight in parallel aggregate

2016-04-04 Thread Robert Haas
One of my EDB colleagues, while in the process of refactoring some
unrelated Advanced Server code, discovered that (1) there's no way to
mark an aggregate as anything other than parallel-unsafe but (2) it
doesn't matter because has_parallel_hazard ignores Aggrefs anyway.
These mistakes cancel each other out (sorta) if all of your aggregates
happen to be parallel-safe, but otherwise not so much.  Barring
objections, I intend to speedily apply the attached patch to fix this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


create-aggregate-parallel.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-04 Thread Peter Geoghegan
On Mon, Mar 21, 2016 at 9:53 AM, Anastasia Lubennikova
 wrote:
> Thanks to David,
> I've missed these letters at first.
> I'll answer here.

Sorry about using the wrong thread.

> I agree that _bt_insertonpg() is not right for truncation.

Cool.

> It's a bit more complicated to add it into index creation algorithm.
> There's a trick with a "high key".
> /*
>  * We copy the last item on the page into the new page, and then
>  * rearrange the old page so that the 'last item' becomes its high
> key
>  * rather than a true data item.  There had better be at least two
>  * items on the page already, else the page would be empty of useful
>  * data.
>  */
> /*
>  * Move 'last' into the high key position on opage
>  */
>
> To be consistent with other steps of algorithm ( all high keys must be
> truncated tuples), I had to update this high key on place:
> delete the old one, and insert truncated high key.

Hmm. But the high key comparing equal to the Scankey gives insertion
the choice of where to put its IndexTuple (it can go on the page with
the high key, or its right-sibling, according only to considerations
about fillfactor, etc). Is this changed? Does it not matter? Why not?
Is it just worth it?

The right-most page on every level has no high-key. But you say those
pages have an "imaginary" *positive* infinity high key, just as
internal pages have (non-imaginary) minus infinity downlinks as their
first item/downlink. So tuples in a (say) leaf page are always bound
by the downlink lower bound in parent, while their own high key is an
upper bound. Either (and, rarely, both) could be (positive or
negative) infinity.

Maybe you now see why I talked about special _bt_compare() logic for
this. I proposed special logic that is similar to the existing minus
infinity thing _bt_compare() does (although _bt_binsrch(), an
important caller of _bt_compare() also does special things for
internal .vs leaf case, so I'm not sure any new special logic must go
in _bt_compare()).

> It is a very important issue. But I don't think it's a bug there.
> I've read amcheck sources thoroughly and found that the problem appears at
> "invariant_key_less_than_equal_nontarget_offset()

> It uses scankey, made with _bt_mkscankey() which uses only key attributes,
> but calls _bt_compare with wrong keysz.
> If we wiil use nkeyatts = state->rel->rd_index->relnatts; instead of natts,
> all the checks would be passed successfully.

I probably shouldn't have brought amcheck into that particular
discussion. I thought amcheck might be a useful way to frame the
discussion, because amcheck always cares about specific invariants,
and notes a few special cases.

> In my view, it's the correct way to fix this problem, because the caller is
> responsible for passing proper arguments to the function.
> Of course I will add a check into bt_compare, but I'd rather make it an
> assertion (see the patch attached).

I see what you mean, but I think we need to decide what to do about
the key space when leaf high keys are truncated. I do think that
truncating the high key was the right idea, though, and it nicely
illustrates that nothing special should happen in upper levels. Suffix
truncation should only happen when leaf pages are split, generally
speaking.

As I said, the high key is very similar to the downlinks, in that both
bound the things that go on each page. Together with downlinks they
represent *discrete* ranges *unambiguously*, so INCLUDING truncation
needs to make it clear which page new items go on. As I said,
_bt_binsrch() already takes special actions for internal pages, making
sure to return the item that is < the scankey, not <= the scankey
which is only allowed for leaf pages. (See README, from "Lehman and
Yao assume that the key range for a subtree S is described by Ki < v
<= Ki+1 where Ki and Ki+1 are the adjacent keys in the parent
page...").

To give a specific example, I worry about the case where two sibling
downlinks in a parent page are distinct, but per specific-to-Postgres
"Ki <= v <= Ki+1" thing (which differs from the classic L
invariant), some tuples with all right downlink's attributes matching
end up in left child page, not right child page. I worry that since
_bt_findsplitloc() doesn't consider this (for example), the split
point doesn't *reliably* and unambiguously divide the key space
between the new halves of a page being split. I think the "Ki <= v <=
Ki+1"/_bt_binsrch() thing might save you in common cases where all
downlink attributes are distinct, so maybe that simpler case is okay.
But to be even more specific, what about the more complicated case
where the downlinks *are* fully _bt_compare()-wise equal? This could
happen even though they're constrained to be unique in leaf pages, due
to bloat. Unique indexes aren't special here; they just make it far
less likely that this would happen in practice, because it 

Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-04 Thread Tom Lane
Alex Shulgin  writes:
> On Mon, Apr 4, 2016 at 1:06 AM, Tom Lane  wrote:
>> I'm inclined to
>> revert the aspect of 3d3bf62f3 that made us work from "d" (the observed
>> number of distinct values in the sample) rather than stadistinct (the
>> extrapolated estimate for the table).  On reflection I think that that's
>> inconsistent with the theory behind the old MCV-cutoff rule.  It wouldn't
>> matter if we were going to replace the cutoff rule with something else,
>> but it's beginning to sound like that won't happen for 9.6.

> Please feel free to do what you think is in the best interest of the people
> preparing 9.6 for the freeze.  I'm not all that familiar with the process,
> but I guess reverting this early might save some head-scratching if some
> interesting interactions of this change combined with some others are going
> to show up.

I've reverted that bit; so we still have the improvements associated with
ignoring nulls, but nothing else at the moment.  I'll set this commitfest
item back to Waiting on Author, just in case you are able to make some
more progress before the end of the week.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] So, can we stop supporting Windows native now?

2016-04-04 Thread Robert Haas
On Wed, Mar 30, 2016 at 7:49 PM, Josh berkus  wrote:
> http://www.zdnet.com/article/microsoft-and-canonical-partner-to-bring-ubuntu-to-windows-10/
>
> ... could be good news for us ...

I read a saying someplace that if you see a news headline that ends in
a question mark, the answer is always "no".

So here.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [CommitFest App] Feature request -- review e-mail additions

2016-04-04 Thread Robert Haas
On Wed, Mar 30, 2016 at 1:47 PM, Alvaro Herrera
 wrote:
> José Luis Tallón wrote:
>> Just wanted to suggest two minor mods to the review e-mails
>> auto-generated by the app:
>>
>> * Prepend a [review] tag to the e-mail subject
>> ... so that e-mails sent to -hackers will read  " [HACKERS] [review]
>> "
>
> Changing the subject of an email causes Gmail to break the threads, so
> anything in that line should be discouraged.  -1 from me.  I would be
> happier if the subject of the submission email is kept intact, i.e. not
> use the patch title that was used in commitfest app but use the one in
> the email.  These often differ.

I entirely agree with Alvaro.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dealing with extension dependencies that aren't quite 'e'

2016-04-04 Thread Alvaro Herrera
Abhijit Menon-Sen wrote:
> At 2016-03-29 10:15:51 -0400, da...@pgmasters.net wrote:
> >
> > Either way it looks like you need to post a patch with more
> > documentation - do you know when you'll have that ready?
> 
> Here it is.
> 
> (I was actually looking for other potential callers, but I couldn't find
> any. There are some places that take a RangeVar and make a list from it,
> but they are creating new nodes, and don't quite fit. So the only change
> in this patch is to add a comment above the get_object_address_rv
> function.)

I gave this another look.  To test whether the grammar is good, I tried
a few additional object cases.  They all seem to work fine, provided
that we use the same production for the object name as in the
corresponding ALTER  case for the object.  The attached is simply
an extension with those new grammar rules -- I didn't go beyond ensuring
the new productions didn't cause any conflicts.  (I also removed the new
includes in alter.c, which are no longer necessary AFAICS.)

At this point I think we're missing user-level docs for the additional
clause in each ALTER command.  I think it's a fiddly business, because
each individual ALTER page is going to need to be touched for the new
clause, but that can't be avoided.

Do you have any ideas on how this would appear in regression tests?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index bb75229..3c128a0 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2888,6 +2888,19 @@
   
  
 
+
+
+ DEPENDENCY_AUTO_EXTENSION (x)
+ 
+  
+   The dependent object is not a member of the extension that is the
+   referenced object (and so should not be ignored by pg_dump), but
+   cannot function without it and should be dropped when the
+   extension itself is. The dependent object may be dropped on its
+   own as well.
+  
+ 
+

 
Other dependency flavors might be needed in future.
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 17f9de1..79595a9 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -589,6 +589,7 @@ findDependentObjects(const ObjectAddress *object,
 		{
 			case DEPENDENCY_NORMAL:
 			case DEPENDENCY_AUTO:
+			case DEPENDENCY_AUTO_EXTENSION:
 /* no problem */
 break;
 			case DEPENDENCY_INTERNAL:
@@ -788,6 +789,7 @@ findDependentObjects(const ObjectAddress *object,
 subflags = DEPFLAG_NORMAL;
 break;
 			case DEPENDENCY_AUTO:
+			case DEPENDENCY_AUTO_EXTENSION:
 subflags = DEPFLAG_AUTO;
 break;
 			case DEPENDENCY_INTERNAL:
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index cb3ba85..ad65d2d 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -1016,6 +1016,31 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
 }
 
 /*
+ * Return an ObjectAddress based on a RangeVar and an object name. The
+ * name of the relation identified by the RangeVar is prepended to the
+ * list passed in as objname. This is useful to find the ObjectAddress
+ * of objects that depend on a relation. All other considerations are
+ * exactly as for get_object_address above.
+ */
+ObjectAddress
+get_object_address_rv(ObjectType objtype, RangeVar *rel, List *objname,
+	  List *objargs, Relation *relp, LOCKMODE lockmode,
+	  bool missing_ok)
+{
+	if (rel)
+	{
+		objname = lcons(makeString(rel->relname), objname);
+		if (rel->schemaname)
+			objname = lcons(makeString(rel->schemaname), objname);
+		if (rel->catalogname)
+			objname = lcons(makeString(rel->catalogname), objname);
+	}
+
+	return get_object_address(objtype, objname, objargs,
+			  relp, lockmode, missing_ok);
+}
+
+/*
  * Find an ObjectAddress for a type of object that is identified by an
  * unqualified name.
  */
diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index 5af0f2f..4c64700 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -391,6 +391,32 @@ ExecRenameStmt(RenameStmt *stmt)
 }
 
 /*
+ * Executes an ALTER OBJECT / DEPENDS ON EXTENSION statement.
+ *
+ * Return value is the address of the altered object.
+ */
+ObjectAddress
+ExecAlterObjectDependsStmt(AlterObjectDependsStmt *stmt)
+{
+	ObjectAddress address;
+	ObjectAddress extAddr;
+	Relation	rel = NULL;
+
+	address =
+		get_object_address_rv(stmt->objectType, stmt->relation, stmt->objname,
+			  stmt->objargs, , AccessExclusiveLock, false);
+	if (rel)
+		heap_close(rel, NoLock);
+
+	extAddr = get_object_address(OBJECT_EXTENSION, stmt->extname, NULL,
+ , AccessExclusiveLock, false);
+
+	recordDependencyOn(, , DEPENDENCY_AUTO_EXTENSION);
+
+	return address;
+}
+
+/*
  * Executes an ALTER OBJECT / SET SCHEMA statement.  

Re: [HACKERS] Timeline following for logical slots

2016-04-04 Thread Petr Jelinek

On 04/04/16 22:00, Robert Haas wrote:

On Mon, Apr 4, 2016 at 10:59 AM, Craig Ringer  wrote:

To allow logical rep and failover to be a reasonable substitute for physical
rep and failover IMO *need*:

* Robust sequence decoding and replication. If you were following the later
parts of that discussion you will've seen how fun that's going to be, but
it's the simplest of all of the problems.

* Logical decoding and sending of in-progress xacts, so the logical client
can already be most of the way through receiving a big xact when it commits.
Without this we have a huge lag spike whenever a big xact happens, since we
must first finish decoding it in to a reorder buffer and can only then
*begin* to send it to the client. During which time no later xacts may be
decoded or replayed to the client. If you're running that rare thing, the
perfect pure OLTP system, you won't care... but good luck finding one in the
real world.

* Either parallel apply on the client side or at least buffering of
in-progress xacts on the client side so they can be safely flushed to disk
and confirmed, allowing receive to continue while replay is done on the
client. Otherwise sync rep is completely impractical... and there's no
shortage of systems out there that can't afford to lose any transactions. Or
at least have some crucial transactions they can't lose.

* Robust, seamless DDL replication, so things don't just break randomly.
This makes the other points above look nice and simple by comparison.
Logical decoding of 2PC xacts with DDL would help here, as would the ability
to transparently convert an xact into a prepare-xact on client commit and
hold the client waiting while we replicate it, confirm the successful
prepare on the replica, then commit prepared on the upstream.

* oh, and some way to handle replication of shared catalog changes like
pg_authid, so the above DDL replication doesn't just randomly break if it
happens to refer to a global object that doesn't exist on the downstream.


In general, I think we'd be a lot better off if we got some kind of
logical replication into core first and then worked on lifting these
types of limitations afterwards.  If I had to pick an order in which
to do the things you list, I'd focus first on the one you list second:
being able to stream and begin applying transactions before they've
committed is a really big deal for large transactions, and lots of
people have some large transactions.  DDL replication is nice, but
realistically, there are a lot of people who simply don't change their
schema all that often, and who could (and might even prefer to) manage
that process in other ways - e.g. change nodes one by one while they
are off-line, then bring them on-line.

I don't necessarily disagree with your statement that we'd need all of
this stuff to make logical replication a substitute for physical
replication as far as failover is concerned.  But I don't think that's
the most important goal, and even to the extent that it is the goal, I
don't think we need to meet every need before we can consider
ourselves to have met some needs.  I don't think that we need every
physical replication feature plus some before logical replication can
start to be useful to PostgreSQL users generally.  We do, however,
need the functionality to be accessible to people who are using only
the PostgreSQL core distribution.  The thing that is going to get
people excited about making logical replication better is getting to a
point where they can use it at all - and that is not going to be true
as long as you can't use it without having to download something from
an external website.



I agree with you and I think Craig does as well. IMHO he merely pointed 
this out to show that we at the moment need physical replication as HA 
solution for logical decoding because currently there is no way to have 
reasonable continuation of service when the original master dies. And 
waiting until we have logical that's on par with physical for failover 
will take long time as it needs the above.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] show xl_prev in xlog.c errcontext

2016-04-04 Thread Alvaro Herrera
Michael Paquier wrote:
> On Thu, Apr 16, 2015 at 3:25 PM, Heikki Linnakangas  wrote:
> > On 04/15/2015 11:35 PM, Alvaro Herrera wrote:
> >>
> >> I found this patch in my local repo that I wrote some weeks or months
> >> ago while debugging some XLog corruption problem: it was difficult to
> >> pinpoint what XLog record in a long sequence of WAL files was causing a
> >> problem, and the displaying the prev pointer in errcontext made finding
> >> it much easier -- I could correlate it with pg_xlogdump output, I think.
> >
> > Seems like a good idea, but why print the prev pointer, and not the location
> > of the record itself?

Makes sense -- pushed that way.

> And both?

I couldn't see much point of doing this, so I didn't.  If you have a
rationale for it, let's hear it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Failover Slots

2016-04-04 Thread Oleksii Kliukin
Hi,

> On 17 Mar 2016, at 09:34, Craig Ringer  wrote:
> 
> OK, here's the latest failover slots patch, rebased on top of today's master 
> plus, in order:
> 
> - Dirty replication slots when confirm_lsn is changed
>   
> (http://www.postgresql.org/message-id/camsr+yhj0oycug2zbyqprhxmcjnkt9d57msxdzgwbkcvx3+...@mail.gmail.com
>  
> )
> 
> - logical decoding timeline following
>   
> (http://www.postgresql.org/message-id/CAMsr+YH-C1-X_+s=2nzapnr0wwqja-rumvhsyyzansn93mu...@mail.gmail.com
>  
> )
> 
> The full tree is at 
> https://github.com/2ndQuadrant/postgres/tree/dev/failover-slots 
>  if you want 
> to avoid the fiddling around required to apply the patch series.
> 
> 
> <0001-Allow-replication-slots-to-follow-failover.patch><0002-Update-decoding_failover-tests-for-failover-slots.patch><0003-Retain-extra-WAL-for-failover-slots-in-base-backups.patch><0004-Add-the-UI-and-for-failover-slots.patch><0005-Document-failover-slots.patch><0006-Add-failover-to-pg_replication_slots.patch><0007-Introduce-TAP-recovery-tests-for-failover-slots.patch>



Thank you for the update. I’ve got some rejects when applying the 
0001-Allow-replication-slots-to-follow-failover.patch after the "Dirty 
replication slots when confirm_lsn is changed” changes. I think it should be 
rebased against the master, (might be the consequence of the "logical slots 
follow the timeline” patch committed).

patch -p1 
<~/git/pg/patches/failover-slots/v6/0001-Allow-replication-slots-to-follow-failover.patch
patching file src/backend/access/rmgrdesc/Makefile
Hunk #1 FAILED at 10.
1 out of 1 hunk FAILED -- saving rejects to file 
src/backend/access/rmgrdesc/Makefile.rej
patching file src/backend/access/rmgrdesc/replslotdesc.c
patching file src/backend/access/transam/rmgr.c
Hunk #1 succeeded at 25 (offset 1 line).
patching file src/backend/access/transam/xlog.c
Hunk #1 succeeded at 6351 (offset 3 lines).
Hunk #2 succeeded at 8199 (offset 14 lines).
Hunk #3 succeeded at 8645 (offset 14 lines).
Hunk #4 succeeded at 8718 (offset 14 lines).
patching file src/backend/commands/dbcommands.c
patching file src/backend/replication/basebackup.c
patching file src/backend/replication/logical/decode.c
Hunk #1 FAILED at 143.
1 out of 1 hunk FAILED -- saving rejects to file 
src/backend/replication/logical/decode.c.rej
patching file src/backend/replication/logical/logical.c
patching file src/backend/replication/slot.c
patching file src/backend/replication/slotfuncs.c
patching file src/backend/replication/walsender.c
patching file src/bin/pg_xlogdump/replslotdesc.c
patching file src/bin/pg_xlogdump/rmgrdesc.c
Hunk #1 succeeded at 27 (offset 1 line).
patching file src/include/access/rmgrlist.h
Hunk #1 FAILED at 45.
1 out of 1 hunk FAILED -- saving rejects to file 
src/include/access/rmgrlist.h.rej
patching file src/include/replication/slot.h
patching file src/include/replication/slot_xlog.h
can't find file to patch at input line 1469
Perhaps you used the wrong -p or --strip option?



--
Oleksii



Re: [HACKERS] [PATCH v11] GSSAPI encryption support

2016-04-04 Thread Robbie Harwood
Michael Paquier  writes:

> On Sat, Apr 2, 2016 at 7:34 AM, Robbie Harwood  wrote:
>
>>   Since I still can't reproduce this locally (left a client machine and
>>   a process on the same machine retrying for over an hour on your test
>>   case and didn't see it), could you provide me with some more
>>   information on why repalloc is complaining?
>>   Is this a low memory situation where alloc might have failed?
>
> No, this is an assertion failure, and it seems that you are compiling
> this code without --enable-cassert, without the switch the code
> actually works.

You are right.  I now see the assertion failure.

>>   That pointer looks like it's on the heap, is that correct?
>
> appendBinaryStringInfo depends on palloc calls that allocate memory
> depending on the memory context used. It looks that what's just
> missing in your logic is a private memory context that be_gssapi_write
> and be_gssapi_read can use to handle the allocation of the
> communication buffers.

Thank you very much for the pointer!  I will work in memory context
management for the next version.


signature.asc
Description: PGP signature


Re: [HACKERS] Timeline following for logical slots

2016-04-04 Thread Robert Haas
On Mon, Apr 4, 2016 at 10:59 AM, Craig Ringer  wrote:
> To allow logical rep and failover to be a reasonable substitute for physical
> rep and failover IMO *need*:
>
> * Robust sequence decoding and replication. If you were following the later
> parts of that discussion you will've seen how fun that's going to be, but
> it's the simplest of all of the problems.
>
> * Logical decoding and sending of in-progress xacts, so the logical client
> can already be most of the way through receiving a big xact when it commits.
> Without this we have a huge lag spike whenever a big xact happens, since we
> must first finish decoding it in to a reorder buffer and can only then
> *begin* to send it to the client. During which time no later xacts may be
> decoded or replayed to the client. If you're running that rare thing, the
> perfect pure OLTP system, you won't care... but good luck finding one in the
> real world.
>
> * Either parallel apply on the client side or at least buffering of
> in-progress xacts on the client side so they can be safely flushed to disk
> and confirmed, allowing receive to continue while replay is done on the
> client. Otherwise sync rep is completely impractical... and there's no
> shortage of systems out there that can't afford to lose any transactions. Or
> at least have some crucial transactions they can't lose.
>
> * Robust, seamless DDL replication, so things don't just break randomly.
> This makes the other points above look nice and simple by comparison.
> Logical decoding of 2PC xacts with DDL would help here, as would the ability
> to transparently convert an xact into a prepare-xact on client commit and
> hold the client waiting while we replicate it, confirm the successful
> prepare on the replica, then commit prepared on the upstream.
>
> * oh, and some way to handle replication of shared catalog changes like
> pg_authid, so the above DDL replication doesn't just randomly break if it
> happens to refer to a global object that doesn't exist on the downstream.

In general, I think we'd be a lot better off if we got some kind of
logical replication into core first and then worked on lifting these
types of limitations afterwards.  If I had to pick an order in which
to do the things you list, I'd focus first on the one you list second:
being able to stream and begin applying transactions before they've
committed is a really big deal for large transactions, and lots of
people have some large transactions.  DDL replication is nice, but
realistically, there are a lot of people who simply don't change their
schema all that often, and who could (and might even prefer to) manage
that process in other ways - e.g. change nodes one by one while they
are off-line, then bring them on-line.

I don't necessarily disagree with your statement that we'd need all of
this stuff to make logical replication a substitute for physical
replication as far as failover is concerned.  But I don't think that's
the most important goal, and even to the extent that it is the goal, I
don't think we need to meet every need before we can consider
ourselves to have met some needs.  I don't think that we need every
physical replication feature plus some before logical replication can
start to be useful to PostgreSQL users generally.  We do, however,
need the functionality to be accessible to people who are using only
the PostgreSQL core distribution.  The thing that is going to get
people excited about making logical replication better is getting to a
point where they can use it at all - and that is not going to be true
as long as you can't use it without having to download something from
an external website.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql metaqueries with \gexec

2016-04-04 Thread Corey Huinker
On Mon, Apr 4, 2016 at 3:31 PM, Tom Lane  wrote:

> Corey Huinker  writes:
> > Patch attached. Changes are thus:
> > - rebased
> > - pset.gexec_flag unconditionally set to false at end of SendQuery
> > - wording of documentation describing execution order of results
> > - rebasing allowed for undoing the re-wrap of enumerated slash commands.
>
> I whacked this around some and committed it.  The main thing that was
>

Hooray!


> broken is that it didn't work nicely at all if you'd set FETCH_COUNT.
>

Mmm, yeah, I hadn't considered cursor fetches, but in the use cases (at
least the ones I can imagine for this) you wouldn't want fetches.


> I experimented with different approaches to that, and ultimately decided
> that the best answer is to disable use of ExecQueryUsingCursor for the
> \gexec master query.  We can still let it be used for the individual
> generated queries, though.
>

Fine by me.


>
> I didn't much like the regression test setup, either.  Tests that
> have to be at the end of their test files aren't very nice, unless
> you give them their very own test file, which checking ON_ERROR_STOP
> didn't seem worth.  To me it's far more important that the code
> respond to cancel_pressed (which, ahem, it wasn't) and we have no
> mechanism for testing that in a pg_regress script.  So I just dropped
> that aspect of it and put the test in a more logical place in the file.
>

I think it was Jim that added the ON_ERROR_STOP check. I wasn't sure how to
properly test that.

Thanks for finding (and fixing) the cancel_pressed issue.


Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-04 Thread Tom Lane
Peter Geoghegan  writes:
> On Mon, Apr 4, 2016 at 7:14 AM, Teodor Sigaev  wrote:
>> Are there any objectins on it? I'm planning to look closely today or
>> tommorrow and commit it.

> I object to committing the patch in that time frame. I'm looking at it again.

Since it's a rather complex patch, pushing it in advance of the reviewers
signing off on it doesn't seem like a great idea ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql metaqueries with \gexec

2016-04-04 Thread Tom Lane
Corey Huinker  writes:
> Patch attached. Changes are thus:
> - rebased
> - pset.gexec_flag unconditionally set to false at end of SendQuery
> - wording of documentation describing execution order of results
> - rebasing allowed for undoing the re-wrap of enumerated slash commands.

I whacked this around some and committed it.  The main thing that was
broken is that it didn't work nicely at all if you'd set FETCH_COUNT.
I experimented with different approaches to that, and ultimately decided
that the best answer is to disable use of ExecQueryUsingCursor for the
\gexec master query.  We can still let it be used for the individual
generated queries, though.

I didn't much like the regression test setup, either.  Tests that
have to be at the end of their test files aren't very nice, unless
you give them their very own test file, which checking ON_ERROR_STOP
didn't seem worth.  To me it's far more important that the code
respond to cancel_pressed (which, ahem, it wasn't) and we have no
mechanism for testing that in a pg_regress script.  So I just dropped
that aspect of it and put the test in a more logical place in the file.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-04 Thread Peter Geoghegan
On Mon, Apr 4, 2016 at 7:14 AM, Teodor Sigaev  wrote:
> Are there any objectins on it? I'm planning to look closely today or
> tommorrow and commit it.

I object to committing the patch in that time frame. I'm looking at it again.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Choosing parallel_degree

2016-04-04 Thread Julien Rouhaud
On 04/04/2016 17:03, Julien Rouhaud wrote:
> On 04/04/2016 08:55, Amit Kapila wrote:
> 
> Thanks for the review!
> 
>> Few comments:
>> 1.
>> +  limited according to the 
>>
>> A. typo.
>>/gux-max-parallel-degree/guc-max-parallel-degree
>>/worker/workers
> 
> Oops, fixed.
> 

And I managed to no fix it, sorry :/ Thanks to Andreas who warned me.


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org
diff --git a/doc/src/sgml/ref/create_table.sgml 
b/doc/src/sgml/ref/create_table.sgml
index cd234db..0eab2be 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -909,6 +909,17 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | 
UNLOGGED ] TABLE [ IF NOT EXI

 

+parallel_degree (integer)
+
+ 
+ Sets the degree of parallelism for an individual relation.  The requested
+ number of workers will be limited by .
+ 
+
+   
+
+   
 autovacuum_enabled, 
toast.autovacuum_enabled (boolean)
 
  
diff --git a/src/backend/access/common/reloptions.c 
b/src/backend/access/common/reloptions.c
index ea0755a..8e4aa80 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -26,6 +26,7 @@
 #include "commands/tablespace.h"
 #include "commands/view.h"
 #include "nodes/makefuncs.h"
+#include "postmaster/postmaster.h"
 #include "utils/array.h"
 #include "utils/attoptcache.h"
 #include "utils/builtins.h"
@@ -267,6 +268,15 @@ static relopt_int intRelOpts[] =
0, 0, 0
 #endif
},
+   {
+   {
+   "parallel_degree",
+   "Number of parallel processes per executor node wanted 
for this relation.",
+   RELOPT_KIND_HEAP,
+   AccessExclusiveLock
+   },
+   -1, 0, MAX_BACKENDS
+   },
 
/* list terminator */
{{NULL}}
@@ -1251,8 +1261,8 @@ fillRelOptions(void *rdopts, Size basesize,
 
 
 /*
- * Option parser for anything that uses StdRdOptions (i.e. fillfactor and
- * autovacuum)
+ * Option parser for anything that uses StdRdOptions (i.e. fillfactor,
+ * autovacuum and parallel_degree)
  */
 bytea *
 default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
@@ -1291,7 +1301,9 @@ default_reloptions(Datum reloptions, bool validate, 
relopt_kind kind)
{"autovacuum_analyze_scale_factor", RELOPT_TYPE_REAL,
offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, 
analyze_scale_factor)},
{"user_catalog_table", RELOPT_TYPE_BOOL,
-   offsetof(StdRdOptions, user_catalog_table)}
+   offsetof(StdRdOptions, user_catalog_table)},
+   {"parallel_degree", RELOPT_TYPE_INT,
+   offsetof(StdRdOptions, parallel_degree)}
};
 
options = parseRelOptions(reloptions, validate, kind, );
diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index cc77ff9..6032b95 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -669,21 +669,31 @@ create_parallel_paths(PlannerInfo *root, RelOptInfo *rel)
 * just for this relation, but when combined with all of its 
inheritance siblings
 * it may well pay off.
 */
-   if (rel->pages < parallel_threshold && rel->reloptkind == 
RELOPT_BASEREL)
+   if (rel->pages < parallel_threshold && rel->rel_parallel_degree == -1 &&
+   rel->reloptkind == RELOPT_BASEREL)
return;
 
/*
-* Limit the degree of parallelism logarithmically based on the size of 
the
-* relation.  This probably needs to be a good deal more sophisticated, 
but we
-* need something here for now.
+* Use the table parallel_degree if specified, but don't go further than
+* max_parallel_degree
 */
-   while (rel->pages > parallel_threshold * 3 &&
-  parallel_degree < max_parallel_degree)
+   if (rel->rel_parallel_degree > 0)
+   parallel_degree = Min(rel->rel_parallel_degree, 
max_parallel_degree);
+   else
{
-   parallel_degree++;
-   parallel_threshold *= 3;
-   if (parallel_threshold >= PG_INT32_MAX / 3)
-   break;
+   /*
+* Limit the degree of parallelism logarithmically based on the 
size of the
+* relation.  This probably needs to be a good deal more 
sophisticated, but we
+* need something here for now.
+*/
+   while (rel->pages > parallel_threshold * 3 &&
+  parallel_degree < max_parallel_degree)
+   {
+   parallel_degree++;
+   parallel_threshold *= 3;
+   if (parallel_threshold >= PG_INT32_MAX / 3)
+   break;
+   }
 

Re: [HACKERS] raw output from copy

2016-04-04 Thread Daniel Verite
Tom Lane wrote:

> >> Code that uses PQexecParams() binary "resultFormat", or the
> >> binary format of copy doesn't have that problem,  but most
> >> client-side drivers don't do that.
> 
> > And maybe they just can't realistically, because  getting result
> > format in binary is exposed as an all-or-nothing choice in libpq.
> 
> That's simply wrong.  Read the documentation for PQexecParams and
> friends: you can specify text or binary per-column.  It's COPY that
> has the only-one-column-format restriction, and RAW certainly isn't
> going to make that better.

About PQexecParams, I disagree, the parameters formats can be
specified independantly, but the not the results, which are either all
binary or all text.

Quoting the doc at
http://www.postgresql.org/docs/9.5/static/libpq-exec.html

PGresult *PQexecParams(PGconn *conn,
   const char *command,
   int nParams,
   const Oid *paramTypes,
   const char * const *paramValues,
   const int *paramLengths,
   const int *paramFormats,
   int resultFormat);
[...]

resultFormat:
Specify zero to obtain results in text format, or one to obtain results
in binary format. (There is not currently a provision to obtain different
result columns in different formats, although that is possible in the
underlying protocol.)


For the client-side drivers that I've looked at, like these used in php
or perl, they just never use resultFormat=1.
I assume that they consider that having all values
in binary is unworkable for them, which is reasonable.
Maybe if they had a per-column choice, they wouldn't
use it anyway, but at least it would be theirs to decide

All this is only tangentially related to COPY RAW.
It's just that COPY RAW can be seen as an efficient alternative to
the single-column returning [SELECT bytea_column FROM...]
The drivers currently request this in text mode even though
it makes no sense in this particular case, and it gets measurably
annoying if the contents are big.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL indicator in psql prompt

2016-04-04 Thread Robert Haas
On Mon, Apr 4, 2016 at 12:07 PM, Shulgin, Oleksandr
 wrote:
> On Apr 4, 2016 17:54, "Robert Haas"  wrote:
>>
>> On Fri, Apr 1, 2016 at 10:15 AM, Tom Lane  wrote:
>> > Peter Eisentraut  writes:
>> >> I like how browsers show a little lock in the address bar depending on
>> >> whether SSL is in use.  This could be useful in psql as well.  Here is
>> >> a
>> >> prototype patch.
>> >> Comments?
>> >
>> > -1 on the hard-coded UTF8, even with the encoding check (which I don't
>> > think is terribly trustworthy).  How about defining it in a way that
>> > lets/makes the user provide the character(s) to print?
>>
>> I think you have been trolled.  Note the date.
>
> Are you trying to say that this feature is in your opinion useless?

Well, what I was trying to say is that I don't think the proposal was
100% serious.  However, I also don't think it's particularly useful.
I am not a big fan of cluttering up the psql command line with random
Unicode glyphs.  It's easy enough to find out whether you've got an
SSL connection if you want to, with \conninfo.  I don't think it needs
to be part of every prompt.

YMMV, of course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Client Log Output Filtering

2016-04-04 Thread David Steele

On 4/4/16 12:36 PM, Tom Lane wrote:

David Steele  writes:

On 4/4/16 11:21 AM, Tom Lane wrote:

I had in mind a patch that simply added LOG_SERVER_ONLY as another define
and did whatever seemed appropriate documentation-wise.  I see no reason
to touch the places that are currently dealing with client communication
failures.



I still prefer to collapse them into a single value for the current
implementation.


Right, that's what I had in mind, sorry if I was unclear.  I agree that
considering LOG_SERVER_ONLY as the main name and COMMERROR as an alias
is reasonable.


COMMERROR was not documented in sources.sgml so LOG_SERVER_ONLY wasn't
either.  I'm happy to do that that though it's not clear to me where it
would go.  I could just put it in the general description.


Ah, I thought I remembered that the specific elevels were documented
there, but I see they're only documented in elog.h.  Doesn't seem like
it's incumbent on this patch to improve that.

Committed with a quick pgindent fixup.


Thank you, and I appreciate you suggesting this simple solution.

The advantage of this over the other possible solutions is that I can 
easily port the feature back to 9.5 by defining LOG_SERVER_ONLY when 
building pgaudit for that version.  That's a big win in my mind.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] raw output from copy

2016-04-04 Thread David G. Johnston
On Fri, Apr 1, 2016 at 8:42 AM, Daniel Verite 
wrote:

> Andrew Dunstan wrote:
>
> > If someone can make a good case that this is going to be of
> > general use I'll happily go along, but I haven't seen one so far.
>
> About COPY FROM with a raw format, for instance just yesterday
> there was this user question on stackoverflow:
> http://stackoverflow.com/questions/36317237
>
> which essentially is: how to import contents from a file without any
> particular interpretation of any character?
>
> With the patch discussed in this thread, a user can do
> \copy table(textcol) from /path/to/file (format raw)
>

​What is needed to solve this specific use-case is a way to specify "QUOTE
NONE" instead of the default for whatever format is being hijacked:

​COPY file_content FROM '/tmp/textfile.txt' WITH (FORMAT csv, QUOTE
E'');

becomes

COPY file_content FROM '/tmp/textfile.txt' WITH (FORMAT csv, QUOTE NONE);

​Or maybe: "WITH (FORMAT single_column)"

Though maybe that doesn't extend well to unencoded binary data...which
seems like it can be considered a separate problem from reliably importing
an entire file into a single row+column in a table.

David J.


Re: [HACKERS] raw output from copy

2016-04-04 Thread Tom Lane
"Daniel Verite"  writes:
> One reason of adding the format to COPY is that it's where users
> are looking for it. It's the canonical way of importing contents
> from files so that's where it makes more sense.

I'm not sure I buy that argument, because it could be used to justify
adding absolutely any ETL functionality to COPY.  And we don't want
to go down that path; the design intention for COPY is that it be as
simple and fast as possible.

>> And I am still waiting for a non-psql use case. But I don't expect to 
>> see one, precisely because most clients have no difficulty at all in 
>> handling binary data.

> You mean small or medium-size binary data. The 512MB-1GB range is
> impossible to handle if requested in text format, which is what drivers
> tend to use. Even pg_dump fails on these contents.

... which is COPY.  I do not see that RAW mode is going to help much
here: it's not going to be noticeably better than COPY BINARY in terms
of maximum field width.

>> Code that uses PQexecParams() binary "resultFormat", or the
>> binary format of copy doesn't have that problem,  but most
>> client-side drivers don't do that.

> And maybe they just can't realistically, because  getting result
> format in binary is exposed as an all-or-nothing choice in libpq.

That's simply wrong.  Read the documentation for PQexecParams and
friends: you can specify text or binary per-column.  It's COPY that
has the only-one-column-format restriction, and RAW certainly isn't
going to make that better.


I'm not quite as convinced as Andrew that RAW mode is unnecessary,
but I don't find these arguments for it to be very compelling.

The real issue to my mind is that it doesn't seem like we can shoehorn
a sanely-defined version of RAW into the existing protocol spec without
creating compatibility hazards.  So we can either wait for the mythical
protocol v4 (but even a protocol update wouldn't fix the application-level
hazards) or we can treat it as a problem to be solved client-side.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

2016-04-04 Thread Robert Haas
On Sat, Mar 26, 2016 at 8:39 AM, Andres Freund  wrote:
> On 2016-03-25 12:02:05 -0400, Robert Haas wrote:
>> Gosh, that's surprising.  I wonder if that just revealed an underlying
>> issue rather than creating it.
>
> I think that's the case; it's just somewhat unlikely to hit in other
> cases.
>
> If SMgrRelation->md_fd[n] is an empty relation, and mdread() or another
> routine is asking for a block in the second segment - which will error
> out. But even if the first segment is zero bytes, _mdfd_getseg() will
> dutifully try to open the second segment. Which will succeed in the case
> of a truncated relation, because we leave the truncated segment in
> place.
>
> ISTM that _mdfd_getseg better check the length of the last segment
> before opening the next one?

Well, I agree that it's pretty strange that _mdfd_getseg() makes no
such check, but I still don't think I understand what's going on here.
Backends shouldn't be requesting nonexistent blocks from a relation -
higher-level safeguards, like holding AccessExclusiveLock before
trying to complete a DROP or TRUNCATE - are supposed to prevent that.
If this patch is causing us to hold onto smgr references to a relation
on which we no longer hold locks, I think that's irretrievably broken
and should be reverted.  I really doubt this will be the only thing
that goes wrong if you do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] raw output from copy

2016-04-04 Thread Daniel Verite
Andrew Dunstan wrote:

> Inserting the whole contents of a text file unchanged is insanely easy 
> in psql.
> 
>  \set file `cat /path/to/file`
>  insert into mytable(contents) values(:'file');

That's assuming psql but the asker of that question never mentioned
using psql. The COPY invocation could be inside a function. Even if
that particular user would be fine with a psql-only option, the next
one might not. Or they might want to import a binary file, and
as you mention, currently there's no equivalent of the :'var'
feature for binary.

But there's another aspect to this that's worth of consideration,
and that this forum question illustrates.
One reason of adding the format to COPY is that it's where users
are looking for it. It's the canonical way of importing contents
from files so that's where it makes more sense.
From the POV of being user friendly and consistent, restricting what
COPY can do because psql could do it completely differently
if the user was psql-savvy enough to know it, what sense does it
make?

> And I am still waiting for a non-psql use case. But I don't expect to 
> see one, precisely because most clients have no difficulty at all in 
> handling binary data.

You mean small or medium-size binary data. The 512MB-1GB range is
impossible to handle if requested in text format, which is what drivers
tend to use. Even pg_dump fails on these contents.
Maybe it was unimportant when bytea was added ~15 years ago,
but the size of data that people actually put into bytea columns is
growing, following Moore's law like the rest.

Even in the lower size range, considering the amount of memory allocated
and the time spent to convert to hex, sending twice the number
of bytes on the wire, just to do the reverse conversion in the client
as soon as all data is obtained, it works but it's pointless
and inefficient.

Code that uses PQexecParams() binary "resultFormat", or the
binary format of copy doesn't have that problem,  but most
client-side drivers don't do that.

And maybe they just can't realistically, because  getting result
format in binary is exposed as an all-or-nothing choice in libpq.

I mean if client code does SELECT * FROM table or even COPY of the
same, and what comes back is bytea and e.g. timestamps and floats and
custom types, the client-side driver may wish to have the bytea field in
binary format for efficiency and the rest in text format for
usability, but that's not possible with PQexecParams(), or other
libpq functions.

The point of mixing binary and text is outside the scope of a RAW
format for COPY, as obviously it wouldn't help with that in any way,
but on the argument that the status quo is fine because clients
have no difficulty, that's just not true. Clients cope with what they have,
but what they have is far from being complete or optimal.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Client Log Output Filtering

2016-04-04 Thread Tom Lane
David Steele  writes:
> On 4/4/16 11:21 AM, Tom Lane wrote:
>> I had in mind a patch that simply added LOG_SERVER_ONLY as another define
>> and did whatever seemed appropriate documentation-wise.  I see no reason
>> to touch the places that are currently dealing with client communication
>> failures.

> I still prefer to collapse them into a single value for the current 
> implementation.

Right, that's what I had in mind, sorry if I was unclear.  I agree that
considering LOG_SERVER_ONLY as the main name and COMMERROR as an alias
is reasonable.

> COMMERROR was not documented in sources.sgml so LOG_SERVER_ONLY wasn't 
> either.  I'm happy to do that that though it's not clear to me where it 
> would go.  I could just put it in the general description.

Ah, I thought I remembered that the specific elevels were documented
there, but I see they're only documented in elog.h.  Doesn't seem like
it's incumbent on this patch to improve that.

Committed with a quick pgindent fixup.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Yet another small patch - reorderbuffer.c:1099

2016-04-04 Thread Aleksander Alekseev
Hello

There is weired peace of code in reorderbuffer.c:

```
/* delete from list of known subxacts */
if (txn->is_known_as_subxact)
{
/* NB: nsubxacts count of parent will be too high now */
dlist_delete(>node);
}
/* delete from LSN ordered list of toplevel TXNs */
else 
{
dlist_delete(>node);
}
```

As you see `if` an `else` branches are exactly the same. I wonder
whether this is a bug or code just requires a bit of cleaning. In the
latter case here is a patch.

According to `git log` both branches were introduced in one commit
b89e1510. I added author and committer of this code to CC since they
have much better understanding of it than I do.

-- 
Best regards,
Aleksander Alekseev
http://eax.me/
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 9d78c8c..454b309 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1097,16 +1097,7 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	}
 
 	/* delete from list of known subxacts */
-	if (txn->is_known_as_subxact)
-	{
-		/* NB: nsubxacts count of parent will be too high now */
-		dlist_delete(>node);
-	}
-	/* delete from LSN ordered list of toplevel TXNs */
-	else
-	{
-		dlist_delete(>node);
-	}
+	dlist_delete(>node);
 
 	/* now remove reference from buffer */
 	hash_search(rb->by_txn,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-04-04 Thread Robert Haas
On Wed, Mar 30, 2016 at 2:22 AM, Thomas Munro
 wrote:
> On Wed, Mar 30, 2016 at 2:36 PM, Robert Haas  wrote:
>> OK, I committed this, with a few tweaks.  In particular, I added a
>> flag variable instead of relying on "latch set" == "need to send
>> reply"; the other changes were cosmetic.
>>
>> I'm not sure how much more of this we can realistically get into 9.6;
>> the latter patches haven't had much review yet.  But I'll set this
>> back to Needs Review in the CommitFest and we'll see where we end up.
>> But even if we don't get anything more than this, it's still rather
>> nice: remote_apply turns out to be only slightly slower than remote
>> flush, and it's a guarantee that a lot of people are looking for.
>
> Thank you Michael and Robert!
>
> Please find attached the rest of the patch series, rebased against
> master.  The goal of the 0002 patch is to provide an accurate
> indication of the current replay lag on each standby, visible to users
> like this:
>
> postgres=# select application_name, replay_lag from pg_stat_replication;
>  application_name │   replay_lag
> ──┼─
>  replica1 │ 00:00:00.000299
>  replica2 │ 00:00:00.000323
>  replica3 │ 00:00:00.000319
>  replica4 │ 00:00:00.000303
> (4 rows)
>
> It works by maintaining a buffer of (end of WAL, time now) samples
> received from the primary, and then eventually feeding those times
> back to the primary when the recovery process replays the
> corresponding locations.
>
> Compared to approaches based on commit timestamps, this approach has
> the advantage of providing non-misleading information between commits.
> For example, if you run a batch load job that takes 1 minute to insert
> the whole phonebook and no other transactions run, you will see
> replay_lag updating regularly throughout that minute, whereas typical
> commit timestamp-only approaches will show an increasing lag time
> until a commit record is eventually applied.  Compared to simple LSN
> location comparisons, it reports in time rather than bytes of WAL,
> which can be more meaningful for DBAs.
>
> When the standby is entirely caught up and there is no write activity,
> the reported time effectively represents the ping time between the
> servers, and is updated every wal_sender_timeout / 2, when keepalive
> messages are sent.  While new WAL traffic is arriving, the walreceiver
> records timestamps at most once per second in a circular buffer, and
> then sends back replies containing the recorded timestamps as fast as
> the recovery process can apply the corresponding xlog.  The lag number
> you see is computed by the primary server comparing two timestamps
> generated by its own system clock, one of which has been on a journey
> to the standby and back.
>
> Accurate lag estimates are a prerequisite for the 0004 patch (about
> which more later), but I believe users would find this valuable as a
> feature on its own.

Well, one problem with this is that you can't put a loop inside of a
spinlock-protected critical section.

In general, I think this is a pretty reasonable way of attacking this
problem, but I'd say it's significantly under-commented.  Where should
someone go to get a general overview of this mechanism?  The answer is
not "at place XXX within the patch".  (I think it might merit some
more extensive documentation, too, although I'm not exactly sure what
that should look like.)

When you overflow the buffer, you could thin in out in a smarter way,
like by throwing away every other entry instead of the oldest one.  I
guess you'd need to be careful how you coded that, though, because
replaying an entry with a timestamp invalidates some of the saved
entries without formally throwing them out.

Conceivably, 0002 could be split into two patches, one of which
computes "stupid replay lag" considering only records that naturally
carry timestamps, and a second adding the circular buffer to handle
the case where much time passes without finding such a record.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL indicator in psql prompt

2016-04-04 Thread Shulgin, Oleksandr
On Apr 4, 2016 17:54, "Robert Haas"  wrote:
>
> On Fri, Apr 1, 2016 at 10:15 AM, Tom Lane  wrote:
> > Peter Eisentraut  writes:
> >> I like how browsers show a little lock in the address bar depending on
> >> whether SSL is in use.  This could be useful in psql as well.  Here is
a
> >> prototype patch.
> >> Comments?
> >
> > -1 on the hard-coded UTF8, even with the encoding check (which I don't
> > think is terribly trustworthy).  How about defining it in a way that
> > lets/makes the user provide the character(s) to print?
>
> I think you have been trolled.  Note the date.

Are you trying to say that this feature is in your opinion useless?

Even if that's an April Fools patch, I don't thiy it is entirely out of
scope. :-)

--
Alex


Re: [HACKERS] [PROPOSAL] Client Log Output Filtering

2016-04-04 Thread David Steele

On 4/4/16 11:21 AM, Tom Lane wrote:

David Steele  writes:

On 3/29/16 12:58 PM, Tom Lane wrote:

...  Basically,
my point is that LOG_ONLY achieves 95% of the benefit for probably
0.01% of the work.



Attached is a patch that re-purposes COMMERROR as LOG_SERVER_ONLY.  I
went ahead and replaced all instances of COMMERROR with LOG_SERVER_ONLY.


Uh, what?  COMMERROR is a distinct concept in my opinion.  It might happen
to share the same implementation today, but that doesn't make it the
same thing.


Fair enough.


I had in mind a patch that simply added LOG_SERVER_ONLY as another define
and did whatever seemed appropriate documentation-wise.  I see no reason
to touch the places that are currently dealing with client communication
failures.


I still prefer to collapse them into a single value for the current 
implementation.  Otherwise there are several places that need to check 
for both in elog.c and their behavior is identical (for now).  For my 2c 
it makes more sense to collapse COMMERROR into LOG_SERVER_ONLY since 
that more accurately describes what it actually does in the elog code.


What do you think of the attached?

COMMERROR was not documented in sources.sgml so LOG_SERVER_ONLY wasn't 
either.  I'm happy to do that that though it's not clear to me where it 
would go.  I could just put it in the general description.


Thanks,
--
-David
da...@pgmasters.net
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 8e00609..740f089 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -293,7 +293,7 @@ errstart(int elevel, const char *filename, int lineno,
output_to_server = is_log_level_output(elevel, log_min_messages);
 
/* Determine whether message is enabled for client output */
-   if (whereToSendOutput == DestRemote && elevel != COMMERROR)
+   if (whereToSendOutput == DestRemote && elevel != LOG_SERVER_ONLY)
{
/*
 * client_min_messages is honored only after we complete the
@@ -2086,7 +2086,7 @@ write_eventlog(int level, const char *line, int len)
case DEBUG2:
case DEBUG1:
case LOG:
-   case COMMERROR:
+   case LOG_SERVER_ONLY:
case INFO:
case NOTICE:
eventlevel = EVENTLOG_INFORMATION_TYPE;
@@ -2965,7 +2965,7 @@ send_message_to_server_log(ErrorData *edata)
syslog_level = LOG_DEBUG;
break;
case LOG:
-   case COMMERROR:
+   case LOG_SERVER_ONLY:
case INFO:
syslog_level = LOG_INFO;
break;
@@ -3595,7 +3595,7 @@ error_severity(int elevel)
prefix = _("DEBUG");
break;
case LOG:
-   case COMMERROR:
+   case LOG_SERVER_ONLY:
prefix = _("LOG");
break;
case INFO:
@@ -3699,7 +3699,7 @@ write_stderr(const char *fmt,...)
 static bool
 is_log_level_output(int elevel, int log_min_level)
 {
-   if (elevel == LOG || elevel == COMMERROR)
+   if (elevel == LOG || elevel == LOG_SERVER_ONLY)
{
if (log_min_level == LOG || log_min_level <= ERROR)
return true;
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 901651f..1d7fcca 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -25,9 +25,11 @@
 #define DEBUG1 14  /* used by GUC debug_* 
variables */
 #define LOG15  /* Server operational 
messages; sent only to
 * server log 
by default. */
-#define COMMERROR  16  /* Client communication 
problems; same as LOG
-* for server 
reporting, but never sent to
-* client. */
+#define LOG_SERVER_ONLY16  /* Same as LOG for server 
reporting, but never
+  sent to 
client. */
+#define COMMERROR  LOG_SERVER_ONLY /* Client communication problems; same 
as
+  LOG 
for server reporting, but never sent
+  to 
client. */
 #define INFO   17  /* Messages specifically 
requested by user (eg
 * VACUUM 
VERBOSE output); always sent to
 * client 
regardless of 

Re: [HACKERS] SSL indicator in psql prompt

2016-04-04 Thread Robert Haas
On Fri, Apr 1, 2016 at 10:15 AM, Tom Lane  wrote:
> Peter Eisentraut  writes:
>> I like how browsers show a little lock in the address bar depending on
>> whether SSL is in use.  This could be useful in psql as well.  Here is a
>> prototype patch.
>> Comments?
>
> -1 on the hard-coded UTF8, even with the encoding check (which I don't
> think is terribly trustworthy).  How about defining it in a way that
> lets/makes the user provide the character(s) to print?

I think you have been trolled.  Note the date.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-04-04 Thread Jesper Pedersen

On 04/01/2016 04:39 PM, Andres Freund wrote:

On April 1, 2016 10:25:51 PM GMT+02:00, Jesper Pedersen 
 wrote:

Hi,

On 03/31/2016 06:21 PM, Andres Freund wrote:

On March 31, 2016 11:13:46 PM GMT+02:00, Jesper Pedersen

 wrote:



I can do a USE_CONTENT_LOCK run on 0003 if it is something for 9.6.


Yes please. I think the lock variant is realistic, the lockless did

isn't.




I have done a run with -M prepared on unlogged running 10min per data
point, up to 300 connections. Using data + wal on HDD.

I'm not seeing a difference between with and without USE_CONTENT_LOCK
--
all points are within +/- 0.5%.

Let me know if there are other tests I can perform


How do either compare to just 0002 applied?



0001 + 0002 compared to 0001 + 0002 + 0003 (either way) were pretty much 
the same +/- 0.5% on the HDD run.


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-04-04 Thread Tom Lane
Amit Langote  writes:
> On Mon, Apr 4, 2016 at 11:24 PM, Tom Lane  wrote:
>> A related issue, now that I've seen this example, is that altering
>> FDW-level or server-level options won't cause a replan either.  I'm
>> not sure there's any very good fix for that.  Surely we don't want
>> to try to identify all tables belonging to the FDW or server and
>> issue relcache invals on all of them.

> Hm, some kind of PlanInvalItem-based solution could work maybe?

Hm, so we'd expect that whenever an FDW consulted the options while
making a plan, it'd have to record a plan dependency on them?  That
would be a clean fix maybe, but I'm worried that third-party FDWs
would fail to get the word about needing to do this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Client Log Output Filtering

2016-04-04 Thread Tom Lane
David Steele  writes:
> On 3/29/16 12:58 PM, Tom Lane wrote:
>> ...  Basically,
>> my point is that LOG_ONLY achieves 95% of the benefit for probably
>> 0.01% of the work.

> Attached is a patch that re-purposes COMMERROR as LOG_SERVER_ONLY.  I 
> went ahead and replaced all instances of COMMERROR with LOG_SERVER_ONLY.

Uh, what?  COMMERROR is a distinct concept in my opinion.  It might happen
to share the same implementation today, but that doesn't make it the
same thing.

I had in mind a patch that simply added LOG_SERVER_ONLY as another define
and did whatever seemed appropriate documentation-wise.  I see no reason
to touch the places that are currently dealing with client communication
failures.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tiny patch: sigmask.diff

2016-04-04 Thread Aleksander Alekseev
> Surely this fix is completely wrong?  You'd have to touch every use of
> signum() to do it like that.  You'd also be introducing similarly-
> undefined behavior at the other end of the loop, where this coding
> would be asking to compute 1<<31, hence shifting into the sign bit,
> which is undefined unless you make the computation explicitly
> unsigned.

Oh, I didn't think about that...

> I think better is just to change the for-loop to iterate from 1 not 0.
> Signal 0 is invalid anyway, and is rejected in pg_queue_signal for
> example, so there can't be any event waiting there.

Agreed.

-- 
Best regards,
Aleksander Alekseev
http://eax.me/
diff --git a/src/backend/port/win32/signal.c b/src/backend/port/win32/signal.c
index 36c6ebd..0ba2817 100644
--- a/src/backend/port/win32/signal.c
+++ b/src/backend/port/win32/signal.c
@@ -113,7 +113,7 @@ pgwin32_dispatch_queued_signals(void)
 		/* One or more unblocked signals queued for execution */
 		int			exec_mask = UNBLOCKED_SIGNAL_QUEUE();
 
-		for (i = 0; i < PG_SIGNAL_COUNT; i++)
+		for (i = 1; i < PG_SIGNAL_COUNT; i++)
 		{
 			if (exec_mask & sigmask(i))
 			{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Client Log Output Filtering

2016-04-04 Thread David Steele

Hi Tom,

On 3/29/16 12:58 PM, Tom Lane wrote:


Oh, I agree that there's a compelling use-case for LOG |
ERR_HIDE_FROM_CLIENT.  I'm less certain that there's a use-case
for supporting such a flag across all elevels that is strong enough
to justify all the work we'd have to do to make it happen.  Basically,
my point is that LOG_ONLY achieves 95% of the benefit for probably
0.01% of the work.


Attached is a patch that re-purposes COMMERROR as LOG_SERVER_ONLY.  I 
went ahead and replaced all instances of COMMERROR with LOG_SERVER_ONLY.


I left the COMMERROR #define but it is no longer used by any server, 
client, or included contrib code (I also noted that it was DEPRECATED in 
the comment).  I'll leave it up to the committer to remove if deemed 
appropriate.


I realize there's no agreement on how this should work ideally, but this 
patch answers the current need without introducing anything new and 
shouldn't cause regressions.  It does address confusion that would arise 
from using COMMERROR in ereports that clearly are not.


Thanks,
--
-David
da...@pgmasters.net
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 2751183..db6da9f 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -635,7 +635,7 @@ recv_password_packet(Port *port)
 * log.
 */
if (mtype != EOF)
-   ereport(COMMERROR,
+   ereport(LOG_SERVER_ONLY,

(errcode(ERRCODE_PROTOCOL_VIOLATION),
errmsg("expected password response, got 
message type %d",
   mtype)));
@@ -663,7 +663,7 @@ recv_password_packet(Port *port)
 * StringInfo is guaranteed to have an appended '\0'.
 */
if (strlen(buf.data) + 1 != buf.len)
-   ereport(COMMERROR,
+   ereport(LOG_SERVER_ONLY,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
 errmsg("invalid password packet size")));
 
@@ -853,7 +853,7 @@ pg_GSS_recvauth(Port *port)
{
/* Only log error if client didn't disconnect. */
if (mtype != EOF)
-   ereport(COMMERROR,
+   ereport(LOG_SERVER_ONLY,

(errcode(ERRCODE_PROTOCOL_VIOLATION),
 errmsg("expected GSS response, 
got message type %d",
mtype)));
@@ -1092,7 +1092,7 @@ pg_SSPI_recvauth(Port *port)
{
/* Only log error if client didn't disconnect. */
if (mtype != EOF)
-   ereport(COMMERROR,
+   ereport(LOG_SERVER_ONLY,

(errcode(ERRCODE_PROTOCOL_VIOLATION),
 errmsg("expected SSPI 
response, got message type %d",
mtype)));
diff --git a/src/backend/libpq/be-secure-openssl.c 
b/src/backend/libpq/be-secure-openssl.c
index 6009663..b8f7a07 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -359,7 +359,7 @@ be_tls_open_server(Port *port)
 
if (!(port->ssl = SSL_new(SSL_context)))
{
-   ereport(COMMERROR,
+   ereport(LOG_SERVER_ONLY,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
 errmsg("could not initialize SSL connection: 
%s",
SSLerrmessage(;
@@ -367,7 +367,7 @@ be_tls_open_server(Port *port)
}
if (!my_SSL_set_fd(port, port->sock))
{
-   ereport(COMMERROR,
+   ereport(LOG_SERVER_ONLY,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
 errmsg("could not set SSL socket: %s",
SSLerrmessage(;
@@ -401,27 +401,27 @@ aloop:
goto aloop;
case SSL_ERROR_SYSCALL:
if (r < 0)
-   ereport(COMMERROR,
+   ereport(LOG_SERVER_ONLY,

(errcode_for_socket_access(),
 errmsg("could not 
accept SSL connection: %m")));
else
-   ereport(COMMERROR,
+   ereport(LOG_SERVER_ONLY,


Re: [HACKERS] Timeline following for logical slots

2016-04-04 Thread Andres Freund
On 2016-04-04 22:59:41 +0800, Craig Ringer wrote:
> Assuming that here you mean separate slots on different machines
> replicating via physical rep:

No,  I don't.

> We don't currently allow the creation of a logical slot on a standby. Nor
> replay from it, even to advance it without receiving the decoded
> changes.

Yes. I know.


> > I think the right way to do this is to focus on failover for logical
> > rep, with separate slots. The whole idea of integrating this physical
> > rep imo makes this a *lot* more complex than necessary. Not all that
> > many people are going to want to physical rep and logical rep.

> If you're saying we should focus on failover between nodes that're
> themselves connected using logical replication rather than physical
> replication, I really have to strongly disagree.
> 
> TL;DR for book-length below: We're a long, long way from being able to
> deliver even vaguely decent logical rep based failover.

I don't buy that.


> * Robust sequence decoding and replication. If you were following the later
> parts of that discussion you will've seen how fun that's going to be, but
> it's the simplest of all of the problems.

Unconvinced. People used londiste and slony for years without that, and
it's not even remotely at the top of the list of problems with either.


> * Logical decoding and sending of in-progress xacts, so the logical client
> can already be most of the way through receiving a big xact when it
> commits. Without this we have a huge lag spike whenever a big xact happens,
> since we must first finish decoding it in to a reorder buffer and can only
> then *begin* to send it to the client. During which time no later xacts may
> be decoded or replayed to the client. If you're running that rare thing,
> the perfect pure OLTP system, you won't care... but good luck finding one
> in the real world.

So? If you're using logical rep, you've always have that.


> * Either parallel apply on the client side or at least buffering of
> in-progress xacts on the client side so they can be safely flushed to disk
> and confirmed, allowing receive to continue while replay is done on the
> client. Otherwise sync rep is completely impractical... and there's no
> shortage of systems out there that can't afford to lose any transactions.
> Or at least have some crucial transactions they can't lose.

That's more or less trivial to implement. In an extension.


> * Robust, seamless DDL replication, so things don't just break randomly.
> This makes the other points above look nice and simple by comparison.

We're talking about a system which involves logical decoding. Whether
you have failover via physical rep or not, doesn't have anything to do
with this point.


> Physical rep *works*. Robustly. Reliably. With decent performance. It's
> proven. It supports sync rep. I'm confident telling people to use it.

Sure? And nothing prevents you from using it.


> I don't think there's any realistic way we're going to get there for
> logical rep in 9.6+n for n<2 unless a whole lot more people get on board
> and work on it. Even then.

I think the primary problem here is that you're focusing on things that
just are not very interesting for the majority of users, and which thus
won't get very enthusastic help.  The way to make progress is to get
something basic in, and then iterate from there.  Instead you're
focussing on the fringes; which nobody cares about, because the basics
aren't there.

FWIW, I plan to aggressively work on in-core (9.7) logical rep starting
in a few weeks. If we can coordinate on that end, I'm very happy, if not
then not.

> Right now we can deliver logical failover for DBs that:
> (b) don't do DDL, ever, or only do some limited DDL via direct admin
> commands where they can call some kind of helper function to queue and
> apply the DDL;
> (c) don't do big transactions or don't care about unbounded lag;
> (d) don't need synchronous replication or don't care about possibly large
> delays before commit is confirmed;
> (e) only manage role creation (among other things) via very strict
> processes that can be absolutely guaranteed to run on all nodes

And just about nothing of that has to do with any of the recent patches
send towards logical rep. I agree about the problems, but you should
attack them, instead of building way too complicated workarounds which
barely anybody will find interesting.


> ... which in my view isn't a great many databases.

Meh^2. Slony and londiste have these problems. And the reason they're
not used isn't primarily those, but that they're external and waaayyy to
complicated to use.


> Physical rep has *none* of those problems. (Sure, it has others, but we're
> used to them).

So?


-Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch: fix typo, duplicated word in indexam.sgml

2016-04-04 Thread Artur Zakirov

Hello,

There is a duplicated word in indexam.sgml. The patch is attached.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index b36889b..69edeea 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -58,7 +58,7 @@
   
 
   
-   Index access access methods can be defined and dropped using
+   Index access methods can be defined and dropped using
 and
  SQL commands respectively.
   

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-04-04 Thread Amit Langote
On Mon, Apr 4, 2016 at 11:24 PM, Tom Lane  wrote:
> Amit Langote  writes:
>> On 2016/04/04 15:17, Rajkumar Raghuwanshi wrote:
>>> * .Observation: *Prepare statement execution plan is not getting changed
>>> even after altering foreign table to point to new schema.
>
>> I wonder if performing relcache invalidation upon ATExecGenericOptions()
>> is the correct solution for this problem.  The attached patch fixes the
>> issue for me.
>
> A forced relcache inval will certainly fix it, but I'm not sure if that's
> the best (or only) place to put it.
>
> A related issue, now that I've seen this example, is that altering
> FDW-level or server-level options won't cause a replan either.  I'm
> not sure there's any very good fix for that.  Surely we don't want
> to try to identify all tables belonging to the FDW or server and
> issue relcache invals on all of them.

Hm, some kind of PlanInvalItem-based solution could work maybe? Or
some solution on lines of recent PlanCacheUserMappingCallback() added
by fbe5a3fb, wherein I see this comment which sounds like a similar
solution could be built for servers and FDWs:

+   /*
+* If the plan has pushed down foreign joins, those join may become
+* unsafe to push down because of user mapping changes. Invalidate only
+* the generic plan, since changes to user mapping do not invalidate the
+* parse tree.
+*/

Missing something am I?

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Choosing parallel_degree

2016-04-04 Thread Julien Rouhaud
On 04/04/2016 08:55, Amit Kapila wrote:

Thanks for the review!

> Few comments:
> 1.
> +  limited according to the 
> 
> A. typo.
>/gux-max-parallel-degree/guc-max-parallel-degree
>/worker/workers

Oops, fixed.

> B. + 
> +  Number of workers wanted for this table. The number of worker will be
> +  limited according to 
> the 
> +  parameter.
> + 
> 
> How about writing the above as:
> Sets the degree of parallelism for an individual relation.  The
> requested number of workers will be limited by  linkend="guc-max-parallel-degree">
> 

That's clearly better, changed.

> 2.
> +{
> +{
> +"parallel_degree",
> +"Number of parallel processes 
> per executor node wanted for this relation.",
> +RELOPT_KIND_HEAP,
> +
> AccessExclusiveLock
> +},
> +-1, 1, INT_MAX
> +},
> 
> I think here min and max values should be same as for
> max_parallel_degree (I have verified that for some of the other
> reloption parameters, min and max are same as their guc values); Is
> there a reason to keep them different?
> 

No reason. I put 0 and MAX_BACKENDS, as the GUC value.

> 3.
> @@ -1291,7 +1300,9 @@ default_reloptions(Datum reloptions, bool
> validate, relopt_kind kind)
> 
> Comment on top of this function says:
> /*
>  * Option parser for anything that uses StdRdOptions (i.e. fillfactor and
>  * autovacuum)
>  */
> 
> I think it is better to include parallel_degree in above comment along
> with fillfactor and autovacuum.
> 

Agreed. BTW the user_catalog_table option isn't listed either.

> 
> 4.
> /*
> + * RelationGetMaxParallelDegree
> + *Returns the relation's parallel_degree.  Note multiple eval of 
> argument!
> + */
> +#define RelationGetParallelDegree(relation, defaultmpd) \
> +((relation)->rd_options ? \
> + 
> ((StdRdOptions *) (relation)->rd_options)->parallel_degree : (defaultmpd))
> +
> 
> There are minor in-consistencies in the above macro definition.
> 
> a. RelationGetMaxParallelDegree - This should be RelationGetParallelDegree.
> b. defaultmpd - it is better to name it as defaultpd
> 

Yes, I forgot to update it when I renamed the option, fixed.

>  
>>
>>
>> The feature freeze is now very close.  If this GUC is still wanted,
>> should I add this patch to the next commitfest?
>>
> 
> I am hoping that this will be committed to 9.6, but I think it is good
> to register it in next CF.
> 

So attached v6 fixes all the problems above.

I'll add it to the next commitfest.

> 
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com 

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org
diff --git a/doc/src/sgml/ref/create_table.sgml 
b/doc/src/sgml/ref/create_table.sgml
index cd234db..175b2c6 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -909,6 +909,17 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | 
UNLOGGED ] TABLE [ IF NOT EXI

 

+parallel_degree (integer)
+
+ 
+ Sets the degree of parallelism for an individual relation.  The requested
+ number of workers will be limited by .
+ 
+
+   
+
+   
 autovacuum_enabled, 
toast.autovacuum_enabled (boolean)
 
  
diff --git a/src/backend/access/common/reloptions.c 
b/src/backend/access/common/reloptions.c
index ea0755a..8e4aa80 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -26,6 +26,7 @@
 #include "commands/tablespace.h"
 #include "commands/view.h"
 #include "nodes/makefuncs.h"
+#include "postmaster/postmaster.h"
 #include "utils/array.h"
 #include "utils/attoptcache.h"
 #include "utils/builtins.h"
@@ -267,6 +268,15 @@ static relopt_int intRelOpts[] =
0, 0, 0
 #endif
},
+   {
+   {
+   "parallel_degree",
+   "Number of parallel processes per executor node wanted 
for this relation.",
+   RELOPT_KIND_HEAP,
+   AccessExclusiveLock
+   },
+   -1, 0, MAX_BACKENDS
+   },
 
/* list terminator */
{{NULL}}
@@ -1251,8 +1261,8 @@ fillRelOptions(void *rdopts, Size basesize,
 
 
 /*
- * Option parser for anything that uses StdRdOptions (i.e. fillfactor and
- * autovacuum)
+ * Option parser for anything that uses StdRdOptions (i.e. fillfactor,
+ * autovacuum and parallel_degree)
  */
 bytea *
 default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
@@ -1291,7 +1301,9 @@ default_reloptions(Datum reloptions, bool validate, 
relopt_kind kind)
{"autovacuum_analyze_scale_factor", RELOPT_TYPE_REAL,
offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, 
analyze_scale_factor)},
{"user_catalog_table", RELOPT_TYPE_BOOL,
-   offsetof(StdRdOptions, user_catalog_table)}
+   offsetof(StdRdOptions, user_catalog_table)},
+   {"parallel_degree", RELOPT_TYPE_INT,
+   offsetof(StdRdOptions, 

Re: [HACKERS] Timeline following for logical slots

2016-04-04 Thread Craig Ringer
On 4 April 2016 at 18:01, Andres Freund  wrote:


> > The only way I can think of to do that really reliably right now, without
> > full failover slots, is to use the newly committed pluggable WAL
> mechanism
> > and add a hook to SaveSlotToPath() so slot info can be captured, injected
> > in WAL, and replayed on the replica.
>
> I personally think the primary answer is to use separate slots on
> different machines. Failover slots can be an extension to that at some
> point, but I think they're a secondary goal.
>

Assuming that here you mean separate slots on different machines
replicating via physical rep:

We don't currently allow the creation of a logical slot on a standby. Nor
replay from it, even to advance it without receiving the decoded changes.
Both would be required for that to work, as well as extensions to the hot
standby feedback mechanism to allow a standby to ask the master to pin its
catalog_xmin if slots on the standby were further behind than that of the
master.

I was chatting about that with Petr earlier. What we came up with was to
require the standby to connect to the master using a replication slot that,
while remaining a physical replication slot, has a catalog_xmin set and
updated by the replica using extended standby progress messages. The slot's
catalog_xmin the replica pushed up to the master would simply be the
min(catalog_xmin) of all slots on the replica,
i.e. procArray->replication_slot_catalog_xmin . Same with the slot xmin, if
defined for any slot on the replica.

That makes sure that the catalog_xmin required for the standby's slots is
preserved even if the standby isn't currently replaying from the master.

Handily this approach would give us cascading, support for intermediate
servers, and the option of only having failover slots on some replicas not
others. All things that were raised as concerns with failover slots.

However, clients would then have to know about the replica(s) of the master
that were failover candidates and would have to send feedback to advance
the client's slots on those nodes, not just the master. They'd have to be
able to connect to the replicas too. Unless we added some mechanism for the
master to lazily relay those feedback messages to replicas, anyway. Not a
major roadblock, just a bit fiddlier for clients.

Consistency shouldn't be a problem so long as the slot created on the
replica reaches SNAPBUILD_CONSISTENT (or there's enough pending WAL for it
to do so) before failover is required.

I think it'd be a somewhat reasonable alternative to failover slots and
it'd make it much more practical to decode from a replica. Which would be
great. It'd be fiddlier for clients, but probably worth it to get rid of
the limitations failover slots impose.



> > It'd also be necessary to move
> > CheckPointReplicationSlots() out of CheckPointGuts()  to the start of a
> > checkpoint/restartpoint when WAL writing is still permitted, like the
> > failover slots patch does.
>
> Ugh. That makes me rather wary.
>

Your comments say it's called in CheckPointGuts for convenience... and
really there doesn't seem to be anything that makes a slot checkpoint
especially tied to a "real" checkpoint.


> > Basically, failover slots as a plugin using a hook, without the
> > additions to base backup commands and the backup label.
>
> I'm going to be *VERY* hard to convince that adding a hook inside
> checkpointing code is acceptable.
>

Yeah... it's in ReplicationSlotSave, but it's still a slot checkpoint even
if (per above) it ceases to also be in the middle of a full system
checkpoint.


> > I'd really hate 9.6 to go out with - still - no way to use logical
> decoding
> > in a basic, bog-standard HA/failover environment. It overwhelmingly
> limits
> > their utility and it's becoming a major drag on practical use of the
> > feature. That's a difficulty given that the failover slots patch isn't
> > especially trivial and you've shown that lazy sync of slot state is not
> > sufficient.
>
> I think the right way to do this is to focus on failover for logical
> rep, with separate slots. The whole idea of integrating this physical
> rep imo makes this a *lot* more complex than necessary. Not all that
> many people are going to want to physical rep and logical rep.
>

If you're saying we should focus on failover between nodes that're
themselves connected using logical replication rather than physical
replication, I really have to strongly disagree.

TL;DR for book-length below: We're a long, long way from being able to
deliver even vaguely decent logical rep based failover. Without that or
support for logical decoding to survive physical failover we've got a great
tool in logical decoding that can't be used effectively with most
real-world systems.


I originally thought logical rep based failover was the way forward too and
that mixing physical and logical rep didn't make sense.

The problem is that we're a very, very long way from there, wheras we can

Re: [HACKERS] Tiny patch: sigmask.diff

2016-04-04 Thread Tom Lane
Aleksander Alekseev  writes:
> sigmask macro is defined in win32.h like this:
> #define sigmask(sig) ( 1 << ((sig)-1) )

> And used in signal.c in this fashion:
> for (i = 0; i < PG_SIGNAL_COUNT; i++)
> if (exec_mask & sigmask(i))

> Thus during first iteration we are doing `<< -1`. I think it's a bug.

Agreed.

> Patch attached.

Surely this fix is completely wrong?  You'd have to touch every use of
signum() to do it like that.  You'd also be introducing similarly-
undefined behavior at the other end of the loop, where this coding
would be asking to compute 1<<31, hence shifting into the sign bit,
which is undefined unless you make the computation explicitly unsigned.

I think better is just to change the for-loop to iterate from 1 not 0.
Signal 0 is invalid anyway, and is rejected in pg_queue_signal for
example, so there can't be any event waiting there.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-04-04 Thread Tom Lane
Amit Langote  writes:
> On 2016/04/04 15:17, Rajkumar Raghuwanshi wrote:
>> * .Observation: *Prepare statement execution plan is not getting changed
>> even after altering foreign table to point to new schema.

> I wonder if performing relcache invalidation upon ATExecGenericOptions()
> is the correct solution for this problem.  The attached patch fixes the
> issue for me.

A forced relcache inval will certainly fix it, but I'm not sure if that's
the best (or only) place to put it.

A related issue, now that I've seen this example, is that altering
FDW-level or server-level options won't cause a replan either.  I'm
not sure there's any very good fix for that.  Surely we don't want
to try to identify all tables belonging to the FDW or server and
issue relcache invals on all of them.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-04 Thread Teodor Sigaev

It seems to me that the patch is completed.
Except, maybe, grammar check of comments and documentation.

Looking forward to your review.
Are there any objectins on it? I'm planning to look closely today or tommorrow 
and commit it.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Choosing parallel_degree

2016-04-04 Thread Corey Huinker
On Mon, Apr 4, 2016 at 2:55 AM, Amit Kapila  wrote:

> On Sun, Apr 3, 2016 at 4:37 PM, Julien Rouhaud 
> wrote:
> >
> > On 22/03/2016 07:58, Julien Rouhaud wrote:
> > > On 21/03/2016 20:38, Julien Rouhaud wrote:
> > >> On 21/03/2016 05:18, James Sewell wrote:
> > >>> OK cool, thanks.
> > >>>
> > >>> Can we remove the minimum size limit when the per table degree
> setting
> > >>> is applied?
> > >>>
> > >>> This would help for tables with 2  - 1000 pages combined with a high
> CPU
> > >>> cost aggregate.
> > >>>
> > >>
> > >> Attached v4 implements that. It also makes sure that the chosen
> > >> parallel_degree won't be more than the relation's estimated number of
> pages.
> > >>
> > >
> > > And I just realize that it'd prevent from forcing parallelism on
> > > partitionned table, v5 attached removes the check on the estimated
> > > number of pages.
> > >
> > >
>
> Few comments:
> 1.
> +  limited according to the 
>
> A. typo.
>/gux-max-parallel-degree/guc-max-parallel-degree
>/worker/workers
> B. + 
> +  Number of workers wanted for this table. The number of worker will
> be
> +  limited according to
> the 
> +  parameter.
> + 
>
> How about writing the above as:
> Sets the degree of parallelism for an individual relation.  The requested
> number of workers will be limited by  linkend="guc-max-parallel-degree">
>
> 2.
> + {
> + {
> + "parallel_degree",
> + "Number of parallel processes
> per executor node wanted for this relation.",
> + RELOPT_KIND_HEAP,
> +
> AccessExclusiveLock
> + },
> + -1, 1, INT_MAX
> + },
>
> I think here min and max values should be same as for max_parallel_degree
> (I have verified that for some of the other reloption parameters, min and
> max are same as their guc values); Is there a reason to keep them different?
>
> 3.
> @@ -1291,7 +1300,9 @@ default_reloptions(Datum reloptions, bool validate,
> relopt_kind kind)
>
> Comment on top of this function says:
> /*
>  * Option parser for anything that uses StdRdOptions (i.e. fillfactor and
>  * autovacuum)
>  */
>
> I think it is better to include parallel_degree in above comment along
> with fillfactor and autovacuum.
>
>
> 4.
> /*
> + * RelationGetMaxParallelDegree
> + * Returns the relation's parallel_degree.  Note multiple eval of
> argument!
> + */
> +#define RelationGetParallelDegree(relation, defaultmpd) \
> + ((relation)->rd_options ? \
> +
> ((StdRdOptions *) (relation)->rd_options)->parallel_degree : (defaultmpd))
> +
>
> There are minor in-consistencies in the above macro definition.
>
> a. RelationGetMaxParallelDegree - This should be RelationGetParallelDegree.
> b. defaultmpd - it is better to name it as defaultpd
>
>
> >
> >
> > The feature freeze is now very close.  If this GUC is still wanted,
> > should I add this patch to the next commitfest?
> >
>
> I am hoping that this will be committed to 9.6, but I think it is good to
> register it in next CF.
>
>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>


I'm late to the party on this thread, and most of the discussion seems to
be about setting parallel levels based on tables, which I think is wise.

What I haven't seen is any talk about setting parallel degree relative to
how many CPUs exist on the machine. Clearly we don't need it right away,
but when we do, I'm happy to report that CPU discovery is as easy as

(int)sysconf(_SC_NPROCESSORS_ONLN)


source: https://github.com/moat/pmpp/blob/distribute_in_c/src/num_cpus.c an
extension I will be very happy to see declared obsolete.

But even that would probably be consulted only at system startup time, and
used to dynamically compute whatever GUCs and system settings will be used
until restart.


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-04-04 Thread Amit Kapila
On Mon, Apr 4, 2016 at 4:31 PM, Magnus Hagander  wrote:

> On Fri, Apr 1, 2016 at 6:47 AM, Amit Kapila 
> wrote:
>
>
>> Also, I think below part of documentation for pg_start_backup() needs to
>> be modified:
>>
>> 
>>
>> pg_start_backup accepts an
>>
>> arbitrary user-defined label for the backup.  (Typically this would be
>>
>> the name under which the backup dump file will be stored.)  The
>> function
>>
>> writes a backup label file (backup_label) and, if there
>>
>> are any links in the pg_tblspc/ directory, a tablespace
>> map
>>
>> file (tablespace_map) into the database cluster's data
>>
>> directory, performs a checkpoint, and then returns the backup's
>> starting
>>
>> transaction log location as text.  The user can ignore this result
>> value,
>>
>> but it is provided in case it is useful.
>>
>
> That one definitely needs to be fixed, as it's part of the reference. Good
> spot.
>
>
>
>> Similarly, there is a description for pg_stop_backup which needs to be
>> modified.
>>
>
> That's the one you're referring to in your first commend above, is it not?
> Or is there one more that you mean?
>
>
I am referring to below part of docs in func.sgml



pg_stop_backup removes the label file and, if it exists,

the tablespace_map file created by

pg_start_backup, and creates a backup history file in

the transaction log archive area.  The history file includes the label
given to

pg_start_backup, the starting and ending transaction log
locations for

the backup, and the starting and ending times of the backup.  The return

value is the backup's ending transaction log location (which again

can be ignored).  After recording the ending location, the current

transaction log insertion

point is automatically advanced to the next transaction log file, so
that the

ending transaction log file can be archived immediately to complete the
backup.

 



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-04 Thread Simon Riggs
On 4 April 2016 at 10:45, Andres Freund  wrote:

>
> Simon, perhaps you could hold the above question in your mind while
> looking through this?
>

Sure, np.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-04 Thread Masahiko Sawada
On Mon, Apr 4, 2016 at 6:03 PM, Kyotaro HORIGUCHI
 wrote:
> Hello, thank you for testing.
>
> At Sat, 2 Apr 2016 14:20:55 +1300, Thomas Munro 
>  wrote in 
> 

[HACKERS] Tiny patch: sigmask.diff

2016-04-04 Thread Aleksander Alekseev
Hello

sigmask macro is defined in win32.h like this:

```
#define sigmask(sig) ( 1 << ((sig)-1) )
```

And used in signal.c in this fashion:

```
for (i = 0; i < PG_SIGNAL_COUNT; i++)
{
if (exec_mask & sigmask(i))
{
```

Thus during first iteration we are doing `<< -1`. I think it's a bug.

Patch attached.

-- 
Best regards,
Aleksander Alekseev
http://eax.me/
diff --git a/src/backend/port/win32/signal.c b/src/backend/port/win32/signal.c
index 36c6ebd..3724aa3 100644
--- a/src/backend/port/win32/signal.c
+++ b/src/backend/port/win32/signal.c
@@ -115,14 +115,14 @@ pgwin32_dispatch_queued_signals(void)
 
 		for (i = 0; i < PG_SIGNAL_COUNT; i++)
 		{
-			if (exec_mask & sigmask(i))
+			if (exec_mask & sigmask(i+1))
 			{
 /* Execute this signal */
 pqsigfunc	sig = pg_signal_array[i];
 
 if (sig == SIG_DFL)
 	sig = pg_signal_defaults[i];
-pg_signal_queue &= ~sigmask(i);
+pg_signal_queue &= ~sigmask(i+1);
 if (sig != SIG_ERR && sig != SIG_IGN && sig != SIG_DFL)
 {
 	LeaveCriticalSection(_signal_crit_sec);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-04-04 Thread David Steele

On 3/19/16 8:15 AM, Magnus Hagander wrote:


I've attached an updated patch, which is rebased on current master and
includes the oid fix.


I've now had a chance to go through this in detail and test thoroughly. 
 We had a lot of discussion about copying pg_control last and that led 
me to believe that there might be some sort of race condition with 
multiple backups running concurrently.


I tried every trick I could think of and everything worked as expected 
so as far as I can see this requirement only applies to backups taken 
from a standby which agrees with the comments in do_pg_stop_backup().


Note that I only tested backups against a master database since that's 
what is modified by this patch.  I'm still not entirely comfortable with 
how backups against a standby are done and it that case still think it 
would be best to write the stop wal location into backup_label, but as 
you said before that would be a completely separate patch.


So in the end I'm fine with the code as it stands with the exception of 
the pg_stop_backup2() naming.  I'd prefer a more verbose name here but 
I'm unable to come up with anything very good.  How about 
pg_stop_backup_v2()?  This function could also use a few more comments, 
at least at the top of the exclusive and non-exclusive blocks.


Except for those minor details I believe this is "ready for committer" 
unless anybody has an objection.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-04 Thread Tomas Vondra

On 04/04/2016 02:06 PM, Teodor Sigaev wrote:

The above-described topic is currently a PostgreSQL 9.6 open item.
Teodor,
since you committed the patch believed to have created it, you own
this open
item.  If that responsibility lies elsewhere, please let us know whose
responsibility it is to fix this.  Since new open items may be
discovered at
any time and I want to plan to have them all fixed well in advance of
the ship
date, I will appreciate your efforts toward speedy resolution.  Please
present, within 72 hours, a plan to fix the defect within seven days
of this
message.  Thanks.


I'm waiting of Tomas testing. Suppose, it isn't possible now? so, will
do myself, after that I will publish results.


Ah, damn. This completely slipped from my TODO list. I'll rerun the 
tests either today or tomorrow, and report the results here.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-04 Thread Teodor Sigaev

The above-described topic is currently a PostgreSQL 9.6 open item.  Teodor,
since you committed the patch believed to have created it, you own this open
item.  If that responsibility lies elsewhere, please let us know whose
responsibility it is to fix this.  Since new open items may be discovered at
any time and I want to plan to have them all fixed well in advance of the ship
date, I will appreciate your efforts toward speedy resolution.  Please
present, within 72 hours, a plan to fix the defect within seven days of this
message.  Thanks.


I'm waiting of Tomas testing. Suppose, it isn't possible now? so, will do 
myself, after that I will publish results.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-04-04 Thread Michael Paquier
On Mon, Apr 4, 2016 at 7:49 PM, Etsuro Fujita
 wrote:
> On 2016/03/31 16:38, Etsuro Fujita wrote:
>>
>> On 2016/03/31 14:07, Noah Misch wrote:
>>>
>>> On Thu, Mar 24, 2016 at 01:02:57PM +0900, Etsuro Fujita wrote:

 On 2016/03/24 11:14, Michael Paquier wrote:
>
> On Wed, Mar 23, 2016 at 10:05 PM, Thom Brown  wrote:
>>
>> I've noticed that you now can't cancel a query if there's DML pushdown
>> to a foreign server.  This previously worked while it was sending
>> individual statements as it interrupted and rolled it back.
>>
>> Here's what the local server sees when trying to cancel:
>>
>> # DELETE FROM remote.contacts;
>> ^CCancel request sent
>> DELETE 500
>>
>> This should probably be fixed.
>
>
> Looking at what has been committed, execute_dml_stmt is using
> PQexecParams, so we'd want to use an asynchronous call and loop on
> PQgetResult with CHECK_FOR_INTERRUPTS() in it.
>
>
 Will fix.
>
>
>>> [This is a generic notification.]
>
>
>> Sorry for not having taken any action.  I've been busy with another task
>> lately, but I started working on this.  I plan to post a patch early
>> next week.
>
>
> Here is a patch to fix this issue.  As proposed by Michael, I modified
> execute_dml_stmt so that it uses PQsendQueryParams, not PQexecParams. Any
> comments are welcome.

+  * This is based on pqSocketCheck.
+  */
+ bool
+ CheckSocket(PGconn *conn)
+ {
+ intret;
+
+ Assert(conn != NULL);
Instead of copying again pqSocketQuery, which is as well copied in
libpqwalreceiver.c, wouldn't it be better to use WaitLatchOrSocket
with the socket returned by PQsocket?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Incorrect comment in contrib/postgres_fdw/deparse.c

2016-04-04 Thread Etsuro Fujita
Hi,

I found an incorrect comment in contrib/postgres_fdw/deparse.c: s/FOR
SELECT or FOR SHARE/FOR UPDATE or FOR SHARE/  Attached is a patch to fix
that comment.

Best regards,
Etsuro Fujita
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index bdc410d..d78ac79 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -937,7 +937,7 @@ deparseTargetList(StringInfo buf,
 }
 
 /*
- * Deparse the appropriate locking clause (FOR SELECT or FOR SHARE) for a
+ * Deparse the appropriate locking clause (FOR UPDATE or FOR SHARE) for a
  * given relation (context->foreignrel).
  */
 static void

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-04-04 Thread Magnus Hagander
On Fri, Apr 1, 2016 at 6:47 AM, Amit Kapila  wrote:

> On Sat, Mar 19, 2016 at 5:45 PM, Magnus Hagander 
> wrote:
>
>> On Wed, Mar 2, 2016 at 6:49 PM, Marco Nenciarini <
>> marco.nenciar...@2ndquadrant.it> wrote:
>>
>>>
>>> I've attached an updated patch, which is rebased on current master and
>> includes the oid fix.
>>
>>
>
> +   Finish performing exclusive on-line backup (restricted to
> superusers or replication roles)
>
> +  
>
> +  
>
> +   
>
> +pg_stop_backup(exclusive
> boolean)
>
> +
>
> +   setof record
>
> +   Finish performing exclusive or non-exclusive on-line backup
> (restricted to superusers or replication roles)
>
>
> Isn't it better to indicate that user needs to form a backup_label and
> tablespace_map file from the output of this API and those needs to be
> dropped into data directory?
>

I think that documentation should go in the "usage" part of the
documentation, and not the reference to the function itself.

This is the documentation that is not written yet of course. I was planinng
to wait for Bruce to finish his restructuring of the backup documentation
in general, but latest news on that was that it's several months away, so I
am going to go ahead and write it anyway, without waiting for that (or
possibly do the restructure at hte same time). That's where the process of
how to use these functions belong.


> Also, I think below part of documentation for pg_start_backup() needs to
> be modified:
>
> 
>
> pg_start_backup accepts an
>
> arbitrary user-defined label for the backup.  (Typically this would be
>
> the name under which the backup dump file will be stored.)  The
> function
>
> writes a backup label file (backup_label) and, if there
>
> are any links in the pg_tblspc/ directory, a tablespace
> map
>
> file (tablespace_map) into the database cluster's data
>
> directory, performs a checkpoint, and then returns the backup's
> starting
>
> transaction log location as text.  The user can ignore this result
> value,
>
> but it is provided in case it is useful.
>

That one definitely needs to be fixed, as it's part of the reference. Good
spot.



> Similarly, there is a description for pg_stop_backup which needs to be
> modified.
>

That's the one you're referring to in your first commend above, is it not?
Or is there one more that you mean?



> CREATE OR REPLACE FUNCTION
>
> -  pg_start_backup(label text, fast boolean DEFAULT false)
>
> +  pg_start_backup(label text, fast boolean DEFAULT false, exclusive
> boolean DEFAULT true)
>
>RETURNS pg_lsn STRICT VOLATILE LANGUAGE internal AS 'pg_start_backup';
>
>
> One thing, that might be slightly inconvenient for user is if he or she
> wants to use this API for non-exclusive backups then, they need to pass the
> value of second parameter as well which doesn't seem to be a big issue.
>

Well, they have to pass it *somehow*. The other option would be to have a
different function, which I think doesn't help at all. And we do *not* want
the behaviour of existing scripts to implicitly change, so we can't have
the default be non-exclusive.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-04-04 Thread Etsuro Fujita

On 2016/03/31 16:38, Etsuro Fujita wrote:

On 2016/03/31 14:07, Noah Misch wrote:

On Thu, Mar 24, 2016 at 01:02:57PM +0900, Etsuro Fujita wrote:

On 2016/03/24 11:14, Michael Paquier wrote:

On Wed, Mar 23, 2016 at 10:05 PM, Thom Brown  wrote:

I've noticed that you now can't cancel a query if there's DML pushdown
to a foreign server.  This previously worked while it was sending
individual statements as it interrupted and rolled it back.

Here's what the local server sees when trying to cancel:

# DELETE FROM remote.contacts;
^CCancel request sent
DELETE 500

This should probably be fixed.



Looking at what has been committed, execute_dml_stmt is using
PQexecParams, so we'd want to use an asynchronous call and loop on
PQgetResult with CHECK_FOR_INTERRUPTS() in it.



Will fix.



[This is a generic notification.]



Sorry for not having taken any action.  I've been busy with another task
lately, but I started working on this.  I plan to post a patch early
next week.


Here is a patch to fix this issue.  As proposed by Michael, I modified 
execute_dml_stmt so that it uses PQsendQueryParams, not PQexecParams. 
Any comments are welcome.


Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/connection.c
--- b/contrib/postgres_fdw/connection.c
***
*** 12,17 
--- 12,20 
   */
  #include "postgres.h"
  
+ #include 
+ #include 
+ 
  #include "postgres_fdw.h"
  
  #include "access/xact.h"
***
*** 20,25 
--- 23,38 
  #include "utils/hsearch.h"
  #include "utils/memutils.h"
  
+ #ifdef HAVE_POLL_H
+ #include 
+ #endif
+ #ifdef HAVE_SYS_POLL_H
+ #include 
+ #endif
+ #ifdef HAVE_SYS_SELECT_H
+ #include 
+ #endif
+ 
  
  /*
   * Connection cache hash table entry
***
*** 417,422  ReleaseConnection(PGconn *conn)
--- 430,484 
  }
  
  /*
+  * Wait until we can read data.
+  *
+  * Returns true if data has become available for reading, false if interrupted
+  * by signal.
+  *
+  * This is based on pqSocketCheck.
+  */
+ bool
+ CheckSocket(PGconn *conn)
+ {
+ 	int			ret;
+ 
+ 	Assert(conn != NULL);
+ 	if (PQsocket(conn) < 0)
+ 		ereport(ERROR,
+ (errcode_for_socket_access(),
+  errmsg("invalid socket: %s", PQerrorMessage(conn;
+ 
+ 	/* We use poll(2) if available, otherwise select(2) */
+ 	{
+ #ifdef HAVE_POLL
+ 		struct pollfd input_fd;
+ 
+ 		input_fd.fd = PQsocket(conn);
+ 		input_fd.events = POLLIN | POLLERR;
+ 		input_fd.revents = 0;
+ 
+ 		ret = poll(_fd, 1, -1);
+ #else			/* !HAVE_POLL */
+ 		fd_set		input_mask;
+ 
+ 		FD_ZERO(_mask);
+ 		FD_SET(PQsocket(conn), _mask);
+ 
+ 		ret = select(PQsocket(conn) + 1, _mask, NULL, NULL, NULL);
+ #endif   /* HAVE_POLL */
+ 	}
+ 
+ 	Assert(ret != 0);
+ 	if (ret < 0 && errno == EINTR)
+ 		return false;
+ 	if (ret < 0)
+ 		ereport(ERROR,
+ (errcode_for_socket_access(),
+  errmsg("select() failed: %s", strerror(errno;
+ 	return true;
+ }
+ 
+ /*
   * Assign a "unique" number for a cursor.
   *
   * These really only need to be unique per connection within a transaction.
***
*** 598,603  pgfdw_xact_callback(XactEvent event, void *arg)
--- 660,684 
  case XACT_EVENT_ABORT:
  	/* Assume we might have lost track of prepared statements */
  	entry->have_error = true;
+ 	/*
+ 	 * If we executed a query asynchronously, the query might
+ 	 * have not yet completed.  Check to see if a command is
+ 	 * still being processed by the remote server, and if so,
+ 	 * request cancellation of the command; if not, abort
+ 	 * gracefully.
+ 	 */
+ 	if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE)
+ 	{
+ 		PGcancel   *cancel;
+ 		char		errbuf[256];
+ 
+ 		if ((cancel = PQgetCancel(entry->conn)))
+ 		{
+ 			PQcancel(cancel, errbuf, sizeof(errbuf));
+ 			PQfreeCancel(cancel);
+ 		}
+ 		break;
+ 	}
  	/* If we're aborting, abort all remote transactions too */
  	res = PQexec(entry->conn, "ABORT TRANSACTION");
  	/* Note: can't throw ERROR, it would be infinite loop */
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***
*** 3151,3158  execute_dml_stmt(ForeignScanState *node)
  	 * We don't use a PG_TRY block here, so be careful not to throw error
  	 * without releasing the PGresult.
  	 */
! 	dmstate->result = PQexecParams(dmstate->conn, dmstate->query,
!    numParams, NULL, values, NULL, NULL, 0);
  	if (PQresultStatus(dmstate->result) !=
  		(dmstate->has_returning ? PGRES_TUPLES_OK : PGRES_COMMAND_OK))
  		pgfdw_report_error(ERROR, dmstate->result, dmstate->conn, true,
--- 3151,3181 
  	 * We don't use a PG_TRY block here, so be careful not to throw error
  	 * without releasing the PGresult.
  	 */
! 	if (!PQsendQueryParams(dmstate->conn, dmstate->query, numParams,
! 		   NULL, values, NULL, NULL, 0))
! 		pgfdw_report_error(ERROR, NULL, dmstate->conn, true, dmstate->query);
! 
! 	/*
! 	 * Receive 

Re: [HACKERS] PATCH: use foreign keys to improve join estimates v1

2016-04-04 Thread Amit Langote
On 2016/04/04 17:25, Simon Riggs wrote:
> The rel cache code you're adding uses a flag called "rd_fkeyvalid" which
> indicates that the relcache is correctly filled. That is confusing, since
> it has nothing to do with the concept of constraint validity. We should
> rename that to rd_fkeycachefilled or similar.

Maybe I'm missing something, but is a separate bool required at all in
this case?  Wouldn't simply doing the following suffice?

/* Quick exit if we already computed the list. */
if (relation->rd_fkeylist)
return list_copy(relation->rd_fkeylist);

ISTM, rd_fkeyvalid is modeled on rd_indexvalid, where the latter serves to
convey more info than simply whether the index list is valid or not, so
the extra field is justified.

Also, it seems the patch forgot to update RelationDestroyRelation().

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline following for logical slots

2016-04-04 Thread Andres Freund
On 2016-04-04 17:50:02 +0800, Craig Ringer wrote:
> To rephrase per my understanding: The client only specifies the point it
> wants to start seeing decoded commits. Decoding starts from the slot's
> restart_lsn, and that's the point from which the accumulation of reorder
> buffer contents begins, the snapshot building process begins, and where
> accumulation of relcache invalidation information begins. At restart_lsn no
> xact that is to be emitted to the client may yet be in progress. Decoding,
s/yet/already/
> whether or not the xacts will be fed to the output plugin callbacks,
> requires access to the system catalogs. Therefore catalog_xmin reported by
> the slot must be >= the real effective catalog_xmin of the heap and valid
> at the restart_lsn, not just the confirmed flush point or the point the
> client specifies to resume fetching changes from.

Hm. Maybe I'm misunderstanding you here, but doesn't it have to be <=?

> On the original copy of the slot on the pre-failover master the restart_lsn
> would've been further ahead, as would the catalog_xmin. So catalog rows
> have been purged.
+may


> So it's necessary to ensure that the slot's restart_lsn and catalog_xmin
> are advanced in a timely, consistent manner on the replica's copy of the
> slot at a point where no vacuum changes to the catalog that could remove
> needed tuples have been replayed.

Right.


> The only way I can think of to do that really reliably right now, without
> full failover slots, is to use the newly committed pluggable WAL mechanism
> and add a hook to SaveSlotToPath() so slot info can be captured, injected
> in WAL, and replayed on the replica.

I personally think the primary answer is to use separate slots on
different machines. Failover slots can be an extension to that at some
point, but I think they're a secondary goal.


> It'd also be necessary to move
> CheckPointReplicationSlots() out of CheckPointGuts()  to the start of a
> checkpoint/restartpoint when WAL writing is still permitted, like the
> failover slots patch does.

Ugh. That makes me rather wary.


> Basically, failover slots as a plugin using a hook, without the
> additions to base backup commands and the backup label.

I'm going to be *VERY* hard to convince that adding a hook inside
checkpointing code is acceptable.


> I'd really hate 9.6 to go out with - still - no way to use logical decoding
> in a basic, bog-standard HA/failover environment. It overwhelmingly limits
> their utility and it's becoming a major drag on practical use of the
> feature. That's a difficulty given that the failover slots patch isn't
> especially trivial and you've shown that lazy sync of slot state is not
> sufficient.

I think the right way to do this is to focus on failover for logical
rep, with separate slots. The whole idea of integrating this physical
rep imo makes this a *lot* more complex than necessary. Not all that
many people are going to want to physical rep and logical rep.


> The restart_lsn from the newer copy of the slot is, as you said, a point we
> know we can reconstruct visibility info.

We can on the master. There's absolutely no guarantee that the
associated serialized snapshot is present on the standby.


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline following for logical slots

2016-04-04 Thread Craig Ringer
On 1 April 2016 at 14:46, Andres Freund  wrote:


> > However: It's safe for the slot state on the replica to be somewhat
> behind
> > the local replay from the master, so the slot state on the replica is
> older
> > than what it would've been at an equivalent time on the master... so long
> > as it's not so far behind that the replica has replayed vacuum activity
> > that removes rows still required by the slot's declared catalog_xmin.
> Even
> > then it should actually be OK in practice because the failing-over client
> > will always have replayed past that point on the master (otherwise
> > catalog_xmin couldn't have advanced on the master), so it'll always ask
> to
> > start replay at a point at or after the lsn where the catalog_xmin was
> > advanced to its most recent value on the old master before failover. It's
> > only safe for walsender based decoding though, since there's no way to
> > specify the startpoint for sql-level decoding.
>
> This whole logic fails entirely flats on its face by the fact that even
> if you specify a startpoint, we read older WAL and process the
> records. The startpoint filters every transaction that commits earlier
> than the "startpoint". But with a long-running transaction you still
> will have old changes being processed, which need the corresponding
> catalog state.
>

Right. Having read over those draft docs I see what you mean.

TL;DR: the only way I can see to do this now is full-on failover slots, or
something very similar that uses pluggable WAL + a slot save hook + moving
CheckPointReplicationSlots to the start of a checkpoint while WAL is still
writeable.



To rephrase per my understanding: The client only specifies the point it
wants to start seeing decoded commits. Decoding starts from the slot's
restart_lsn, and that's the point from which the accumulation of reorder
buffer contents begins, the snapshot building process begins, and where
accumulation of relcache invalidation information begins. At restart_lsn no
xact that is to be emitted to the client may yet be in progress. Decoding,
whether or not the xacts will be fed to the output plugin callbacks,
requires access to the system catalogs. Therefore catalog_xmin reported by
the slot must be >= the real effective catalog_xmin of the heap and valid
at the restart_lsn, not just the confirmed flush point or the point the
client specifies to resume fetching changes from.

On the original copy of the slot on the pre-failover master the restart_lsn
would've been further ahead, as would the catalog_xmin. So catalog rows
have been purged. When we decode from the old restart_lsn on the replica
after failover we're not just doing excess work, we'd be accumulating
probably-incorrect invalidation information for the xacts that we're
decoding (but not sending to the client). Or just failing to find entries
we expect to find in the caches and dying.

If the restart_lsn was advanced on the master there must be another safe
decoding restart point after the one the old copy of the slot points to.
But we don't know where, and we don't have any way to know that we *are* in
such a post-failover situation so we can't just scan forward looking for it
without looking up invalidation info for xacts we know we'll be throwing
away.

So it's necessary to ensure that the slot's restart_lsn and catalog_xmin
are advanced in a timely, consistent manner on the replica's copy of the
slot at a point where no vacuum changes to the catalog that could remove
needed tuples have been replayed.

The only way I can think of to do that really reliably right now, without
full failover slots, is to use the newly committed pluggable WAL mechanism
and add a hook to SaveSlotToPath() so slot info can be captured, injected
in WAL, and replayed on the replica.  It'd also be necessary to move
CheckPointReplicationSlots() out of CheckPointGuts()  to the start of a
checkpoint/restartpoint when WAL writing is still permitted, like the
failover slots patch does. Basically, failover slots as a plugin using a
hook, without the additions to base backup commands and the backup label.
I'm not convinced that's at all better than the full failover slots, and
I'm not sure I can make a solid argument for the general purpose utility of
the hook it'd need, but at least it'd avoid committing core to that
approach.

I'd really hate 9.6 to go out with - still - no way to use logical decoding
in a basic, bog-standard HA/failover environment. It overwhelmingly limits
their utility and it's becoming a major drag on practical use of the
feature. That's a difficulty given that the failover slots patch isn't
especially trivial and you've shown that lazy sync of slot state is not
sufficient.

> It's also OK for the slot state to be *ahead* of the replica's replay
> > position, i.e. from the future. restart_lsn and catalog_xmin will be
> higher
> > than they were on the master at the same point in time, but that doesn't
> > matter at all, since the client 

Re: [HACKERS] pgbench more operators & functions

2016-04-04 Thread Fabien COELHO


Hello Andres,

I don't see much point in asking people to postpone. I do think however 
it can make sense to respond with something like: Fabien, you've been 
submitting a lot of patches over the last year. Thanks for the that! To 
keep up with the amount of incoming work the prject relies on 
contributors also shouldering some review responsibility. Please 
consider focusing on that, while we're working on getting 9.6 ready.


Sure, I definitely agree about that.

I try to review all patches in my (small) area of (limited) expertise, 
which is currently pgbench & some part of the checkpointer. I did also 
minor bug fixes (eg isbn). AFAICS none of the patches lacking a reviewer 
in 9.6 CF fall in these area.


Also note that while I submitted patches for the checkpointer, I ended up 
reviewing your version of the patches, so somehow I was first author, then 
a driving force to provoke you to do it your way, and finally a reviewer,

esp in performance testing which is a time consumming task.

I can also learn other things, but that means more time to do a useful 
review. This "more" time is available for me mostly over the Summer, so 
I'll try to be more useful to the community, and also learn new stuff, 
then. Probably not ideal for 9.6, but it cannot be helped.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-04 Thread Andres Freund
On 2016-04-04 10:35:34 +0100, Simon Riggs wrote:
> On 4 April 2016 at 09:28, Fujii Masao  wrote:
> > Barring any objections, I'll commit this patch.

No objection here either, just one question: Has anybody thought about
the ability to extend this to do per-database syncrep? Logical decoding
works on a database level, and that can cause some problems with global
configuration.

> That sounds good.
> 
> May I have one more day to review this? Actually more like 3-4 hours.

> I have no comments on an initial read, so I'm hopeful of having nothing at
> all to say on it.

Simon, perhaps you could hold the above question in your mind while
looking through this?

Thanks,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-04 Thread Simon Riggs
On 4 April 2016 at 09:28, Fujii Masao  wrote:


> Barring any objections, I'll commit this patch.
>

That sounds good.

May I have one more day to review this? Actually more like 3-4 hours.

I have no comments on an initial read, so I'm hopeful of having nothing at
all to say on it.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] WIP: Access method extendability

2016-04-04 Thread Emre Hasegeli
I think the variable "magick" should be "magic".  Patch attached.


bloom-magick.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-04 Thread Kyotaro HORIGUCHI
Hello, thank you for testing.

At Sat, 2 Apr 2016 14:20:55 +1300, Thomas Munro  
wrote in 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-04 Thread Abhijit Menon-Sen
At 2016-04-04 17:28:07 +0900, masao.fu...@gmail.com wrote:
>
> Barring any objections, I'll commit this patch.

No objections, just a minor wording tweak:

doc/src/sgml/config.sgml:

"The synchronous standbys will be the standbys that their names appear
early in this list" should be "The synchronous standbys will be those
whose names appear earlier in this list".

doc/src/sgml/high-availability.sgml:

"The standbys that their names appear early in this list are given
higher priority and will be considered as synchronous" should be "The
standbys whose names appear earlier in the list are given higher
priority and will be considered as synchronous".

"The standbys that their names appear early in the list will be used as
the synchronous standby" should be "The standbys whose names appear
earlier in the list will be used as synchronous standbys".

You may prefer to reword this in some other way, but the current "that
their names appear" wording should be changed.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-04 Thread Andres Freund
Hi,

On 2016-04-03 16:47:49 +0530, Dilip Kumar wrote:
> 6. With Head+ pinunpin-cas-8 +
> 0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect performance is
> almost same as with
> Head+pinunpin-cas-8, only sometime performance at 128 client is low
> (~250,000 instead of 650,000)

Hm, interesting. I suspect that's because of the missing backoff in my
experimental patch. If you apply the attached patch ontop of that
(requires infrastructure from pinunpin), how does performance develop?

Regards,

Andres
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index ec6baf6..4216be5 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -858,11 +858,15 @@ LWLockWaitListLock(LWLock *lock)
 	{
 		if (old_state & LW_FLAG_LOCKED)
 		{
-			/* FIXME: add exponential backoff */
-			pg_spin_delay();
-			old_state = pg_atomic_read_u32(>state);
+			SpinDelayStatus delayStatus = init_spin_delay((void*)>state);
+			while (old_state & LW_FLAG_LOCKED)
+			{
+perform_spin_delay();
+old_state = pg_atomic_read_u32(>state);
+			}
+			finish_spin_delay();
 #ifdef LWLOCK_STATS
-			delays++;
+			delays += delayStatus.delays;
 #endif
 		}
 		else

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-04-04 Thread Amit Langote

Hi,

Thanks for the report.

On 2016/04/04 15:17, Rajkumar Raghuwanshi wrote:
> Hi,
> 
> I observed below in postgres_fdw
> 
> * .Observation: *Prepare statement execution plan is not getting changed
> even after altering foreign table to point to new schema.
> 

[ ... ]

> PREPARE stmt_ft AS select c1,c2 from ft;
> 
> EXECUTE stmt_ft;
>  c1 |  c2
> +---
>   1 | s1.lt
> (1 row)
> 
> --changed foreign table ft pointing schema from s1 to s2
> ALTER foreign table ft options (SET schema_name 's2', SET table_name 'lt');
> ANALYZE ft;
> 
> EXPLAIN (COSTS OFF, VERBOSE) EXECUTE stmt_ft;
>QUERY PLAN
> 
>  Foreign Scan on public.ft
>Output: c1, c2
>Remote SQL: SELECT c1, c2 FROM s1.lt
> (3 rows)

I wonder if performing relcache invalidation upon ATExecGenericOptions()
is the correct solution for this problem.  The attached patch fixes the
issue for me.

Thanks,
Amit
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 9e9082d..6a4e1d6 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -11642,6 +11642,13 @@ ATExecGenericOptions(Relation rel, List *options)
 	simple_heap_update(ftrel, >t_self, tuple);
 	CatalogUpdateIndexes(ftrel, tuple);
 
+	/*
+	 * Invalidate the relcache for the table, so that after this commit
+	 * all sessions will refresh any cached plans that are based on older
+	 * values of the options.
+	 */
+	CacheInvalidateRelcache(rel);
+
 	InvokeObjectPostAlterHook(ForeignTableRelationId,
 			  RelationGetRelid(rel), 0);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-04 Thread Fujii Masao
On Sat, Apr 2, 2016 at 10:20 AM, Thomas Munro
 wrote:
> On Thu, Mar 31, 2016 at 5:11 PM, Thomas Munro
>  wrote:
>> On Thu, Mar 31, 2016 at 3:55 AM, Masahiko Sawada  
>> wrote:
>>> On Wed, Mar 30, 2016 at 11:43 PM, Masahiko Sawada  
>>> wrote:
 On Tue, Mar 29, 2016 at 5:36 PM, Kyotaro HORIGUCHI
  wrote:
> I personally don't think it needs such a survive measure. It is
> very small syntax and the parser reads very short text. If the
> parser failes in such mode, something more serious should have
> occurred.
>
> At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao  
> wrote in 
> 
>> On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
>>  wrote:
>> > Hello,
>> >
>> > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada 
>> >  wrote in 
>> > 
>> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
>> >>  wrote:
>> > As mentioned in my comment, SQL parser converts yy_fatal_error
>> > into ereport(ERROR), which can be caught by the upper PG_TRY (by
>> > #define'ing fprintf). So it is doable if you mind exit().
>>
>> I'm afraid that your idea doesn't work in postmaster. Because 
>> ereport(ERROR) is
>> implicitly promoted to ereport(FATAL) in postmaster. IOW, when an 
>> internal
>> flex fatal error occurs, postmaster just exits instead of jumping out of 
>> parser.
>
> If The ERROR may be LOG or DEBUG2 either, if we think the parser
> fatal erros are recoverable. guc-file.l is doing so.
>
>> ISTM that, when an internal flex fatal error occurs, it's
>> better to elog(FATAL) and terminate the problematic
>> process. This might lead to the server crash (e.g., if
>> postmaster emits a FATAL error, it and its all child processes
>> will exit soon). But probably we can live with this because the
>> fatal error basically rarely happens.
>
> I agree to this
>
>> OTOH, if we make the process keep running even after it gets an internal
>> fatal error (like Sawada's patch or your idea do), this might cause more
>> serious problem. Please imagine the case where one walsender gets the 
>> fatal
>> error (e.g., because of OOM), abandon new setting value of
>> synchronous_standby_names, and keep running with the previous setting 
>> value.
>> OTOH, the other walsender processes successfully parse the setting and
>> keep running with new setting. In this case, the inconsistency of the 
>> setting
>> which each walsender is based on happens. This completely will mess up 
>> the
>> synchronous replication.
>
> On the other hand, guc-file.l seems ignoring parser errors under
> normal operation, even though it may cause similar inconsistency,
> if any..
>
> | LOG:  received SIGHUP, reloading configuration files
> | LOG:  input in flex scanner failed at file 
> "/home/horiguti/data/data_work/postgresql.conf" line 1
> | LOG:  configuration file 
> "/home/horiguti/data/data_work/postgresql.conf" contains errors; no 
> changes were applied
>
>> Therefore, I think that it's better to make the problematic process exit
>> with FATAL error rather than ignore the error and keep it running.
>
> +1. Restarting walsender would be far less harmful than keeping
> it running in doubtful state.
>
> Sould I wait for the next version or have a look on the latest?
>

 Attached latest patch incorporate some review comments so far, and is
 rebased against current HEAD.

>>>
>>> Sorry I attached wrong patch.
>>> Attached patch is correct patch.

Thanks for updating the patch!

I applied the following changes to the patch.
Attached is the revised version of the patch.

- Changed syncrep_flex_fatal() so that it just calls ereport(FATAL), based on
  the recent discussion with Horiguchi-san.
- Improved the documentation.
- Fixed some bugs.
- Removed the changes for recovery testing framework. I'd like to commit
   those changes later separately from the main patch of multiple sync rep.

Barring any objections, I'll commit this patch.

>> One thing I noticed is that there are LOG messages telling me when a
>> standby becomes a synchronous standby, but nothing to tell me if a
>> standby stops being a standby (ie because a higher priority one has
>> taken its place in the quorum).  Would that be interesting?

+1

>> Also, I spotted some tiny mistakes:
>>
>> +  
>> +   Dedicated language for multiple synchornous 
>> replication
>> +  
>>
>> 

Re: [HACKERS] PATCH: use foreign keys to improve join estimates v1

2016-04-04 Thread Simon Riggs
On 3 April 2016 at 22:44, Simon Riggs  wrote:


> Detailed comments in the planner part of the patch. The discussion around
> this patch isn't reflected enough in the patch.
>

I think we should record that the planner uses the constraint, even if the
constraint is not yet valid, per DDL.

The rel cache code you're adding uses a flag called "rd_fkeyvalid" which
indicates that the relcache is correctly filled. That is confusing, since
it has nothing to do with the concept of constraint validity. We should
rename that to rd_fkeycachefilled or similar.

ISTM that the FKey info added to the rel cache would be useful for many
optimizations, hence why I think we should commit that separately, whether
or not the specific optimization for the other half of the patch is
accepted or later modified or revoked. Objections?

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Generic WAL logical messages

2016-04-04 Thread Simon Riggs
On 4 April 2016 at 05:23, Petr Jelinek  wrote:


> I rebased this patch on top of current master as the generic wal commit
> added some conflicting changes. Also fixed couple of typos in comments and
> added non ascii message to test.


This looks good to me, so have marked it Ready For Committer.

I marked myself as Committer to show there was interest in this. If anyone
else would like to commit it, I am happy for you to do so.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-04 Thread Noah Misch
On Thu, Feb 25, 2016 at 11:19:20AM -0800, Jeff Janes wrote:
> On Wed, Feb 24, 2016 at 8:51 AM, Teodor Sigaev  wrote:
> > Thank you for remembering this problem, at least for me.
> >
> >>> Well, turns out there's a quite significant difference, actually. The
> >>> index sizes I get (quite stable after multiple runs):
> >>>
> >>> 9.5 : 2428 MB
> >>> 9.6 + alone cleanup : 730 MB
> >>> 9.6 + pending lock : 488 MB
> >
> > Interesting, I don't see why alone_cleanup and pending_lock are so differ.
> > I'd like to understand that, but does somebody have an good theory?
> 
> Under my patch, anyone who wanted to do a clean up and detected
> someone else was doing one would wait for the concurrent one to end.
> (This is more consistent with the existing behavior, I just made it so
> they don't do any damage while they wait.)
> 
> Under your patch, if a backend wants to do a clean up and detects
> someone else is already doing one, it would just skip the clean up and
> proceed on with whatever it was doing.  This allows one process
> (hopefully a vacuum, but maybe a user backend) to get pinned down
> indefinitely, as other processes keep putting stuff onto the end of
> the pending_list with no throttle.
> 
> Since the freespace recycling only takes place once the list is
> completely cleaned, allowing some processes to add to the end while
> one poor process is trying to clean can lead to less effective
> recycling.
> 
> That is my theory, anyway.

[This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.  Teodor,
since you committed the patch believed to have created it, you own this open
item.  If that responsibility lies elsewhere, please let us know whose
responsibility it is to fix this.  Since new open items may be discovered at
any time and I want to plan to have them all fixed well in advance of the ship
date, I will appreciate your efforts toward speedy resolution.  Please
present, within 72 hours, a plan to fix the defect within seven days of this
message.  Thanks.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline following for logical slots

2016-04-04 Thread Craig Ringer
On 4 April 2016 at 14:43, Andres Freund  wrote:


> > OK, makes sense. And still resume decoding from restart_lsn, despite
> having
> > that visibility information stashed, because we also have to rebuild the
> > information on invalidations for running xacts, which is not stored
> > persistently anywhere as decoding progresses. So for now at least it's an
> > optimisation to store the visibility info, since we still have go go back
> > and decode for invalidations anyway. Right?
>
> Not really no. The important point isn't invalidation or anything. It's
> that we don't have the content & metadata of the transactions
> themselves.  Yes, we also do re-read invalidations, but that's just a
> side effect.
>

Right, because the reorder buffer contents aren't persistent so decoding
must restart at an lsn at or before the start of the oldest uncommitted
xact at the time it wants to start returning committed changes to the
client.

I've been looking too hard for the details and somehow managed to
completely miss the blindingly obvious right in front of me. I knew that
the confirmed lsn was the threshold of confirmed commit LSNs, I knew about
the reorder buffer needing to accumulate changes, I knew it wasn't
persistent, etc. Yet I can't pretend I didn't overlook that when looking
over the xlogreader vs decoding startup, per those comments.

I don't think a succinct comment there will be useful given that it'd
really just do a better job of explaining what restart_lsn is and why we
start decoding there. So I guess a more detailed comment on the slot struct
(not on the field, at the start) would be good. Or that README.

Thanks for your patience.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] pgbench more operators & functions

2016-04-04 Thread Amit Kapila
On Mon, Apr 4, 2016 at 11:51 AM, Andres Freund  wrote:
>
> On 2016-04-04 06:18:47 +0100, Simon Riggs wrote:
> > I'd say "why not wait?". Minor, non-urgent patches will definitely go
> > nowhere for a long time, so it gains nobody to submit now.
> >
> > Submitting patches during freeze has been discouraged for many years, so
> > asking a long term contributor to avoid sending multiple minor patches
is
> > in line with that.
>
> I don't see much point in asking people to postpone. I do think however
> it can make sense to respond with something like:
> Fabien, you've been submitting a lot of patches over the last
> year. Thanks for the that! To keep up with the amount of incoming work
> the prject relies on contributors also shouldering some review
> responsibility. Please consider focusing on that, while we're working on
> getting 9.6 ready.
>
>

+1.  Extremely positive and encouraging way of involving other people.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Choosing parallel_degree

2016-04-04 Thread Amit Kapila
On Sun, Apr 3, 2016 at 4:37 PM, Julien Rouhaud 
wrote:
>
> On 22/03/2016 07:58, Julien Rouhaud wrote:
> > On 21/03/2016 20:38, Julien Rouhaud wrote:
> >> On 21/03/2016 05:18, James Sewell wrote:
> >>> OK cool, thanks.
> >>>
> >>> Can we remove the minimum size limit when the per table degree setting
> >>> is applied?
> >>>
> >>> This would help for tables with 2  - 1000 pages combined with a high
CPU
> >>> cost aggregate.
> >>>
> >>
> >> Attached v4 implements that. It also makes sure that the chosen
> >> parallel_degree won't be more than the relation's estimated number of
pages.
> >>
> >
> > And I just realize that it'd prevent from forcing parallelism on
> > partitionned table, v5 attached removes the check on the estimated
> > number of pages.
> >
> >

Few comments:
1.
+  limited according to the 

A. typo.
   /gux-max-parallel-degree/guc-max-parallel-degree
   /worker/workers
B. + 
+  Number of workers wanted for this table. The number of worker will be
+  limited according to
the 
+  parameter.
+ 

How about writing the above as:
Sets the degree of parallelism for an individual relation.  The requested
number of workers will be limited by 

2.
+ {
+ {
+ "parallel_degree",
+ "Number of parallel processes
per executor node wanted for this relation.",
+ RELOPT_KIND_HEAP,
+
AccessExclusiveLock
+ },
+ -1, 1, INT_MAX
+ },

I think here min and max values should be same as for max_parallel_degree
(I have verified that for some of the other reloption parameters, min and
max are same as their guc values); Is there a reason to keep them different?

3.
@@ -1291,7 +1300,9 @@ default_reloptions(Datum reloptions, bool validate,
relopt_kind kind)

Comment on top of this function says:
/*
 * Option parser for anything that uses StdRdOptions (i.e. fillfactor and
 * autovacuum)
 */

I think it is better to include parallel_degree in above comment along with
fillfactor and autovacuum.


4.
/*
+ * RelationGetMaxParallelDegree
+ * Returns the relation's parallel_degree.  Note multiple eval of
argument!
+ */
+#define RelationGetParallelDegree(relation, defaultmpd) \
+ ((relation)->rd_options ? \
+
((StdRdOptions *) (relation)->rd_options)->parallel_degree : (defaultmpd))
+

There are minor in-consistencies in the above macro definition.

a. RelationGetMaxParallelDegree - This should be RelationGetParallelDegree.
b. defaultmpd - it is better to name it as defaultpd


>
>
> The feature freeze is now very close.  If this GUC is still wanted,
> should I add this patch to the next commitfest?
>

I am hoping that this will be committed to 9.6, but I think it is good to
register it in next CF.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Timeline following for logical slots

2016-04-04 Thread Andres Freund
On 2016-04-04 14:36:29 +0800, Craig Ringer wrote:
> On 4 April 2016 at 14:30, Andres Freund  wrote:
> 
> > On 2016-04-04 14:24:52 +0800, Craig Ringer wrote:
> > > I don't feel like I've grasped this properly yet. I think it's referring
> > to
> > > the pg_logical/snapshots/ serialization, the use of which allows us to
> > > avoid doing extra work in SnapBuildFindSnapshot(...), but doesn't seem to
> > > be crucial for correct function. After all, decoding still restarts at
> > the
> > > restart_lsn and feeds relevant xact info into the snapshot builder,
> > > accumulates invalidation information, etc.
> >
> > restart_lsn is set to the last known point where a) all changes for
> > ongoing transactions are available b) we can re-build visiblity
> > information when we start reading from there.
> >
> > As we essentially can only start determining visibility information
> > whenever processing a xl_running_xacts record. Building visibility
> > information means that there has to be a xl_running_xacts to start
> > from. To build full visibility information we also have to wait till we
> > have seen all in-progress transactions finish. So we dump visibility
> > information every now and then, so we can re-use the information we'd
> > already assembled.
> >
> 
> OK, makes sense. And still resume decoding from restart_lsn, despite having
> that visibility information stashed, because we also have to rebuild the
> information on invalidations for running xacts, which is not stored
> persistently anywhere as decoding progresses. So for now at least it's an
> optimisation to store the visibility info, since we still have go go back
> and decode for invalidations anyway. Right?

Not really no. The important point isn't invalidation or anything. It's
that we don't have the content & metadata of the transactions
themselves.  Yes, we also do re-read invalidations, but that's just a
side effect.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline following for logical slots

2016-04-04 Thread Andres Freund
On 2016-04-04 14:18:53 +0800, Craig Ringer wrote:
> I want to go over the rest of your reply in more detail, but regarding this
> and the original comment, I'm going by:
> 
> if (start_lsn == InvalidXLogRecPtr)
> {
> /* continue from last position */
> start_lsn = slot->data.confirmed_flush;
> }
> 
> ctx = StartupDecodingContext(output_plugin_options,
>  start_lsn, InvalidTransactionId,
>  read_page, prepare_write, do_write);
> 
> ...  in CreateDecodingContext(), which in turn does ...
> 
> ctx->snapshot_builder =
> AllocateSnapshotBuilder(ctx->reorder, xmin_horizon, start_lsn);
> 
> ... in StartupDecodingContext(...).
> 
> The snapshot builder controls when the snapshot builder returns
> SNAPBUILD_CONSISTENT, therefore when we start returning decoded commits to
> the client. Right?

> We pass InvalidXLogRecPtr as the start_lsn
> in pg_logical_slot_get_changes_guts(...).
> 
> So, when I wrote that:
> 
> /*
>  * We start reading xlog from the restart lsn, even though in
>  * CreateDecodingContext we set the snapshot builder up using the
>  * slot's confirmed_flush. This means we might read xlog we don't
>  * actually decode rows from, but the snapshot builder might need it
>  * to get to a consistent point. The point we start returning data
> to
>  * *users* at is the confirmed_flush lsn set up in the decoding
>  * context.
>  */
> 
> I don't see what's wrong there at all.

They're independent. What the snapshot builder gets as the start
position isn't the same as where we start reading WAL.


> We do "start reading xlog from the slot's restart_lsn". That's what gets
> passed to set up the xlogreader. That's where we read the first xlog
> records from.
> 
> In CreateDecodingContext we _do_ "set the snapshot builder up using the
> slot's confirmed_flush" [unless overridden by explicit argument to
> CreateDecodingContext, which isn't given here].

Yes. But again, that hasn't really got anything to do with where we read
WAL from. We always have to read older WAL than were confirmed_flush is
set to; to assemble all the changes from transactions that are
in-progress at the point of confirmed_flush.


> This is presumably what you're taking issue with:
> 
>  * This means we might read xlog we don't
>  * actually decode rows from, but the snapshot builder might need it
>  * to get to a consistent point.
> 
> and would be better worded as something like:
> 
> This means we will read and decode xlog from before the point we actually
> start returning decoded commits to the client, but the snapshot builder may
> need this additional xlog to get to a consistent point.

> right?

No. The snapshot builder really hasn't gotten that much to do with
things. The fundamental thing we have to do is re-assemble all
in-progress transactions (as in reorderbuffer.c, not snapbuild.c).


> I think
> 
> The point we start returning data to
>  * *users* at is the confirmed_flush lsn set up in the decoding
>  * context.
> 
> is still right.
> 
> What I was proposing instead is, in pg_logical_slot_get_changes_guts,
> changing:
> 
> ctx = CreateDecodingContext(InvalidXLogRecPtr,
> options,
> logical_read_local_xlog_page,
> LogicalOutputPrepareWrite,
> LogicalOutputWrite);
> 
> so that instead of InvalidXLogRecPtr we explicitly pass
> 
>MyReplicationSlot->data.confirmed_flush;
> 
> thus making it way easier to see what's going on without having to chase
> way deeper to figure out why data isn't returned from the restart_lsn.
> Where, y'know, you'd expect decoding to restart from... and where it does
> restart from, just not where it resumes sending output to the client from.

I just don't buy this. It's absolutely fundamental to understand that
the commit LSN isn't the point where all the data of transactions is
stored.


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >