Re: [HACKERS] Add support for restrictive RLS policies

2016-09-27 Thread Jeevan Chalke
Hello Stephen,

On Tue, Sep 27, 2016 at 12:57 AM, Stephen Frost  wrote:

> Jeevan,
>
> * Jeevan Chalke (jeevan.cha...@enterprisedb.com) wrote:
> > I have started reviewing this patch and here are couple of points I have
> > observed so far:
> >
> > 1. Patch applies cleanly
> > 2. make / make install / initdb all good.
> > 3. make check (regression) FAILED. (Attached diff file for reference).
>
> I've re-based my patch on top of current head and still don't see the
> failures which you are getting during the regression tests.  Is it
> possible you were doing the tests without a full rebuild of the source
> tree..?
>
> Can you provide details of your build/test environment and the full
> regression before and after output?
>

I still get same failures with latest sources and with new patch. Here are
few details of my setup. Let me know if I missed any.

$ uname -a
Linux centos7 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux

HEAD at
commit 51c3e9fade76c12e4aa37bffdf800bbf74fb3fb1

configure switches:
./configure --with-openssl --with-tcl --with-perl --with-python
--with-ossp-uuid --with-ldap --with-pam --with-zlib --with-pgport=5432
--enable-depend --enable-debug --enable-cassert --prefix=`pwd`/install
CFLAGS="-g -O0"

Regression FAILED. Regression diff is same as previous one.

Without patch I don't get any regression failure.

Well, I could not restrict myself debugging this mystery and finally able
to find the reason why this is failing. It was strange that it did not
crash and simply gave different results.

With this patch, pg_policy catalog now has seven columns, however
Natts_pg_policy is still set to 6. It should be updated to 7 now.
Doing this regression seems OK.

I am reviewing the latest patch in detail now and will post my review
comments later.

Thanks


>
> Thanks!
>
> Stephen
>



-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] Fix some corner cases that cube_in rejects

2016-09-27 Thread Amul Sul
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

Note for committer :  There are unnecessary files (cube_1.out, cube_2.out & 
cube_3.out) in contrib directory, that need to be removed at code checkin,  
other than this concern, I think this patch looks pretty reasonable. Thanks.

Regards,
Amul

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

2016-09-27 Thread Heikki Linnakangas

On 09/24/2016 02:34 PM, Peter Geoghegan wrote:

On Tue, Sep 20, 2016 at 8:55 AM, Heikki Linnakangas  wrote:

Likewise, in ExecInsertIndexTuples(), this makes the deferred-index
case work slightly differently from speculative insertion. It's not a big
difference, but it again forces you to keep one more scenario in mind, when
reading the code


This actually does have useful practical consequences, although I
didn't point that out earlier (though I should have). To see what I
mean, consider the complaint in this recent thread, which is based on
an actual user application problem:

https://www.postgresql.org/message-id/flat/1472134358.649524...@f146.i.mail.ru#1472134358.649524...@f146.i.mail.ru

I go on to explain how this patch represents a partial solution to
that [1]. That's what I mean by "useful practical consequences". As I
say in [1], I think we could even get a full solution, if we applied
this patch and *also* made the ordering in which the relcache returns
a list of index OIDs more useful (it should still be based on
something stable, to avoid deadlocks, but more than just OID order.
Instead, relcache.c can sort indexes such that we insert into primary
keys first, then unique indexes, then all other indexes. This also
avoids bloat if there is a unique violation, by getting unique indexes
out of the way first during ExecInsert()).


Hmm, yeah, that'd be nice to fix. I'd like to see a patch specifically 
to fix that. I'm not convinced that we need all the complications of 
this patch, to get that fixed. (In particular, indexam's still wouldn't 
need to care about the different between CHECK_UNIQUE_PARTIAL and 
CHECK_UNIQUE_SPECULATIVE, I think.)


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

2016-09-27 Thread Peter Geoghegan
On Tue, Sep 27, 2016 at 9:10 AM, Heikki Linnakangas  wrote:
> Hmm, yeah, that'd be nice to fix. I'd like to see a patch specifically to
> fix that. I'm not convinced that we need all the complications of this
> patch, to get that fixed. (In particular, indexam's still wouldn't need to
> care about the different between CHECK_UNIQUE_PARTIAL and
> CHECK_UNIQUE_SPECULATIVE, I think.)

But you are convinced that everything else I describe would do it?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

2016-09-27 Thread Heikki Linnakangas

On 09/27/2016 11:22 AM, Peter Geoghegan wrote:

On Tue, Sep 27, 2016 at 9:10 AM, Heikki Linnakangas  wrote:

Hmm, yeah, that'd be nice to fix. I'd like to see a patch specifically to
fix that. I'm not convinced that we need all the complications of this
patch, to get that fixed. (In particular, indexam's still wouldn't need to
care about the different between CHECK_UNIQUE_PARTIAL and
CHECK_UNIQUE_SPECULATIVE, I think.)


But you are convinced that everything else I describe would do it?


I didn't think very hard about it, but yeah, think so..

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

2016-09-27 Thread Peter Geoghegan
On Tue, Sep 27, 2016 at 9:25 AM, Heikki Linnakangas  wrote:
> I didn't think very hard about it, but yeah, think so..

Do you think it's worth a backpatch? Or, too early to tell?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-27 Thread Amit Langote

Sorry about the delay in replying.

On 2016/09/15 21:58, Ashutosh Bapat wrote:
> Hi Amit,
> 
> It looks like there is some problem while creating paramterized paths
> for multi-level partitioned tables. Here's a longish testcase
> 
> [ ... ]
> 
> Please check if you are able to reproduce these errors in your
> repository. I made sure that I cleaned up all partition-wise join code
> before testing this, but ... .

Thanks for the test case.  I can reproduce the same.

> I tried to debug the problem somewhat. In set_append_rel_pathlist(),
> it finds that at least one child has a parameterized path as the
> cheapest path, so it doesn't create an unparameterized path for append
> rel. At the same time there is a parameterization common to all the
> children, so it doesn't create any path. There seem to be two problems
> here
> 1. The children from second level onwards may not be getting
> parameterized for lateral references. That seems unlikely but
> possible.
> 2. Reparameterization should have corrected this, but
> reparameterize_path() does not support AppendPaths.

Hmm, 0005-Refactor-optimizer-s-inheritance-set-expansion-code-5.patch is
certainly to be blamed here; if I revert the patch, the problem goes away.

Based on 2 above, I attempted to add logic for AppendPath in
reparameterize_path() as in the attached.  It fixes the reported problem
and does not break any regression tests. If it's sane to do things this
way, I will incorporate the attached into patch 0005 mentioned above.
Thoughts?

Thanks,
Amit
commit 86c8fc2c268d17fb598bf6879cfef2b2a2f42417
Author: amit 
Date:   Tue Sep 27 16:02:54 2016 +0900

Handle AppendPath in reparameterize_path().

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 2a49639..750f0ea 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1570,6 +1570,40 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * cost_append
+ *	  Determines and returns the cost of a Append node
+ *
+ * Compute rows and costs as sums of subplan rows and costs.  We charge
+ * nothing extra for the Append itself, which perhaps is too optimistic,
+ * but since it doesn't do any selection or projection, it is a pretty
+ * cheap node.
+ */
+void
+cost_append(AppendPath *apath, Relids required_outer)
+{
+	ListCell   *l;
+
+	apath->path.rows = 0;
+	apath->path.startup_cost = 0;
+	apath->path.total_cost = 0;
+	foreach(l, apath->subpaths)
+	{
+		Path	   *subpath = (Path *) lfirst(l);
+
+		apath->path.rows += subpath->rows;
+
+		if (l == list_head(apath->subpaths))	/* first node? */
+			apath->path.startup_cost = subpath->startup_cost;
+		apath->path.total_cost += subpath->total_cost;
+		apath->path.parallel_safe = apath->path.parallel_safe &&
+			subpath->parallel_safe;
+
+		/* All child paths must have same parameterization */
+		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index abb7507..bf8fb55 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1206,7 +1206,6 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
    int parallel_workers)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
-	ListCell   *l;
 
 	pathnode->path.pathtype = T_Append;
 	pathnode->path.parent = rel;
@@ -1220,32 +1219,7 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 		 * unsorted */
 	pathnode->subpaths = subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
-	foreach(l, subpaths)
-	{
-		Path	   *subpath = (Path *) lfirst(l);
-
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
-		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
-
-		/* All child paths must have same parameterization */
-		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
-	}
+	cost_append(pathnode, required_outer);
 
 	return pathnode;
 }
@@ -3204,6 +3178,41 @@ reparameterize_path(PlannerInfo *root, Path *path,
 		 spath->path.pathkeys,
 		 required_outer);
 			}
+		case T_Append:
+			{
+AppendPath	*apath = (AppendPath *) path;
+AppendPath	*newpath;
+List		*new_subpaths = NIL;
+ListCell	*lc;
+
+foreach(lc, apath->subpaths)
+{
+	

Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

2016-09-27 Thread Heikki Linnakangas

On 09/27/2016 11:38 AM, Peter Geoghegan wrote:

On Tue, Sep 27, 2016 at 9:25 AM, Heikki Linnakangas  wrote:

I didn't think very hard about it, but yeah, think so..


Do you think it's worth a backpatch? Or, too early to tell?


Too early to tell..

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-27 Thread Ashutosh Bapat
>>
>> Please check if you are able to reproduce these errors in your
>> repository. I made sure that I cleaned up all partition-wise join code
>> before testing this, but ... .
>
> Thanks for the test case.  I can reproduce the same.
>
>> I tried to debug the problem somewhat. In set_append_rel_pathlist(),
>> it finds that at least one child has a parameterized path as the
>> cheapest path, so it doesn't create an unparameterized path for append
>> rel. At the same time there is a parameterization common to all the
>> children, so it doesn't create any path. There seem to be two problems
>> here
>> 1. The children from second level onwards may not be getting
>> parameterized for lateral references. That seems unlikely but
>> possible.

Did you check this? We may be missing on creating index scan paths
with parameterization. If we fix this, we don't need to
re-parameterize Append.

>> 2. Reparameterization should have corrected this, but
>> reparameterize_path() does not support AppendPaths.
>
> Hmm, 0005-Refactor-optimizer-s-inheritance-set-expansion-code-5.patch is
> certainly to be blamed here; if I revert the patch, the problem goes away.
>
> Based on 2 above, I attempted to add logic for AppendPath in
> reparameterize_path() as in the attached.  It fixes the reported problem
> and does not break any regression tests. If it's sane to do things this
> way, I will incorporate the attached into patch 0005 mentioned above.
> Thoughts?


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-27 Thread Amit Langote
On 2016/09/27 15:44, Ashutosh Bapat wrote:
>> By the way, I fixed one thinko in your patch as follows:
>>
>> -result->oids[i] = oids[mapping[i]];
>> +result->oids[mapping[i]] = oids[i];
> 
> While I can not spot any problem with this logic, when I make that
> change and run partition_join testcase in my patch, it fails because
> wrong partitions are matched for partition-wise join of list
> partitions. In that patch, RelOptInfo of partitions are saved in
> RelOptInfo of the parent by matching their OIDs. They are saved in the
> same order as corresponding OIDs. Partition-wise join simply joins the
> RelOptInfos at the same positions from both the parent RelOptInfos. I
> can not spot an error in this logic too.

OTOH, using the original logic makes tuple routing put tuples into the
wrong partitions.  When debugging why that was happening I discovered this
and hence the proposed change.

You mean that partition RelOptInfo's are placed using the canonical
ordering of OIDs instead of catalog-scan-driven order, right?  If that's
true, then there is no possibility of wrong pairing happening, even with
the new ordering of OIDs in the partition descriptor (ie, the ordering
that would be produced by my proposed method above).

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-27 Thread Ashutosh Bapat
On Tue, Sep 27, 2016 at 2:46 PM, Amit Langote
 wrote:
> On 2016/09/27 15:44, Ashutosh Bapat wrote:
>>> By the way, I fixed one thinko in your patch as follows:
>>>
>>> -result->oids[i] = oids[mapping[i]];
>>> +result->oids[mapping[i]] = oids[i];
>>
>> While I can not spot any problem with this logic, when I make that
>> change and run partition_join testcase in my patch, it fails because
>> wrong partitions are matched for partition-wise join of list
>> partitions. In that patch, RelOptInfo of partitions are saved in
>> RelOptInfo of the parent by matching their OIDs. They are saved in the
>> same order as corresponding OIDs. Partition-wise join simply joins the
>> RelOptInfos at the same positions from both the parent RelOptInfos. I
>> can not spot an error in this logic too.
>
> OTOH, using the original logic makes tuple routing put tuples into the
> wrong partitions.  When debugging why that was happening I discovered this
> and hence the proposed change.
>
> You mean that partition RelOptInfo's are placed using the canonical
> ordering of OIDs instead of catalog-scan-driven order, right?  If that's
> true, then there is no possibility of wrong pairing happening, even with
> the new ordering of OIDs in the partition descriptor (ie, the ordering
> that would be produced by my proposed method above).

right! I don't know what's wrong, will debug my changes.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [GENERAL] inconsistent behaviour of set-returning functions in sub-query with random()

2016-09-27 Thread Tom van Tilburg
Good to know and I agree that it is not an urgent case.
I think this practice might be more common in the POSTGIS community where
there are plenty of set-returning-functions used in this way. My use was
taking a random sample of a pointcloud distrubution.

I took the liberty to post your answer at stackexchange.

thanks,
 Tom

On Mon, 26 Sep 2016 at 21:38 Tom Lane  wrote:

> Tom van Tilburg  writes:
> > I'm often using the WHERE clause random() > 0.5 to pick a random subset
> of
> > my data. Now I noticed that when using a set-returning function in a
> > sub-query, I either get the whole set or none (meaning that the WHERE
> > random() > 0.5 clause is interpreted *before* the set is being
> generated).
> > e.g.:
> >
> > SELECT num FROM (
> > SELECT unnest(Array[1,2,3,4,5,6,7,8,9,10]) num) AS foo WHERE
> random() > 0.5;
>
> Hmm, I think this is an optimizer bug.  There are two legitimate behaviors
> here:
>
> SELECT * FROM unnest(ARRAY[1,2,3,4,5,6,7,8,9,10]) WHERE random() > 0.5;
>
> should (and does) re-evaluate the WHERE for every row output by unnest().
>
> SELECT unnest(ARRAY[1,2,3,4,5,6,7,8,9,10]) WHERE random() > 0.5;
>
> should evaluate WHERE only once, since that happens before expansion of the
> set-returning function in the targetlist.  (If you're an Oracle user and
> you imagine this query as having an implicit "FROM dual", the WHERE should
> be evaluated for the single row coming out of the FROM clause.)
>
> In the case you've got here, given the placement of the WHERE in the outer
> query, you'd certainly expect it to be evaluated for each row coming out
> of the inner query.  But the optimizer is deciding it can push the WHERE
> clause down to become a WHERE of the sub-select.  That is legitimate in a
> lot of cases, but not when there are SRF(s) in the sub-select's
> targetlist, because that pushes the WHERE to occur before the SRF(s),
> analogously to the change between the two queries I wrote.
>
> I'm a bit hesitant to change this in existing releases.  Given the lack
> of previous complaints, it seems more likely to break queries that were
> behaving as-expected than to make people happy.  But we could change it
> in v10 and up, especially since some other corner-case changes in
> SRF-in-tlist behavior are afoot.
>
> In the meantime, you could force it to work as you wish by inserting the
> all-purpose optimization fence "OFFSET 0" in the sub-select:
>
> =# SELECT num FROM (
> SELECT unnest(Array[1,2,3,4,5,6,7,8,9,10]) num OFFSET 0) AS foo WHERE
> random() > 0.5;
>  num
> -
>1
>4
>7
>9
> (4 rows)
>
>
> regards, tom lane
>


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-09-27 Thread Victor Wagner
On Sun, 25 Sep 2016 17:31:53 +0530
Mithun Cy  wrote:

> I have some more comments on libpq-failover-8.patch
> 
> + /* FIXME need to check that port is numeric */
> 
> Is this still applicable?.
> 

Unfortunately, it was. I've fixed this problem in 9-th version of patch
(attached)
> 
> I think we need comments to know why we change default value based on
> number of elements in connection string. why default in not “any"
> when node count > 1.

Fixed.

> 
> + /* loop over all the host specs in the node variable */
> 
> + for (node = nodes; node->host != NULL || node->port != NULL; node++)
> 
>   {
> 
> I think instead of loop if we can move these code into a separate
> function say pg_add_to_addresslist(host, port, &addrs) this helps in
> removing temp variables like "struct node info” and several heap
> calls around it.

For some reason DNS resolving was concentrated in one place before my
changes. So, I've tried to not change this decision.

 
> 3)
> 
> +static char *
> 
> +get_port_from_addrinfo(struct addrinfo * ai)
> 
> 
> Need some comments for this function.

Done.
 
> We use strdup in many places no where we handle memory allocation
> failure.

Added some more memory allocation error checks.
> 
> Comments not in sink with code.

Fixed.

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 4e34f00..f0df877 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -792,7 +792,7 @@ host=localhost port=5432 dbname=mydb connect_timeout=10

The general form for a connection URI is:
 
-postgresql://[user[:password]@][netloc][:port][/dbname][?param1=value1&...]
+postgresql://[user[:password]@][netloc][:port][,netloc[:port]...][/dbname][?param1=value1&...]
 

 
@@ -809,6 +809,7 @@ postgresql://localhost/mydb
 postgresql://user@localhost
 postgresql://user:secret@localhost
 postgresql://other@localhost/otherdb?connect_timeout=10&application_name=myapp
+postgresql://node1,node2:5433,node3:4432,node4/mydb?hostorder=random&target_server_type=any
 
 Components of the hierarchical part of the URI can also
 be given as parameters.  For example:
@@ -831,7 +832,9 @@ postgresql:///mydb?host=localhost&port=5433

 For improved compatibility with JDBC connection URIs,
 instances of parameter ssl=true are translated into
-sslmode=require.
+sslmode=require and
+loadBalanceHosts=true  into
+hostorder=random.

 

@@ -841,6 +844,10 @@ postgresql:///mydb?host=localhost&port=5433
 postgresql://[2001:db8::1234]/database
 

+   
+   There can be serveral host specifications, optionally accompanied
+   with port, separated by comma.
+   
 

 The host component is interpreted as described for the parameter PostgreSQL was built). On machines without
 Unix-domain sockets, the default is to connect to localhost.

+   
+   There can be more than one host parameter in
+   the connect string. In this case these hosts would be considered
+   alternate entries into same database and if connect to first one
+   fails, library would try to connect second etc. This can be used
+   for high availability cluster or for load balancing. See
+parameter.
+   
+   
+   Network host name can be accompanied with port number, separated by
+   colon. If so, this port number is used only when connected to
+   this host. If there is no port number, port specified in the
+parameter would be used.
+   
   
  
 
@@ -942,8 +963,44 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname


   
-
+
+  
+  hostorder
+  
+  
+  Specifies how to choose host from list of alternate hosts,
+  specified in the  parameter.
+  
+  
+  If value of this argument is sequential (the
+  default) library connects to the hosts in order they specified,
+  and tries to connect second one only if connection to the first
+  fails.
+  
+  
+  If value is random host to connect is randomly
+  picked from the list. It allows to balance load between several
+  cluster nodes. However, currently PostgreSQL doesn't support
+  multimaster clusters. So, without use of third-party products,
+  only read-only connections can take advantage from the
+  load-balancing. See 
+  
+  
+  
+  
+  target_server_type
+  
+  
+  If this parameter is master upon successful connection
+  library checks if host is in recovery state, and if it is so,
+  tries next host in the connect string. If this parameter is
+  any, connection to standby nodes are considered
+  successful.
+  
+  
+  
   
+
port


@@ -985,7 +1042,6 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
   
  
-
  
   connect_timeout
   
@@ -996,7 +1052,27 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
   
  
-
+ 
+ failover_t

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-09-27 Thread Masahiko Sawada
On Mon, Sep 26, 2016 at 9:07 PM, Ashutosh Bapat
 wrote:
> On Mon, Sep 26, 2016 at 5:25 PM, Masahiko Sawada  
> wrote:
>> On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat
>>  wrote:
>>> My original patch added code to manage the files for 2 phase
>>> transactions opened by the local server on the remote servers. This
>>> code was mostly inspired from the code in twophase.c which manages the
>>> file for prepared transactions. The logic to manage 2PC files has
>>> changed since [1] and has been optimized. One of the things I wanted
>>> to do is see, if those optimizations are applicable here as well. Have
>>> you considered that?
>>>
>>>
>>
>> Yeah, we're considering it.
>> After these changes are committed, we will post the patch incorporated
>> these changes.
>>
>> But what we need to do first is the discussion in order to get consensus.
>> Since current design of this patch is to transparently execute DCL of
>> 2PC on foreign server, this code changes lot of code and is
>> complicated.
>
> Can you please elaborate. I am not able to understand what DCL is
> involved here. According to [1], examples of DCL are GRANT and REVOKE
> command.

I meant transaction management command such as PREPARE TRANSACTION and
COMMIT/ABORT PREPARED command.
The web page I refered might be wrong, sorry.

>> Another approach I have is to push down DCL to only foreign servers
>> that support 2PC protocol, which is similar to DML push down.
>> This approach would be more simpler than current idea and is easy to
>> use by distributed transaction manager.
>
> Again, can you please elaborate, how that would be different from the
> current approach and how does it simplify the code.
>

The idea is just to push down PREPARE TRANSACTION, COMMIT/ROLLBACK
PREPARED to foreign servers that support 2PC.
With this idea, the client need to do following operation when foreign
server is involved with transaction.

BEGIN;
UPDATE parent_table SET ...; -- update including foreign server
PREPARE TRANSACTION 'xact_id';
COMMIT PREPARED 'xact_id';

The above PREPARE TRANSACTION and COMMIT PREPARED command are pushed
down to foreign server.
That is, the client needs to execute PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED explicitly.

In this idea, I think that we don't need to do followings,

* Providing the prepare id of 2PC.
  Current patch adds new API prepare_id_provider() but we can use the
prepare id of 2PC that is used on parent server.

* Keeping track of status of foreign servers.
  Current patch keeps track of status of foreign servers involved with
transaction but this idea is just to push down transaction management
command to foreign server.
  So I think that we no longer need to do that.

* Adding max_prepared_foreign_transactions parameter.
  It means that the number of transaction involving foreign server is
the same as max_prepared_transactions.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-09-27 Thread Kyotaro HORIGUCHI
Hi,

I apologize in advance that the comments in this message might
one of the ideas discarded in the past thread.. I might not grasp
the discussion completely X(

The attached patches are rebased to the master and additional one
mentioned below.

At Wed, 18 May 2016 17:57:49 -0400, Michael Paquier  
wrote in 

> A couple of months back is has been reported to pgsql-bugs that WAL
> segments were always switched with a low value of archive_timeout even
> if a system is completely idle:
> http://www.postgresql.org/message-id/20151016203031.3019.72...@wrigleys.postgresql.org
> In short, a closer look at the problem has showed up that the logic in
> charge of checking if a checkpoint should be skipped or not is
> currently broken, because it completely ignores standby snapshots in
> its calculation of the WAL activity. So a checkpoint always occurs
> after checkpoint_timeout on an idle system since hot_standby has been
> introduced as wal_level. This did not get better from 9.4, since
> standby snapshots are logged every 15s by the background writer
> process. In 9.6, since wal_level = 'archive' and 'hot_standby'
> actually has the same meaning, the skip logic that worked with
> wal_level = 'archive' does not do its job anymore.
> 
> One solution that has been discussed is to track the progress of WAL
> activity when doing record insertion by being able to mark some
> records as not updating the progress of WAL. Standby snapshot records
> enter in this category, making the checkpoint skip logic more robust.
> Attached is a patch implementing a solution for it, by adding in

If I understand the old thread correctly, this still doesn't
solve the main issue, excessive checkpoint and segment
switching. The reason is the interaction between XLOG_SWITCH and
checkpoint as mentioned there. I think we may treat XLOG_SWITCH
as NO_PROGRESS, since we can recover to the lastest state without
it. If it is not wrong, the second patch attached (v12-2) inserts
XLOG_SWITCH as NO_PROGRESS and skips segment switching when no
progress took place for the round.

> WALInsertLock a new field that gets updated for each record to track
> the LSN progress. This allows to reliably skip the generation of
> standby snapshots in the bgwriter or checkpoints on an idle system.

WALInsertLock doesn't seem to me to be a good place for
progressAt even considering the discussion about adding few
instructions (containing a branch) in the
hot-section. BackgroundWriterMain blocks all running
XLogInsertRecord every 200 ms, not 15 or 30 seconds (only for
replica, though). If this is correct, the Amit's suggestion below
will have more significance, and I rather agree with it. XLogCtl
is more suitable place for progressAt for the case.

https://www.postgresql.org/message-id/caa4ek1lb9hdq+f8lw8bgrqx6sw42xaikx1uq2dzk+auegbf...@mail.gmail.com
Amit> Taking and releasing locks again and again (which is done in patch)
Amit> matters much more than adding few instructions under it and I think
Amit> if you would have written the code in-a-way as in patch in some of
Amit> the critical path, it would have been regressed badly, but because
Amit> checkpoint doesn't happen often, reproducing regression is difficult.


https://www.postgresql.org/message-id/cab7npqso6hvj0t6lut84pcy+_ihitdt64ze2d+sjrhzy72y...@mail.gmail.com
> > Also, I think it is worth to once take the performance data for
> > write tests (using pgbench 30 minute run or some other way) with
> > minimum checkpoint_timeout (i.e 30s) to see if the additional locking
> > has any impact on performance.  I think taking locks at intervals
> > of 15s or 30s should not matter much, but it is better to be safe.
> 
> I don't think performance will be impacted, but there are no reasons
> to not do any measurements either. I'll try to get some graphs
> tomorrow with runs on my laptop, mainly looking for any effects of
> this patch on TPS when checkpoints show up.

I don't think the impact is measurable on a laptop, where 2 to 4
cores available at most.

> Per discussion with Andres at PGcon, we decided that this is an
> optimization, only for 9.7~ because this has been broken for a long
> time. I have also changed XLogIncludeOrigin() to use a more generic
> routine to set of status flags for a record being inserted:
> XLogSetFlags(). This routine can use two flags:
> - INCLUDE_ORIGIN to decide if the origin should be logged or not
> - NO_PROGRESS to decide at insertion if a record should update the LSN
> progress or not.
> Andres mentioned me that we'd want to have something similar to
> XLogIncludeOrigin, but while hacking I noticed that grouping both
> things under the same umbrella made more sense.

This looks reasonable.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 686e4981c0d7ab3dd9e919f8b203aeb475f89a3b Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 27 Sep 2016 10:19:55 +0900
Subject: [PATCH 1/3] hs-checkpoints-v12-1

Rebased version of v11-1
---
 src/backend/access/hea

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-27 Thread Dilip Kumar
On Wed, Sep 21, 2016 at 8:47 AM, Dilip Kumar  wrote:
> Summary:
> --
> At 32 clients no gain, I think at this workload Clog Lock is not a problem.
> At 64 Clients we can see ~10% gain with simple update and ~5% with TPCB.
> At 128 Clients we can see > 50% gain.
>
> Currently I have tested with synchronous commit=off, later I can try
> with on. I can also test at 80 client, I think we will see some
> significant gain at this client count also, but as of now I haven't
> yet tested.
>
> With above results, what we think ? should we continue our testing ?

I have done further testing with on TPCB workload to see the impact on
performance gain by increasing scale factor.

Again at 32 client there is no gain, but at 64 client gain is 12% and
at 128 client it's 75%, it shows that improvement with group lock is
better at higher scale factor (at 300 scale factor gain was 5% at 64
client and 50% at 128 clients).

8 socket machine (kernel 3.10)
10 min run(median of 3 run)
synchronous_commit=off
scal factor = 1000
share buffer= 40GB

Test results:


client  head group lock
32  27496   27178
64  31275   35205
1282065634490


LWLOCK_STATS approx. block count on ClogControl Lock ("lwlock main 11")

client  head  group lock
32  8  6
6415 10
128  14   7

Note: These are approx. block count, I have detailed result of
LWLOCK_STAT, incase someone wants to look into.


LWLOCK_STATS shows that ClogControl lock block count reduced by 25% at
32 client, 33% at 64 client and 50% at 128 client.

Conclusion:
1. I think both  LWLOCK_STATS and performance data shows that we get
significant contention reduction on ClogControlLock with the patch.
2. It also shows that though we are not seeing any performance gain at
32 clients, but there is contention reduction with patch.

I am planning to do some more test with higher scale factor (3000 or more).

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Python adding support for multi-dimensional arrays

2016-09-27 Thread Dave Cramer
On 26 September 2016 at 14:52, Dave Cramer  wrote:

>
>
>
>>
>> This crashes with arrays with non-default lower bounds:
>>
>> postgres=# SELECT * FROM test_type_conversion_array_int
>> 4('[2:4]={1,2,3}');
>> INFO:  ([1, 2, ], )
>> server closed the connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>>
>> Attached patch fixes this bug, and adds a test for it.
>
>>
>> I'd like to see some updates to the docs for this. The manual doesn't
>> currently say anything about multi-dimensional arrays in pl/python, but it
>> should've mentioned that they're not supported. Now that it is supported,
>> should mention that, and explain briefly that a multi-dimensional array is
>> mapped to a python list of lists.
>>
>> If the code passes I'll fix the docs
>
>> It seems we don't have any mention in the docs about arrays with
>> non-default lower-bounds ATM. That's not this patch's fault, but it would
>> be good to point out that the lower bounds are discarded when an array is
>> passed to python.
>>
>> I find the loop in PLyList_FromArray() quite difficult to understand. Are
>> the comments there mixing up the "inner" and "outer" dimensions? I wonder
>> if that would be easier to read, if it was written in a recursive-style,
>> rather than iterative with stacks for the dimensions.
>>
>> Yes, it is fairly convoluted.
>
>
> Dave Cramer
>
> da...@postgresintl.com
> www.postgresintl.com
>
>
>


PL-Python-adding-support-for-multi-dimensional-arrays-20160926.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench more operators & functions

2016-09-27 Thread Jeevan Ladhe
Hi,

The patch has correct precedence now.

Further minor comments:

1. About documentation, I think it will be good idea to arrange the
operators
table with the precedence and add a line at top: "In decreasing order of
precedence".

2. You may want to remove the comment:
+   /* should it do a lazy evaluation of the branch? */

Regards,
Jeevan Ladhe


Re: [HACKERS] Declarative partitioning - another take

2016-09-27 Thread Amit Langote
On 2016/09/27 18:09, Ashutosh Bapat wrote:
>>> I tried to debug the problem somewhat. In set_append_rel_pathlist(),
>>> it finds that at least one child has a parameterized path as the
>>> cheapest path, so it doesn't create an unparameterized path for append
>>> rel. At the same time there is a parameterization common to all the
>>> children, so it doesn't create any path. There seem to be two problems
>>> here
>>> 1. The children from second level onwards may not be getting
>>> parameterized for lateral references. That seems unlikely but
>>> possible.
> 
> Did you check this? We may be missing on creating index scan paths
> with parameterization. If we fix this, we don't need to
> re-parameterize Append.

You're right.  How about the attached patch that fixes the problem along
these lines?  The problem seems to be that multi-level inheritance sets
(partitioned tables) are not handled in create_lateral_join_info(), which
means that lateral_relids and lateral_referencers of the root relation are
not being propagated to the partitions below level 1.

I'm getting concerned about one thing though - for a given *regular*
inheritance set, the root->append_rel_list would be scanned only once; But
for a *partitioned table* inheritance set, it would be scanned for every
partitioned table in the set (ie, the root table and internal partitions).

Thanks,
Amit
commit d69aeabcfcb58f349602a1b7392c611045c37465
Author: amit 
Date:   Tue Sep 27 20:01:44 2016 +0900

Consider multi-level partitioned tables in create_lateral_join_info().

diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 84ce6b3..74734f2 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "catalog/pg_class.h"
 #include "catalog/pg_type.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
@@ -623,8 +624,19 @@ create_lateral_join_info(PlannerInfo *root)
 	for (rti = 1; rti < root->simple_rel_array_size; rti++)
 	{
 		RelOptInfo *brel = root->simple_rel_array[rti];
+		RangeTblEntry *rte = root->simple_rte_array[rti];
 
-		if (brel == NULL || brel->reloptkind != RELOPT_BASEREL)
+		if (brel == NULL)
+			continue;
+
+		/*
+		 * If an "other rel" RTE is a "partitioned table", we must propagate
+		 * the lateral info inherited from the parent to its children. That's
+		 * because they are not linked directly with the parent via
+		 * AppendRelInfo's (see expand_inherited_rte_internal()).
+		 */
+		if (brel->reloptkind != RELOPT_BASEREL &&
+			rte->relkind != RELKIND_PARTITIONED_TABLE)
 			continue;
 
 		if (root->simple_rte_array[rti]->inh)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add support for restrictive RLS policies

2016-09-27 Thread Jeevan Chalke
On Tue, Sep 27, 2016 at 12:45 PM, Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

> Hello Stephen,
>


> I am reviewing the latest patch in detail now and will post my review
> comments later.
>


Here are the review comments:

1. In documentation, we should put both permissive as well as restrictive in
the header like permissive|restrictive. Or may be a generic term, say,
policy
type (like we have command and then options mentioned like all, select
etc.),
followed by options in the description. Or like we have CASCADE and RESTRICT
in drop case.

2. "If the policy is a "permissive" or "restrictive" policy." seems broken
as
sentence starts with "If" and there is no other part to it. Will it be
better
to say "Specifies whether the policy is a "permissive" or "restrictive"
policy."?

3. " .. a policy can instead by "restrictive""
Do you mean "instead be" here?

4. It will be good if we have an example for this in section
"5.7. Row Security Policies"

5. AS ( PERMISSIVE | RESTRICTIVE )
should be '{' and '}' instead of parenthesis.

6. I think following changes are irrelevant for this patch.
Should be submitted as separate patch if required.

@@ -2133,6 +2139,25 @@ psql_completion(const char *text, int start, int end)
 /* Complete "CREATE POLICY  ON  USING (" */
 else if (Matches6("CREATE", "POLICY", MatchAny, "ON", MatchAny,
"USING"))
 COMPLETE_WITH_CONST("(");
+/* CREATE POLICY  ON  AS PERMISSIVE|RESTRICTIVE FOR
ALL|SELECT|INSERT|UPDATE|DELETE */
+else if (Matches8("CREATE", "POLICY", MatchAny, "ON", MatchAny, "AS",
MatchAny, "FOR"))
+COMPLETE_WITH_LIST5("ALL", "SELECT", "INSERT", "UPDATE", "DELETE");
+/* Complete "CREATE POLICY  ON  AS PERMISSIVE|RESTRICTIVE
FOR INSERT TO|WITH CHECK" */
+else if (Matches9("CREATE", "POLICY", MatchAny, "ON", MatchAny, "AS",
MatchAny, "FOR", "INSERT"))
+COMPLETE_WITH_LIST2("TO", "WITH CHECK (");
+/* Complete "CREATE POLICY  ON  AS PERMISSIVE|RESTRICTIVE
FOR SELECT|DELETE TO|USING" */
+else if (Matches9("CREATE", "POLICY", MatchAny, "ON", MatchAny, "AS",
MatchAny, "FOR", "SELECT|DELETE"))
+COMPLETE_WITH_LIST2("TO", "USING (");
+/* CREATE POLICY  ON  AS PERMISSIVE|RESTRICTIVE FOR
ALL|UPDATE TO|USING|WITH CHECK */
+else if (Matches9("CREATE", "POLICY", MatchAny, "ON", MatchAny, "AS",
MatchAny, "FOR", "ALL|UPDATE"))
+COMPLETE_WITH_LIST3("TO", "USING (", "WITH CHECK (");
+/* Complete "CREATE POLICY  ON  AS PERMISSIVE|RESTRICTIVE
TO " */
+else if (Matches8("CREATE", "POLICY", MatchAny, "ON", MatchAny, "AS",
MatchAny, "TO"))
+COMPLETE_WITH_QUERY(Query_for_list_of_grant_roles);
+/* Complete "CREATE POLICY  ON  AS PERMISSIVE|RESTRICTIVE
USING (" */
+else if (Matches8("CREATE", "POLICY", MatchAny, "ON", MatchAny, "AS",
MatchAny, "USING"))
+COMPLETE_WITH_CONST("(");

7. Natts_pg_policy should be updated to 7 now.

8. In following error, $2 and @2 should be used to correctly display the
option and location.

ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
 errmsg("unrecognized row security option
\"%s\"", $1),
 errhint("Only PERMISSIVE or RESTRICTIVE
policies are supported currently."),
 parser_errposition(@1)));

You will see following result otherwise:

postgres=# CREATE POLICY my_policy ON foo AS a123;
ERROR:  unrecognized row security option "as"
LINE 1: CREATE POLICY my_policy ON foo AS a123;
   ^
HINT:  Only PERMISSIVE or RESTRICTIVE policies are supported currently.

I think adding negative test to test this error should be added in
regression.

9. Need to update following comments in gram.y to reflect new changes.

 *QUERIES:
 *CREATE POLICY name ON table [FOR cmd] [TO role, ...]
 *[USING (qual)] [WITH CHECK (with_check)]

10. ALTER POLICY has no changes for this. Any reason why that is not
considered here.

11. Will it be better to use boolean for polpermissive in _policyInfo?
And then set that appropriately while getting the policies. So that later we
only need to test the boolean avoiding string comparison.

12. Since PERMISSIVE is default, we should dump only "RESTRICTIVE" when
appropriate, like other default cases.
strcmp(polinfo->polpermissive,"t") == 0 ? "PERMISSIVE" : "RESTRICTIVE"

13. Since PERMISSIVE is default, do we need changes like below?
-\QCREATE POLICY p1 ON test_table FOR ALL TO PUBLIC \E
+\QCREATE POLICY p1 ON test_table AS PERMISSIVE FOR ALL TO
PUBLIC \E

I am not familiar or used TAP framework. So no idea about these changes.

14. While displaying policy details in permissionsList, per syntax, we
should
display (RESTRICT) before the command option. Also will it be better to use
(RESTRICTIVE) instead of (RESTRICT)?

15. Similarly in describeOneTableDetails() too, can we have RESTRICTIVE
after

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-09-27 Thread Ashutosh Bapat
On Tue, Sep 27, 2016 at 2:54 PM, Masahiko Sawada  wrote:
> On Mon, Sep 26, 2016 at 9:07 PM, Ashutosh Bapat
>  wrote:
>> On Mon, Sep 26, 2016 at 5:25 PM, Masahiko Sawada  
>> wrote:
>>> On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat
>>>  wrote:
 My original patch added code to manage the files for 2 phase
 transactions opened by the local server on the remote servers. This
 code was mostly inspired from the code in twophase.c which manages the
 file for prepared transactions. The logic to manage 2PC files has
 changed since [1] and has been optimized. One of the things I wanted
 to do is see, if those optimizations are applicable here as well. Have
 you considered that?


>>>
>>> Yeah, we're considering it.
>>> After these changes are committed, we will post the patch incorporated
>>> these changes.
>>>
>>> But what we need to do first is the discussion in order to get consensus.
>>> Since current design of this patch is to transparently execute DCL of
>>> 2PC on foreign server, this code changes lot of code and is
>>> complicated.
>>
>> Can you please elaborate. I am not able to understand what DCL is
>> involved here. According to [1], examples of DCL are GRANT and REVOKE
>> command.
>
> I meant transaction management command such as PREPARE TRANSACTION and
> COMMIT/ABORT PREPARED command.
> The web page I refered might be wrong, sorry.
>
>>> Another approach I have is to push down DCL to only foreign servers
>>> that support 2PC protocol, which is similar to DML push down.
>>> This approach would be more simpler than current idea and is easy to
>>> use by distributed transaction manager.
>>
>> Again, can you please elaborate, how that would be different from the
>> current approach and how does it simplify the code.
>>
>
> The idea is just to push down PREPARE TRANSACTION, COMMIT/ROLLBACK
> PREPARED to foreign servers that support 2PC.
> With this idea, the client need to do following operation when foreign
> server is involved with transaction.
>
> BEGIN;
> UPDATE parent_table SET ...; -- update including foreign server
> PREPARE TRANSACTION 'xact_id';
> COMMIT PREPARED 'xact_id';
>
> The above PREPARE TRANSACTION and COMMIT PREPARED command are pushed
> down to foreign server.
> That is, the client needs to execute PREPARE TRANSACTION and
>
> In this idea, I think that we don't need to do followings,
>
> * Providing the prepare id of 2PC.
>   Current patch adds new API prepare_id_provider() but we can use the
> prepare id of 2PC that is used on parent server.
>
> * Keeping track of status of foreign servers.
>   Current patch keeps track of status of foreign servers involved with
> transaction but this idea is just to push down transaction management
> command to foreign server.
>   So I think that we no longer need to do that.

> COMMIT/ROLLBACK PREPARED explicitly.

The problem with this approach is same as one previously stated. If
the connection between local and foreign server is lost between
PREPARE and COMMIT the prepared transaction on the foreign server
remains dangling, none other than the local server knows what to do
with it and the local server has lost track of the prepared
transaction on the foreign server. So, just pushing down those
commands doesn't work.

>
> * Adding max_prepared_foreign_transactions parameter.
>   It means that the number of transaction involving foreign server is
> the same as max_prepared_transactions.
>

That isn't true exactly. max_prepared_foreign_transactions indicates
how many transactions can be prepared on the foreign server, which in
the method you propose should have a cap of max_prepared_transactions
* number of foreign servers.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Generic type subscription

2016-09-27 Thread Victor Wagner
On Fri, 9 Sep 2016 18:29:23 +0700
Dmitry Dolgov <9erthali...@gmail.com> wrote:

> Hi,
> 
> Regarding to the previous conversations [1], [2], here is a patch
> (with some improvements from Nikita Glukhov) for the generic type
> subscription. It allows
> to define type-specific subscription logic for any data type and has
> implementations for the array and jsonb types. There are following
> changes in this
> patch:

I've tried to compile this patch with current state of master (commit
51c3e9fade76c12)  and found out that, when configured with
--enable-cassert, it doesn't pass make check.

LOG:  server process (PID 4643) was terminated by signal 6: Aborted
DETAIL:  Failed process was running: 
update test_jsonb_subscript set test_json['a'] = '{"b":
1}'::jsonb;



--
 Victor 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Some information about the small portion of source code of postgreSQL

2016-09-27 Thread davinder singh
Hi,
   I am working in optimizer module of postgreSQL 9.4.1. My project is
concerned with the plan returned by the optimizer.

I am trying to return a subplan for a query instead of full plan. For this
I need to return a path form *RelOptInfo *standard_join_search() *at*
allpaths.c *other than from last level of *root->join_rel_level().*
while doing so I am getting *error :variable not found in subplan target
lists* at function *fix_join_expr_mutator* at *setrefs.c.*

It will be great if you can give me some idea about why this error is
happening,since this error is happening at above function, Please give me
more information about that portion of code. Also I want to know for what
targetlist stands for.


Thanks,
Davinder Singh


Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

2016-09-27 Thread Tom Lane
Heikki Linnakangas  writes:
> On 09/24/2016 02:34 PM, Peter Geoghegan wrote:
>> I go on to explain how this patch represents a partial solution to
>> that [1]. That's what I mean by "useful practical consequences". As I
>> say in [1], I think we could even get a full solution, if we applied
>> this patch and *also* made the ordering in which the relcache returns
>> a list of index OIDs more useful (it should still be based on
>> something stable, to avoid deadlocks, but more than just OID order.
>> Instead, relcache.c can sort indexes such that we insert into primary
>> keys first, then unique indexes, then all other indexes. This also
>> avoids bloat if there is a unique violation, by getting unique indexes
>> out of the way first during ExecInsert()).

> Hmm, yeah, that'd be nice to fix. I'd like to see a patch specifically 
> to fix that. I'm not convinced that we need all the complications of 
> this patch, to get that fixed. (In particular, indexam's still wouldn't 
> need to care about the different between CHECK_UNIQUE_PARTIAL and 
> CHECK_UNIQUE_SPECULATIVE, I think.)

I can see the value of processing unique indexes before non-unique ones.
I'm pretty suspicious of trying to prioritize primary keys first, though,
because (a) it's not clear why bother, and (b) PG is a tad squishy about
whether an index is a primary key or not, so that I'd be worried about
different backends sometimes choosing different orders.  I'd simplify
this to "uniques in OID order then non-uniques in OID order".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-09-27 Thread Michael Paquier
On Tue, Sep 27, 2016 at 9:01 PM, Heikki Linnakangas  wrote:
> * Added error-handling for OOM and other errors in liybpq
> * In libpq, added check that the server sent back the same client-nonce
> * Turned ERRORs into COMMERRORs and removed DEBUG4 lines (they could reveal
> useful information to an attacker)
> * Improved comments

Thanks!

> * A source of random values. This currently uses PostmasterRandom()
> similarly to how the MD5 salt is generated, in the server, but plain old
> random() in the client. If built with OpenSSL, we should probably use
> RAND_bytes(). But what if the client is built without OpenSSL? I believe the
> protocol doesn't require cryptographically strong randomness for the nonces,
> i.e. it's OK if they're predictable, but they should be different for each
> session.

And what if we just replace PostmasterRandom()? pgcrypto is a useful
source of inspiration here. If the server is built with OpenSSL we use
RAND_bytes all the time. If not, let's use /dev/urandom. If urandom is
not there, we fallback to /dev/random. For WIN32, there is
CryptGenRandom(). This could just be done as an independent patch with
a routine in src/common/ for example to allow both frontend and
backend to use it. Do you think that this is a requirement for this
patch? I think not really for the first shot.

> * Nonce and salt lengths. The patch currently uses 10 bytes for both, but I
> think I just pulled number that out of thin air. The spec doesn't say
> anything about nonce and salt lengths AFAICS. What do other implementations
> use? Is 10 bytes enough?

Good question, but that seems rather short to me now that you mention
it. Mongo has implemented already SCRAM-SHA-1 and they are using 3
uint64 so that's 24 bytes (sasl_scramsha1_client_conversation.cpp for
example). For the salt I am seeing a reference to a string "salt"
only, which is too short.

> * The spec defines a final "server-error" message that the server sends on
> authentication failure, or e.g. if a required extension is not supported.
> The patch just uses FATAL for those. Should we try to send a server-error
> message instead, or before, the elog(FATAL) ?

It seems to me that sending back the error while the context is still
alive, aka before the FATAL would be the way to go. That could be
nicely done with an error callback while the exchange is happening. I
missed that while going through the spec.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Make flex/bison checks stricter in Git trees

2016-09-27 Thread Daniel Gustafsson
When running ./configure on a system without Flex/Bison it’s easy to miss the
warning that flies past and then run into compilation error instead.  When it
happened to a colleague yesterday a brief discussion came to the conclusion
that it would be neat it the flex and bison checks took the existence of the
generated files into consideration.

Attached patch scans for the generated files and iff flex or bison isn’t found
in a non-cross compilation build, errors out in case the generated filed don’t
exist while retaining the warning in case they do.  This maintains the current
warning message for downloaded distribution trees while it make the check
strict on Git trees.  It does add a hardcoded list of files which it can be
argued how nice that is even though the list rarely change (there might be a
cleaner way but I couldn’t see one).  Also includes a tiny copy-edit fix on the
flex warning to make it match the bison one since that seemed better.

If this is at all of interest I can add to the commitfest.

cheers ./daniel



test_generated_files.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench more operators & functions

2016-09-27 Thread Fabien COELHO


Hello Jeevan.

1. About documentation, I think it will be good idea to arrange the 
operators table with the precedence and add a line at top: "In 
decreasing order of precedence".


Done, see attached.


2. You may want to remove the comment:
+   /* should it do a lazy evaluation of the branch? */


Ok.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 285608d..bec3228 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -821,9 +821,8 @@ pgbench  options  dbname
   The expression may contain integer constants such as 5432,
   double constants such as 3.14159,
   references to variables :variablename,
-  unary operators (+, -) and binary operators
-  (+, -, *, /,
-  %) with their usual precedence and associativity,
+  operators
+  with their usual precedence and associativity,
   function calls, and
   parentheses.
  
@@ -909,6 +908,84 @@ pgbench  options  dbname
   
  
 
+ 
+  Built-In Operators
+
+  
+   The arithmetic, bitwise, comparison and logical operators listed in
+are built into pgbench
+   and may be used in expressions appearing in
+   \set.
+  
+
+  
+   pgbench Operators by increasing precedence
+   
+
+ 
+  Operator Category
+  Result Type
+  List of Operators
+ 
+
+
+ 
+  binary logical
+  boolean (0/1)
+  
+   or/||,
+   xor/^^,
+   and/&&
+  
+ 
+ 
+  binary bitwise
+  integer
+  |, #, &
+ 
+ 
+  comparison
+  boolean (0/1)
+  
+   =/==, <>/!=,
+   <, <=, >, >=
+  
+ 
+ 
+  shifts
+  integer
+  <<, >>
+ 
+ 
+  binary arithmetic
+  integer or double
+  +, -, *, /
+ 
+ 
+  binary arithmetic
+  integer only
+  %
+ 
+ 
+  unary logical
+  boolean (0/1)
+  not/!
+ 
+ 
+  unary bitwise
+  integer
+  ~
+ 
+ 
+  unary arithmetic
+  integer or double
+  +, -
+ 
+
+   
+  
+ 
+
  
   Built-In Functions
 
@@ -955,6 +1032,13 @@ pgbench  options  dbname
5432.0
   
   
+   exp(x)
+   double
+   exponential
+   exp(1.0)
+   2.718281828459045
+  
+  
greatest(a [, ... ] )
double if any a is double, else integer
largest value among arguments
@@ -962,6 +1046,13 @@ pgbench  options  dbname
5
   
   
+   if(c,e1,e2)
+   same as e*
+   if c is not zero then e1 else e2
+   if(0,1.0,2.0)
+   2.0
+  
+  
int(x)
integer
cast to int
@@ -976,6 +1067,13 @@ pgbench  options  dbname
2.1
   
   
+   ln(x)
+   double
+   natural logarithm
+   ln(2.718281828459045)
+   1.0
+  
+  
pi()
double
value of the constant PI
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index 0cc665b..f8dbbaf 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -52,11 +52,21 @@ static PgBenchExpr *make_func(yyscan_t yyscanner, int fnumber, PgBenchExprList *
 %type  VARIABLE FUNCTION
 
 %token INTEGER_CONST DOUBLE_CONST VARIABLE FUNCTION
-
-/* Precedence: lowest to highest */
+%token AND_OP OR_OP XOR_OP NE_OP LE_OP GE_OP LS_OP RS_OP
+
+/* Precedence: lowest to highest, taken from C */
+%left	OR_OP
+%left	XOR_OP
+%left	AND_OP
+%left   '|'
+%left   '#'
+%left   '&'
+%left	'=' NE_OP
+%left	'<' '>' LE_OP GE_OP
+%left	LS_OP RS_OP
 %left	'+' '-'
 %left	'*' '/' '%'
-%right	UMINUS
+%right	UNARY
 
 %%
 
@@ -68,14 +78,32 @@ elist:  	{ $$ = NULL; }
 	;
 
 expr: '(' expr ')'			{ $$ = $2; }
-	| '+' expr %prec UMINUS	{ $$ = $2; }
-	| '-' expr %prec UMINUS	{ $$ = make_op(yyscanner, "-",
+	| '+' expr %prec UNARY	{ $$ = $2; }
+	| '-' expr %prec UNARY	{ $$ = make_op(yyscanner, "-",
 		   make_integer_constant(0), $2); }
+	| '~' expr %prec UNARY	{ $$ = make_op(yyscanner, "#",
+		   make_integer_constant(-1), $2); }
+	| '!' expr %prec UNARY	{ $$ = make_op(yyscanner, "^^",
+		   make_integer_constant(1), $2); }
 	| expr '+' expr			{ $$ = make_op(yyscanner, "+", $1, $3); }
 	| expr '-' expr			{ $$ = make_op(yyscanner, "-", $1, $3); }
 	| expr '*' expr			{ $$ = make_op(yyscanner, "*", $1, $3); }
 	| expr '/' expr			{ $$ = make_op(yyscanner, "/", $1, $3); }
 	| expr '%' expr			{ $$ = make_op(yyscanner, "%", $1, $3); }
+	| expr '<' expr			{ $$ = make_op(yyscanner, "<", $1, $3); }
+	| expr LE_OP expr		{ $$ = make_op(yyscanner, "<=", $1, $3); }
+	| expr '>' expr			{ $$ = make_op(yyscanner, "<", $3, $1); }
+	| expr GE_OP expr		{ $$ = make_op(yyscanner, "<=", $3, $1); }
+	| expr '=' expr			{ $$ = make_op(yyscanner, "=", $1, $3); }
+	| expr NE_OP expr		{ $$ = make_op(yyscanner, "<>", $1, $3); }
+	| expr '&' expr			{ $$ = make_op(yyscanner, "&", $1, $3); }
+	| expr '|' expr			{ $$ = make_op(yyscanner, "|", $1, $3); }
+	| e

Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-09-27 Thread Heikki Linnakangas

On 09/27/2016 04:19 PM, Michael Paquier wrote:

On Tue, Sep 27, 2016 at 9:01 PM, Heikki Linnakangas  wrote:

* A source of random values. This currently uses PostmasterRandom()
similarly to how the MD5 salt is generated, in the server, but plain old
random() in the client. If built with OpenSSL, we should probably use
RAND_bytes(). But what if the client is built without OpenSSL? I believe the
protocol doesn't require cryptographically strong randomness for the nonces,
i.e. it's OK if they're predictable, but they should be different for each
session.


And what if we just replace PostmasterRandom()? pgcrypto is a useful
source of inspiration here. If the server is built with OpenSSL we use
RAND_bytes all the time. If not, let's use /dev/urandom. If urandom is
not there, we fallback to /dev/random. For WIN32, there is
CryptGenRandom(). This could just be done as an independent patch with
a routine in src/common/ for example to allow both frontend and
backend to use it.


Yeah, if built with OpenSSL, we probably should just always use 
RAND_bytes(). Without OpenSSL, we have to think a bit harder.


The server-side code in the patch is probably good enough. After all, we 
use the same mechanism for the MD5 salt today.


The libpq-side is not. Just calling random() won't do. We haven't needed 
for random numbers in libpq before, but now we do. Is the pgcrypto 
solution portable enough that we can use it in libpq?



Do you think that this is a requirement for this
patch? I think not really for the first shot.


We need something for libpq. We can't just call random(), as that's not 
random unless you also do srandom(), and we don't want to do that 
because the application might have a different idea of what the seed 
should be.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make flex/bison checks stricter in Git trees

2016-09-27 Thread Tom Lane
Daniel Gustafsson  writes:
> When running ./configure on a system without Flex/Bison it’s easy to miss 
> the
> warning that flies past and then run into compilation error instead.  When it
> happened to a colleague yesterday a brief discussion came to the conclusion
> that it would be neat it the flex and bison checks took the existence of the
> generated files into consideration.

> Attached patch scans for the generated files and iff flex or bison isn’t 
> found
> in a non-cross compilation build, errors out in case the generated filed 
> don’t
> exist while retaining the warning in case they do.

Not exactly convinced this is a good idea.  What if the files exist but
are out of date?  More generally, how much advantage is there really in
failing at configure rather than build time?

The subtext here is that I'm disinclined to load more behavior into
configure while we're waiting to see if cmake conversion happens.
That job is tough enough without the autoconf sources being more of
a moving target than they have to be.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-09-27 Thread Michael Paquier
On Tue, Sep 27, 2016 at 10:42 PM, Heikki Linnakangas  wrote:
> The libpq-side is not. Just calling random() won't do. We haven't needed for
> random numbers in libpq before, but now we do. Is the pgcrypto solution
> portable enough that we can use it in libpq?

Do you think that urandom would be enough then? The last time I took a
look at that, I saw urandom on all modern platforms even those ones:
OpenBSD, NetBSD, Solaris, SunOS. For Windows the CryptGen stuff would
be nice enough I guess..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-27 Thread David Steele
On 9/26/16 2:36 AM, Michael Paquier wrote:
> 
> Just a nit:
>  
> - postmaster.pid
> + postmaster.pid and postmaster.opts
>  
> Having one  block for each file would be better.

I grouped these because they are related and because there are now other
bullets where multiple files/dirs are listed.

Are you proposing to also breakup pg_replslot,
pg_dynshmem, pg_stat_tmp,
pg_notify, pg_serial,
pg_snapshots, and pg_subtrans?

Also backup_label and tablespace_map?

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-09-27 Thread Robert Haas
On Mon, Sep 26, 2016 at 3:06 PM, Stephen Frost  wrote:
> I feel like we're getting wrapped around the axle as it regards who is
> perceived to be voting for what.

True.  It's not very clear; thanks for trying to shed some light on it.

> I don't particularly care for it either, primairly because \sf could be
> improved upon, as suggested by Peter, to avoid the need to have the same
> information displayed by both \df+ and \sf.

IMHO, we've had \dWHATEVER as the way to find out about things for so
long that we should just stick with it.  I think users are used to
remembering which character they need to stick after \d to get
information on the object type in which they are currently interested;
I know I am.  If we move this all over to \sf people will have trouble
finding it.  I'll get used to it because I "work here" and so will
you, but I think most users will just type \df and then \df+ and then
say ... well where the %@#! did they put it?

>> If you do want to see all of the output, you'll appreciate not having
>> it indented by 60 or 80 columns any more.  There's really no
>> circumstanced under which it's worse than what we're doing today.
>
> That doesn't mean, at least to me, that we should forgo considering
> better alternatives.

I don't think so, either, but if we could agree that "Tom's patch >
doing nothing" then he could commit it and we could debate whether
there's something even better.

> We often reject patches which only improve a bit on the status quo
> because we wish for a better overall solution, particularly when we're
> talking about user interfaces that we don't want to change between every
> release.

Sure, that's true.  In this case, however, I believe that the amount
of improvement that's possible is pretty limited.  Super-wide lines
that rapid repeatedly are bad; we can probably all agree on that.
Whether or not it's better to adjust \df+ as Tom has done or introduce
\df++ or enhance \sf or something else entirely is debatable;
different people prefer different things for different reasons - or
for no reason, as some of this is surely down to personal preference.
If I thought Tom's patch solved 20% of the problem while kicking 80%
of it down the road, I'd probably agree that we ought not to adopt it;
but in fact I think it's more like the reverse -- at least in the
narrow sense of keeping \df+ output readable, which I think is about
as ambitious as we should make our goal for a thread that started out
being about showing parallel status in \df+ output.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

2016-09-27 Thread Peter Geoghegan
On Tue, Sep 27, 2016 at 2:13 PM, Tom Lane  wrote:
> I can see the value of processing unique indexes before non-unique ones.
> I'm pretty suspicious of trying to prioritize primary keys first, though,
> because (a) it's not clear why bother, and (b) PG is a tad squishy about
> whether an index is a primary key or not, so that I'd be worried about
> different backends sometimes choosing different orders.  I'd simplify
> this to "uniques in OID order then non-uniques in OID order".

I see your point. A more considered ordering of indexes for processing
by the executor (prepared for it by the relcache), including something
more that goes further than your proposal is useful in the context of
fixing the bug I mentioned [1], but for non-obvious reasons. I would
like to clarify what I meant there specifically. I am repeating
myself, but maybe I just wasn't clear before.

The theory of putting the PK first there is that we then have a
well-defined (uh, better defined) user-visible ordering *across unique
indexes* such that the problem case would *reliably* be fixed. With
only this refactoring patch applied (and no change to the relcache
ordering thing), it is then only a sheer accident of OID assignment
ordering that the INSERT ON CONFLICT problem case happens to take the
alternative path on the OP's *inferred* index (which, as it happens,
was the PK for him), rather than the other unique index that was
involved (the one that is not actually inferred, and yet is equivalent
to the PK, UPSERT-semantics-wise). So, the reporter of the bug [1] is
happy with his exact case now working, at least.

You might now counter: "But why prefer one convention over the other?
Prioritizing the PK would reliably fix that particular problem case,
but that's still pretty arbitrary."

It's true that it's somewhat arbitrary to always (speculatively)
insert into the PK first. But, I think that it's more likely that the
PK is inferred in general, and so it's more likely that users will
fall on the right side of that in practice. Besides, at least we now
have a consistent behavior.

You might also reasonably ask: "But what if there are multiple unique
indexes, none of which happen to be the PK? Isn't that subject to the
same vagaries of OID ordering anyway?"

I must admit that it is. But I don't really know where to draw the
line here. Is it worth contemplating a more complicated scheme still?
For example, trigger-style ordering; a sort order that considers index
name as a "secondary attribute", in order to ensure perfectly
consistent behavior? I must admit that I don't really have a clue
whether or not that's a good idea. It's an idea.

[1] 
https://www.postgresql.org/message-id/CAM3SWZTFTbK_Y%3D7uWJaXYLHnYQ99pV4KFpmjTKbNJR5_%2BQThzA%40mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect: Hash index support

2016-09-27 Thread Jesper Pedersen

On 09/26/2016 10:45 PM, Peter Eisentraut wrote:

On 9/26/16 1:39 PM, Jesper Pedersen wrote:

Left as is, since BuildTupleFromCStrings() vs. xyzGetDatum() are equally
readable in this case. But, I can change the patch if needed.


The point is that to use BuildTupleFromCStrings() you need to convert
numbers to strings, and then they are converted back.  This is not a
typical way to write row-returning functions.



Ok.

Changed:
 * BuildTupleFromCStrings -> xyzGetDatum
 * 'type' field: char -> text w/ full description
 * Removed 'type' information from documentation

Best regards,
 Jesper

>From d6dd73ffef9deafc938b9fe7119db0928494cbf9 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Fri, 5 Aug 2016 10:16:32 -0400
Subject: [PATCH] pageinspect: Hash index support

---
 contrib/pageinspect/Makefile  |  10 +-
 contrib/pageinspect/hashfuncs.c   | 408 ++
 contrib/pageinspect/pageinspect--1.5--1.6.sql |  59 
 contrib/pageinspect/pageinspect--1.5.sql  | 279 --
 contrib/pageinspect/pageinspect--1.6.sql  | 346 ++
 contrib/pageinspect/pageinspect.control   |   2 +-
 doc/src/sgml/pageinspect.sgml | 146 -
 7 files changed, 955 insertions(+), 295 deletions(-)
 create mode 100644 contrib/pageinspect/hashfuncs.c
 create mode 100644 contrib/pageinspect/pageinspect--1.5--1.6.sql
 delete mode 100644 contrib/pageinspect/pageinspect--1.5.sql
 create mode 100644 contrib/pageinspect/pageinspect--1.6.sql

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index a98237e..c73d728 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -2,13 +2,13 @@
 
 MODULE_big	= pageinspect
 OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \
-		  brinfuncs.o ginfuncs.o $(WIN32RES)
+		  brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES)
 
 EXTENSION = pageinspect
-DATA = pageinspect--1.5.sql pageinspect--1.4--1.5.sql \
-	pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \
-	pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \
-	pageinspect--unpackaged--1.0.sql
+DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \
+	pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
+	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
+	pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
 ifdef USE_PGXS
diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
new file mode 100644
index 000..a76b683
--- /dev/null
+++ b/contrib/pageinspect/hashfuncs.c
@@ -0,0 +1,408 @@
+/*
+ * hashfuncs.c
+ *		Functions to investigate the content of HASH indexes
+ *
+ * Copyright (c) 2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/pageinspect/hashfuncs.c
+ */
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/htup_details.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+
+PG_FUNCTION_INFO_V1(hash_metapage_info);
+PG_FUNCTION_INFO_V1(hash_page_items);
+PG_FUNCTION_INFO_V1(hash_page_stats);
+
+/* 
+ * structure for single hash page statistics
+ * 
+ */
+typedef struct HashPageStat
+{
+	uint32		live_items;
+	uint32		dead_items;
+	uint32		page_size;
+	uint32		free_size;
+	char	   *type;
+
+	/* opaque data */
+	BlockNumber hasho_prevblkno;
+	BlockNumber hasho_nextblkno;
+	Bucket		hasho_bucket;
+	uint16		hasho_flag;
+	uint16		hasho_page_id;
+} HashPageStat;
+
+
+/*
+ * Verify that the given bytea contains a HASH page, or die in the attempt.
+ * A pointer to the page is returned.
+ */
+static Page
+verify_hash_page(bytea *raw_page, bool metap)
+{
+	Page		page;
+	int			raw_page_size;
+	HashPageOpaque pageopaque;
+
+	raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+
+	if (raw_page_size != BLCKSZ)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("input page too small"),
+ errdetail("Expected size %d, got %d",
+		   BLCKSZ, raw_page_size)));
+
+	page = VARDATA(raw_page);
+	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
+	if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("page is not a HASH page"),
+ errdetail("Expected %08x, got %08x.",
+		   HASHO_PAGE_ID, pageopaque->hasho_page_id)));
+
+	if (metap)
+	{
+		if (pageopaque->hasho_flag != LH_META_PAGE)
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+	 errmsg("page is not a metadata page"),
+	 errdetail("Expected %08x, got %08x.",
+			   LH_META_PAGE, pageopaque->hasho_flag)));
+	}
+	else
+	{
+		if (pageopaque->hasho_flag == LH_META_PAGE)
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+	 errmsg("page is a metadata page"),
+	 errdetail("Flags %08x.",
+			   pageopaque->hasho_flag

Re: [HACKERS] Showing parallel status in \df+

2016-09-27 Thread Tom Lane
Robert Haas  writes:
> On Mon, Sep 26, 2016 at 3:06 PM, Stephen Frost  wrote:
>> That doesn't mean, at least to me, that we should forgo considering
>> better alternatives.

> I don't think so, either, but if we could agree that "Tom's patch >
> doing nothing" then he could commit it and we could debate whether
> there's something even better.

I think the debate is more about whether moving the source display
functionality over to \sf is a better solution than rearranging \df+
output.  (If we had consensus to do that, I'd be happy to go code it,
but I'm not going to invest the effort when it seems like we don't.)

If we'd had \sf all along, I think it's likely that we would never
have put source-code display into \df.  But of course we didn't,
and what would have been best in a green field is not necessarily
what's best or achievable given existing reality.  Both Robert and
Peter have put forward the argument that people are used to finding
this info in \df+ output, and I think that deserves a whole lot of
weight.  The \sf solution might be cleaner, but it's not so much
better that it can justify forcing people to relearn their habits.

So I think that rearranging \df+ output is really what we ought to
be doing here.

I'm not necessarily wedded to any of the precise details of what I did
in my patch --- for instance, maybe function bodies ought to be indented
one tab stop?  But we've not gotten to the merits of such points, for
lack of agreement about whether this is the basic approach to take.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-09-27 Thread Robert Haas
On Mon, Sep 26, 2016 at 3:40 PM, Peter Geoghegan  wrote:
> On Mon, Sep 26, 2016 at 6:58 PM, Robert Haas  wrote:
>>> That requires some kind of mutual exclusion mechanism, like an LWLock.
>>
>> No, it doesn't.  Shared memory queues are single-reader, single-writer.
>
> The point is that there is a natural dependency when merging is
> performed eagerly within the leader. One thing needs to be in lockstep
> with the others. That's all.

I don't know what any of that means.  You said we need something like
an LWLock, but I think we don't.  The workers just write the results
of their own final merges into shm_mqs.  The leader can read from any
given shm_mq until no more tuples can be read without blocking, just
like nodeGather.c does, or at least it can do that unless its own
queue fills up first.  No mutual exclusion mechanism is required for
any of that, as far as I can see - not an LWLock, and not anything
similar.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS]

2016-09-27 Thread Peter Geoghegan
On Tue, Sep 27, 2016 at 3:08 PM, Robert Haas  wrote:
> I don't know what any of that means.  You said we need something like
> an LWLock, but I think we don't.  The workers just write the results
> of their own final merges into shm_mqs.  The leader can read from any
> given shm_mq until no more tuples can be read without blocking, just
> like nodeGather.c does, or at least it can do that unless its own
> queue fills up first.  No mutual exclusion mechanism is required for
> any of that, as far as I can see - not an LWLock, and not anything
> similar.

I'm confused about the distinction you're making. Isn't the shm_mqs
queue itself a mutual exclusion mechanism? The leader cannot really do
anything without some input to process, because of the dependency that
naturally exists. ISTM that everything else we've discussed is
semantics, and/or about particular mechanisms in Postgres.

At a high level, based on my intuition about performance
characteristics, I have reservations about eager merging in the
leader. That's all I mean. There is a trade-off to be made between
eager and lazy merging. As I pointed out, the author of the Volcano
paper (Goetz Graefe) went on at length about this particular trade-off
for parallel sort almost 30 years ago. So, it's not clear that that
would be an easy win, or even a win.

That's all I meant.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2016-09-27 Thread Robert Haas
On Mon, Sep 26, 2016 at 11:17 AM, Anastasia Lubennikova
 wrote:
>> Is there any fundamental problem in storing them in ordered way?  I
>> mean to say, you need to anyway store all the column values on leaf
>> page, so why can't we find the exact location for the complete key.
>> Basically use truncated key to reach to leaf level and then use the
>> complete key to find the exact location to store the key.  I might be
>> missing some thing here, but if we can store them in ordered fashion,
>> we can use them even for queries containing ORDER BY (where ORDER BY
>> contains included columns).
>
> I'd say that the reason for not using included columns in any
> operations which require comparisons, is that they don't have opclass.
> Let's go back to the example of points.
> This data type don't have any opclass for B-tree, because of fundamental
> reasons.
> And we can not apply _bt_compare() and others to this attribute, so
> we don't include it to scan key.
>
> create table t (i int, i2 int, p point);
> create index idx1 on (i) including (i2);
> create index idx2 on (i) including (p);
> create index idx3 on (i) including (i2, p);
> create index idx4 on (i) including (p, i2);
>
> You can keep tuples ordered in idx1, but not for idx2, partially ordered for
> idx3, but not for idx4.

Yeah, I think we shouldn't go there.  I mean, once you start ordering
by INCLUDING columns, then you're going to need to include them in
leaf pages because otherwise you can't actually guarantee that they
are in the right order.  And then you have to wonder why an INCLUDING
column is any different from a non-INCLUDING column.  It seems best to
make a firm rule that INCLUDING columns are there only for the values,
not for ordering.  That rule is simple and clear, which is a good
thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-09-27 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> I think the debate is more about whether moving the source display
> functionality over to \sf is a better solution than rearranging \df+
> output.  (If we had consensus to do that, I'd be happy to go code it,
> but I'm not going to invest the effort when it seems like we don't.)

Right, that's the main question.

> If we'd had \sf all along, I think it's likely that we would never
> have put source-code display into \df.  But of course we didn't,

Indeed.

> and what would have been best in a green field is not necessarily
> what's best or achievable given existing reality.  Both Robert and
> Peter have put forward the argument that people are used to finding
> this info in \df+ output, and I think that deserves a whole lot of
> weight.  The \sf solution might be cleaner, but it's not so much
> better that it can justify forcing people to relearn their habits.
> 
> So I think that rearranging \df+ output is really what we ought to
> be doing here.

Alright, given that Robert's made it clear what his preference is and
you're in agreement with that, I'll remove my objection to moving down
that path.  I agree that it's better than the current situation.  If we
do end up improving \sf (which seems like a good idea, in general), then
we may wish to consider a display option to control if the source is
included in \df+ or not, but that doesn't need to bar this patch from
going in.

The earlier comments on the thread hadn't been as clear with regard to
who held what opinions regarding the options and I'm glad that we were
able to reach a point where it was much clearer that there was strong
support for keeping the source in \df+.

> I'm not necessarily wedded to any of the precise details of what I did
> in my patch --- for instance, maybe function bodies ought to be indented
> one tab stop?  But we've not gotten to the merits of such points, for
> lack of agreement about whether this is the basic approach to take.

As for this, I wouldn't indent or change the source at all.  For
starters, indentation actually matters for some PLs, and I can certainly
see people wanting to be able to copy/paste from the output, now that
it'll be possible to reasonably do from the \df+ output.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-27 Thread David Steele
On 9/26/16 2:36 AM, Michael Paquier wrote:

> Just a nit:
>  
> - postmaster.pid
> + postmaster.pid and postmaster.opts
>  
> Having one  block for each file would be better.

OK, changed.

> +const char *excludeFile[] =
> excludeFiles[]?
> 
> +# Move pg_replslot out of $pgdata and create a symlink to it
> +rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
> +   or die "unable to move $pgdata/pg_replslot";
> +symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot");
> This will blow up on Windows. Those tests need to be included in the
> SKIP block. Even if your code needs to support junction points on
> Windows, as perl does not offer an equivalent for it we just cannot
> test it on this platform.

Fixed.

-- 
-David
da...@pgmasters.net
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 0f09d82..a8daa07 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1090,6 +1090,22 @@ SELECT pg_stop_backup();

 

+The contents of the pg_dynshmem/, pg_stat_tmp/,
+pg_notify/, pg_serial/,
+pg_snapshots/, and pg_subtrans/ directories can
+be omitted from the backup as they will be initialized on postmaster
+startup. If the  is set and is
+under the database cluster directory then the contents of the directory
+specified by  can also be omitted.
+   
+
+   
+Any file or directory beginning with pgsql_tmp can be
+omitted from the backup.  These files are removed on postmaster start and
+the directories will be recreated as needed.
+   
+
+   
 The backup label
 file includes the label string you gave to pg_start_backup,
 as well as the time at which pg_start_backup was run, and
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 68b0941..d65687f 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2059,17 +2059,26 @@ The commands accepted in walsender mode are:
   

 
- postmaster.pid
+ postmaster.pid and postmaster.opts
 


 
- postmaster.opts
+ postgresql.auto.conf.tmp
 


 
- various temporary files created during the operation of the 
PostgreSQL server
+ backup_label and tablespace_map.  If these
+ files exist they belong to an exclusive backup and are not applicable
+ to the base backup.
+
+   
+   
+
+ Various temporary files and directories created during the operation 
of
+ the PostgreSQL server, i.e. any file or directory beginning with
+ pgsql_tmp.
 


@@ -2082,7 +2091,11 @@ The commands accepted in walsender mode are:


 
- pg_replslot is copied as an empty directory.
+ pg_replslot, pg_dynshmem,
+ pg_stat_tmp, pg_notify,
+ pg_serial, pg_snapshots, and
+ pg_subtrans are copied as empty directories (even if they
+ are symbolic links).
 


diff --git a/doc/src/sgml/ref/pg_basebackup.sgml 
b/doc/src/sgml/ref/pg_basebackup.sgml
index 9f1eae1..984ea5b 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -610,10 +610,8 @@ PostgreSQL documentation
   
The backup will include all files in the data directory and tablespaces,
including the configuration files and any additional files placed in the
-   directory by third parties. But only regular files and directories are
-   copied.  Symbolic links (other than those used for tablespaces) and special
-   device files are skipped.  (See  for
-   the precise details.)
+   directory by third parties, with certain exceptions.  (See
+for the complete list of exceptions.)
   
 
   
diff --git a/src/backend/replication/basebackup.c 
b/src/backend/replication/basebackup.c
index da9b7a6..8412472 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -30,6 +30,7 @@
 #include "replication/basebackup.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
+#include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "utils/builtins.h"
@@ -55,8 +56,10 @@ static int64 sendDir(char *path, int basepathlen, bool 
sizeonly,
 static bool sendFile(char *readfilename, char *tarfilename,
 struct stat * statbuf, bool missing_ok);
 static void sendFileWithContent(const char *filename, const char *content);
-static void _tarWriteHeader(const char *filename, const char *linktarget,
-   struct stat * statbuf);
+static int64 _tarWriteHeader(const char *filename, const char *linktarget,
+   struct stat * statbuf, bool sizeonly);
+static int64 _tarWriteDir(char *pathbuf, int basepathlen, struct stat *statbuf,
+   bool sizeonly);
 static void send_int8_string(StringInfo

Re: [HACKERS]

2016-09-27 Thread Robert Haas
You seem to have erased the subject line from this email somehow.

On Tue, Sep 27, 2016 at 10:18 AM, Peter Geoghegan  wrote:
> On Tue, Sep 27, 2016 at 3:08 PM, Robert Haas  wrote:
>> I don't know what any of that means.  You said we need something like
>> an LWLock, but I think we don't.  The workers just write the results
>> of their own final merges into shm_mqs.  The leader can read from any
>> given shm_mq until no more tuples can be read without blocking, just
>> like nodeGather.c does, or at least it can do that unless its own
>> queue fills up first.  No mutual exclusion mechanism is required for
>> any of that, as far as I can see - not an LWLock, and not anything
>> similar.
>
> I'm confused about the distinction you're making. Isn't the shm_mqs
> queue itself a mutual exclusion mechanism? The leader cannot really do
> anything without some input to process, because of the dependency that
> naturally exists. ISTM that everything else we've discussed is
> semantics, and/or about particular mechanisms in Postgres.

No, a shm_mq is not a mutual exclusion mechanism.  It's a queue!

A data dependency and a lock aren't the same thing.  If your point in
saying "we need something like an LWLock" was actually "the leader
will have to wait if it's merged all tuples from a worker and no more
are immediately available" then I think that's a pretty odd way to
make that point.  To me, the first statement appears to be false,
while the second is obviously true.

> At a high level, based on my intuition about performance
> characteristics, I have reservations about eager merging in the
> leader. That's all I mean. There is a trade-off to be made between
> eager and lazy merging. As I pointed out, the author of the Volcano
> paper (Goetz Graefe) went on at length about this particular trade-off
> for parallel sort almost 30 years ago. So, it's not clear that that
> would be an easy win, or even a win.
>
> That's all I meant.

OK, well I'm not taking any position on whether what Heikki is
proposing will turn out to be good from a performance point of view.
My intuitions about sorting performance haven't turned out to be
particularly good in the past.  I'm only saying that if you do decide
to queue the tuples passing from worker to leader in a shm_mq, you
shouldn't read from the shm_mq objects in a round robin, but rather
read multiple tuples at a time from the same worker whenever that is
possible without blocking.  If you read tuples from workers one at a
time, you get a lot of context-switch thrashing, because the worker
wakes up, writes one tuple (or even part of a tuple) and immediately
goes back to sleep.  Then it shortly thereafter does it again.
Whereas if you drain the worker's whole queue each time you read from
that queue, then the worker can wake up, refill it completely, and go
back to sleep again.  So you incur fewer context switches for the same
amount of work.  Or at least, that's how it worked out for Gather, and
what I am saying is that it will probably work out the same way for
sorting, if somebody chooses to try to implement this.

I am also saying that using the shm_mq code here does not require the
introduction of an LWLock.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detect supported SET parameters when pg_restore is run

2016-09-27 Thread Robert Haas
On Mon, Sep 26, 2016 at 9:56 PM, Vitaly Burovoy
 wrote:
> At work we use several major versions of PostgreSQL, and developers
> use non-local clusters for developing and debugging.
> We do dump/restore schemas/data via custom/dir formats and we have to
> keep several client versions for 9.2, 9.4 and 9.5 versions on local
> workstations because after pg_restore95 connects to 9.2, it fails when
> it sets run-time parameters via SET:
>
> vitaly@work 01:21:26 ~ $ pg_restore95 --host DEV_HOST_9_2 -d DBNAME
> --data-only -e -1 -Fc arhcivefile
> Password:
> pg_restore95: [archiver (db)] Error while INITIALIZING:
> pg_restore95: [archiver (db)] could not execute query: ERROR:
> unrecognized configuration parameter "lock_timeout"
> Command was: SET lock_timeout = 0;

I think our policy is that a newer pg_dump needs to work with an older
version of the database, but I don't think we make any similar
guarantee for other tools, such as pg_restore.  It's an interesting
question whether we should try a little harder to do that, but I
suspect it might require more changes than what you've got in this
patch

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS]

2016-09-27 Thread Peter Geoghegan
On Tue, Sep 27, 2016 at 3:31 PM, Robert Haas  wrote:
> You seem to have erased the subject line from this email somehow.

I think that that was Gmail. Maybe that's new? Generally, I have to go
out of my way to change the subject line, so it seems unlikely that I
fat-fingered it. I wish that they'd stop changing things...

> On Tue, Sep 27, 2016 at 10:18 AM, Peter Geoghegan  wrote:
> No, a shm_mq is not a mutual exclusion mechanism.  It's a queue!
>
> A data dependency and a lock aren't the same thing.  If your point in
> saying "we need something like an LWLock" was actually "the leader
> will have to wait if it's merged all tuples from a worker and no more
> are immediately available" then I think that's a pretty odd way to
> make that point.  To me, the first statement appears to be false,
> while the second is obviously true.

Okay. Sorry for being unclear.

> OK, well I'm not taking any position on whether what Heikki is
> proposing will turn out to be good from a performance point of view.
> My intuitions about sorting performance haven't turned out to be
> particularly good in the past.  I'm only saying that if you do decide
> to queue the tuples passing from worker to leader in a shm_mq, you
> shouldn't read from the shm_mq objects in a round robin, but rather
> read multiple tuples at a time from the same worker whenever that is
> possible without blocking.  If you read tuples from workers one at a
> time, you get a lot of context-switch thrashing, because the worker
> wakes up, writes one tuple (or even part of a tuple) and immediately
> goes back to sleep.  Then it shortly thereafter does it again.
> Whereas if you drain the worker's whole queue each time you read from
> that queue, then the worker can wake up, refill it completely, and go
> back to sleep again.  So you incur fewer context switches for the same
> amount of work.  Or at least, that's how it worked out for Gather, and
> what I am saying is that it will probably work out the same way for
> sorting, if somebody chooses to try to implement this.

Maybe. This would involve the overlapping of multiple worker merges
that are performed in parallel with a single leader merge. I don't
think it would be all that great. There would be a trade-off around
disk bandwidth utilization too, so I'm particularly doubtful that the
complexity would pay for itself. But, of course, I too have been wrong
about sorting performance characteristics in the past, so I can't be
sure.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] assert violation in logical messages serialization

2016-09-27 Thread Stas Kelvich
Hello.

During processing of logical messages in ReorderBufferSerializeChange()
pointer to ondisk structure isn’t updated after possible reorder buffer realloc 
by
ReorderBufferSerializeReserve().

Actual reason spotted by Konstantin Knizhnik.



reorderbuffer.patch
Description: Binary data


-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Redesigning parallel dump/restore's wait-for-workers logic

2016-09-27 Thread Kevin Grittner
This patch applies with a few minor offsets, compiles without
warning, and passes all regression tests including `make
check-world` with TAP tests enabled.  This and the related patch
dealing with the parallel API definitely make things cleaner and
easier to follow.

I find it disappointing that ACLs continue to be held  back as a
fourth step outside the framework of the "normal" ordering.  That
is an ugly hack, which makes it impossible to, for example, create
a fifth step to create indexes on materialized views and refresh
them in anything resembling a clean fashion.  Would it make sense
to deal with the ACL ordering hack in one of these patches, or
should that be left for later?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-09-27 Thread David Steele
On 9/26/16 2:02 AM, Michael Paquier wrote:

> On Mon, Sep 26, 2016 at 2:15 AM, David Steele  wrote:
>
> Thanks for the review and the comments!
> 
>> I notice that the copyright from pgcrypto/sha1.c was carried over but
>> not the copyright from pgcrypto/sha2.c.  I'm no expert on how this
>> works, but I believe the copyright from sha2.c must be copied over.
> 
> Right, those copyright bits are missing:
> - * AUTHOR: Aaron D. Gifford 
> [...]
> - * Copyright (c) 2000-2001, Aaron D. Gifford
> The license block being the same, it seems to me that there is no need
> to copy it over. The copyright should be enough.

Looks fine to me.

>> Also, are there any plans to expose these functions directly to the user
>> without loading pgcrypto?  Now that the functionality is in core it
>> seems that would be useful.  In addition, it would make this patch stand
>> on its own rather than just being a building block.
> 
> There have been discussions about avoiding enabling those functions by
> default in the distribution. We'd rather not do that...

OK.

>> * [PATCH 2/8] Move encoding routines to src/common/
>>
>> I wonder if it is confusing to have two of encode.h/encode.c.  Perhaps
>> they should be renamed to make them distinct?
> 
> Yes it may be a good idea to rename that, like encode_utils.[c|h] for
> the new files.

I like that better.

>> Couldn't md5_crypt_verify() be made more general and take the hash type?
>>  For instance, password_crypt_verify() with the last param as the new
>> password type enum.
> 
> This would mean incorporating the whole SASL message exchange into
> this routine because the password string is part of the scram
> initialization context, and it seems to me that it is better to just
> do once a lookup at the entry in pg_authid. So we'd finish with a more
> confusing code I am afraid. At least that's the conclusion I came up
> with when doing that.. md5_crypt_verify does only the work on a
> received password.

Ah, yes, I see now.  I missed that when I reviewed patch 6.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix some corner cases that cube_in rejects

2016-09-27 Thread Tom Lane
Amul Sul  writes:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   tested, passed
> Documentation:not tested

> Note for committer :  There are unnecessary files (cube_1.out, cube_2.out & 
> cube_3.out) in contrib directory, that need to be removed at code checkin,  
> other than this concern, I think this patch looks pretty reasonable. Thanks.

Thanks for the review --- pushed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detect supported SET parameters when pg_restore is run

2016-09-27 Thread Tom Lane
Robert Haas  writes:
> On Mon, Sep 26, 2016 at 9:56 PM, Vitaly Burovoy
>  wrote:
>> We do dump/restore schemas/data via custom/dir formats and we have to
>> keep several client versions for 9.2, 9.4 and 9.5 versions on local
>> workstations because after pg_restore95 connects to 9.2, it fails when
>> it sets run-time parameters via SET:

> I think our policy is that a newer pg_dump needs to work with an older
> version of the database, but I don't think we make any similar
> guarantee for other tools, such as pg_restore.  It's an interesting
> question whether we should try a little harder to do that, but I
> suspect it might require more changes than what you've got in this
> patch

The general policy has always been that pg_dump output is only expected to
restore without errors into a server that's the same or newer version as
pg_dump (regardless of the server version being dumped from).  If you try
to restore into an older server version, you may get some errors, which we
try to minimize the scope of when possible.  But it will never be possible
to promise none at all.  I think the correct advice here is simply "don't
use pg_restore -e -1 when trying to restore into an older server version".

Taking a step back, there are any number of ways that you might get
errors during a pg_restore into a DB that's not set up exactly as pg_dump
expected.  Missing roles or tablespaces, for example.  Again, the dump
output is constructed so that you can survive those problems and bull
ahead ... but not with "-e -1".  I don't see a very good reason why
older-server-version shouldn't be considered the same type of problem.

The patch as given seems rather broken anyway --- won't it change text
output from pg_dump as well as on-line output from pg_restore?  (That is,
it looks to me like the SETs emitted by pg_dump to text format would
depend on the source server version, which they absolutely should not.
Either that, or the patch is overwriting pg_dump's idea of what the
source server version is at the start of the output phase, which is
likely to break all kinds of stuff when dumping from an older server.)

It's possible that we could alleviate this specific symptom by arranging
for the BEGIN caused by "-1" to come out after the initial SETs, but
I'm not sure how complicated that would be or whether it's worth the
trouble.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Redesigning parallel dump/restore's wait-for-workers logic

2016-09-27 Thread Tom Lane
Kevin Grittner  writes:
> This patch applies with a few minor offsets, compiles without
> warning, and passes all regression tests including `make
> check-world` with TAP tests enabled.  This and the related patch
> dealing with the parallel API definitely make things cleaner and
> easier to follow.

OK, thanks for reviewing!

> I find it disappointing that ACLs continue to be held  back as a
> fourth step outside the framework of the "normal" ordering.  That
> is an ugly hack, which makes it impossible to, for example, create
> a fifth step to create indexes on materialized views and refresh
> them in anything resembling a clean fashion.  Would it make sense
> to deal with the ACL ordering hack in one of these patches, or
> should that be left for later?

Seems like material for some other patch.  It would be greatly more
invasive than what I have here, I fear, and probably require some
fundamental design work.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting huge pages on Windows

2016-09-27 Thread Robert Haas
On Sun, Sep 25, 2016 at 10:45 PM, Tsunakawa, Takayuki
 wrote:
> Credit: This patch is based on Thomas Munro's one.

How are they different?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misdesigned command/status APIs for parallel dump/restore

2016-09-27 Thread Kevin Grittner
This patch also applied with minor offsets, compiled without
warning, and passes regression tests, including `make check-world`
with TAP tests enabled.  It looks good to me, presenting a cleaner
API.

I really want to get this to the point where we can dump and
restore nested materialized views better, and this doesn't get us
to the point where that's immediately possible; but with a cleaner
API it should be easier to get there, so this is a step along the
way.

Setting to Ready for Committer.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detect supported SET parameters when pg_restore is run

2016-09-27 Thread Vitaly Burovoy
On 9/27/16, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Sep 26, 2016 at 9:56 PM, Vitaly Burovoy
>>  wrote:
>>> We do dump/restore schemas/data via custom/dir formats and we have to
>>> keep several client versions for 9.2, 9.4 and 9.5 versions on local
>>> workstations because after pg_restore95 connects to 9.2, it fails when
>>> it sets run-time parameters via SET:
>
>> I think our policy is that a newer pg_dump needs to work with an older
>> version of the database, but I don't think we make any similar
>> guarantee for other tools, such as pg_restore.  It's an interesting
>> question whether we should try a little harder to do that, but I
>> suspect it might require more changes than what you've got in this
>> patch

Well... I'm not inclined to insert support of restoring from a higher
major version to a lower one, because it can lead to security issues
(e.g. with RLS). But my observation is that all new features supported
by the "pg_dump" are "incremental", e.g. the last feature "parallel"
for pg_dump/pg_restore --- lack of "PARALLEL UNSAFE" (which is by
default) from 9.6 and lack of it from pre-9.6.

That behavior allows newer versions of pg_restore to use dumps from DB
of older versions because of lack of new features grammar. With the
patch I'm able to use pg_dump96/pg_restore96 for our database of 9.2
(dump from 9.2 and restore to 9.2).

> The general policy has always been that pg_dump output is only expected to
> restore without errors into a server that's the same or newer version as
> pg_dump (regardless of the server version being dumped from).  If you try
> to restore into an older server version, you may get some errors, which we
> try to minimize the scope of when possible.  But it will never be possible
> to promise none at all.  I think the correct advice here is simply "don't
> use pg_restore -e -1 when trying to restore into an older server version".

Why can't I use it if pg_dump92/pg_restore92 have the same behavior as
pg_dump96/pg_restore96 except the SET block?
The patch does not give guarantee of a restoration, it just avoids
setting unsupported parameters for pg_restore the same way as pg_dump
does.
The other issues are for solving by the user who wants to restore to a
DB of older version.

> Taking a step back, there are any number of ways that you might get
> errors during a pg_restore into a DB that's not set up exactly as pg_dump
> expected.  Missing roles or tablespaces, for example.

It does not depends on a pg_dump/pg_restore version and can be soved
by using command line options:
--no-tablespaces --no-owner --no-privileges

> Again, the dump output is constructed so that you can survive those problems
> and bull ahead ... but not with "-e -1".

I think "-e -1" was invented specially for it --- stop restoring if
something is going wrong. Wasn't it?

> I don't see a very good reason why
> older-server-version shouldn't be considered the same type of problem.
>
> The patch as given seems rather broken anyway --- won't it change text
> output from pg_dump as well as on-line output from pg_restore?

My opinion --- no (and I wrote it in the initial letter): because it
is impossible to know what version of a database is used for that
plain text output. Users who use output to a plain text are able to
use sed (or something else) to delete unwanted rows.

> (That is,
> it looks to me like the SETs emitted by pg_dump to text format would
> depend on the source server version, which they absolutely should not.
> Either that, or the patch is overwriting pg_dump's idea of what the
> source server version is at the start of the output phase, which is
> likely to break all kinds of stuff when dumping from an older server.)

I agree, that's why I left current behavior as is for the plain text output.

> It's possible that we could alleviate this specific symptom by arranging
> for the BEGIN caused by "-1" to come out after the initial SETs, but
> I'm not sure how complicated that would be or whether it's worth the
> trouble.

It leads to change ERRORs to WARNINGs but current behavior discussed
above is left the same.
Why don't just avoid SET parameters when we know for sure they are not
supported by the server to which pg_restore is connected?

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-09-27 Thread Kevin Grittner
On Fri, Sep 9, 2016 at 4:25 PM, Kevin Grittner  wrote:
> On Sun, Sep 4, 2016 at 12:42 PM, Emre Hasegeli  wrote:

> These patches apply and build on top of 5c609a74 with no problems,
> but `make check` finds differences per the attached.  Please
> investigate why the regression tests are failing and what the
> appropriate response is.

>> I am not much experienced in C.  If you think that inline functions
>> are better suited, I can rework the patches.
>
> I suspect that they will be as fast or faster, and they eliminate
> the hazard of multiple evaluation, where a programmer might not be
> aware of the multiple evaluation or of some side-effect of an
> argument.

Emre, are you going to address the above?  It would have to be Real
Soon Now.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detect supported SET parameters when pg_restore is run

2016-09-27 Thread Tom Lane
Vitaly Burovoy  writes:
> On 9/27/16, Tom Lane  wrote:
>> The general policy has always been that pg_dump output is only expected to
>> restore without errors into a server that's the same or newer version as
>> pg_dump (regardless of the server version being dumped from).

> Why can't I use it if pg_dump92/pg_restore92 have the same behavior as
> pg_dump96/pg_restore96 except the SET block?

That's a pretty large "if", and not one I'm prepared to commit to.
Before you assert that it's true, you might want to reflect on the
size of the diff between 9.2 and 9.6 pg_dump (hint: it's over 20K
lines in -u format).

>> Either that, or the patch is overwriting pg_dump's idea of what the
>> source server version is at the start of the output phase, which is
>> likely to break all kinds of stuff when dumping from an older server.)

> I agree, that's why I left current behavior as is for the plain text output.

I'm not exactly convinced that you did.  There's only one copy of
Archive->remoteVersion, and you're overwriting it long before the
dump process is over.  That'd break anything that consulted it to
find the source server's version after RestoreArchive starts.

It might not be a bad thing to clearly distinguish source server version
from (expected?) target server version, but pg_dump doesn't do that
currently, and making it do so is probably not entirely trivial.
I think your patch confuses that distinction further, which
is not going to be helpful in the long run.

I could get behind a patch that split remoteVersion into sourceVersion
and targetVersion and made sure that all the existing references to
remoteVersion were updated to the correct one of those.  After that
we could do something like what you have here in a less shaky fashion.
As Robert noted, there'd probably still be a long way to go before it'd
really work to use a newer pg_dump with an older target version, but at
least we'd not be building on quicksand.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Python adding support for multi-dimensional arrays

2016-09-27 Thread Heikki Linnakangas

On 09/27/2016 02:04 PM, Dave Cramer wrote:

On 26 September 2016 at 14:52, Dave Cramer  wrote:

This crashes with arrays with non-default lower bounds:

postgres=# SELECT * FROM test_type_conversion_array_int
4('[2:4]={1,2,3}');
INFO:  ([1, 2, ], )
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Attached patch fixes this bug, and adds a test for it.


I spent some more time massaging this:

* Changed the loops from iterative to recursive style. I think this 
indeed is slightly easier to understand.


* Fixed another segfault, with too deeply nested lists:

CREATE or replace FUNCTION test_type_conversion_mdarray_toodeep() 
RETURNS int[] AS $$

return [[1]]
$$ LANGUAGE plpythonu;

* Also, in PLySequence_ToArray(), we must check that the 'len' of the 
array doesn't overflow.


* Fixed reference leak in the loop in PLySequence_ToArray() to count the 
number of dimensions.



I'd like to see some updates to the docs for this. The manual doesn't
currently say anything about multi-dimensional arrays in pl/python, but it
should've mentioned that they're not supported. Now that it is supported,
should mention that, and explain briefly that a multi-dimensional array is
mapped to a python list of lists.


If the code passes I'll fix the docs


Please do, thanks!

- Heikki

>From 6bcff9d1787fc645118a3ecbb71d1ef7561a5bfd Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 27 Sep 2016 21:53:17 +0300
Subject: [PATCH 1/1] WIP: Multi-dimensional arrays in PL/python

---
 src/pl/plpython/expected/plpython_types.out   | 151 +-
 src/pl/plpython/expected/plpython_types_3.out | 151 +-
 src/pl/plpython/plpy_typeio.c | 272 +-
 src/pl/plpython/sql/plpython_types.sql|  86 
 4 files changed, 608 insertions(+), 52 deletions(-)

diff --git a/src/pl/plpython/expected/plpython_types.out b/src/pl/plpython/expected/plpython_types.out
index f0b6abd..947244e 100644
--- a/src/pl/plpython/expected/plpython_types.out
+++ b/src/pl/plpython/expected/plpython_types.out
@@ -537,9 +537,133 @@ INFO:  (None, )
 (1 row)
 
 SELECT * FROM test_type_conversion_array_int4(ARRAY[[1,2,3],[4,5,6]]);
-ERROR:  cannot convert multidimensional array to Python list
-DETAIL:  PL/Python only supports one-dimensional arrays.
-CONTEXT:  PL/Python function "test_type_conversion_array_int4"
+INFO:  ([[1, 2, 3], [4, 5, 6]], )
+ test_type_conversion_array_int4 
+-
+ {{1,2,3},{4,5,6}}
+(1 row)
+
+SELECT * FROM test_type_conversion_array_int4(ARRAY[[[1,2,NULL],[NULL,5,6]],[[NULL,8,9],[10,11,12]]]);
+INFO:  ([[[1, 2, None], [None, 5, 6]], [[None, 8, 9], [10, 11, 12]]], )
+  test_type_conversion_array_int4  
+---
+ {{{1,2,NULL},{NULL,5,6}},{{NULL,8,9},{10,11,12}}}
+(1 row)
+
+SELECT * FROM test_type_conversion_array_int4('[2:4]={1,2,3}');
+INFO:  ([1, 2, 3], )
+ test_type_conversion_array_int4 
+-
+ {1,2,3}
+(1 row)
+
+CREATE FUNCTION test_type_conversion_array_int8(x int8[]) RETURNS int8[] AS $$
+plpy.info(x, type(x))
+return x
+$$ LANGUAGE plpythonu;
+SELECT * FROM test_type_conversion_array_int8(ARRAY[[[1,2,NULL],[NULL,5,6]],[[NULL,8,9],[10,11,12]]]::int8[]);
+INFO:  ([[[1L, 2L, None], [None, 5L, 6L]], [[None, 8L, 9L], [10L, 11L, 12L]]], )
+  test_type_conversion_array_int8  
+---
+ {{{1,2,NULL},{NULL,5,6}},{{NULL,8,9},{10,11,12}}}
+(1 row)
+
+CREATE FUNCTION test_type_conversion_array_float4(x float4[]) RETURNS float4[] AS $$
+plpy.info(x, type(x))
+return x
+$$ LANGUAGE plpythonu;
+SELECT * FROM test_type_conversion_array_float4(ARRAY[[[1.2,2.3,NULL],[NULL,5.7,6.8]],[[NULL,8.9,9.345],[10.123,11.456,12.6768]]]::float4[]);
+INFO:  ([[[1.200476837158, 2.29952316284, None], [None, 5.69809265137, 6.80190734863]], [[None, 8.89618530273, 9.345000267028809], [10.123000144958496, 11.456000328063965, 12.676799774169922]]], )
+  test_type_conversion_array_float4   
+--
+ {{{1.2,2.3,NULL},{NULL,5.7,6.8}},{{NULL,8.9,9.345},{10.123,11.456,12.6768}}}
+(1 row)
+
+CREATE FUNCTION test_type_conversion_array_float8(x float8[]) RETURNS float8[] AS $$
+plpy.info(x, type(x))
+return x
+$$ LANGUAGE plpythonu;
+SELECT * FROM test_type_conversion_array_float8(ARRAY[[[1.2,2.3,NULL],[NULL,5.7,6.8]],[[NULL,8.9,9.345],[10.123,11.456,12.6768]]]::float8[]);
+INFO:  ([[[1.2, 2.3, None], [None, 5.7, 6.8]], [[None, 8.9, 9.345], [10.123, 11.456, 12.6768]]], )
+  test_type_conversion_array_float8   
+--
+ {{{1.2,2.3,NULL},{NULL,5.7,6.8}},{{NULL,8.

Re: [HACKERS] Hash Indexes

2016-09-27 Thread Jesper Pedersen

On 09/20/2016 09:02 AM, Amit Kapila wrote:

On Fri, Sep 16, 2016 at 11:22 AM, Amit Kapila  wrote:


I do want to work on it, but it is always possible that due to some
other work this might get delayed.  Also, I think there is always a
chance that while doing that work, we face some problem due to which
we might not be able to use that optimization.  So I will go with your
suggestion of removing hashscan.c and it's usage for now and then if
required we will pull it back.  If nobody else thinks otherwise, I
will update this in next patch version.



In the attached patch, I have removed the support of hashscans.  I
think it might improve performance by few percentage (especially for
single row fetch transactions) as we have registration and destroy of
hashscans.




I have been running various tests, and applications with this patch 
together with the WAL v5 patch [1].


As I havn't seen any failures and doesn't currently have additional 
feedback I'm moving this patch to "Ready for Committer" for their feedback.


If others have comments, move the patch status back in the CommitFest 
application, please.


[1] 
https://www.postgresql.org/message-id/CAA4eK1KE%3D%2BkkowyYD0vmch%3Dph4ND3H1tViAB%2B0cWTHqjZDDfqg%40mail.gmail.com


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-09-27 Thread Jesper Pedersen

On 09/25/2016 01:00 AM, Amit Kapila wrote:

Attached patch fixes the problem, now we do perform full page writes
for bitmap pages.  Apart from that, I have rebased the patch based on
latest concurrent index patch [1].  I have updated the README as well
to reflect the WAL logging related information for different
operations.

With attached patch, all the review comments or issues found till now
are addressed.



I have been running various tests, and applications with this patch 
together with the CHI v8 patch [1].


As I havn't seen any failures and doesn't currently have additional 
feedback I'm moving this patch to "Ready for Committer" for their feedback.


If others have comments, move the patch status back in the CommitFest 
application, please.


[1] 
https://www.postgresql.org/message-id/CAA4eK1%2BX%3D8sUd1UCZDZnE3D9CGi9kw%2Bkjxp2Tnw7SX5w8pLBNw%40mail.gmail.com


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2016-09-27 Thread Robert Haas
On Thu, Sep 22, 2016 at 6:41 AM, Ashutosh Bapat
 wrote:
> [ new patch ]

This should probably get updated since Rajkumar reported a crash.
Meanwhile, here are some comments from an initial read-through:

+ * Multiple relations may be partitioned in the same way. The relations
+ * resulting from joining such relations may be partitioned in the same way as
+ * the joining relations.  Similarly, relations derived from such relations by
+ * grouping, sorting be partitioned in the same as the underlying relations.

I think you should change "may be partitioned in the same way" to "are
partitioned in the same way" or "can be regarded as partitioned in the
same way". The sentence that begins with "Similarly," is not
grammatical; it should say something like: ...by grouping or sorting
are partitioned in the same way as the underlying relations.

@@ -870,20 +902,21 @@ RelationBuildPartitionDesc(Relation rel)
 result->bounds->rangeinfo = rangeinfo;
 break;
 }
 }
 }

 MemoryContextSwitchTo(oldcxt);
 rel->rd_partdesc = result;
 }

+
 /*
  * Are two partition bound collections logically equal?
  *
  * Used in the keep logic of relcache.c (ie, in RelationClearRelation()).
  * This is also useful when b1 and b2 are bound collections of two separate
  * relations, respectively, because BoundCollection is a canonical
  * representation of a set partition bounds (for given partitioning strategy).
  */
 bool
 partition_bounds_equal(PartitionKey key,

Spurious hunk.

+ * For an umpartitioned table, it returns NULL.

Spelling.

+ * two arguemnts and returns boolean. For types, it
suffices to match

Spelling.

+ * partition key expression is stored as a single member list to accomodate

Spelling.

+ * For a base relation, construct an array of partition key expressions. Each
+ * partition key expression is stored as a single member list to accomodate
+ * more partition keys when relations are joined.

How would joining relations result in more partitioning keys getting
added?  Especially given the comment for the preceding function, which
says that a new PartitionScheme gets created unless an exact match is
found.

+if (!lc)

Test lc == NIL instead of !lc.

+extern int
+PartitionSchemeGetNumParts(PartitionScheme part_scheme)
+{
+return part_scheme ? part_scheme->nparts : 0;
+}

I'm not convinced it's a very good idea for this function to have
special handling for when part_scheme is NULL.  In
try_partition_wise_join() that checks is not needed because it's
already been done, and in generate_partition_wise_join_paths it is
needed but only because you are initializing nparts too early.  If you
move this initialization down below the IS_DUMMY_REL() check you won't
need the NULL guard.  I would ditch this function and let the callers
access the structure member directly.

+extern int
+PartitionSchemeGetNumKeys(PartitionScheme part_scheme)
+{
+return part_scheme ? part_scheme->partnatts : 0;
+}

Similarly here.  have_partkey_equi_join should probably have a
quick-exit path when part_scheme is NULL, and then num_pks can be set
afterwards unconditionally.  Same for match_expr_to_partition_keys.
build_joinrel_partition_info already has it and doesn't need this
double-check.

+extern Oid *
+PartitionDescGetPartOids(PartitionDesc part_desc)
+{
+Oid   *part_oids;
+intcnt_parts;
+
+if (!part_desc || part_desc->nparts <= 0)
+return NULL;
+
+part_oids = (Oid *) palloc(sizeof(Oid) * part_desc->nparts);
+for (cnt_parts = 0; cnt_parts < part_desc->nparts; cnt_parts++)
+part_oids[cnt_parts] = part_desc->oids[cnt_parts];
+
+return part_oids;
+}

I may be missing something, but this looks like a bad idea in multiple
ways.  First, you've got checks for part_desc's validity here that
should be in the caller, as noted above.  Second, you're copying an
array by looping instead of using memcpy().  Third, the one and only
caller is set_append_rel_size, which doesn't seem to have any need to
copy this data in the first place.  If there is any possibility that
the PartitionDesc is going to change under us while that function is
running, something is deeply broken.  Nothing in the planner is going
to cope with the table structure changing under us, so it had better
not.

+/*
+ * For a partitioned relation, we will save the child RelOptInfos in parent
+ * RelOptInfo in the same the order as corresponding bounds/lists are
+ * stored in the partition scheme.
+ */

This comment seems misplaced; shouldn't it be next to the code that is
actually doing this, rather than the code that is merely setting up
for it?  And, also, the comment implies that we're doing this instead
of what we'd normally do, whereas I think we are actually doing
something additional.

+/*
+ * Save topmost parent's relid. If the parent itself is a child of some
+ * other relation, use parent's topm

Re: [HACKERS] Make flex/bison checks stricter in Git trees

2016-09-27 Thread Daniel Gustafsson
> On 27 Sep 2016, at 15:49, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> When running ./configure on a system without Flex/Bison it’s easy to miss the
>> warning that flies past and then run into compilation error instead.  When it
>> happened to a colleague yesterday a brief discussion came to the conclusion
>> that it would be neat it the flex and bison checks took the existence of the
>> generated files into consideration.
> 
>> Attached patch scans for the generated files and iff flex or bison isn’t 
>> found
>> in a non-cross compilation build, errors out in case the generated filed 
>> don’t
>> exist while retaining the warning in case they do.
> 
> Not exactly convinced this is a good idea.  What if the files exist but
> are out of date?  

Wouldn’t that be the same as today if so?

> More generally, how much advantage is there really in
> failing at configure rather than build time?

The error reporting is clearer IMO but it’s a matter of taste.

> The subtext here is that I'm disinclined to load more behavior into
> configure while we're waiting to see if cmake conversion happens.
> That job is tough enough without the autoconf sources being more of
> a moving target than they have to be.

Fair enough, that’s a very valid argument.

Thanks!

cheers ./daniel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detect supported SET parameters when pg_restore is run

2016-09-27 Thread Vitaly Burovoy
On 9/27/16, Tom Lane  wrote:
> Vitaly Burovoy  writes:
>> On 9/27/16, Tom Lane  wrote:
>>> The general policy has always been that pg_dump output is only expected
>>> to
>>> restore without errors into a server that's the same or newer version as
>>> pg_dump (regardless of the server version being dumped from).
>
>> Why can't I use it if pg_dump92/pg_restore92 have the same behavior as
>> pg_dump96/pg_restore96 except the SET block?
>
> That's a pretty large "if", and not one I'm prepared to commit to.
> Before you assert that it's true, you might want to reflect on the
> size of the diff between 9.2 and 9.6 pg_dump (hint: it's over 20K
> lines in -u format).
>
>>> Either that, or the patch is overwriting pg_dump's idea of what the
>>> source server version is at the start of the output phase, which is
>>> likely to break all kinds of stuff when dumping from an older server.)
>
>> I agree, that's why I left current behavior as is for the plain text
>> output.
>
> I'm not exactly convinced that you did.  There's only one copy of
> Archive->remoteVersion, and you're overwriting it long before the
> dump process is over.

I'm sorry, I'm not so good at knowing the code base but I see that my
patch changes Archive->remoteVersion in the "RestoreArchive" which is
called after all schema is fetched to pg_dump's structs and just
before output is written, moreover, there is a comment that it is a
final stage (9.2 has the same block of code):
...
/*
 * And finally we can do the actual output.
 *...
 */
if (plainText)
RestoreArchive(fout);

CloseArchive(fout);

exit_nicely(0);
}

It does not seem that I'm "overwriting it long before the dump process
is over"... Also pg_dump -v proves that changing remoteVersion happens
after all "pg_dump: finding *", "pg_dump: reading *" and "pg_dump:
saving *".

> That'd break anything that consulted it to
> find the source server's version after RestoreArchive starts.

I'm sorry again, could you or anyone else point me what part of the
code use Archive->remoteVersion after schema is read?
I set up break point at the line
AHX->remoteVersion = 99;
and ran pg_dump with plain text output to a file (via "-f" option).
When the line was reached I set up break points at all lines I could
find by a script:

egrep -n -r --include '*.c' 'remoteVersion [><]' src/bin/pg_dump/ |
awk -F: '{print "b "$1":"$2}'

(there were 217 of them) and continued debuging. All three fired break
points were expectedly in _doSetFixedOutputState (at the lines where
my patch introduced them) after which the program exited normally
without stopping.

Also I wonder that the process of "restoring" consults
Archive->remoteVersion because currently in the pg_restore to plain
text output remoteVersion is zero. It means restoring process would
output schema to the oldest supported version, but is not so.

> I could get behind a patch that split remoteVersion into sourceVersion
> and targetVersion and made sure that all the existing references to
> remoteVersion were updated to the correct one of those.  After that
> we could do something like what you have here in a less shaky fashion.
> As Robert noted, there'd probably still be a long way to go before it'd
> really work to use a newer pg_dump with an older target version, but at
> least we'd not be building on quicksand.

It means support of restoring from a newer version to an older one.
What's for? If new features are used it is impossible to restore
(port) them to an older version, if they are not, restoring process is
done (i guess) in 99% of cases.

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] to_date_valid()

2016-09-27 Thread Tom Lane
Peter Eisentraut  writes:
> I think using ValidateDate() was the right idea.  That is what we use
> for checking date validity everywhere else.

Note that we've got two different CF entries, and threads, covering
fundamentally the same territory here, ie making to_timestamp et al
behave more sanely.  The other thread is
https://commitfest.postgresql.org/10/713/
https://www.postgresql.org/message-id/flat/1873520224.1784572.1465833145330.JavaMail.yahoo%40mail.yahoo.com

Robert pointed out several times in the other thread that to_timestamp
and friends were intended to be Oracle-compatible, which I agree with.
It was also pointed out that Oracle does range validity checks on the
timestamp component fields, which I verified just now using
http://rextester.com/l/oracle_online_compiler
(which is what I found by googling after discovering that sqlfiddle
isn't working very well, sigh).  I got these results:

select banner as "oracle version" from v$version
Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE11.2.0.2.0  Production
TNS for 64-bit Windows: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SELECT TO_TIMESTAMP('2016-06-13 99:99:99', '-MM-DD HH24:MI:SS') FROM dual
ORA-01850: hour must be between 0 and 23

SELECT TO_TIMESTAMP('2016-06-13 19:99:99', '-MM-DD HH24:MI:SS') FROM dual
ORA-01851: minutes must be between 0 and 59

SELECT TO_TIMESTAMP('2016-06-13 19:59:99', '-MM-DD HH24:MI:SS') FROM dual
ORA-01852: seconds must be between 0 and 59

SELECT TO_TIMESTAMP('2016-06-13 19:59:59', '-MM-DD HH24:MI:SS') FROM dual
13.06.2016 19:59:59

SELECT TO_TIMESTAMP('2016-06-13 19:59:59', '-MM-DD HH:MI:SS') FROM dual
ORA-01849: hour must be between 1 and 12

SELECT TO_TIMESTAMP('2016-02-30 19:59:59', '-MM-DD HH24:MI:SS') FROM dual
ORA-01839: date not valid for month specified

SELECT TO_TIMESTAMP('2016-02-29 19:59:59', '-MM-DD HH24:MI:SS') FROM dual
29.02.2016 19:59:59

SELECT TO_TIMESTAMP('2015-02-29 19:59:59', '-MM-DD HH24:MI:SS') FROM dual
ORA-01839: date not valid for month specified

I think this is sufficient evidence to show that we ought to change
to_timestamp et al. to apply range checking on the component fields.
And while I've not attempted to characterize exactly what Oracle
does with extra spaces, non-matching punctuation, etc, it's also
clear to me that their behavior is different and probably saner
than ours.

So I vote for fixing these functions to behave more like Oracle, and
forgetting about having a separate family of to_date_valid() functions
or optional parameters or anything of the sort.  I might've been in
favor of that, until I saw how far down the rabbit hole this thread
had gotten.  Let's just call it a bug and fix it.

Having range checking after the field scanning phase also changes the
terms of discussion about what we need to do in the scanning phase, since
it would catch many (of course not all) of the problems that arise from
field boundary issues.  So I think we should get the range changing
committed first and then fool with the scanner.

The 0002 patch that Artur sent for this purpose in
https://www.postgresql.org/message-id/22dbe4e0-b550-ca86-8634-adcda0faa...@postgrespro.ru
seems like the right approach to me, though I've not read it in detail
yet.  I propose to work through that and commit it.

I'm much less sold on his 0001 patch, but I tend to agree that what
we want is to adjust the scanner behavior.  I do not like the
reverse-conversion wrapper that Andreas proposes here; quite aside from
micro-efficiency issues, it seems to me that that just begs the question
of how you know what the input is supposed to mean.

In short, I think we should reject this thread + CF entry altogether
and instead move forward with the approach in the other thread.
It's probably too late in the September CF to close out the other
submission entirely, but we could get the range checking committed
and then RWF the other part for reconsideration later.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detect supported SET parameters when pg_restore is run

2016-09-27 Thread Tom Lane
Vitaly Burovoy  writes:
> On 9/27/16, Tom Lane  wrote:
>> I'm not exactly convinced that you did.  There's only one copy of
>> Archive->remoteVersion, and you're overwriting it long before the
>> dump process is over.

> It does not seem that I'm "overwriting it long before the dump process
> is over"...

There's a lot that happens during RestoreArchive.  Even if none of it
inspects remoteVersion today, I do not think that's a safe assumption to
make going forward.  The easiest counterexample is that this very bit of
code you want to add does so.  I really do not want to get into a design
that says "remoteVersion means the source server version until we reach
RestoreArchive, and the target version afterwards".  That way madness
lies.  If we're going to try altering the emitted SQL based on target
version, let's first create a separation between those concepts; otherwise
I will bet that we add more bugs than we remove.

(The other thing I'd want here is a --target-version option so that
you could get the same output alterations in pg_dump or pg_restore to
text.  Otherwise it's nigh undebuggable, and certainly much harder
to test than it needs to be.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-09-27 Thread David Steele

On 9/27/16 6:16 AM, Kyotaro HORIGUCHI wrote:


I apologize in advance that the comments in this message might
one of the ideas discarded in the past thread.. I might not grasp
the discussion completely X(

The attached patches are rebased to the master and additional one
mentioned below.


I tried the attached patch set and noticed an interesting behavior. 
With archive_timeout=5 whenever I made a change I would get a WAL 
segment within a few seconds as expected then another one would follow a 
few minutes later.


Database init:
16M Sep 27 20:05 00010001
16M Sep 27 20:09 00010002

Create test table:
16M Sep 27 20:13 00010003
16M Sep 27 20:15 00010004

Insert row into test table:
16M Sep 27 20:46 00010005
16M Sep 27 20:49 00010006

The cluster was completely idle with no sessions connected in between 
those three commands.  Is it possible this is caused by:


+* switch segment only when any substantial progress have made 
from
+* the last segment switching by timeout. Segment switching by 
other
+* reasons will cause last_xlog_switch_lsn stay behind but it 
doesn't
+* matter since it just causes possible one excessive segment 
switch.
 */

I would like to give Michael a chance to respond to the updated patches 
before delving deeper.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-27 Thread Tomas Vondra

On 09/26/2016 08:48 PM, Tomas Vondra wrote:

On 09/26/2016 07:16 PM, Tomas Vondra wrote:


The averages (over the 10 runs, 5 minute each) look like this:

 3.2.80 1  8 16 32 64128192

 granular-locking1567  12146  26341  44188  43263  49590  15042
 no-content-lock 1567  12180  25549  43787  43675  51800  16831
 group-update1550  12018  26121  44451  42734  51455  15504
 master  1566  12057  25457  42299  42513  42562  10462

 4.5.5  1  8 16 32 64128192

 granular-locking3018  19031  27394  29222  32032  34249  36191
 no-content-lock 2988  18871  27384  29260  32120  34456  36216
 group-update2960  18848  26870  29025  32078  34259  35900
 master  2984  18917  26430  29065  32119  33924  35897



So, I got the results from 3.10.101 (only the pgbench data), and it 
looks like this:


 3.10.101   1  8 16 32 64128192

 granular-locking2582  18492  33416  49583  53759  53572  51295
 no-content-lock 2580  18666  33860  49976  54382  54012  51549
 group-update2635  18877  33806  49525  54787  54117  51718
 master  2630  18783  33630  49451  54104  53199  50497

So 3.10.101 performs even better tnan 3.2.80 (and much better than 
4.5.5), and there's no sign any of the patches making a difference.


It also seems there's a major regression in the kernel, somewhere 
between 3.10 and 4.5. With 64 clients, 3.10 does ~54k transactions, 
while 4.5 does only ~32k - that's helluva difference.


I wonder if this might be due to running the benchmark on unlogged 
tables (and thus not waiting for WAL), but I don't see why that should 
result in such drop on a new kernel.


In any case, this seems like an issue unrelated to the patch, so I'll 
post further data into a new thread instead of hijacking this one.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add support for restrictive RLS policies

2016-09-27 Thread Stephen Frost
Jeevan,

* Jeevan Chalke (jeevan.cha...@enterprisedb.com) wrote:
> Here are the review comments:

[... many good comments ...]

Many thanks for the review, I believe I agree with pretty much all your
comments and will update the patch accordingly.

Responses for a couple of them are below.

> 6. I think following changes are irrelevant for this patch.
> Should be submitted as separate patch if required.

Those changes were adding support for tab completion of the new
restrictive / permissive options, which certainly seems appropriate for
the patch which is adding those options.  I realize it looks like a lot
for just two new options, but that's because there's a number of ways to
get to them.

> 10. ALTER POLICY has no changes for this. Any reason why that is not
> considered here.

Generally speaking, I don't believe it makes sense to flip a policy from
permissive to restrictive or vice-versa, they're really quite different
things.  We also don't support altering the "command" type for a policy.

> 12. Since PERMISSIVE is default, we should dump only "RESTRICTIVE" when
> appropriate, like other default cases.
> strcmp(polinfo->polpermissive,"t") == 0 ? "PERMISSIVE" : "RESTRICTIVE"

Sure, we could do that, though I suppose we'd want to do that for all of
the defaults for policies.

> 13. Since PERMISSIVE is default, do we need changes like below?
> -\QCREATE POLICY p1 ON test_table FOR ALL TO PUBLIC \E
> +\QCREATE POLICY p1 ON test_table AS PERMISSIVE FOR ALL TO
> PUBLIC \E
> 
> I am not familiar or used TAP framework. So no idea about these changes.

If we change pg_dump to not output AS PERMISSIVE for permissive
policies, then the TAP test will need to be updated to not include AS
PERMISSIVE (or FOR ALL TO PUBLIC, if we're going down that route).

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Detect supported SET parameters when pg_restore is run

2016-09-27 Thread Vitaly Burovoy
On 9/27/16, Tom Lane  wrote:
> Vitaly Burovoy  writes:
>> On 9/27/16, Tom Lane  wrote:
>>> I'm not exactly convinced that you did.  There's only one copy of
>>> Archive->remoteVersion, and you're overwriting it long before the
>>> dump process is over.
>
>> It does not seem that I'm "overwriting it long before the dump process
>> is over"...
>
> There's a lot that happens during RestoreArchive.  Even if none of it
> inspects remoteVersion today, I do not think that's a safe assumption to
> make going forward.

And... Khm... Note that even _now_ AHX->remoteVersion is set to a
database version pg_restore connects to... So all the code has it
during restoring process...

> The easiest counterexample is that this very bit of
> code you want to add does so.

The only change I've done is set remoteVersion to the maximum allowed
when output is the plain text format.

> I really do not want to get into a design
> that says "remoteVersion means the source server version until we reach
> RestoreArchive, and the target version afterwards".  That way madness lies.

It is only if you think about "remoteVersion" as
"sourceServerVersion", but even now it is not so.

Moreover RestoreArchive is a delimter only for pg_dump and only when
output is a plain text.
For other modes of the pg_dump RestoreArchive is not called at all.

> If we're going to try altering the emitted SQL based on target version,
> let's first create a separation between those concepts;

I've just found there is _archiveHandle.archiveRemoteVersion. Is it a
parameter you were searched for?
The pg_restore code does not use it (just as remoteVersion), but it
can do so if it is necessary.

> otherwise I will bet that we add more bugs than we remove.
>
> (The other thing I'd want here is a --target-version option so that
> you could get the same output alterations in pg_dump or pg_restore to
> text.  Otherwise it's nigh undebuggable, and certainly much harder
> to test than it needs to be.)

I thought that way. I'm ready to introduce that parameter, but again,
I see now it will influence only SET parameters. Does it worth it?

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support

2016-09-27 Thread Andres Freund
Hi,


On 2015-09-07 17:05:10 +0100, Greg Stark wrote:
> I feel like I remember hearing about this before but I can't find any
> mention of it in my mail archives. It seems pretty simple to add
> support for LLVM's Address Sanitizer (asan) by using the hooks we
> already have for valgrind.

Any plans to pick this up again?


> In fact I think this would actually be sufficient. I'm not sure what
> the MEMPOOL valgrind stuff is though. I don't think it's relevant to
> address sanitizer which only tracks references to free'd or
> unallocated pointers.

It'd be nice to add msan support, so uninitialized accesses are also
tracked. (oh, you suggest that below)


I vote for renaming the VALGRIND names etc. to something more
tool-neutral. I think it's going to be too confusing otherwise.


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GENERAL] inconsistent behaviour of set-returning functions in sub-query with random()

2016-09-27 Thread Tom Lane
Tom van Tilburg  writes:
> Good to know and I agree that it is not an urgent case.
> I think this practice might be more common in the POSTGIS community where
> there are plenty of set-returning-functions used in this way. My use was
> taking a random sample of a pointcloud distrubution.

Fix pushed to HEAD only.  Thanks for the report!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detect supported SET parameters when pg_restore is run

2016-09-27 Thread Vitaly Burovoy
On 9/27/16, Vitaly Burovoy  wrote:
> On 9/27/16, Tom Lane  wrote:
>> (The other thing I'd want here is a --target-version option so that
>> you could get the same output alterations in pg_dump or pg_restore to
>> text.  Otherwise it's nigh undebuggable, and certainly much harder
>> to test than it needs to be.)
>
> I thought that way. I'm ready to introduce that parameter, but again,
> I see now it will influence only SET parameters. Does it worth it?

The only reason I have not implemented it was attempt to avoid users
being confused who could think that result of pg_dump (we need it
there for the plain text output) or pg_restore can be converted for
target version to be restored without new features (but now it is
wrong).

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] hash_create(nelem = 0) does invalid memory accesses

2016-09-27 Thread Andres Freund
Hi,

debugging a citus valgrind bleat I noticed that hash_create() accesses
the result of palloc(0) as an hash element:
HTAB *
hash_create(const char *tabname, long nelem, HASHCTL *info, int flags)
{
...
if ((flags & HASH_SHARED_MEM) ||
nelem < hctl->nelem_alloc)
{
if (!element_alloc(hashp, (int) nelem))
ereport(ERROR,
(errcode(ERRCODE_OUT_OF_MEMORY),
 errmsg("out of memory")));
}
...}

I.e. e call element_alloc with nelem = 0. There we then do:
static bool
element_alloc(HTAB *hashp, int nelem)
{
...
firstElement = (HASHELEMENT *) hashp->alloc(nelem * elementSize);
...
firstElement->link = hctlv->freeList;
}

which means we'll write to the result of palloc(0).

Do we consider this an API usage error that we want to fix?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support

2016-09-27 Thread Greg Stark
On Tue, Sep 27, 2016 at 11:02 PM, Andres Freund  wrote:
> Any plans to pick this up again?

Yeah, I was just thinking I should pick this up again.

> I vote for renaming the VALGRIND names etc. to something more tool-neutral. I 
> think it's going to be too confusing otherwise.

Hm, the danger there is once I start refactoring things I could get
bogged down... I would love to remove all the #ifdef's and have the
macros just be no-ops if they're compiled out for example...

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support

2016-09-27 Thread Andres Freund
On 2016-09-28 00:23:11 +0100, Greg Stark wrote:
> On Tue, Sep 27, 2016 at 11:02 PM, Andres Freund  wrote:
> > Any plans to pick this up again?
> 
> Yeah, I was just thinking I should pick this up again.
> 
> > I vote for renaming the VALGRIND names etc. to something more tool-neutral. 
> > I think it's going to be too confusing otherwise.
> 
> Hm, the danger there is once I start refactoring things I could get
> bogged down...

Meh, adding a neutral name seems pretty harmless.

> I would love to remove all the #ifdef's and have the
> macros just be no-ops if they're compiled out for example...

Don't we pretty much have that?

#ifdef USE_VALGRIND
#include 
#else
#define VALGRIND_CHECK_MEM_IS_DEFINED(addr, size)   do {} 
while (0)
#define VALGRIND_CREATE_MEMPOOL(context, redzones, zeroed)  do {} while (0)
#define VALGRIND_DESTROY_MEMPOOL(context)   
do {} while (0)
#define VALGRIND_MAKE_MEM_DEFINED(addr, size)   do {} 
while (0)
#define VALGRIND_MAKE_MEM_NOACCESS(addr, size)  do {} 
while (0)
#define VALGRIND_MAKE_MEM_UNDEFINED(addr, size) do {} 
while (0)
#define VALGRIND_MEMPOOL_ALLOC(context, addr, size) do {} 
while (0)
#define VALGRIND_MEMPOOL_FREE(context, addr)do {} 
while (0)
#define VALGRIND_MEMPOOL_CHANGE(context, optr, nptr, size)  do {} while (0)
#endif

?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support

2016-09-27 Thread Tom Lane
Andres Freund  writes:
> On 2016-09-28 00:23:11 +0100, Greg Stark wrote:
>> I would love to remove all the #ifdef's and have the
>> macros just be no-ops if they're compiled out for example...

> Don't we pretty much have that?

I think "((void) 0)" is a more common spelling of a dummy statement.
Though there's little wrong with it as it is.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hash_create(nelem = 0) does invalid memory accesses

2016-09-27 Thread Tom Lane
Andres Freund  writes:
> debugging a citus valgrind bleat I noticed that hash_create() accesses
> the result of palloc(0) as an hash element:
> Do we consider this an API usage error that we want to fix?

I think Assert(nelem > 0) would be an appropriate response.
There are probably issues in sizing the hashtable quite aside
from this one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support

2016-09-27 Thread Andres Freund
On 2016-09-27 19:31:57 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2016-09-28 00:23:11 +0100, Greg Stark wrote:
> >> I would love to remove all the #ifdef's and have the
> >> macros just be no-ops if they're compiled out for example...
> 
> > Don't we pretty much have that?
> 
> I think "((void) 0)" is a more common spelling of a dummy statement.
> Though there's little wrong with it as it is.

That's already committed code, I didn't propose to add it ;)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] should xlog_outdesc modify its argument?

2016-09-27 Thread Mark Dilger
The function

  static void xlog_outdesc(StringInfo buf, XLogReaderState *record);

in src/backend/access/transam/xlog.c is called by XLogInsertRecord,
and after returning a string describing an XLogRecord, it clears the
state data in its XLogReaderState argument.  That mixes the read-only
semantics of "give me a string that describes this argument" and the
read-write semantics of "clear out the value in this argument".  That
seems ok, except that xlog_outdesc is also called by the function

  static void rm_redo_error_callback(const void *arg);

also in xlog.c, which appears to only want the string description of the
argument, and not the side effect of clearing out the value.  Now,
perhaps under exceptional conditions it won't matter one way or the
other if the xlog record gets modified; perhaps it's going to be discarded
anyway.  I didn't look far enough to find out.  But this is the only function
of all the (void)(void *arg) callback functions in the entire postgresql
source tree that modifies its argument.  I discovered this while trying
to convert all the callback function signatures to (void)(const void *),
and this was the only monkey-wrench in the works.

Is this a bug?  Do I just have to live with the unfortunate lack of
orthogonality in the callback mechanisms?  At the very least, there
should perhaps be some documentation about how all this is intended
to work.

Thanks, please find my work-in-progress attempt and constify-ing
these functions attached.

Mark Dilger



callback.patch.wip.1
Description: Binary data






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] casting away const in comparators

2016-09-27 Thread Mark Dilger
Friends,

comparators usually take arguments like

  (const void *a, const void *b)

and do something read-only to a and b.  In our
sources, we typically cast these to something else,
like (char *), and do something read-only.  This 
generates a lot of warnings if using -Wcast-qual.
To fix that, I have converted the casts to not cast
away const.  Please find my changes, attached.

Mark Dilger



comparators.patch.1
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] typedef FileName not const?

2016-09-27 Thread Mark Dilger
Friends,

along the lines of other similar emails from me of late,
I tried to avoid casting away const when using the FileName
typedef.  There are several calls where a (const char *) has to
be cast to (char *) due to FileName being typedef'd as
non-const.  But changing the typedef to const doesn't seem to
conflict with any code in the source tree.

Since this may be seen as an external API change, I kept
these changes in their own patch submission, so that it can
be rejected separately if need be.

Mark Dilger



filename.patch.1
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] typedef FileName not const?

2016-09-27 Thread Andres Freund
Hi,


Can we please keep this topic in one thread? Anybody motivated to apply
these isn't going to have an easy time applying things, and everyone
else is just having a harder time sorting through the mails.

On 2016-09-27 17:08:24 -0700, Mark Dilger wrote:
> along the lines of other similar emails from me of late,
> I tried to avoid casting away const when using the FileName
> typedef.  There are several calls where a (const char *) has to
> be cast to (char *) due to FileName being typedef'd as
> non-const.  But changing the typedef to const doesn't seem to
> conflict with any code in the source tree.

I think the better fix here is to simply remove the typedef. It doesn't
seem to have much of a benefit, and makes using correct types harder as
demonstrated here. We don't even use it internally in fd.c..


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting huge pages on Windows

2016-09-27 Thread Tsunakawa, Takayuki
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
> On Sun, Sep 25, 2016 at 10:45 PM, Tsunakawa, Takayuki
>  wrote:
> > Credit: This patch is based on Thomas Munro's one.
> 
> How are they different?

As Thomas mentioned, his patch (only win32_shmem.c) might not have been able to 
compile (though I didn't try.)  And it didn't have error processing or 
documentation.  I added error handling, documentation, comments, and a little 
bit of structural change.  The possibly biggest change, though it's only 
one-liner in pg_ctl.c, is additionally required.  I failed to include it in the 
first patch.  The attached patch includes that.


Regards
Takayuki Tsunakawa



win_large_page_v2.patch
Description: win_large_page_v2.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gratuitous casting away const

2016-09-27 Thread Mark Dilger
Friends,

here is another patch, this time fixing the casting away of const
in the regex code.

Mark Dilger



regex.patch.1
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tracking wait event for latches

2016-09-27 Thread Thomas Munro
On Mon, Sep 26, 2016 at 7:07 PM, Michael Paquier
 wrote:
> On Mon, Sep 26, 2016 at 1:46 PM, Thomas Munro
>  wrote:
>> If the class really is strictly implied by the WaitEventIdentifier,
>> then do we really need to supply it everywhere when calling the
>> various wait functions?  That's going to be quite a few functions:
>> WaitLatch, WaitLatchOrSocket, WaitEventSetWait for now, and if some
>> more patches out there have legs then also ConditionVariableWait,
>> BarrierWait, and possibly further kinds of wait points.  And then all
>> their call sites will have have to supply the wait class ID, even
>> though it is implied by the other ID.  Perhaps that array that
>> currently holds the names should instead hold { classId, displayName }
>> so we could just look it up?
>
> I considered this reverse-engineering, but arrived at the conclusion
> that having a flexible API mattered more to give more flexibility to
> module developers. In short this avoids having extra class IDs like
> ActivityExtention, TimeoutExtension, etc. But perhaps that's just a
> matter of taste..

Ok, if they really are independent then shouldn't we take advantage of
that at call sites where we might be idle but we might also be waiting
for the network?  For example we could do this:

/* Sleep until something happens or we time out */
WaitLatchOrSocket(MyLatch, wakeEvents,
  MyProcPort->sock, sleeptime,
  pq_is_send_pending() ? WAIT_CLIENT :
WAIT_ACTIVITY,
  WE_WAL_SENDER_MAIN);

Isn't WE_WAL_SENDER_WAIT_WAL primarily WAIT_IPC, not WAIT_CLIENT?  Or
perhaps "pq_is_send_pending() ? WAIT_CLIENT : WAIT_IPC".

Actually, I'm still not sold on "Activity" and "Client".  I think
"Idle" and "Network" would be better.  Everybody knows intuitively
what "Idle" means.  "Network" is better than "Client" because it
avoids confusion about user applications vs replication connections or
clients vs servers.

+
+ 
+  Activity: The server process is waiting for some
+  activity to happen on a socket.  This is mainly used system processes
+  in their main processing loop.  wait_event will identify
+  the type of activity waited for.
+ 
+

"The server process is waiting for some activity to happen on a
socket."  Not true: could be a latch, or both.

I think what this category is really getting at is that the server
process is idle.  Everything in it could otherwise go in the IPC or
Client categories on the basis that it's mainly waiting for a socket
or a latch, but we're deciding to separate the wait points
representing "idleness" and put them here.

How about:  "The server process is idle.  This is used by system
processes waiting for activity in their main processing loop.
wait_event will identify the specific wait point."

+
+ 
+  Client: The server process is waiting for some activity
+  on a socket from a client process, and that the server expects
+  something to happen that is independent from its internal processes.
+  wait_event will identify the type of client activity
+  waited for.
+ 
+

Is it worth spelling out that "client process" here doesn't just mean
user applications, it's also remote PostgreSQL servers doing
replication?  "wait_event" doesn't really identify the type of client
activity waited for, it identifies the code that is waiting.

+
+ 
+  Extension: The server process is waiting for activity
+  in a plugin or an extension.  This category is useful for plugin
+  and module developers to track custom waiting points.
+ 
+

Plugin, extension, module?  How about just "extension"?  Why developers?

+
+ 
+  IPC: The server process is waiting for some activity
+  from an auxilliary process.  wait_event will identify
+  the type of activity waited for.
+ 
+

s/auxilliary/auxiliary/, but I wouldn't it be better to say something
more general like "from another process in the cluster"?  Background
workers are not generally called auxiliary processes, and some of
these wait points are waiting for those.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-09-27 Thread Michael Paquier
On Tue, Sep 27, 2016 at 6:24 PM, Masahiko Sawada  wrote:
> * Providing the prepare id of 2PC.
>   Current patch adds new API prepare_id_provider() but we can use the
> prepare id of 2PC that is used on parent server.

And we assume that when this is used across many servers there will be
no GID conflict because each server is careful enough to generate
unique strings, say with UUIDs?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] typedef FileName not const?

2016-09-27 Thread Mark Dilger

> I think the better fix here is to simply remove the typedef. It doesn't
> seem to have much of a benefit, and makes using correct types harder as
> demonstrated here. We don't even use it internally in fd.c..
> 

Fine by me.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] make async slave to wait for lsn to be replayed

2016-09-27 Thread Thomas Munro
On Thu, Sep 15, 2016 at 2:41 PM, Thomas Munro
 wrote:
> On Thu, Sep 1, 2016 at 2:16 AM, Ivan Kartyshov
>  wrote:
>> Hi hackers,
>>
>> Few days earlier I've finished my work on WAITLSN statement utility, so I’d
>> like to share it.
>> [...]
>> Your feedback is welcome!
>>
>> [waitlsn_10dev.patch]
>
> Thanks for working on this.  Here are some general thoughts on the
> feature, and an initial review.

Hi Ivan

I'm marking the patch Returned with Feedback, since there hasn't been
any response or a new patch.  I encourage you to keep working on this
feature, and I'll be happy to review future patches.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] sloppy handling of pointers

2016-09-27 Thread Mark Dilger
rangetypes_spgist.c contains a call to detoast a pointer which was
just returned from a detoast operation.  This seems to do no harm,
but is to my mind a sloppy thinko.  Fix attached.

tsgistidx.c contains a detoast of a pointer where detoast of a datum
was likely intended.  Fix attached.

tsrank.c contains some arguably correct casts which I found slightly
less correct than an alternative that I've attached.  I'm pretty indifferent
on this one, and just as happy if it is not included.

Mark Dilger



tidy.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-09-27 Thread Tsunakawa, Takayuki
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
> Allowing SIGQUIT to prompt fast shutdown of the stats collector seems sane,
> though.  Try to make sure it doesn't leave partly-written stats files
> behind.

The attached patch based on HEAD does this.  I'd like this to be back-patched 
because one of our important customers uses 9.2.

I didn't remove partially written stat files on SIGQUIT for the following 
reasons.  Is this OK?

1. The recovery at the next restart will remove the stat files.
2. SIGQUIT processing should be as fast as possible.
3. If writing stats files took long due to the OS page cache flushing, removing 
files might be forced to wait likewise.


> FWIW, I'm pretty much -1 on messing with the timing of the socket close
> actions.  I broke that once within recent memory, so maybe I'm gun-shy,
> but I think that the odds of unpleasant side effects greatly outweigh any
> likely benefit there.

Wasn't it related to TouchSocketFiles()?  Can I see the discussion on this ML?  
I don't see any problem looking at the code...

Regards
Takayuki Tsunakawa





01_pgstat_avoid_writing_on_sigquit.patch
Description: 01_pgstat_avoid_writing_on_sigquit.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detect supported SET parameters when pg_restore is run

2016-09-27 Thread Pavel Stehule
2016-09-27 23:12 GMT+02:00 Tom Lane :

> Vitaly Burovoy  writes:
> > On 9/27/16, Tom Lane  wrote:
> >> I'm not exactly convinced that you did.  There's only one copy of
> >> Archive->remoteVersion, and you're overwriting it long before the
> >> dump process is over.
>
> > It does not seem that I'm "overwriting it long before the dump process
> > is over"...
>
> There's a lot that happens during RestoreArchive.  Even if none of it
> inspects remoteVersion today, I do not think that's a safe assumption to
> make going forward.  The easiest counterexample is that this very bit of
> code you want to add does so.  I really do not want to get into a design
> that says "remoteVersion means the source server version until we reach
> RestoreArchive, and the target version afterwards".  That way madness
> lies.  If we're going to try altering the emitted SQL based on target
> version, let's first create a separation between those concepts; otherwise
> I will bet that we add more bugs than we remove.
>
> (The other thing I'd want here is a --target-version option so that
> you could get the same output alterations in pg_dump or pg_restore to
> text.  Otherwise it's nigh undebuggable, and certainly much harder
> to test than it needs to be.)
>

This options likes like very good idea.

Regards

Pavel


>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Supporting huge pages on Windows

2016-09-27 Thread Thomas Munro
On Wed, Sep 28, 2016 at 1:26 PM, Tsunakawa, Takayuki
 wrote:
>> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
>> On Sun, Sep 25, 2016 at 10:45 PM, Tsunakawa, Takayuki
>>  wrote:
>> > Credit: This patch is based on Thomas Munro's one.
>>
>> How are they different?
>
> As Thomas mentioned, his patch (only win32_shmem.c) might not have been able 
> to compile (though I didn't try.)  And it didn't have error processing or 
> documentation.  I added error handling, documentation, comments, and a little 
> bit of structural change.  The possibly biggest change, though it's only 
> one-liner in pg_ctl.c, is additionally required.  I failed to include it in 
> the first patch.  The attached patch includes that.

Right, my patch[1] was untested sketch code.  I had heard complaints
about poor performance with large shared_buffers on Windows, and then
I bumped into some recommendations to turn large pages on for a couple
of other RDBMSs if using large buffer pool.  So I wrote that patch
based on the documentation to try some time in the future if I ever
got trapped in a room with Windows, but I posted it just in case the
topic would interest other hackers.  Thanks for picking it up!

>  huge_pages=off: 70412 tps
>  huge_pages=on : 72100 tps

Hmm.  I guess it could be noise or random code rearrangement effects.
I saw your recent post[2] proposing to remove the sentence about the
512MB effective limit and I wondered why you didn't go to larger sizes
with a larger database and more run time.  But I will let others with
more benchmarking experience comment on the best approach to
investigate Windows shared_buffers performance.

[1] 
https://www.postgresql.org/message-id/CAEepm=075-bghi_vdt4scamt+o_+1xarap2zh7xwfzvt294...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1F5EE995@G01JPEXMBYT05

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Requesting some information about the small portion of source code of postgreSQL

2016-09-27 Thread Ashutosh Bapat
On Tue, Sep 27, 2016 at 11:43 AM, Srinivas Karthik V
 wrote:
> Dear PostgresSQL Hackers,
>I am working in optimizer module of postgreSQL 9.4.1. I am trying to
> return a subplan for a query instead of full plan.For this I need to return
> an intermediate plan (or path) from the DP lattice (i.e.  from RelOptInfo
> *standard_join_search() at allpaths.c) instead of the full optimal plan
> (which is from the last level of root->join_rel_level()).

Why do you want to do that?

> while doing so I am getting error :variable not found in subplan target
> lists at function fix_join_expr_mutator at setrefs.c.
>
> It will be great if you can give me some idea about why this error is
> happening, since this error is happening at fix_join_expr_mutator function
> at setrefs.c. Please give me more information about this portion of code.
> Also I would like to know for what targetlist stands for.
>

setref.c implements the final stage of planner, where the
variable/expression references in the parent plan nodes are fixed by
pointing them to expressions/parts of expressions/variables from their
children. This error means that some of the things that parent node is
referring to are not available in the children as expected.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some information about the small portion of source code of postgreSQL

2016-09-27 Thread Ashutosh Bapat
This mail looks exactly same as that from Shrinivas. Please check answers there.

On Tue, Sep 27, 2016 at 11:32 AM, davinder singh
 wrote:
> Hi,
>I am working in optimizer module of postgreSQL 9.4.1. My project is
> concerned with the plan returned by the optimizer.
>
> I am trying to return a subplan for a query instead of full plan. For this I
> need to return a path form RelOptInfo *standard_join_search() at allpaths.c
> other than from last level of root->join_rel_level().
> while doing so I am getting error :variable not found in subplan target
> lists at function fix_join_expr_mutator at setrefs.c.
>
> It will be great if you can give me some idea about why this error is
> happening,since this error is happening at above function, Please give me
> more information about that portion of code. Also I want to know for what
> targetlist stands for.
>
>
> Thanks,
> Davinder Singh



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous execution

2016-09-27 Thread Amit Khandekar
On 24 September 2016 at 06:39, Robert Haas  wrote:
> Since Kyotaro Horiguchi found that my previous design had a
> system-wide performance impact due to the ExecProcNode changes, I
> decided to take a different approach here: I created an async
> infrastructure where both the requestor and the requestee have to be
> specifically modified to support parallelism, and then modified Append
> and ForeignScan to cooperate using the new interface.  Hopefully that
> means that anything other than those two nodes will suffer no
> performance impact.  Of course, it might have other problems

I see that the reason why you re-designed the asynchronous execution
implementation is because the earlier implementation showed
performance degradation in local sequential and local parallel scans.
But I checked that the ExecProcNode() changes were not that
significant as to cause the degradation. It will not call
ExecAsyncWaitForNode() unless that node supports asynchronism. Do you
feel there is anywhere else in the implementation that is really
causing this degrade ? That previous implementation has some issues,
but they seemed solvable. We could resolve the plan state recursion
issue by explicitly making sure the same plan state does not get
called again while it is already executing.

Thanks
-Amit Khandekar


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: pg_ctl: Detect current standby state from pg_control

2016-09-27 Thread Michael Paquier
On Tue, Sep 27, 2016 at 9:55 AM, Michael Paquier
 wrote:
> Seems overcomplicated to me. How about returning the control file all
> the time and let the caller pfree the result? You could then use
> crc_ok in pg_ctl.c's get_control_dbstate() to do the decision-making.

In short I would just go with the attached and call it a day.
-- 
Michael
diff --git a/src/backend/utils/misc/pg_controldata.c b/src/backend/utils/misc/pg_controldata.c
index 4f9de83..0c67833 100644
--- a/src/backend/utils/misc/pg_controldata.c
+++ b/src/backend/utils/misc/pg_controldata.c
@@ -34,6 +34,7 @@ pg_control_system(PG_FUNCTION_ARGS)
 	TupleDesc	tupdesc;
 	HeapTuple	htup;
 	ControlFileData *ControlFile;
+	bool		crc_ok;
 
 	/*
 	 * Construct a tuple descriptor for the result row.  This must match this
@@ -51,8 +52,8 @@ pg_control_system(PG_FUNCTION_ARGS)
 	tupdesc = BlessTupleDesc(tupdesc);
 
 	/* read the control file */
-	ControlFile = get_controlfile(DataDir, NULL);
-	if (!ControlFile)
+	ControlFile = get_controlfile(DataDir, NULL, &crc_ok);
+	if (!crc_ok)
 		ereport(ERROR,
 (errmsg("calculated CRC checksum does not match value stored in file")));
 
@@ -83,6 +84,7 @@ pg_control_checkpoint(PG_FUNCTION_ARGS)
 	ControlFileData *ControlFile;
 	XLogSegNo	segno;
 	char		xlogfilename[MAXFNAMELEN];
+	bool		crc_ok;
 
 	/*
 	 * Construct a tuple descriptor for the result row.  This must match this
@@ -130,8 +132,8 @@ pg_control_checkpoint(PG_FUNCTION_ARGS)
 	tupdesc = BlessTupleDesc(tupdesc);
 
 	/* Read the control file. */
-	ControlFile = get_controlfile(DataDir, NULL);
-	if (!ControlFile)
+	ControlFile = get_controlfile(DataDir, NULL, &crc_ok);
+	if (!crc_ok)
 		ereport(ERROR,
 (errmsg("calculated CRC checksum does not match value stored in file")));
 
@@ -216,6 +218,7 @@ pg_control_recovery(PG_FUNCTION_ARGS)
 	TupleDesc	tupdesc;
 	HeapTuple	htup;
 	ControlFileData *ControlFile;
+	bool		crc_ok;
 
 	/*
 	 * Construct a tuple descriptor for the result row.  This must match this
@@ -235,8 +238,8 @@ pg_control_recovery(PG_FUNCTION_ARGS)
 	tupdesc = BlessTupleDesc(tupdesc);
 
 	/* read the control file */
-	ControlFile = get_controlfile(DataDir, NULL);
-	if (!ControlFile)
+	ControlFile = get_controlfile(DataDir, NULL, &crc_ok);
+	if (!crc_ok)
 		ereport(ERROR,
 (errmsg("calculated CRC checksum does not match value stored in file")));
 
@@ -268,6 +271,7 @@ pg_control_init(PG_FUNCTION_ARGS)
 	TupleDesc	tupdesc;
 	HeapTuple	htup;
 	ControlFileData *ControlFile;
+	bool		crc_ok;
 
 	/*
 	 * Construct a tuple descriptor for the result row.  This must match this
@@ -303,8 +307,8 @@ pg_control_init(PG_FUNCTION_ARGS)
 	tupdesc = BlessTupleDesc(tupdesc);
 
 	/* read the control file */
-	ControlFile = get_controlfile(DataDir, NULL);
-	if (!ControlFile)
+	ControlFile = get_controlfile(DataDir, NULL, &crc_ok);
+	if (!crc_ok)
 		ereport(ERROR,
 (errmsg("calculated CRC checksum does not match value stored in file")));
 
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index e92feab..20077a6 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -86,6 +86,7 @@ int
 main(int argc, char *argv[])
 {
 	ControlFileData *ControlFile;
+	bool		crc_ok;
 	char	   *DataDir = NULL;
 	time_t		time_tmp;
 	char		pgctime_str[128];
@@ -155,8 +156,8 @@ main(int argc, char *argv[])
 	}
 
 	/* get a copy of the control file */
-	ControlFile = get_controlfile(DataDir, progname);
-	if (!ControlFile)
+	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
+	if (!crc_ok)
 		printf(_("WARNING: Calculated CRC checksum does not match value stored in file.\n"
  "Either the file is corrupt, or it has a different layout than this program\n"
  "is expecting.  The results below are untrustworthy.\n\n"));
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 052b02e..ab10d2f 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -2147,28 +2147,18 @@ static DBState
 get_control_dbstate(void)
 {
 	DBState ret;
+	bool	crc_ok;
+	ControlFileData *control_file_data = get_controlfile(pg_data, progname, &crc_ok);
 
-	for (;;)
+	if (!crc_ok)
 	{
-		ControlFileData *control_file_data = get_controlfile(pg_data, progname);
-
-		if (control_file_data)
-		{
-			ret = control_file_data->state;
-			pfree(control_file_data);
-			return ret;
-		}
-
-		if (wait_seconds > 0)
-		{
-			pg_usleep(100);		/* 1 sec */
-			wait_seconds--;
-			continue;
-		}
-
 		write_stderr(_("%s: control file appears to be corrupt\n"), progname);
 		exit(1);
 	}
+
+	ret = control_file_data->state;
+	pfree(control_file_data);
+	return ret;
 }
 
 
diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
index f218d25..822fda7 100644
--- a/src/common/controldata_utils.c
+++ b/src/common/controldata_utils.c
@@ -29,21 +29,24 @@
 #include "port/pg_crc32c.h"
 
 /*
- * get_controlfile(char *DataDir, const char *progname)
+ * get_controlfile(char *DataDir, const char 

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-09-27 Thread Masahiko Sawada
On Tue, Sep 27, 2016 at 9:06 PM, Ashutosh Bapat
 wrote:
> On Tue, Sep 27, 2016 at 2:54 PM, Masahiko Sawada  
> wrote:
>> On Mon, Sep 26, 2016 at 9:07 PM, Ashutosh Bapat
>>  wrote:
>>> On Mon, Sep 26, 2016 at 5:25 PM, Masahiko Sawada  
>>> wrote:
 On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat
  wrote:
> My original patch added code to manage the files for 2 phase
> transactions opened by the local server on the remote servers. This
> code was mostly inspired from the code in twophase.c which manages the
> file for prepared transactions. The logic to manage 2PC files has
> changed since [1] and has been optimized. One of the things I wanted
> to do is see, if those optimizations are applicable here as well. Have
> you considered that?
>
>

 Yeah, we're considering it.
 After these changes are committed, we will post the patch incorporated
 these changes.

 But what we need to do first is the discussion in order to get consensus.
 Since current design of this patch is to transparently execute DCL of
 2PC on foreign server, this code changes lot of code and is
 complicated.
>>>
>>> Can you please elaborate. I am not able to understand what DCL is
>>> involved here. According to [1], examples of DCL are GRANT and REVOKE
>>> command.
>>
>> I meant transaction management command such as PREPARE TRANSACTION and
>> COMMIT/ABORT PREPARED command.
>> The web page I refered might be wrong, sorry.
>>
 Another approach I have is to push down DCL to only foreign servers
 that support 2PC protocol, which is similar to DML push down.
 This approach would be more simpler than current idea and is easy to
 use by distributed transaction manager.
>>>
>>> Again, can you please elaborate, how that would be different from the
>>> current approach and how does it simplify the code.
>>>
>>
>> The idea is just to push down PREPARE TRANSACTION, COMMIT/ROLLBACK
>> PREPARED to foreign servers that support 2PC.
>> With this idea, the client need to do following operation when foreign
>> server is involved with transaction.
>>
>> BEGIN;
>> UPDATE parent_table SET ...; -- update including foreign server
>> PREPARE TRANSACTION 'xact_id';
>> COMMIT PREPARED 'xact_id';
>>
>> The above PREPARE TRANSACTION and COMMIT PREPARED command are pushed
>> down to foreign server.
>> That is, the client needs to execute PREPARE TRANSACTION and
>>
>> In this idea, I think that we don't need to do followings,
>>
>> * Providing the prepare id of 2PC.
>>   Current patch adds new API prepare_id_provider() but we can use the
>> prepare id of 2PC that is used on parent server.
>>
>> * Keeping track of status of foreign servers.
>>   Current patch keeps track of status of foreign servers involved with
>> transaction but this idea is just to push down transaction management
>> command to foreign server.
>>   So I think that we no longer need to do that.
>
>> COMMIT/ROLLBACK PREPARED explicitly.
>
> The problem with this approach is same as one previously stated. If
> the connection between local and foreign server is lost between
> PREPARE and COMMIT the prepared transaction on the foreign server
> remains dangling, none other than the local server knows what to do
> with it and the local server has lost track of the prepared
> transaction on the foreign server. So, just pushing down those
> commands doesn't work.

Yeah, my idea is one of the first step.
Mechanism that resolves the dangling foreign transaction and the
resolver worker process are necessary.

>>
>> * Adding max_prepared_foreign_transactions parameter.
>>   It means that the number of transaction involving foreign server is
>> the same as max_prepared_transactions.
>>
>
> That isn't true exactly. max_prepared_foreign_transactions indicates
> how many transactions can be prepared on the foreign server, which in
> the method you propose should have a cap of max_prepared_transactions
> * number of foreign servers.

Oh, I understood, thanks.

Consider sharding solution using postgres_fdw (that is, the parent
postgres server has multiple shard postgres servers), we need to
increase max_prepared_foreign_transactions whenever new shard server
is added to cluster, or to allocate enough size in advance. But the
estimation of enough max_prepared_foreign_transactions would not be
easy, for example can we estimate it by (max throughput of the system)
* (the number of foreign servers)?

One new idea I came up with is that we set transaction id on parent
server to global transaction id (gid) that is prepared on shard
server.
And pg_fdw_resolver worker process periodically resolves the dangling
transaction on foreign server by comparing active lowest XID on parent
server with the XID in gid used by PREPARE TRANSACTION.

For example, suppose that there are one parent server and one shard
server, and the client executes update transaction (XID = 100)
involving foreign servers.
In commit phase, parent server executes PR

Re: [HACKERS] Tracking wait event for latches

2016-09-27 Thread Michael Paquier
On Wed, Sep 28, 2016 at 9:39 AM, Thomas Munro
 wrote:
> Ok, if they really are independent then shouldn't we take advantage of
> that at call sites where we might be idle but we might also be waiting
> for the network?  For example we could do this:
>
> /* Sleep until something happens or we time out */
> WaitLatchOrSocket(MyLatch, wakeEvents,
>   MyProcPort->sock, sleeptime,
>   pq_is_send_pending() ? WAIT_CLIENT :
> WAIT_ACTIVITY,
>   WE_WAL_SENDER_MAIN);

Yes, we can do fancy things with this API. Extensions could do the
same, the important thing being to have generic enough event
categories (take class).

> Isn't WE_WAL_SENDER_WAIT_WAL primarily WAIT_IPC, not WAIT_CLIENT?  Or
> perhaps "pq_is_send_pending() ? WAIT_CLIENT : WAIT_IPC".
>
> Actually, I'm still not sold on "Activity" and "Client".  I think
> "Idle" and "Network" would be better.  Everybody knows intuitively
> what "Idle" means.  "Network" is better than "Client" because it
> avoids confusion about user applications vs replication connections or
> clients vs servers.

Personally I am buying the argument of Robert instead. The class of an
event means what it is waiting for, not in which state in it waiting
for something. "Idle" means that the process *is* idle, not that it is
waiting for something.

> +
> + 
> +  Activity: The server process is waiting for some
> +  activity to happen on a socket.  This is mainly used system 
> processes
> +  in their main processing loop.  wait_event will 
> identify
> +  the type of activity waited for.
> + 
> +
>
> "The server process is waiting for some activity to happen on a
> socket."  Not true: could be a latch, or both.
>
> I think what this category is really getting at is that the server
> process is idle.  Everything in it could otherwise go in the IPC or
> Client categories on the basis that it's mainly waiting for a socket
> or a latch, but we're deciding to separate the wait points
> representing "idleness" and put them here.
>
> How about:  "The server process is idle.  This is used by system
> processes waiting for activity in their main processing loop.
> wait_event will identify the specific wait point."

You have a better way to word things than I do.

> +
> + 
> +  Client: The server process is waiting for some activity
> +  on a socket from a client process, and that the server expects
> +  something to happen that is independent from its internal 
> processes.
> +  wait_event will identify the type of client activity
> +  waited for.
> + 
> +
>
> Is it worth spelling out that "client process" here doesn't just mean
> user applications, it's also remote PostgreSQL servers doing
> replication?  "wait_event" doesn't really identify the type of client
> activity waited for, it identifies the code that is waiting.

Okay, switched to "user applications", and precised that this is a
wait point that wait_event tracks.

> +
> + 
> +  Extension: The server process is waiting for activity
> +  in a plugin or an extension.  This category is useful for plugin
> +  and module developers to track custom waiting points.
> + 
> +
>
> Plugin, extension, module?  How about just "extension"?  Why developers?

Let's say "extension module", this is used in a couple of places in the docs.


> +
> + 
> +  IPC: The server process is waiting for some activity
> +  from an auxilliary process.  wait_event will identify
> +  the type of activity waited for.
> + 
> +
>
> s/auxilliary/auxiliary/, but I wouldn't it be better to say something
> more general like "from another process in the cluster"?  Background
> workers are not generally called auxiliary processes, and some of
> these wait points are waiting for those.

There was the same typo in latch.h, still "from another process" looks better.
-- 
Michael
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 8ca1c1c..9222b73 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -17,6 +17,7 @@
 #include "access/xact.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "storage/latch.h"
 #include "utils/hsearch.h"
 #include "utils/memutils.h"
@@ -496,7 +497,9 @@ pgfdw_get_result(PGconn *conn, const char *query)
 			wc = WaitLatchOrSocket(MyLatch,
    WL_LATCH_SET | WL_SOCKET_READABLE,
    PQsocket(conn),
-   -1L);
+   -1L,
+   WAIT_EXTENSION,
+   WE_EXTENSION);
 			ResetLatch(MyLatch);
 
 			CHECK_FOR_INTERRUPTS();
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index f400785..bb975c1 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgm

Re: [HACKERS] Some information about the small portion of source code of postgreSQL

2016-09-27 Thread Amit Langote
On 2016/09/28 13:30, Ashutosh Bapat wrote:
> This mail looks exactly same as that from Shrinivas. Please check answers 
> there.

Here is a link to the discussion thread that Ashutosh is referring to:

https://www.postgresql.org/message-id/flat/CAFjFpRfxJJJGNhtQaS2CQ7Boyfo88nu-45JcNKeREUbQUPxOEw%40mail.gmail.com

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] should xlog_outdesc modify its argument?

2016-09-27 Thread Heikki Linnakangas

On 09/28/2016 02:35 AM, Mark Dilger wrote:

The function

  static void xlog_outdesc(StringInfo buf, XLogReaderState *record);

in src/backend/access/transam/xlog.c is called by XLogInsertRecord,
and after returning a string describing an XLogRecord, it clears the
state data in its XLogReaderState argument.  That mixes the read-only
semantics of "give me a string that describes this argument" and the
read-write semantics of "clear out the value in this argument".


I don't see where the "clears the state data" is happening. Can you 
elaborate?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-09-27 Thread Ashutosh Bapat
On Wed, Sep 28, 2016 at 10:43 AM, Masahiko Sawada  wrote:
> On Tue, Sep 27, 2016 at 9:06 PM, Ashutosh Bapat
>  wrote:
>> On Tue, Sep 27, 2016 at 2:54 PM, Masahiko Sawada  
>> wrote:
>>> On Mon, Sep 26, 2016 at 9:07 PM, Ashutosh Bapat
>>>  wrote:
 On Mon, Sep 26, 2016 at 5:25 PM, Masahiko Sawada  
 wrote:
> On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat
>  wrote:
>> My original patch added code to manage the files for 2 phase
>> transactions opened by the local server on the remote servers. This
>> code was mostly inspired from the code in twophase.c which manages the
>> file for prepared transactions. The logic to manage 2PC files has
>> changed since [1] and has been optimized. One of the things I wanted
>> to do is see, if those optimizations are applicable here as well. Have
>> you considered that?
>>
>>
>
> Yeah, we're considering it.
> After these changes are committed, we will post the patch incorporated
> these changes.
>
> But what we need to do first is the discussion in order to get consensus.
> Since current design of this patch is to transparently execute DCL of
> 2PC on foreign server, this code changes lot of code and is
> complicated.

 Can you please elaborate. I am not able to understand what DCL is
 involved here. According to [1], examples of DCL are GRANT and REVOKE
 command.
>>>
>>> I meant transaction management command such as PREPARE TRANSACTION and
>>> COMMIT/ABORT PREPARED command.
>>> The web page I refered might be wrong, sorry.
>>>
> Another approach I have is to push down DCL to only foreign servers
> that support 2PC protocol, which is similar to DML push down.
> This approach would be more simpler than current idea and is easy to
> use by distributed transaction manager.

 Again, can you please elaborate, how that would be different from the
 current approach and how does it simplify the code.

>>>
>>> The idea is just to push down PREPARE TRANSACTION, COMMIT/ROLLBACK
>>> PREPARED to foreign servers that support 2PC.
>>> With this idea, the client need to do following operation when foreign
>>> server is involved with transaction.
>>>
>>> BEGIN;
>>> UPDATE parent_table SET ...; -- update including foreign server
>>> PREPARE TRANSACTION 'xact_id';
>>> COMMIT PREPARED 'xact_id';
>>>
>>> The above PREPARE TRANSACTION and COMMIT PREPARED command are pushed
>>> down to foreign server.
>>> That is, the client needs to execute PREPARE TRANSACTION and
>>>
>>> In this idea, I think that we don't need to do followings,
>>>
>>> * Providing the prepare id of 2PC.
>>>   Current patch adds new API prepare_id_provider() but we can use the
>>> prepare id of 2PC that is used on parent server.
>>>
>>> * Keeping track of status of foreign servers.
>>>   Current patch keeps track of status of foreign servers involved with
>>> transaction but this idea is just to push down transaction management
>>> command to foreign server.
>>>   So I think that we no longer need to do that.
>>
>>> COMMIT/ROLLBACK PREPARED explicitly.
>>
>> The problem with this approach is same as one previously stated. If
>> the connection between local and foreign server is lost between
>> PREPARE and COMMIT the prepared transaction on the foreign server
>> remains dangling, none other than the local server knows what to do
>> with it and the local server has lost track of the prepared
>> transaction on the foreign server. So, just pushing down those
>> commands doesn't work.
>
> Yeah, my idea is one of the first step.
> Mechanism that resolves the dangling foreign transaction and the
> resolver worker process are necessary.
>
>>>
>>> * Adding max_prepared_foreign_transactions parameter.
>>>   It means that the number of transaction involving foreign server is
>>> the same as max_prepared_transactions.
>>>
>>
>> That isn't true exactly. max_prepared_foreign_transactions indicates
>> how many transactions can be prepared on the foreign server, which in
>> the method you propose should have a cap of max_prepared_transactions
>> * number of foreign servers.
>
> Oh, I understood, thanks.
>
> Consider sharding solution using postgres_fdw (that is, the parent
> postgres server has multiple shard postgres servers), we need to
> increase max_prepared_foreign_transactions whenever new shard server
> is added to cluster, or to allocate enough size in advance. But the
> estimation of enough max_prepared_foreign_transactions would not be
> easy, for example can we estimate it by (max throughput of the system)
> * (the number of foreign servers)?
>
> One new idea I came up with is that we set transaction id on parent
> server to global transaction id (gid) that is prepared on shard
> server.
> And pg_fdw_resolver worker process periodically resolves the dangling
> transaction on foreign server by comparing active lowest XID on parent
> server with the XID in gid used by PREPARE TRANSACTION.
>
> For exa

Re: [HACKERS] Supporting huge pages on Windows

2016-09-27 Thread Tsunakawa, Takayuki
From: Thomas Munro [mailto:thomas.mu...@enterprisedb.com]
> >  huge_pages=off: 70412 tps
> >  huge_pages=on : 72100 tps
> 
> Hmm.  I guess it could be noise or random code rearrangement effects.

I'm not the difference was a random noise, because running multiple set of 
three runs of pgbench (huge_pages = on, off, on, off, on...) produced similar 
results.  But I expected a bit greater improvement, say, +10%.  There may be 
better benchmark model where the large page stands out, but I think pgbench is 
not so bad because its random data access would cause TLB cache misses.



> I saw your recent post[2] proposing to remove the sentence about the 512MB
> effective limit and I wondered why you didn't go to larger sizes with a
> larger database and more run time.  But I will let others with more
> benchmarking experience comment on the best approach to investigate Windows
> shared_buffers performance.

Yes, I could have gone to 8GB of shared_buffers because my PC has 16GB of RAM, 
but I felt the number of variations was sufficient.  Anyway, positive comments 
on benchmarking would be appreciated.

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tracking wait event for latches

2016-09-27 Thread Thomas Munro
On Wed, Sep 28, 2016 at 6:25 PM, Michael Paquier
 wrote:
> wait-event-set-v8.patch

Ok, I'm just about ready to mark this as 'Ready for Committer'.  Just
a couple of things:

+ pgstat_report_wait_start((uint8) classId, (uint16) eventId);

Unnecessary casts.

+
+ Client
+ SecureRead
+ Waiting to read data from a secure connection.
+
+
+ SecureWrite
+ Waiting to write data to a secure connection.
+

I think we want to drop the word 'secure' from the description lines
in this patch.  Then I think we plan to make a separate patch that
will rename the functions themselves and the corresponding wait points
to something more generic?

I'm assuming that my suggestions for making WE_WAL_SENDER_WAIT_WAL and
WE_WAL_SENDER_MAIN have a dynamically chosen class ID would also be
material for another patch, but it doesn't matter much because those
processes won't show up yet anyway.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >