Re: COPY FROM WHEN condition

2018-11-01 Thread Pavel Stehule
pá 2. 11. 2018 v 3:57 odesílatel Corey Huinker 
napsal:

> > Are you thinking something like having a COPY command that provides
>> > results in such a way that they could be referenced in a FROM clause
>> > (perhaps a COPY that defines a cursor…)?
>>
>> That would also be nice, but what I was thinking of was that some
>> highly restricted subset of cases of SQL in general could lend
>> themselves to levels of optimization that would be impractical in
>> other contexts.
>>
>
> If COPY (or a syntactical equivalent) can return a result set, then the
> whole of SQL is available to filter and aggregate the results and we don't
> have to invent new syntax, or endure confusion whenCOPY-WHEN syntax behaves
> subtly different from a similar FROM-WHERE.
>
> Also, what would we be saving computationally? The whole file (or program
> output) has to be consumed no matter what, the columns have to be parsed no
> matter what. At least some of the columns have to be converted to their
> assigned datatypes enough to know whether or not to filter the row, but we
> might be able push that logic inside a copy. I'm thinking of something like
> this:
>
> SELECT x.a, sum(x.b)
> FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b
> numeric, c text, d date, e json) )
> WHERE x.d >= '2018-11-01'
>
>
Without some special feature this example is not extra useful. It is based
on copy on server that can use only super user with full rights.

What should be benefit of this feature?

Regards

Pavel


> In this case, there is the *opportunity* to see the following
> optimizations:
> - columns c and e are never referenced, and need never be turned into a
> datum (though we might do so just to confirm that they conform to the data
> type)
> - if column d is converted first, we can filter on it and avoid converting
> columns a,b
> - whatever optimizations we can infer from knowing that the two surviving
> columns will go directly into an aggregate
>
> If we go this route, we can train the planner to notice other
> optimizations and add those mechanisms at that time, and then existing code
> gets faster.
>
> If we go the COPY-WHEN route, then we have to make up new syntax for every
> possible future optimization.
>


Re: Getting ERROR: could not open file "base/13164/t3_16388" with partition table with ON COMMIT

2018-11-01 Thread Michael Paquier
On Fri, Nov 02, 2018 at 02:18:04PM +0900, Michael Paquier wrote:
> This case is funky.  The parent gets dropped at commit time, but it does
> not know that it should drop the child as well per their dependencies.
> This actually goes into the internals of performDeletion(), which is
> scary to touch on back-branches just for such cases..

A bit more fun with inheritance:
=# begin;
BEGIN
=# create temp table aa_p (a int) on commit drop;
CREATE TABLE
=# create temp table aa_c (a int) inherits (aa_p) on commit delete rows;
NOTICE:  0: merging column "a" with inherited definition
LOCATION:  MergeAttributes, tablecmds.c:2339
CREATE TABLE
=# insert into aa_p values (1);
INSERT 0 1
=# insert into aa_c values (1);
INSERT 0 1
=# commit;
NOTICE:  0: drop cascades to table aa_c
LOCATION:  reportDependentObjects, dependency.c:995
ERROR:  XX000: could not open relation with OID 16426
LOCATION:  relation_open, heapam.c:1138

Let's treat that as a separate issue, as this happens also with an
unpatched build.  The only reason why you cannot trigger it with
partitions is that ON COMMIT is currently broken for them, so we should
fix the reported case first.  In consequence, I would tend to commit the
patch proposed and take care of the first, except if of course anybody
has an objection. 
--
Michael


signature.asc
Description: PGP signature


Re: Getting ERROR: could not open file "base/13164/t3_16388" with partition table with ON COMMIT

2018-11-01 Thread Michael Paquier
On Fri, Nov 02, 2018 at 01:36:05PM +0900, Amit Langote wrote:
> When writing the test, I noticed something to be pointed out.  As of
> 1c7c317cd9d, partitions of a temporary partition table themselves must be
> temporary, but the ON COMMIT action has to be specified for each table
> separately.  Maybe, there is nothing to be concerned about here, because
> there's nowhere to record what's specified for the parent to use it on the
> children.  So, children's CREATE TABLE commands must specify the ON COMMIT
> action for themselves.

Well, I would actually expect callers to do so.  ON COMMIT PRESERVE or
DELETE rows does not make much sense for partitioned tables as
they have no physical presence (like today's thread about tablespaces
which you know about), but it looks better to just let the command go
through instead of complaining about it if we worry about inheritance.
And actually, your point has just made me aware of a second bug:
=# begin;
BEGIN
=# create temp table temp_parent (a int) partition by list (a) on commit
drop;
CREATE TABLE
=# create temp table temp_child_2 partition of temp_parent for values in
(2) on commit delete rows;
CREATE TABLE
=# insert into temp_parent values (2);
INSERT 0 1
=# table temp_parent;
 a
---
 2
(1 row)
=# commit;
ERROR:  XX000: could not open relation with OID 16420
LOCATION:  relation_open, heapam.c:1138

This case is funky.  The parent gets dropped at commit time, but it does
not know that it should drop the child as well per their dependencies.
This actually goes into the internals of performDeletion(), which is
scary to touch on back-branches just for such cases..
--
Michael


signature.asc
Description: PGP signature


Re: ToDo: show size of partitioned table

2018-11-01 Thread Amit Langote
Hi,

On 2018/11/01 2:19, Pavel Stehule wrote:
> st 31. 10. 2018 v 7:34 odesílatel Amit Langote <
> langote_amit...@lab.ntt.co.jp> napsal:
>> On 2018/10/31 15:30, Pavel Stehule wrote:
>>> st 31. 10. 2018 v 3:27 odesílatel Amit Langote <
>>> langote_amit...@lab.ntt.co.jp> napsal:
 +appendPQExpBufferStr(, "\nWHERE c.relkind IN ('p')\n");

 I wonder if we should list partitioned indexes ('I') as well, because
 their size information is not available with \di+.  But maybe, they
>> should
 have a separate command.

>>>
>>> I though about it too and I prefer separate command. Similar to \di+
>>
>> Okay, maybe \dI+.
>>
>>
> what about combination
> 
> \dPt+ for partitined tables and size
> \dPi+ for indexes on partitioned tables and size
> \dP+ for total partition size (tables + indexes)

+1

Thanks,
Amit




Re: bugfix: BUG #15477: Procedure call with named inout refcursor parameter - "invalid input syntax for type boolean"

2018-11-01 Thread Pavel Stehule
čt 1. 11. 2018 v 14:31 odesílatel Pavel Stehule 
napsal:

> Cleaned patch with regress tests
>
>
minor cleaning

Regards

Pavel
diff --git a/src/pl/plpgsql/src/expected/plpgsql_call.out b/src/pl/plpgsql/src/expected/plpgsql_call.out
index 547ca22a55..8762e1335c 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_call.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_call.out
@@ -276,3 +276,43 @@ DROP PROCEDURE test_proc1;
 DROP PROCEDURE test_proc3;
 DROP PROCEDURE test_proc4;
 DROP TABLE test1;
+CREATE TABLE test_call_proc(key serial, name text);
+CREATE OR REPLACE PROCEDURE p1(v_cnt int, v_ResultSet inout refcursor = NULL)
+AS $$
+BEGIN
+  INSERT INTO test_call_proc(name) VALUES('name test');
+  OPEN v_ResultSet FOR SELECT * FROM test_call_proc;
+END
+$$ LANGUAGE plpgsql;
+DO $$
+DECLARE
+  v_ResultSet refcursor;
+  v_cnt   integer;
+BEGIN
+  CALL p1(v_cnt:=v_cnt, v_ResultSet := v_ResultSet);
+  RAISE NOTICE '%', v_ResultSet;
+END;
+$$;
+NOTICE:  
+DO $$
+DECLARE
+  v_ResultSet refcursor;
+  v_cnt   integer;
+BEGIN
+  CALL p1(10, v_ResultSet := v_ResultSet);
+  RAISE NOTICE '%', v_ResultSet;
+END;
+$$;
+NOTICE:  
+DO $$
+DECLARE
+  v_ResultSet refcursor;
+  v_cnt   integer;
+BEGIN
+  CALL p1(v_ResultSet := v_ResultSet, v_cnt:=v_cnt);
+  RAISE NOTICE '%', v_ResultSet;
+END;
+$$;
+NOTICE:  
+DROP PROCEDURE p1;
+DROP TABLE test_call_proc;
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 45526383f2..c737a66c54 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2168,10 +2168,10 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 		 */
 		if (!stmt->target)
 		{
+			Form_pg_proc funcform;
 			Node	   *node;
 			ListCell   *lc;
 			FuncExpr   *funcexpr;
-			int			i;
 			HeapTuple	tuple;
 			Oid		   *argtypes;
 			char	  **argnames;
@@ -2179,6 +2179,7 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			MemoryContext oldcontext;
 			PLpgSQL_row *row;
 			int			nfields;
+			int			pronargs;
 
 			/*
 			 * Get the original CallStmt
@@ -2196,6 +2197,8 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			if (!HeapTupleIsValid(tuple))
 elog(ERROR, "cache lookup failed for function %u", funcexpr->funcid);
 			get_func_arg_info(tuple, , , );
+			funcform = (Form_pg_proc) GETSTRUCT(tuple);
+			pronargs = funcform->pronargs;
 			ReleaseSysCache(tuple);
 
 			/*
@@ -2210,45 +2213,74 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			row->varnos = palloc(sizeof(int) * FUNC_MAX_ARGS);
 
 			nfields = 0;
-			i = 0;
-			foreach(lc, funcexpr->args)
+
+			/*
+			 * The argmodes can be in different order than funcexpr->args due
+			 * named args. When we should to check INOUT parameters and prepare
+			 * target variable, we should to reorder a arguments first. This is
+			 * similar code to reorder_function_arguments. In this part a default
+			 * values are not necessary.
+			 */
+			if (argmodes)
 			{
-Node	   *n = lfirst(lc);
+Node	   *argarray[FUNC_MAX_ARGS];
+int		nargsprovided = list_length(funcexpr->args);
+int		i = 0;
+
+MemSet(argarray, 0, pronargs * sizeof(Node *));
 
-if (argmodes && argmodes[i] == PROARGMODE_INOUT)
+foreach(lc, funcexpr->args)
 {
-	if (IsA(n, Param))
-	{
-		Param	   *param = castNode(Param, n);
+	Node	   *n = lfirst(lc);
 
-		/* paramid is offset by 1 (see make_datum_param()) */
-		row->varnos[nfields++] = param->paramid - 1;
-	}
-	else if (IsA(n, NamedArgExpr))
+	if (IsA(n, NamedArgExpr))
 	{
 		NamedArgExpr *nexpr = castNode(NamedArgExpr, n);
-		Param	   *param;
+		argarray[nexpr->argnumber] = (Node *) nexpr->arg;
+	}
+	else
+		argarray[i++] = n;
+}
 
-		if (!IsA(nexpr->arg, Param))
-			ereport(ERROR,
-	(errcode(ERRCODE_SYNTAX_ERROR),
-	 errmsg("argument %d is an output argument but is not writable", i + 1)));
+Assert(nargsprovided <= pronargs);
 
-		param = castNode(Param, nexpr->arg);
+for (i = 0; i < pronargs; i++)
+{
+	Node	   *n = argarray[i];
 
+	if (argmodes[i] == PROARGMODE_INOUT)
+	{
 		/*
-		 * Named arguments must be after positional arguments,
-		 * so we can increase nfields.
+		 * Empty positions are related to default values. The INOUT defaults
+		 * are allowed, only if after are not any other parameter.
 		 */
-		row->varnos[nexpr->argnumber] = param->paramid - 1;
-		nfields++;
+		if (!n)
+		{
+			int		j;
+
+			for (j = i + 1; j < pronargs; j++)
+			{
+if (argarray[j])
+	ereport(ERROR,
+			(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("argument %i with default values is output argument but it is not writeable", i + 1)));
+
+			/* there are not any other parameter */
+			break;
+		}
+		else if (IsA(n, Param))
+		{
+			Param	   *param = castNode(Param, n);
+
+			/* paramid is 

Re: Getting ERROR: could not open file "base/13164/t3_16388" with partition table with ON COMMIT

2018-11-01 Thread Amit Langote
On 2018/11/02 10:51, Michael Paquier wrote:
> On Thu, Nov 01, 2018 at 01:04:43PM +0900, Michael Paquier wrote:
>> On Thu, Nov 01, 2018 at 12:39:16PM +0900, Amit Langote wrote:
>>> Rajkumar pointed out off-list that the patch still remains to be applied.
>>> Considering that there is a planned point release on Nov 8, maybe we
>>> should do something about this?
>>
>> Yes doing something about that very soon would be a good idea.  Tom,
>> are you planning to look at it or should I jump in?
> 
> And so I am looking at v3 now...

Thanks for looking.

> Adding a test case in temp.sql would be nice.

Good idea, done.

When writing the test, I noticed something to be pointed out.  As of
1c7c317cd9d, partitions of a temporary partition table themselves must be
temporary, but the ON COMMIT action has to be specified for each table
separately.  Maybe, there is nothing to be concerned about here, because
there's nowhere to record what's specified for the parent to use it on the
children.  So, children's CREATE TABLE commands must specify the ON COMMIT
action for themselves.

> Would it make sense to support TRUNCATE on a materialized as well in the
> future?  It seems to me that it is dangerous to assume that only
> relations make use of heap_truncate_one_rel() anyway as modules or
> external code could perfectly call it.  And the thing is documented
> to work on a relation, including materialized views, not just an
> ordinary table which is what RELKIND_RELATION only mentions.  On the
> contrary we know that heap_truncate() works only on temporary
> relations.  It is documented to do so and does so.
> 
> So it seems to me that Tom correctly mentioned to add the check in
> heap_truncate, not heap_truncate_one_rel(), so v3 looks incorrect to
> me.

Okay, I agree that adding the preventive check in heap_truncate is a good
idea.

Updated patch attached.

Thanks,
Amit
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 064cf9dbf6..2c2d287f0e 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -3121,7 +3121,8 @@ RelationTruncateIndexes(Relation heapRelation)
 /*
  *  heap_truncate
  *
- *  This routine deletes all data within all the specified relations.
+ *  This routine deletes all data within all the specified relations,
+ *  ignoring any that don't have any storage to begin with
  *
  * This is not transaction-safe!  There is another, transaction-safe
  * implementation in commands/tablecmds.c.  We now use this only for
@@ -3151,8 +3152,12 @@ heap_truncate(List *relids)
{
Relationrel = lfirst(cell);
 
-   /* Truncate the relation */
-   heap_truncate_one_rel(rel);
+   /*
+* Truncate the relation.  Regular and partitioned tables can be
+* temporary relations, but only regular tables have storage.
+*/
+   if (rel->rd_rel->relkind == RELKIND_RELATION)
+   heap_truncate_one_rel(rel);
 
/* Close the relation, but keep exclusive lock on it until 
commit */
heap_close(rel, NoLock);
diff --git a/src/test/regress/expected/temp.out 
b/src/test/regress/expected/temp.out
index addf1ec444..ed2c591fc6 100644
--- a/src/test/regress/expected/temp.out
+++ b/src/test/regress/expected/temp.out
@@ -199,3 +199,17 @@ select pg_temp.whoami();
 (1 row)
 
 drop table public.whereami;
+-- for partitioned temp tables, ON COMMIT action should ignore storage-less
+-- partitioned parent table
+begin;
+create temp table temp_parted_oncommit_test (a int) partition by list (a) on 
commit delete rows;
+create temp table temp_parted_oncommit_test1 partition of 
temp_parted_oncommit_test for values in (1) on commit delete rows;
+insert into temp_parted_oncommit_test values (1);
+commit;
+-- temp_parted_oncommit_test1 must've been emptied during the commit
+select * from temp_parted_oncommit_test;
+ a 
+---
+(0 rows)
+
+drop table temp_parted_oncommit_test;
diff --git a/src/test/regress/sql/temp.sql b/src/test/regress/sql/temp.sql
index 5183c727f5..0274bdb13f 100644
--- a/src/test/regress/sql/temp.sql
+++ b/src/test/regress/sql/temp.sql
@@ -151,3 +151,14 @@ select whoami();
 select pg_temp.whoami();
 
 drop table public.whereami;
+
+-- for partitioned temp tables, ON COMMIT action should ignore storage-less
+-- partitioned parent table
+begin;
+create temp table temp_parted_oncommit_test (a int) partition by list (a) on 
commit delete rows;
+create temp table temp_parted_oncommit_test1 partition of 
temp_parted_oncommit_test for values in (1) on commit delete rows;
+insert into temp_parted_oncommit_test values (1);
+commit;
+-- temp_parted_oncommit_test1 must've been emptied during the commit
+select * from temp_parted_oncommit_test;
+drop table temp_parted_oncommit_test;


Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-01 Thread Tom Lane
Christian Ohler  writes:
> On Wed, Oct 31, 2018 at 7:22 AM Tom Lane  wrote:
>> If we're going to expose the
>> internal format, let's just change the definition of the type's binary
>> I/O format, thereby getting a win for purposes like COPY BINARY as well.

> How would this work from the driver's and application's perspective?  What
> does the driver do when receiving JSONB data?

Well, upthread it was posited that applications that read binary JSONB
data would be willing to track changes in that format (or else have no
need to, because they don't do anything with it except feed it back to the
server).  If that isn't the case, then this entire thread is a waste of
time.  I certainly don't buy that exposing the internal format via some
other mechanism than binary I/O would be a sufficient excuse for not
worrying about cross-version compatibility.

> The idea behind the proposal is to improve efficiency by avoiding
> conversions, and the most straightforward way to do that is for every layer
> to pass through the raw bytes.

This argument is, frankly, as bogus as it could possibly be.  In the first
place, you're essentially saying that ignoring version compatibility
considerations entirely is the way to avoid future version compatibility
problems.  I don't buy it.  In the second place, you already admitted
that format conversion *is* necessary; what PG finds best internally is
unlikely to be exactly what some other piece of software will want.
So we'd be better off agreeing on some common interchange format.

I'm still bemused by the proposition that that common interchange format
shouldn't be, um, JSON.  We've already seen BSON, BJSON, etc die
well-deserved deaths.  Why would jsonb internal format, which was never
for one second intended to be seen anywhere outside PG, be a better
interchange-format design than those were?

(In short, I remain unconvinced that we'd not be better off spending
our effort on making json_out faster...)

regards, tom lane



Re: COPY FROM WHEN condition

2018-11-01 Thread Corey Huinker
>
> > Are you thinking something like having a COPY command that provides
> > results in such a way that they could be referenced in a FROM clause
> > (perhaps a COPY that defines a cursor…)?
>
> That would also be nice, but what I was thinking of was that some
> highly restricted subset of cases of SQL in general could lend
> themselves to levels of optimization that would be impractical in
> other contexts.
>

If COPY (or a syntactical equivalent) can return a result set, then the
whole of SQL is available to filter and aggregate the results and we don't
have to invent new syntax, or endure confusion whenCOPY-WHEN syntax behaves
subtly different from a similar FROM-WHERE.

Also, what would we be saving computationally? The whole file (or program
output) has to be consumed no matter what, the columns have to be parsed no
matter what. At least some of the columns have to be converted to their
assigned datatypes enough to know whether or not to filter the row, but we
might be able push that logic inside a copy. I'm thinking of something like
this:

SELECT x.a, sum(x.b)
FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b
numeric, c text, d date, e json) )
WHERE x.d >= '2018-11-01'


In this case, there is the *opportunity* to see the following optimizations:
- columns c and e are never referenced, and need never be turned into a
datum (though we might do so just to confirm that they conform to the data
type)
- if column d is converted first, we can filter on it and avoid converting
columns a,b
- whatever optimizations we can infer from knowing that the two surviving
columns will go directly into an aggregate

If we go this route, we can train the planner to notice other optimizations
and add those mechanisms at that time, and then existing code gets faster.

If we go the COPY-WHEN route, then we have to make up new syntax for every
possible future optimization.


Re: heap_sync seems rather oblivious to partitioned tables (wal_level=minimal)

2018-11-01 Thread Steve Singer

On Thu, 1 Nov 2018, David Rowley wrote:



I ended up going with:

   However, this consideration only applies when
is minimal for
   non-partitioned tables as all commands must write WAL otherwise.





I've changed this to:

 Rows will be frozen only if the table being loaded has been created
 or truncated in the current subtransaction, there are no cursors
 open and there are no older snapshots held by this transaction.  It is
 currently not possible to perform a COPY FREEZE on
 a partitioned table.

Which is just the original text with the new sentence tagged onto the end.


Other than that the patch looks fine and works as expected.

The new status of this patch is: Waiting on Author


Thanks for looking at this. I've attached an updated patch.


I am happy with the changes.
I think the patch is ready for a committer.




--
David Rowley   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Steve




Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-01 Thread Christian Ohler
On Wed, Oct 31, 2018 at 7:22 AM Tom Lane  wrote:

> If we're going to expose the
> internal format, let's just change the definition of the type's binary
> I/O format, thereby getting a win for purposes like COPY BINARY as well.
> We'd need to ensure that jsonb_recv could tell whether it was seeing the
> old or new format, but at worst that'd require prepending a header of
> some sort.  (In practice, I suspect we'd end up with a wire-format
> definition that isn't exactly the bits-on-disk, but something easily
> convertible to/from that and more easily verifiable by jsonb_recv.
> Numeric subfields, for instance, ought to match the numeric wire
> format, which IIRC isn't exactly the bits-on-disk either.)
>

How would this work from the driver's and application's perspective?  What
does the driver do when receiving JSONB data?

Applications currently receive JSON strings when reading JSONB data, and
the driver presumably has to stay compatible with that.  Does that mean the
driver transparently converts JSONB to JSON before handing it to the
application?  That scales better than doing it in Postgres itself, but is
still the kind of inefficiency we're trying to avoid.

We want to convert JSONB directly to language-native objects.  This
shouldn't be the responsibility of the Postgres driver, since the
conversion is complex and can be done in different ways, such as
instantiating objects from a class hierarchy vs instantiating generic
containers, or eager vs lazy conversion.  Applications that are sensitive
to JSONB performance likely want control over these aspects.  Postgres
drivers aren't coupled to specific JSON parsers; they shouldn't be coupled
to specific JSONB parsers either.

So, AFAICS, when the application requests JSONB data, the driver has to
hand it the raw JSONB bytes.  But that's incompatible with what currently
happens.  To preserve compatibility, does the application have to opt in by
setting some out-of-band per-query per-result-column flags to tell the
driver how it wants the JSONB data returned?  That's workable in principle
but bloats every driver's API with some rarely-used performance feature.
Seems much simpler to put this into the query.

The idea behind the proposal is to improve efficiency by avoiding
conversions, and the most straightforward way to do that is for every layer
to pass through the raw bytes.  With an explicit conversion to BYTEA in the
query, this is automatic without any changes to drivers, since every layer
already knows to leave BYTEAs untouched.

I don't have an argument against _also_ adding a binary format version 2
for JSONB once we define a portable JSONB format; but I am not sure it
alone solves the problem we have.


Re: partitioned indexes and tablespaces

2018-11-01 Thread Amit Langote
Hi,

On 2018/11/02 10:27, Michael Paquier wrote:
> On Thu, Nov 01, 2018 at 09:31:38PM -0300, Alvaro Herrera wrote:
>> A customer reported to us that partitioned indexes are not working
>> consistently with tablespaces:
> 
> Let's see...
> 
>> 1. When a CREATE INDEX specifies a tablespace, existing partitions get
>> the index in the correct tablespace; however, the parent index itself
>> does not record the tablespace.  So when new partitions are created
>> later, they get the index in the default tablespace instead of the
>> specified tablespace.  Fix by saving the tablespace in the pg_class row
>> for the parent index.
> 
> I may be missing something of course...  But partitioned tables don't
> register the tablespace they are on either so as it cannot be used by
> any partitions created on it:
> =# create tablespace popo location '/home/ioltas/data/tbspace';
> CREATE TABLESPACE
> =# create table aa (a int) partition by list (a) tablespace popo;
> CREATE TABLE
> =# create table aa_1 partition of aa for values in (1) tablespace popo;
> CREATE TABLE
> =# create table aa_2 partition of aa for values in (2);
> CREATE TABLE
> =# select t.spcname, c.relname from pg_class c, pg_tablespace t
> where c.oid > 16000 and c.reltablespace = t.oid;
>  spcname | relname
> -+-
>  popo| aa_1
> (1 row)
> 
> It seems to me that the current behavior is wanted in this case, because
> partitioned tables and partitioned indexes have no physical storage.

Keith Fiske complained about this behavior for partitioned *tables* a few
months ago, which led to the following discussion:

https://www.postgresql.org/message-id/flat/CAKJS1f9PXYcT%2Bj%3DoyL-Lquz%3DScNwpRtmD7u9svLASUygBdbN8w%40mail.gmail.com

It's Michael's message that was the last one on that thread. :)

https://www.postgresql.org/message-id/20180413224007.GB27295%40paquier.xyz

By the way, if we decide to do something about this, I think we do the
same for partitioned tables.  There are more than one interesting
behaviors possible that are mentioned in the above thread for when
parent's reltablespace is set/changed.  For example, when it's set, the
existing partitions are not moved, but the new value only applies to
subsequently created partitions.  Considering various such behaviors, this
would seem like new feature work, which I'd suppose would only be
considered for 12.

>> 2. ALTER TABLE SET TABLESPACE, applied to the partitioned index, would
>> raise an error indicating that it's not the correct relation kind.  In
>> order for this to actually work, we need bespoke code for ATExecCmd();
>> the code for all other relation kinds wants to move storage (and runs in
>> Phase 3, later), but these indexes do not have that.  Therefore, write a
>> cut-down version which is invoked directly in ATExecCmd instead.
>
> Using the previous example, this does not raise an error:
> alter table aa set tablespace popo;
> However the reference to reltablespace in pg_class is not changed.  So I
> would agree with your point to not raise an error and back-patch that,
> but I don't agree with the point of changing reltablespace for a
> partitioned index if that's what you mean.
>
>> 3. ALTER INDEX ALL IN TABLESPACE, identical problem, is also fixed by
>> the above change.
> 
> Reproducible with just the following stuff on top of the previous
> example:
> create index aai on aa(a);
> alter index all in tablespace pg_default set tablespace popo;
> 
> In this case also raising an error is a bug, it seems to me that
> partitioned indexes should just be ignored.

Same argument here too.

IOW, I agree with Michael that if something will be back-patched to 11, it
should be a small patch to make the unsupported relkind error go away.

Thanks,
Amit




Re: row filtering for logical replication

2018-11-01 Thread Euler Taveira
Em qui, 1 de nov de 2018 às 05:30, Erik Rijkers  escreveu:
> > I ran pgbench-over-logical-replication with a WHERE-clause and could
> > not get this to do a correct replication.  Below is the output of the
> > attached test program.
> >
> >
> > $ ./logrep_rowfilter.sh
>
Erik, thanks for testing.

> So it seems this bug is due to some timing error in your patch (or
> possibly in logical replication itself).
>
It is a bug in the new synchronization code. I'm doing some code
cleanup/review and will post a new patchset after I finish it. If you
want to give it a try again, apply the following patch.

diff --git a/src/backend/replication/logical/tablesync.c
b/src/backend/replication/logical/tablesync.c
index e0eb73c..4797e0b 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -757,7 +757,7 @@ fetch_remote_table_info(char *nspname, char *relname,

/* Fetch row filtering info */
resetStringInfo();
-   appendStringInfo(, "SELECT pg_get_expr(prrowfilter,
prrelid) FROM pg_publication p INNER JOIN pg_publication_rel pr ON
(p.oid = pr.prpubid) WHERE pr.prrelid = %u AND p.pubname IN (",
MyLogicalRepWorker->relid);
+   appendStringInfo(, "SELECT pg_get_expr(prrowfilter,
prrelid) FROM pg_publication p INNER JOIN pg_publication_rel pr ON
(p.oid = pr.prpubid) WHERE pr.prrelid = %u AND p.pubname IN (",
lrel->remoteid);


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento



Re: CF app feature request

2018-11-01 Thread Andrew Dunstan




On 11/01/2018 08:40 PM, Michael Paquier wrote:

On Thu, Nov 01, 2018 at 05:56:24PM -0400, Andrew Dunstan wrote:

On 11/01/2018 05:50 PM, Magnus Hagander wrote:

Are you thinking basically another status that's "Withdrawn", but
keeping it, or actually removing the records completely?

I don't know enough about the app internals to comment. But it probably
shouldn't appear in the stats, or else should have its own category in the
stats.

Or that's closer to "Rejected by the author himself"?  "Withdrawn"
sounds like a good term for that, we surely don't want to simply remove
the thing entirely either.  What's actually the issue with not tracking
such things in the stats?



I don't have a strong opinion. It seemed to me that where something had 
been created in error it would be best simply to be able to undo that. 
But I can see arguments against.


cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Getting ERROR: could not open file "base/13164/t3_16388" with partition table with ON COMMIT

2018-11-01 Thread Michael Paquier
On Thu, Nov 01, 2018 at 01:04:43PM +0900, Michael Paquier wrote:
> On Thu, Nov 01, 2018 at 12:39:16PM +0900, Amit Langote wrote:
>> Rajkumar pointed out off-list that the patch still remains to be applied.
>> Considering that there is a planned point release on Nov 8, maybe we
>> should do something about this?
> 
> Yes doing something about that very soon would be a good idea.  Tom,
> are you planning to look at it or should I jump in?

And so I am looking at v3 now...

Adding a test case in temp.sql would be nice.

Would it make sense to support TRUNCATE on a materialized as well in the
future?  It seems to me that it is dangerous to assume that only
relations make use of heap_truncate_one_rel() anyway as modules or
external code could perfectly call it.  And the thing is documented
to work on a relation, including materialized views, not just an
ordinary table which is what RELKIND_RELATION only mentions.  On the
contrary we know that heap_truncate() works only on temporary
relations.  It is documented to do so and does so.

So it seems to me that Tom correctly mentioned to add the check in
heap_truncate, not heap_truncate_one_rel(), so v3 looks incorrect to
me.
--
Michael


signature.asc
Description: PGP signature


Re: partitioned indexes and tablespaces

2018-11-01 Thread Michael Paquier
On Thu, Nov 01, 2018 at 09:31:38PM -0300, Alvaro Herrera wrote:
> A customer reported to us that partitioned indexes are not working
> consistently with tablespaces:

Let's see...

> 1. When a CREATE INDEX specifies a tablespace, existing partitions get
> the index in the correct tablespace; however, the parent index itself
> does not record the tablespace.  So when new partitions are created
> later, they get the index in the default tablespace instead of the
> specified tablespace.  Fix by saving the tablespace in the pg_class row
> for the parent index.

I may be missing something of course...  But partitioned tables don't
register the tablespace they are on either so as it cannot be used by
any partitions created on it:
=# create tablespace popo location '/home/ioltas/data/tbspace';
CREATE TABLESPACE
=# create table aa (a int) partition by list (a) tablespace popo;
CREATE TABLE
=# create table aa_1 partition of aa for values in (1) tablespace popo;
CREATE TABLE
=# create table aa_2 partition of aa for values in (2);
CREATE TABLE
=# select t.spcname, c.relname from pg_class c, pg_tablespace t
where c.oid > 16000 and c.reltablespace = t.oid;
 spcname | relname
-+-
 popo| aa_1
(1 row)

It seems to me that the current behavior is wanted in this case, because
partitioned tables and partitioned indexes have no physical storage.

> 2. ALTER TABLE SET TABLESPACE, applied to the partitioned index, would
> raise an error indicating that it's not the correct relation kind.  In
> order for this to actually work, we need bespoke code for ATExecCmd();
> the code for all other relation kinds wants to move storage (and runs in
> Phase 3, later), but these indexes do not have that.  Therefore, write a
> cut-down version which is invoked directly in ATExecCmd instead.

Using the previous example, this does not raise an error:
alter table aa set tablespace popo;
However the reference to reltablespace in pg_class is not changed.  So I
would agree with your point to not raise an error and back-patch that,
but I don't agree with the point of changing reltablespace for a
partitioned index if that's what you mean.

> 3. ALTER INDEX ALL IN TABLESPACE, identical problem, is also fixed by
> the above change.

Reproducible with just the following stuff on top of the previous
example:
create index aai on aa(a);
alter index all in tablespace pg_default set tablespace popo;

In this case also raising an error is a bug, it seems to me that
partitioned indexes should just be ignored.

Could you add an entry in the next CF to not lose track of what is
discussed here?
--
Michael


signature.asc
Description: PGP signature


Re: INSTALL file

2018-11-01 Thread Michael Paquier
On Thu, Nov 01, 2018 at 08:42:34PM -0400, Stephen Frost wrote:
> If we go down this route, the master branch should probably link to the
> regularly built devel documentation, so that if/when we do make such a
> change, we'll point people at the updated documentation too.

I don't know how others feel about such things, so I may be
overengineering a simple problem as well :)

Also, I have not looked in details at the perl tools used to change the
version number in the tree, but we'd likely want a solution which
updates automatically the README also in this case depending on the
version number, perhaps also making sure that for a git repository with
the master branch in use we don't want to point to the development
version of the docs.
--
Michael


signature.asc
Description: PGP signature


Re: CF app feature request

2018-11-01 Thread Michael Paquier
On Thu, Nov 01, 2018 at 05:56:24PM -0400, Andrew Dunstan wrote:
> On 11/01/2018 05:50 PM, Magnus Hagander wrote:
>> Are you thinking basically another status that's "Withdrawn", but
>> keeping it, or actually removing the records completely?
> 
> I don't know enough about the app internals to comment. But it probably
> shouldn't appear in the stats, or else should have its own category in the
> stats.

Or that's closer to "Rejected by the author himself"?  "Withdrawn"
sounds like a good term for that, we surely don't want to simply remove
the thing entirely either.  What's actually the issue with not tracking
such things in the stats?
--
Michael


signature.asc
Description: PGP signature


Re: INSTALL file

2018-11-01 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Thu, Nov 01, 2018 at 01:41:33PM -0400, Stephen Frost wrote:
> > I'm not sure that I'm really following this, because we aren't pointing
> > to the development documentation, just the 'current' documentation, and
> > that seems fine, and there's links on that page to the other versions of
> > the page for each major version of PostgreSQL, in case someone pulled an
> > older branch or such.
> 
> Imagine for example that Postgres switches from ./configure to cmake,
> this makes the current version of the documentation invalid on
> back-branches, still they refer to it.  If we target that stuff for
> beginners, they may just look into those docs, without thinking that
> they actually need the version of the docs with the branch checked out.

I'd say we're targeting beginner developers, not beginner computer
users.

That said, I suppose it wouldn't be all that hard to have a different
version of the README for each branch and to keep it updated as we make
releases.

One point I'll make though is that we do need to have the README checked
into git, since that's where most people are getting the source.

If we go down this route, the master branch should probably link to the
regularly built devel documentation, so that if/when we do make such a
change, we'll point people at the updated documentation too.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: INSTALL file

2018-11-01 Thread Stephen Frost
Greetings,

* Andreas 'ads' Scherbaum (a...@pgug.de) wrote:
> How about the attached one? Picked up your draft, and cleaned it up a bit.

(unsurprisingly) this is looking pretty good to me.

A few additional notes:

> PostgreSQL Database Management System
> =
> 
> This directory contains the source code distribution of the PostgreSQL
> database management system.
> 
> PostgreSQL is an advanced object-relational database management system
> that supports an extended subset of the SQL standard, including
> transactions, foreign keys, subqueries, triggers, user-defined types
> and functions.  This distribution also contains C language bindings.
> 
> PostgreSQL has many language interfaces, many of which are listed here:
> 
>   https://www.postgresql.org/download

Honestly, I don't think the above link makes sense when we're saying
"here's where to go to download language interfaces."  We could maybe
link to https://www.postgresql.org/download/products/2-drivers-and-interfaces/
instead, but that link is really pretty messy, imv.

Instead, I'd just move the 'Download' section that's at the end up to
here.

> Building on Windows
> ===
> 
> Detailed instructions for building on Windows is available here:
> https://www.postgresql.org/docs/current/static/install-windows.html
> 
> To build PostgreSQL on Windows, either Visual Studio Express 2017
> for Windows Desktop or Microsoft Visual C++ 2005 (or later) should be
> installed.  PostgreSQL can also be build using MinGW or Cygwin using
> the Unix instructions.

The above should say "can also be built", not "build".

> Initializing your Database
> ==
> 
> Once the PostgreSQL software is installed, the first step to having a
> running database is to initialize a PostgreSQL database using the
> 'initdb' command, eg:
> 
> initdb -D /path/to/mydatabase

We probably shouldn't say 'eg' but should use language similar to what
we say above, which would be "run this command" or so.

> Where '/path/to/mydatabase' is the directory where the database is
> going to be installed. This directory can exist, but must be empty.
> If it does not exist, 'initdb' will create it.
> 
> After the database system has been initialized, PostgreSQL can be
> started by using the pg_ctl command:
> 
> pg_ctl -D /path/to/mydatabase -l logfile start
> 
> Once PostgreSQL is running, you can connect to it using the psql
> command-line client.  A default database called 'postgres' was created
> by 'initdb'.

I didn't include a full path intentionally, simply because paths look
different between Unix systems and Windows.  That said, I don't think
it's a big deal either way.

> Building the PostgreSQL Documentation
> =

I've been going back and forth on this, but it seems like building the
documentation should maybe be above the Initializing a database..?  I
could go either way on it.

> Download
> 
> 
> The latest version of this software may be obtained at
> https://www.postgresql.org/download/.  For more information look at our
> web site located at https://www.postgresql.org/.

I'd suggest we change this to be something like:

The latest version of this software, both in source code form and as
binary packages for many platforms, may be obtained at
https://www.postgresql.org/download/.  For more information please visit
https://www.postgresql.org/.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-01 Thread Michael Paquier
On Thu, Nov 01, 2018 at 04:44:40PM +0530, Amit Kapila wrote:
> This sounds like a good argument for having a whitelist approach, but
> is it really a big problem if a user gets warning for files that the
> utility is not able to verify checksums for?  I think in some sense
> this message can be useful to the user as it can allow him to know
> which files are not verified by the utility for some form of
> corruption.  I guess one can say that users might not be interested in
> this information in which case such a check could be optional as you
> seem to be suggesting in the following paragraph.

The replication protocol supports NOVERIFY_CHECKSUMS to avoid the
warnings so they enabled by default, and can be disabled at will.
pg_basebackup supports the same interface.
--
Michael


signature.asc
Description: PGP signature


partitioned indexes and tablespaces

2018-11-01 Thread Alvaro Herrera
Hello

A customer reported to us that partitioned indexes are not working
consistently with tablespaces:

1. When a CREATE INDEX specifies a tablespace, existing partitions get
the index in the correct tablespace; however, the parent index itself
does not record the tablespace.  So when new partitions are created
later, they get the index in the default tablespace instead of the
specified tablespace.  Fix by saving the tablespace in the pg_class row
for the parent index.

2. ALTER TABLE SET TABLESPACE, applied to the partitioned index, would
raise an error indicating that it's not the correct relation kind.  In
order for this to actually work, we need bespoke code for ATExecCmd();
the code for all other relation kinds wants to move storage (and runs in
Phase 3, later), but these indexes do not have that.  Therefore, write a
cut-down version which is invoked directly in ATExecCmd instead.

3. ALTER INDEX ALL IN TABLESPACE, identical problem, is also fixed by
the above change.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 819dd64baf04d39f02785cde72b6a4c33fd10f3b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 1 Nov 2018 21:29:56 -0300
Subject: [PATCH] Fix tablespace handling for partitioned indexes

When creating partitioned indexes, the tablespace was not being saved
for the parent index. This meant that subsequently created partitions
would not use the right tablespace for their indexes.

ALTER INDEX SET TABLESPACE and ALTER INDEX ALL IN TABLESPACE also raised
errors when tried; fix them too.
---
 src/backend/catalog/heap.c| 10 -
 src/backend/commands/tablecmds.c  | 61 +--
 src/test/regress/input/tablespace.source  | 10 +
 src/test/regress/output/tablespace.source | 19 +-
 4 files changed, 95 insertions(+), 5 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 8c52a1543d..61ce44830b 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -297,7 +297,6 @@ heap_create(const char *relname,
 		case RELKIND_COMPOSITE_TYPE:
 		case RELKIND_FOREIGN_TABLE:
 		case RELKIND_PARTITIONED_TABLE:
-		case RELKIND_PARTITIONED_INDEX:
 			create_storage = false;
 
 			/*
@@ -306,6 +305,15 @@ heap_create(const char *relname,
 			 */
 			reltablespace = InvalidOid;
 			break;
+
+		case RELKIND_PARTITIONED_INDEX:
+			/*
+			 * Preserve tablespace so that it's used as tablespace for indexes
+			 * on future partitions.
+			 */
+			create_storage = false;
+			break;
+
 		case RELKIND_SEQUENCE:
 			create_storage = true;
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 357c73073d..514c1c9d49 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -446,6 +446,8 @@ static bool ATPrepChangePersistence(Relation rel, bool toLogged);
 static void ATPrepSetTableSpace(AlteredTableInfo *tab, Relation rel,
 	const char *tablespacename, LOCKMODE lockmode);
 static void ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode);
+static void ATExecPartedIdxSetTableSpace(Relation rel, Oid newTableSpace);
+
 static void ATExecSetRelOptions(Relation rel, List *defList,
 	AlterTableType operation,
 	LOCKMODE lockmode);
@@ -3845,7 +3847,8 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 			pass = AT_PASS_DROP;
 			break;
 		case AT_SetTableSpace:	/* SET TABLESPACE */
-			ATSimplePermissions(rel, ATT_TABLE | ATT_MATVIEW | ATT_INDEX);
+			ATSimplePermissions(rel, ATT_TABLE | ATT_MATVIEW | ATT_INDEX |
+ATT_PARTITIONED_INDEX);
 			/* This command never recurses */
 			ATPrepSetTableSpace(tab, rel, cmd->name, lockmode);
 			pass = AT_PASS_MISC;	/* doesn't actually matter */
@@ -4181,10 +4184,13 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 			 */
 			break;
 		case AT_SetTableSpace:	/* SET TABLESPACE */
-
 			/*
-			 * Nothing to do here; Phase 3 does the work
+			 * Only do this for partitioned indexes, for which this is just
+			 * a catalog change.  Other relation types are handled by Phase 3.
 			 */
+			if (rel->rd_rel->relkind == RELKIND_PARTITIONED_INDEX)
+ATExecPartedIdxSetTableSpace(rel, tab->newTableSpace);
+
 			break;
 		case AT_SetRelOptions:	/* SET (...) */
 		case AT_ResetRelOptions:	/* RESET (...) */
@@ -11050,6 +11056,55 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 }
 
 /*
+ * Special handling of ALTER TABLE SET TABLESPACE for partitioned indexes,
+ * which have no storage (so not handled in Phase 3 like other relation types)
+ */
+static void
+ATExecPartedIdxSetTableSpace(Relation rel, Oid newTableSpace)
+{
+	HeapTuple	tuple;
+	Oid			oldTableSpace;
+	Relation	pg_class;
+	Form_pg_class rd_rel;
+	Oid			indexOid = RelationGetRelid(rel);
+
+	Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_INDEX);
+
+	/*
+	 * No work if no change in 

Re: Commitfest 2018-11

2018-11-01 Thread Michael Paquier
On Thu, Nov 01, 2018 at 01:19:20PM +0100, Dmitry Dolgov wrote:
> On Thu, 1 Nov 2018 at 13:05, Daniel Gustafsson  wrote:
>>
>> Dmitry: Are you still interested in running this commitfest?
> 
> Yes, I'm still interested. Do I need to have any special permissions
> in CF app for that (e.g. now I can't "push the button" to start the
> current one)?

Thanks Daniel for bringing up the topic, and to the person changed the
CF status in the app.  Happy hacking to all!
--
Michael


signature.asc
Description: PGP signature


Re: pg_promote not marked as parallel-restricted in pg_proc.dat

2018-11-01 Thread Michael Paquier
On Thu, Nov 01, 2018 at 12:59:47PM -0400, Robert Haas wrote:
> If you do the same analysis for pg_start_backup(), you'll immediately
> notice that it calls get_backup_status(), and if you look at that
> function, you'll see that it returns the value of a global variable.
> If you check parallel.c, you'll find that that global variable is not
> one of the special ones that gets synchronized between a leader and
> the workers.  Therefore, if you ran this function in a worker, it
> might do the wrong thing, because it might get the wrong value of that
> global variable.  So it's at least (and in fact exactly)
> parallel-restricted.  It doesn't need to be parallel-restricted
> because it's scary in some ill-defined way or any of these other
> reasons advanced on this thread -- it needs to be parallel-restricted
> because it *relies on a global variable that is not synchronized to
> parallel workers*.  If somebody writes code to move that state out of
> a global variables and into the main shared memory segment or to a DSM
> shared between the leader and the workers, then it can be marked
> parallel-safe (unless, for example, it also depends on OTHER global
> variables that are NOT moved into shared storage).  Otherwise not.

My name is all over the commit logs for this area, so I kind of heard
about what those things rely on..  Both exclusive and non-exclusive
backup interfaces are definitely unsafe to run across multiple processes
in the same query.  Well, unsafe is incorrect for the pg_proc
definition, that's restricted :)

> I hope I'm making sense here.

You actually do a lot, moving just one person with MP as initials to
consider moving the function as being parallel-safe.  Thanks for the
points you raised, what needs to be done looks clear now.
--
Michael


signature.asc
Description: PGP signature


Re: Pluggable Storage - Andres's take

2018-11-01 Thread Haribabu Kommi
On Wed, Oct 31, 2018 at 9:34 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:

> > On Mon, 29 Oct 2018 at 05:56, Haribabu Kommi 
> wrote:
> >
> >> This problem couldn't be reproduced on the master branch, so I've tried
> to
> >> investigate it. It comes from nodeModifyTable.c:1267, when we've got
> >> HeapTupleInvisible as a result, and this value in turn comes from
> >> table_lock_tuple. Everything points to the new way of handling
> HeapTupleUpdated
> >> result from heap_update, when table_lock_tuple call was introduced.
> Since I
> >> don't see anything similar in the master branch, can anyone clarify why
> is this
> >> lock necessary here?
> >
> >
> > In the master branch code also, there is a tuple lock that is happening
> in
> > EvalPlanQual() function, but pluggable-storage code, the lock is kept
> outside
> > and also function call rearrangements, to make it easier for the table
> access
> > methods to provide their own MVCC implementation.
>
> Yes, now I see it, thanks. Also I can confirm that the attached patch
> solves
> this issue.
>

Thanks for the testing and confirmation.


> FYI, alongside with reviewing the code changes I've ran few performance
> tests
> (that's why I hit this issue with pgbench in the first place). In case of
> high
> concurrecy so far I see small performance degradation in comparison with
> the
> master branch (about 2-5% of average latency, depending on the level of
> concurrency), but can't really say why exactly (perf just shows barely
> noticeable overhead there and there, maybe what I see is actually a
> cumulative
> impact).
>

Thanks for sharing your observation, I will also analyze and try to find
out performance
bottlenecks that are causing the overhead.

Here I attached the cumulative fixes of the patches, new API additions for
zheap and
basic outline of the documentation.

Regards,
Haribabu Kommi
Fujitsu Australia


0003-First-draft-of-pluggable-storage-documentation.patch
Description: Binary data


0002-New-API-s-are-added.patch
Description: Binary data


0001-Further-fixes-and-cleanup.patch
Description: Binary data


Re: INSTALL file

2018-11-01 Thread Michael Paquier
On Thu, Nov 01, 2018 at 01:41:33PM -0400, Stephen Frost wrote:
> I'm not sure that I'm really following this, because we aren't pointing
> to the development documentation, just the 'current' documentation, and
> that seems fine, and there's links on that page to the other versions of
> the page for each major version of PostgreSQL, in case someone pulled an
> older branch or such.

Imagine for example that Postgres switches from ./configure to cmake,
this makes the current version of the documentation invalid on
back-branches, still they refer to it.  If we target that stuff for
beginners, they may just look into those docs, without thinking that
they actually need the version of the docs with the branch checked out.

My 2c.
--
Michael


signature.asc
Description: PGP signature


Re: INSTALL file

2018-11-01 Thread Andreas 'ads' Scherbaum

On 01.11.18 18:41, Stephen Frost wrote:

Greetings,

* Andreas 'ads' Scherbaum (a...@pgug.de) wrote:

On 01.11.18 07:26, Michael Paquier wrote:

It includes links to the website, as well as the short version of the
installation instructions.

+The installation instructions are listed on the website:
+
+https://www.postgresql.org/docs/current/static/install-short.html

I don't think we should link to the "Short version" directly here, the
above URL should be the "installation.html" one..  With a caveat, see
below.


How about the attached one? Picked up your draft, and cleaned it up a bit.

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project

diff --git a/README b/README
index 12de3f1d73..1f33303d80 100644
--- a/README
+++ b/README
@@ -13,14 +13,94 @@ PostgreSQL has many language interfaces, many of which are listed here:
 
 	https://www.postgresql.org/download
 
-See the file INSTALL for instructions on how to build and install
-PostgreSQL.  That file also lists supported operating systems and
-hardware platforms and contains information regarding any other
-software packages that are required to build or run the PostgreSQL
-system.  Copyright and license information can be found in the
-file COPYRIGHT.  A comprehensive documentation set is included in this
-distribution; it can be read as described in the installation
-instructions.
+
+Building on UNIX
+
+
+Detailed instructions for many Unix platforms are available here:
+https://www.postgresql.org/docs/current/static/installation.html
+
+To build PostgreSQL on most Unix variants, the following are required:
+
+GNU make, version 3.8 or newer
+ISO/ANSI C compilar (at least C99-compliant)
+Flex 2.5.31 or later, and Bison 1.875 or later (for building from git)
+Perl 5.8.3 (for building from git)
+
+PostgreSQL has many additional capabilities which can be enabled using
+configure --enable switches but many of those also depend on additional
+libraries.  See the installation instructions for details.
+
+To build PostgreSQL, run the following commands:
+
+./configure
+make
+
+PostgreSQL can then be installed using 'make install', which will
+require being a superuser to install into the default directory.
+The installation location can be changed by passing '--prefix' to
+'configure'.  Run './configure --help' for additional options.
+
+
+Building on Windows
+===
+
+Detailed instructions for building on Windows is available here:
+https://www.postgresql.org/docs/current/static/install-windows.html
+
+To build PostgreSQL on Windows, either Visual Studio Express 2017
+for Windows Desktop or Microsoft Visual C++ 2005 (or later) should be
+installed.  PostgreSQL can also be build using MinGW or Cygwin using
+the Unix instructions.
+
+There are different requirements for building on a 32-bit or 64-bit
+environment, check the documentation for details.
+
+
+Initializing your Database
+==
+
+Once the PostgreSQL software is installed, the first step to having a
+running database is to initialize a PostgreSQL database using the
+'initdb' command, eg:
+
+initdb -D /path/to/mydatabase
+
+Where '/path/to/mydatabase' is the directory where the database is
+going to be installed. This directory can exist, but must be empty.
+If it does not exist, 'initdb' will create it.
+
+After the database system has been initialized, PostgreSQL can be
+started by using the pg_ctl command:
+
+pg_ctl -D /path/to/mydatabase -l logfile start
+
+Once PostgreSQL is running, you can connect to it using the psql
+command-line client.  A default database called 'postgres' was created
+by 'initdb'.
+
+
+Building the PostgreSQL Documentation
+=
+
+Full documentation for PostgreSQL is available online here:
+https://www.postgresql.org/docs/current/static/index.html
+
+PostgreSQL uses DocBook to build the documentation. Therefore the
+DocBook tools must be installed. In addition, a working Java
+installation is required.
+
+To build PostgreSQL's documentation on Unix, run:
+
+./configure
+make docs
+
+The documentation, once built by 'make docs', will be available in
+various formats in the 'doc/src/sgml' directory.
+
+
+Download
+
 
 The latest version of this software may be obtained at
 https://www.postgresql.org/download/.  For more information look at our
PostgreSQL Database Management System
=

This directory contains the source code distribution of the PostgreSQL
database management system.

PostgreSQL is an advanced object-relational database management system
that supports an extended subset of the SQL standard, including
transactions, foreign keys, subqueries, triggers, user-defined types
and functions.  This distribution also contains C language bindings.

PostgreSQL has many language interfaces, many of which 

Re: Compressed TOAST Slicing

2018-11-01 Thread Tom Lane
Paul Ramsey  writes:
> On Thu, Nov 1, 2018 at 2:29 PM Stephen Frost  wrote:
>> and secondly, why we wouldn't consider
>> handling a non-zero offset.  A non-zero offset would, of course, still
>> require decompressing from the start and then just throwing away what we
>> skip over, but we're going to be doing that anyway, aren't we?  Why not
>> stop when we get to the end, at least, and save ourselves the trouble of
>> decompressing the rest and then throwing it away.

> I was worried about changing the pg_lz code too much because it scared me,
> but debugging some stuff made me read it more closely so I fear it less
> now, and doing interior slices seems not unreasonable, so I will give it a
> try.

I think Stephen was just envisioning decompressing from offset 0 up to
the end of what's needed, and then discarding any data before the start
of what's needed; at least, that's what'd occurred to me.  It sounds like
you were thinking about hacking pg_lz to not write the leading data
anywhere.  While that'd likely be a win for cases where there was leading
data to discard, I'm worried about adding any cycles to the inner loops
of the decompressor.  We don't want to pessimize every other use of pg_lz
to buy a little bit for these cases.

regards, tom lane



Re: replication_slots usability issue

2018-11-01 Thread Michael Paquier
On Thu, Nov 01, 2018 at 10:54:23AM -0700, Andres Freund wrote:
> On 2018-11-01 09:34:05 +0900, Michael Paquier wrote:
> That has absolutely nothing to do with the issue at hand though, so I
> don't think it'd have made much sense to do it at the same time. Nor do
> I think it's particularly important.

Thanks for the confirmation.

>> I don't mind doing so myself if you agree with the change, only on
>> HEAD as you seemed to disagree about changing that on back-branches.
> 
> Cool. And yes, I don't think a cosmetic log level adjustment that could
> affect people's scripts should be backpatched without need. Even if not
> particularly likely to break something.

No issues with your arguments.  I did the change this way.

>> Also, from 691d79a which you just committed:
>> +   ereport(FATAL,
>> +   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> +errmsg("logical replication slots \"%s\" exists, but 
>> wal_level < logical",
>> +   NameStr(cp.slotdata.name)),
>> I can see one grammar mistake here, as you refer to only one slot here.
>> The error messages should read:
>> "logical replication slot \"%s\" exists, but wal_level < logical"
>> and:
>> "physical replication slot \"%s\" exists, but wal_level < replica"
> 
> Darnit. Fixed. Thanks.

And thanks for fixing that.
--
Michael


signature.asc
Description: PGP signature


Re: PG vs macOS Mojave

2018-11-01 Thread Tom Lane
I wrote:
> Then somebody who wants to build on a different SDK version still needs
> to do "make PG_SYSROOT=/proper/path", but only if they're trying to
> build PL/Perl or related extensions.  So this second way seems uglier
> in some sense but less likely to cause problems for most people.

> Either way I guess we'd need to document it rather than just hoping
> it's invisible.

Here's a proposed doc patch for this second approach.  It'd still mostly
work for the first approach, if we drop the sentence about it only
mattering for PL/Perl and PL/Tcl.

I failed to resist the temptation to mutter something about SIP while
at it.  AFAIK that problem isn't mentioned anywhere else.

regards, tom lane

diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 4487d0c..1c87eef 100644
*** a/doc/src/sgml/installation.sgml
--- b/doc/src/sgml/installation.sgml
*** PHSS_30849  s700_800 u2comp/be/plugin li
*** 2513,2518 
--- 2513,2559 
 

  
+   
+macOS
+ 
+
+ macOS
+ installation on
+
+ 
+
+ On recent macOS releases, it's necessary to
+ embed the sysroot path in the include switches used to
+ find some system header files.  This results in the outputs of
+ the configure script varying depending on
+ which SDK version was used during configure.
+ That shouldn't pose any problem in simple scenarios, but if you are
+ trying to do something like building an extension on a different machine
+ than the server code was built on, you may need to force use of a
+ different sysroot path.  To do that, set PG_SYSROOT,
+ for example
+ 
+ make PG_SYSROOT=/desired/path
+ 
+ To find out the appropriate path on your machine, run
+ 
+ xcodebuild -version -sdk macosx Path
+ 
+ Currently, this only affects Perl and Tcl header files, so it should
+ only matter if you're building PL/Perl, PL/Tcl, or a related extension.
+
+ 
+
+ macOS's System Integrity
+ Protection (SIP) feature breaks make check,
+ because it prevents passing the needed setting
+ of DYLD_LIBRARY_PATH down to the executables being
+ tested.  You can work around that by doing make
+ install before make check.
+ Most Postgres developers just turn off SIP, though.
+
+   
+ 

 MinGW/Native Windows
  

Re: Compressed TOAST Slicing

2018-11-01 Thread Paul Ramsey
On Thu, Nov 1, 2018 at 2:29 PM Stephen Frost  wrote:

> Greetings,
>
> * Paul Ramsey (pram...@cleverelephant.ca) wrote:
> > The attached patch adds in a code path to do a partial decompression of
> the
> > TOAST entry, when the requested slice is at the start of the object.
>
> There two things that I wonder about in the patch- if it would be of any
> use to try and allocate on a need basis instead of just allocating the
> whole chunk up to the toast size,


I'm not sure what I was thinking when I rejected allocating the slice size
in favour of the whole uncompressed size... I cannot see why that wouldn't
work.


> and secondly, why we wouldn't consider
> handling a non-zero offset.  A non-zero offset would, of course, still
> require decompressing from the start and then just throwing away what we
> skip over, but we're going to be doing that anyway, aren't we?  Why not
> stop when we get to the end, at least, and save ourselves the trouble of
> decompressing the rest and then throwing it away.
>

I was worried about changing the pg_lz code too much because it scared me,
but debugging some stuff made me read it more closely so I fear it less
now, and doing interior slices seems not unreasonable, so I will give it a
try.

P


Re: PG vs macOS Mojave

2018-11-01 Thread Tom Lane
I wrote:
> Peter Eisentraut  writes:
>> On 01/11/2018 22:17, Tom Lane wrote:
>>> The other idea that's occurred to me is to go back to the scheme of
>>> commit 68fc227dd, where we inject the sysroot path into just the -I
>>> switches used for PL/Perl and PL/Tcl.  We could improve on that
>>> commit by injecting it symbolically similar to what I did here, ie
>>> what ends up in the configure output is

>> How does that work when building against a non-system Perl or Tcl?

> It does nothing, because configure will not find that it needs to
> inject any sysroot reference in order to find such a Perl or Tcl's
> headers.

Here's a lightly-tested patch for that approach.

regards, tom lane

diff --git a/configure b/configure
index 43ae8c8..db50751 100755
*** a/configure
--- b/configure
*** CPP
*** 731,736 
--- 731,737 
  BITCODE_CXXFLAGS
  BITCODE_CFLAGS
  CFLAGS_VECTOR
+ PG_SYSROOT
  LLVM_BINPATH
  LLVM_CXXFLAGS
  LLVM_CFLAGS
*** unset CXXFLAGS
*** 5261,5266 
--- 5262,5270 
  #
  . "$srcdir/src/template/$template" || exit
  
+ # Record PG_SYSROOT in Makefile.global, if set by user or template.
+ 
+ 
  # C[XX]FLAGS are selected so:
  # If the user specifies something in the environment, that is used.
  # else:  If the template file set something, that is used.
*** PL/Perl." "$LINENO" 5
*** 9776,9786 
fi
# On most platforms, archlibexp is also where the Perl include files live ...
perl_includespec="-I$perl_archlibexp/CORE"
!   # ... but on newer macOS versions, we must use -iwithsysroot to look
!   # under $PG_SYSROOT
if test \! -f "$perl_archlibexp/CORE/perl.h" ; then
  if test -f "$PG_SYSROOT$perl_archlibexp/CORE/perl.h" ; then
!   perl_includespec="-iwithsysroot $perl_archlibexp/CORE"
  fi
fi
  
--- 9780,9789 
fi
# On most platforms, archlibexp is also where the Perl include files live ...
perl_includespec="-I$perl_archlibexp/CORE"
!   # ... but on newer macOS versions, we must look under $PG_SYSROOT instead
if test \! -f "$perl_archlibexp/CORE/perl.h" ; then
  if test -f "$PG_SYSROOT$perl_archlibexp/CORE/perl.h" ; then
!   perl_includespec="-I$PG_SYSROOT$perl_archlibexp/CORE"
  fi
fi
  
*** eval TCL_SHARED_BUILD=\"$TCL_SHARED_BUIL
*** 18115,18120 
--- 18118,18128 
as_fn_error $? "cannot build PL/Tcl because Tcl is not a shared library
  Use --without-tcl to disable building PL/Tcl." "$LINENO" 5
  fi
+ # Some macOS versions report an include spec that uses -iwithsysroot.
+ # We don't really want to use -isysroot, so translate that if we can.
+ if test x"$PG_SYSROOT" != x; then
+ TCL_INCLUDE_SPEC="`echo "$TCL_INCLUDE_SPEC" | sed "s|-iwithsysroot */|-I$PG_SYSROOT/|"`"
+ fi
  # now that we have TCL_INCLUDE_SPEC, we can check for 
  ac_save_CPPFLAGS=$CPPFLAGS
  CPPFLAGS="$TCL_INCLUDE_SPEC $CPPFLAGS"
*** fi
*** 18127,18132 
--- 18135,18147 
  
  
  CPPFLAGS=$ac_save_CPPFLAGS
+ # If we are inserting PG_SYSROOT into TCL_INCLUDE_SPEC, do so symbolically
+ # not literally, so that it's possible to override it at build time using a
+ # command like "make ... PG_SYSROOT=path".  This has to be done after we've
+ # finished all configure checks that depend on TCL_INCLUDE_SPEC.
+ if test x"$PG_SYSROOT" != x; then
+   TCL_INCLUDE_SPEC=`echo "$TCL_INCLUDE_SPEC" | sed -e "s|-I$PG_SYSROOT/|-I\\\$(PG_SYSROOT)/|"`
+ fi
  fi
  
  # check for 
*** rm -f core conftest.err conftest.$ac_obj
*** 18176,18181 
--- 18191,18203 
  conftest$ac_exeext conftest.$ac_ext
LIBS=$pgac_save_LIBS
CPPFLAGS=$ac_save_CPPFLAGS
+   # If we are inserting PG_SYSROOT into perl_includespec, do so symbolically
+   # not literally, so that it's possible to override it at build time using a
+   # command like "make ... PG_SYSROOT=path".  This has to be done after we've
+   # finished all configure checks that depend on perl_includespec.
+   if test x"$PG_SYSROOT" != x; then
+ perl_includespec=`echo "$perl_includespec" | sed -e "s|-I$PG_SYSROOT/|-I\\\$(PG_SYSROOT)/|"`
+   fi
  fi
  
  # check for 
diff --git a/configure.in b/configure.in
index 519ecd5..b895f7d 100644
*** a/configure.in
--- b/configure.in
*** unset CXXFLAGS
*** 404,409 
--- 404,412 
  #
  . "$srcdir/src/template/$template" || exit
  
+ # Record PG_SYSROOT in Makefile.global, if set by user or template.
+ AC_SUBST(PG_SYSROOT)
+ 
  # C[XX]FLAGS are selected so:
  # If the user specifies something in the environment, that is used.
  # else:  If the template file set something, that is used.
*** PL/Perl.])
*** 1046,1056 
fi
# On most platforms, archlibexp is also where the Perl include files live ...
perl_includespec="-I$perl_archlibexp/CORE"
!   # ... but on newer macOS versions, we must use -iwithsysroot to look
!   # under $PG_SYSROOT
if test \! 

Re: CF app feature request

2018-11-01 Thread Tom Lane
Andrew Dunstan  writes:
> On 11/01/2018 05:50 PM, Magnus Hagander wrote:
>> Are you thinking basically another status that's "Withdrawn", but 
>> keeping it, or actually removing the records completely?

> I don't know enough about the app internals to comment. But it probably 
> shouldn't appear in the stats, or else should have its own category in 
> the stats.

A separate "Withdrawn" status seems like it'd cover more cases than
a "make like it never happened" action.

regards, tom lane



Re: PG vs macOS Mojave

2018-11-01 Thread Tom Lane
Peter Eisentraut  writes:
> On 01/11/2018 22:17, Tom Lane wrote:
>> The other idea that's occurred to me is to go back to the scheme of
>> commit 68fc227dd, where we inject the sysroot path into just the -I
>> switches used for PL/Perl and PL/Tcl.  We could improve on that
>> commit by injecting it symbolically similar to what I did here, ie
>> what ends up in the configure output is

> How does that work when building against a non-system Perl or Tcl?

It does nothing, because configure will not find that it needs to
inject any sysroot reference in order to find such a Perl or Tcl's
headers.

regards, tom lane



Re: PG vs macOS Mojave

2018-11-01 Thread Peter Eisentraut
On 01/11/2018 22:17, Tom Lane wrote:
> The other idea that's occurred to me is to go back to the scheme of
> commit 68fc227dd, where we inject the sysroot path into just the -I
> switches used for PL/Perl and PL/Tcl.  We could improve on that
> commit by injecting it symbolically similar to what I did here, ie
> what ends up in the configure output is

How does that work when building against a non-system Perl or Tcl?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: CF app feature request

2018-11-01 Thread Andrew Dunstan




On 11/01/2018 05:50 PM, Magnus Hagander wrote:



On Thu, Nov 1, 2018 at 2:44 PM Andrew Dunstan 
> wrote:



Yesterday Fabien and I submitted the same item to the Commitfest
(1859
and 1860). Unfortunately there doesn't seem to be any way for one of
these to be withdrawn. "Rejected" and "Returned with Feedback" seem
wrong. Ideally, there would be a way for someone who submits an
item in
error to withdraw it, at which point it should be as if it had never
been submitted.

Meanwhile, what's the advice about how to deal with these?


Are you thinking basically another status that's "Withdrawn", but 
keeping it, or actually removing the records completely?






I don't know enough about the app internals to comment. But it probably 
shouldn't appear in the stats, or else should have its own category in 
the stats.


cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Buildfarm failures for hash indexes: buffer leaks

2018-11-01 Thread Fabien COELHO




Their commit r265375 fixed the ability to compile itself, but built
PostgreSQL binaries remain broken there and thereafter.

|...]


Thanks a lot for this investigation! I can fill in a gcc bug report. There
would be a enormous work to narrow it down to a small test case, it is
unclear how they can act about it, but at least they would know.


Have you done so? If so, what's the bug number?


Not yet. There was no answer to my suggestion, so I did not feel like 
that urgent. I'll try to do it over next week.


--
Fabien.



Re: CF app feature request

2018-11-01 Thread Magnus Hagander
On Thu, Nov 1, 2018 at 2:44 PM Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:

>
> Yesterday Fabien and I submitted the same item to the Commitfest (1859
> and 1860). Unfortunately there doesn't seem to be any way for one of
> these to be withdrawn. "Rejected" and "Returned with Feedback" seem
> wrong. Ideally, there would be a way for someone who submits an item in
> error to withdraw it, at which point it should be as if it had never
> been submitted.
>
> Meanwhile, what's the advice about how to deal with these?
>
>
Are you thinking basically another status that's "Withdrawn", but keeping
it, or actually removing the records completely?

//Magnus


Re: Compressed TOAST Slicing

2018-11-01 Thread Stephen Frost
Greetings,

* Paul Ramsey (pram...@cleverelephant.ca) wrote:
> The attached patch adds in a code path to do a partial decompression of the
> TOAST entry, when the requested slice is at the start of the object.

Neat!

> As usual, doing less work is faster.

Definitely.

> Interesting note to motivate a follow-on patch: the substr() function does
> attempt to slice, but the left() function does not. So, if this patch is
> accepted, next patch will be to left() to add slicing behaviour.

Makes sense to me.

There two things that I wonder about in the patch- if it would be of any
use to try and allocate on a need basis instead of just allocating the
whole chunk up to the toast size, and secondly, why we wouldn't consider
handling a non-zero offset.  A non-zero offset would, of course, still
require decompressing from the start and then just throwing away what we
skip over, but we're going to be doing that anyway, aren't we?  Why not
stop when we get to the end, at least, and save ourselves the trouble of
decompressing the rest and then throwing it away.

> If nobody lights me on fire, I'll submit to commitfest shortly.

Sounds like a good idea to me.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: PG vs macOS Mojave

2018-11-01 Thread Tom Lane
So it seems like there are two ways we could go about this.  One is
to go back to the scheme of adding an -isysroot switch to CPPFLAGS,
where it'd have global effects.  We could make this slightly less
painful for scenarios like Jakob's if we set things up in Makefile.global
this way:

CPPFLAGS = -isysroot $(PG_SYSROOT)  
PG_SYSROOT = 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk

and then, if you need to build on a different SDK version without
reconfiguring, you can do something like "make PG_SYSROOT=/proper/path".
I coded this up, as attached, and it seems to work but it's still not all
that friendly for such cases.

The other idea that's occurred to me is to go back to the scheme of
commit 68fc227dd, where we inject the sysroot path into just the -I
switches used for PL/Perl and PL/Tcl.  We could improve on that
commit by injecting it symbolically similar to what I did here, ie
what ends up in the configure output is

PG_SYSROOT = 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk

perl_includespec = -I 
$(PG_SYSROOT)/System/Library/Perl/5.18/darwin-thread-multi-2level/CORE

Then somebody who wants to build on a different SDK version still needs
to do "make PG_SYSROOT=/proper/path", but only if they're trying to
build PL/Perl or related extensions.  So this second way seems uglier
in some sense but less likely to cause problems for most people.

Either way I guess we'd need to document it rather than just hoping
it's invisible.

Thoughts?

regards, tom lane

diff --git a/configure b/configure
index 43ae8c8..0686941 100755
*** a/configure
--- b/configure
*** ac_includes_default="\
*** 627,632 
--- 627,633 
  
  ac_subst_vars='LTLIBOBJS
  vpath_build
+ PG_SYSROOT
  PG_VERSION_NUM
  PROVE
  FOP
*** _ACEOF
*** 18815,18820 
--- 18816,18830 
  
  
  
+ # If we are inserting PG_SYSROOT into CPPFLAGS, do so symbolically not
+ # literally, so that it's possible to override it at build time using
+ # a command like "make ... PG_SYSROOT=path".  This has to be done after
+ # we've finished all configure checks that depend on CPPFLAGS.
+ if test x"$PG_SYSROOT" != x; then
+   CPPFLAGS=`echo "$CPPFLAGS" | sed -e "s| $PG_SYSROOT | \\\$(PG_SYSROOT) |"`
+ fi
+ 
+ 
  
  # Begin output steps
  
diff --git a/configure.in b/configure.in
index 519ecd5..7586deb 100644
*** a/configure.in
--- b/configure.in
*** $AWK '{printf "%d%04d", $1, $2}'`"]
*** 2357,2362 
--- 2357,2371 
  AC_DEFINE_UNQUOTED(PG_VERSION_NUM, $PG_VERSION_NUM, [PostgreSQL version as a number])
  AC_SUBST(PG_VERSION_NUM)
  
+ # If we are inserting PG_SYSROOT into CPPFLAGS, do so symbolically not
+ # literally, so that it's possible to override it at build time using
+ # a command like "make ... PG_SYSROOT=path".  This has to be done after
+ # we've finished all configure checks that depend on CPPFLAGS.
+ if test x"$PG_SYSROOT" != x; then
+   CPPFLAGS=`echo "$CPPFLAGS" | sed -e "s| $PG_SYSROOT | \\\$(PG_SYSROOT) |"`
+ fi
+ AC_SUBST(PG_SYSROOT)
+ 
  
  # Begin output steps
  
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index bdf394b..218c65a 100644
*** a/src/Makefile.global.in
--- b/src/Makefile.global.in
*** BITCODE_CXXFLAGS = @BITCODE_CXXFLAGS@
*** 241,246 
--- 241,247 
  
  CPP = @CPP@
  CPPFLAGS = @CPPFLAGS@
+ PG_SYSROOT = @PG_SYSROOT@
  
  override CPPFLAGS := $(ICU_CFLAGS) $(CPPFLAGS)
  
diff --git a/src/template/darwin b/src/template/darwin
index 159d8bb..c05adca 100644
*** a/src/template/darwin
--- b/src/template/darwin
***
*** 3,16 
  # Note: Darwin is the original code name for macOS, also known as OS X.
  # We still use "darwin" as the port name, partly because config.guess does.
  
! # Some configure tests require explicit knowledge of where the Xcode "sysroot"
! # is.  We try to avoid having this leak into configure's results, though.
  if test x"$PG_SYSROOT" = x"" ; then
PG_SYSROOT=`xcodebuild -version -sdk macosx Path 2>/dev/null`
  fi
  # Old xcodebuild versions may produce garbage, so validate the result.
  if test x"$PG_SYSROOT" != x"" ; then
!   if test \! -d "$PG_SYSROOT" ; then
  PG_SYSROOT=""
fi
  fi
--- 3,17 
  # Note: Darwin is the original code name for macOS, also known as OS X.
  # We still use "darwin" as the port name, partly because config.guess does.
  
! # Select where system include files should be sought.
  if test x"$PG_SYSROOT" = x"" ; then
PG_SYSROOT=`xcodebuild -version -sdk macosx Path 2>/dev/null`
  fi
  # Old xcodebuild versions may produce garbage, so validate the result.
  if test x"$PG_SYSROOT" != x"" ; then
!   if test -d "$PG_SYSROOT" ; then
! CPPFLAGS="-isysroot $PG_SYSROOT $CPPFLAGS"
!   else
  PG_SYSROOT=""
fi
  fi


Re: Hash Joins vs. Bloom Filters / take 2

2018-11-01 Thread Tomas Vondra
On 11/01/2018 10:06 PM, Thomas Munro wrote:
> On Fri, Nov 2, 2018 at 9:23 AM Jim Finnerty  wrote:
>> I'm very interested in this patch, and particularly in possible
>> extensions to push the Bloom filter down on the probe side of the join.  I
>> made a few small edits to the patch to enable it to compile on PG11, and can
>> send it to you if you're interested.
> 
> Hi Jim,
> 
> Would you compute the hash for the outer tuples in the scan, and then
> again in the Hash Join when probing, or would you want to (somehow)
> attach the hash to emitted tuples for later reuse by the higher node?
> Someone pointed out to me off-list that a popular RDBMS emanating from
> the bicycle capital of the North-West pushes down Bloom filters to
> scans, but only when the key is a non-nullable integer; I wonder if
> that is because they hash in both places, but consider that OK only
> when it's really cheap to do so.  (Along the same lines, if we could
> attach extra data to tuples, I wonder if it would make sense to
> transmit sort support information to higher nodes, so that (for
> example) GatherMerge could use it to avoid full key comparison when
> dealing with subplans that already did a sort and computed integers
> for fast inequality checks.)
> 
>> It is currently in the list of patches for the current commitfest, but
>> based on your previous post I'm not sure if you're planning to get back to
>> this patch just now.  If you plan to resume work on it, I'll sign up as a
>> reviewer.
> 
> I'm also signed up to review.
> 

I haven't really planned to work on this anytime soon, unfortunately,
which is why I proposed to mark it as RwF at the end of the last CF. I
already have a couple other patches there, and (more importantly) I
don't have a very clear idea how to implement the filter pushdown.

That being said I still think it'd be an interesting improvement, and if
someone wants to take over I'm available as a reviewer / tester ...


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Hash Joins vs. Bloom Filters / take 2

2018-11-01 Thread Thomas Munro
On Fri, Nov 2, 2018 at 9:23 AM Jim Finnerty  wrote:
> I'm very interested in this patch, and particularly in possible
> extensions to push the Bloom filter down on the probe side of the join.  I
> made a few small edits to the patch to enable it to compile on PG11, and can
> send it to you if you're interested.

Hi Jim,

Would you compute the hash for the outer tuples in the scan, and then
again in the Hash Join when probing, or would you want to (somehow)
attach the hash to emitted tuples for later reuse by the higher node?
Someone pointed out to me off-list that a popular RDBMS emanating from
the bicycle capital of the North-West pushes down Bloom filters to
scans, but only when the key is a non-nullable integer; I wonder if
that is because they hash in both places, but consider that OK only
when it's really cheap to do so.  (Along the same lines, if we could
attach extra data to tuples, I wonder if it would make sense to
transmit sort support information to higher nodes, so that (for
example) GatherMerge could use it to avoid full key comparison when
dealing with subplans that already did a sort and computed integers
for fast inequality checks.)

> It is currently in the list of patches for the current commitfest, but
> based on your previous post I'm not sure if you're planning to get back to
> this patch just now.  If you plan to resume work on it, I'll sign up as a
> reviewer.

I'm also signed up to review.

-- 
Thomas Munro
http://www.enterprisedb.com



Compressed TOAST Slicing

2018-11-01 Thread Paul Ramsey
Currently, PG_DETOAST_DATUM_SLICE when run on a compressed TOAST entry will
first decompress the whole object, then extract the relevant slice.

When the desired slice is at or near the front of the object, this is
obviously non-optimal.

The attached patch adds in a code path to do a partial decompression of the
TOAST entry, when the requested slice is at the start of the object.

For an example of the improvement possible, this trivial example:

create table slicingtest (
  id serial primary key,
  a text
);

insert into slicingtest (a) select repeat('xyz123', 1) as a from
generate_series(1,1);
\timing
select sum(length(substr(a, 0, 20))) from slicingtest;

On master, in the current state on my wee laptop, I get

Time: 1426.737 ms (00:01.427)

With the patch, on my wee laptop, I get

Time: 46.886 ms

As usual, doing less work is faster.

Interesting note to motivate a follow-on patch: the substr() function does
attempt to slice, but the left() function does not. So, if this patch is
accepted, next patch will be to left() to add slicing behaviour.

If nobody lights me on fire, I'll submit to commitfest shortly.

P.


compressed-datum-slicing-20190101a.patch
Description: Binary data


Re: Doubts about pushing LIMIT to MergeAppendPath

2018-11-01 Thread Tomas Vondra



On 11/01/2018 08:51 PM, Antonin Houska wrote:
> Tomas Vondra  wrote:
> 
>> On 11/01/2018 12:48 PM, Antonin Houska wrote:
>>> Review of [1] made me think of this optimization, currently used only in
>>> create_merge_append_path():
>>>
>>> /*
>>>  * Apply query-wide LIMIT if known and path is for sole base relation.
>>>  * (Handling this at this low level is a bit klugy.)
>>>  */
>>> if (bms_equal(rel->relids, root->all_baserels))
>>> pathnode->limit_tuples = root->limit_tuples;
>>> else
>>> pathnode->limit_tuples = -1.0;
>>>
>>> Currently it's not a problem because the output of MergeAppend plan is not
>>> likely to be re-sorted, but I don't think data correctness should depend on
>>> cost evaluation. Instead, -1 should be set here if there's any chance that 
>>> the
>>> output will be sorted again.
>>>
>>
>> So you're saying we might end up with a plan like this:
>>
>> Limit
>> -> Sort
>> -> MergeAppend
>>-> SeqScan on t
>>
>> In which case we'd pass the wrong limit_tuples to the MergeAppend?
>>
>> Hmmm, that would depend on us building MergeAppend node that does not
>> match the expected pathkeys, and pick it instead of plain Append node.
>> I'm not sure that's actually possible, but maybe it is ...
> 
> Yes, if there should be Sort node above MergeAppend, then it'll be probably
> cheaper to use Append. So the problem should not happen in practice, but in
> theory it seems to be possible.
> 

Actually no - see my other response. We don't build child paths with
pathkeys that are not useful (do not match the expected ordering), and
we only build MergeAppend for pathkeys from child paths.

So I believe we currently won't build a MergeAppend that would then
require additional Sort node, irrespectedly of the costing.

>>
>>> I tried to reproduce the issue by applying the "Incremental sort" [2] patch
>>> and running the following example:
>>>
>>> CREATE TABLE t(i int, j int);
>>> CREATE TABLE t1() INHERITS (t);
>>> CREATE INDEX ON t1(i, j);
>>> INSERT INTO t1(i, j) VALUES (1, 0), (1, 1);
>>> CREATE TABLE t2() INHERITS (t);
>>> CREATE INDEX ON t2(i, j);
>>> INSERT INTO t2(i, j) VALUES (1, 0), (1, 1);
>>>
>>> ANALYZE;
>>>
>>> SELECT * FROM t ORDER BY i, j DESC LIMIT 1;
>>>
>>
>> So, what query plan did this use?
> 
>  Limit
>->  Incremental Sort
>  Sort Key: t.i, t.j DESC
>  Presorted Key: t.i
>  ->  Merge Append
>Sort Key: t.i
>->  Sort
>  Sort Key: t.i
>  ->  Seq Scan on t
>->  Index Only Scan using t1_i_j_idx on t1
>->  Index Only Scan using t2_i_j_idx on t2
> 

Interesting. That might be broken, not sure if the LIMIT optimization
triggered or not.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Remove obsolete pg_attrdef.adsrc column

2018-11-01 Thread Peter Eisentraut
On 27/10/2018 23:19, Daniel Gustafsson wrote:
>> On 27 Oct 2018, at 12:57, Peter Eisentraut 
>>  wrote:
>>
>> On 23/10/2018 19:48, Daniel Gustafsson wrote:
 On 23 Oct 2018, at 15:17, Peter Eisentraut 
  wrote:

 I propose the attached patch to remove the long-unused catalog column
 pg_attrdef.adsrc.
>>>
>>> +1, I ran into a bug in an app as recently as today where adsrc was used
>>> instead of pg_get_expr().
>>>
>>> Patch looks good.  I probably would’ve opted for mentioning how to get a 
>>> human
>>> readable version on the page, along the lines of the attached version,
>>
>> Agreed.  I have integrated your suggestion.
>>
>> Also, let's go nuts and remove pg_constraint.consrc as well.
> 
> No objections from me.
> 
>> Updated patches attached.
> 
> +1, applies and works as intended.

Committed, thanks.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Doubts about pushing LIMIT to MergeAppendPath

2018-11-01 Thread Antonin Houska
Tomas Vondra  wrote:

> On 11/01/2018 12:48 PM, Antonin Houska wrote:
> > Review of [1] made me think of this optimization, currently used only in
> > create_merge_append_path():
> > 
> > /*
> >  * Apply query-wide LIMIT if known and path is for sole base relation.
> >  * (Handling this at this low level is a bit klugy.)
> >  */
> > if (bms_equal(rel->relids, root->all_baserels))
> > pathnode->limit_tuples = root->limit_tuples;
> > else
> > pathnode->limit_tuples = -1.0;
> > 
> > Currently it's not a problem because the output of MergeAppend plan is not
> > likely to be re-sorted, but I don't think data correctness should depend on
> > cost evaluation. Instead, -1 should be set here if there's any chance that 
> > the
> > output will be sorted again.
> > 
> 
> So you're saying we might end up with a plan like this:
> 
> Limit
> -> Sort
> -> MergeAppend
>-> SeqScan on t
> 
> In which case we'd pass the wrong limit_tuples to the MergeAppend?
> 
> Hmmm, that would depend on us building MergeAppend node that does not
> match the expected pathkeys, and pick it instead of plain Append node.
> I'm not sure that's actually possible, but maybe it is ...

Yes, if there should be Sort node above MergeAppend, then it'll be probably
cheaper to use Append. So the problem should not happen in practice, but in
theory it seems to be possible.

> 
> > I tried to reproduce the issue by applying the "Incremental sort" [2] patch
> > and running the following example:
> > 
> > CREATE TABLE t(i int, j int);
> > CREATE TABLE t1() INHERITS (t);
> > CREATE INDEX ON t1(i, j);
> > INSERT INTO t1(i, j) VALUES (1, 0), (1, 1);
> > CREATE TABLE t2() INHERITS (t);
> > CREATE INDEX ON t2(i, j);
> > INSERT INTO t2(i, j) VALUES (1, 0), (1, 1);
> > 
> > ANALYZE;
> > 
> > SELECT * FROM t ORDER BY i, j DESC LIMIT 1;
> > 
> 
> So, what query plan did this use?

 Limit
   ->  Incremental Sort
 Sort Key: t.i, t.j DESC
 Presorted Key: t.i
 ->  Merge Append
   Sort Key: t.i
   ->  Sort
 Sort Key: t.i
 ->  Seq Scan on t
   ->  Index Only Scan using t1_i_j_idx on t1
   ->  Index Only Scan using t2_i_j_idx on t2

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com



Re: Doubts about pushing LIMIT to MergeAppendPath

2018-11-01 Thread Tomas Vondra



On 11/01/2018 08:11 PM, Tomas Vondra wrote:
> On 11/01/2018 12:48 PM, Antonin Houska wrote:
>> Review of [1] made me think of this optimization, currently used only in
>> create_merge_append_path():
>>
>>  /*
>>   * Apply query-wide LIMIT if known and path is for sole base relation.
>>   * (Handling this at this low level is a bit klugy.)
>>   */
>>  if (bms_equal(rel->relids, root->all_baserels))
>>  pathnode->limit_tuples = root->limit_tuples;
>>  else
>>  pathnode->limit_tuples = -1.0;
>>
>> Currently it's not a problem because the output of MergeAppend plan is not
>> likely to be re-sorted, but I don't think data correctness should depend on
>> cost evaluation. Instead, -1 should be set here if there's any chance that 
>> the
>> output will be sorted again.
>>
> 
> So you're saying we might end up with a plan like this:
> 
> Limit
> -> Sort
> -> MergeAppend
>-> SeqScan on t
> 
> In which case we'd pass the wrong limit_tuples to the MergeAppend?
> 
> Hmmm, that would depend on us building MergeAppend node that does not
> match the expected pathkeys, and pick it instead of plain Append node.
> I'm not sure that's actually possible, but maybe it is ...
> 

OK, so the reason is that when building child paths, we don't keep the
pathkeys unless it matches the "interesting" pathkeys.

So for example we may have an IndexPath, but with pathkeys=NIL if the
index does not match the ORDER BY we need. So we don't even build the
MergeAppend paths, as generate_mergeappend_paths iterates over child
pathkeys (and we don't have any).

We might have two child paths, one with pathkeys (matching the ORDER BY)
and one without pathkeys. But at that point the optimization does not
apply, because it checks for "single base relation".

At least that's my understanding of add_paths_to_append_rel().

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Parallel threads in query

2018-11-01 Thread Tomas Vondra
On 11/01/2018 08:03 PM, Andres Freund wrote:
> On 2018-11-01 19:57:17 +0100, Tomas Vondra wrote:
 I think that very much depends on how expensive the tasks handled by the
 threads are. It may still be cheaper than a reasonable IPC, and if you
 don't create/destroy threads, that also saves quite a bit of time.
>>>
>>> I'm not following. How can you have a pool *and* threads? Those seem to
>>> be contradictory in PG's architecture? You need full blown IPC with your
>>> proposal afaict?
>>>
>>
>> My suggestion was to create a bgworker, which would then internally
>> allocate and manage a pool of threads. It could then open some sort of
>> IPC (say, as dumb as unix socket). The backends could could then send
>> requests to it, and it would respond to them. Not sure why/how would
>> this contradict PG's architecture?
> 
> Because you said "faster than reasonable IPC" - which to me implies that
> you don't do full blown IPC. Which using threads in a bgworker is very
> strongly implying. What you're proposing strongly implies multiple
> context switches just to process a few results. Even before, but
> especially after, spectre that's an expensive proposition.
> 

Gah! I meant to wrote "faster with reasonable IPC" - i.e. faster/cheaper
than a solution that would create threads ad-hoc.

My assumption is that the tasks are fairly large, and may take quite a
bit of time to process (say, a couple of seconds?). In which cese the
the extra context switches are not a major issue. But maybe I'm wrong.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Parallel threads in query

2018-11-01 Thread Andres Freund
Hi,

On 2018-11-01 09:17:56 -1000, Darafei "Komяpa" Praliaskouski wrote:
> So, do I understand correctly that I need to start a parallel worker that
> does nothing for each thread I launch to consume the parallel worker limit?

No, I don't think that'd be reasonable. I think what we're saying is
that there's no way to reasonably use the parallel worker limit as the
limitation for what you're trying to do. You need custom infrastructure.

Greetings,

Andres Freund



Re: Parallel threads in query

2018-11-01 Thread Komяpa
>
>
> Because you said "faster than reasonable IPC" - which to me implies that
> you don't do full blown IPC. Which using threads in a bgworker is very
> strongly implying. What you're proposing strongly implies multiple
> context switches just to process a few results. Even before, but
> especially after, spectre that's an expensive proposition.
>
>
To have some idea of what it could be:

a)
PostGIS has ST_ClusterKMeans window function. It collects all geometries
passed to it to memory, re-packs to more compact buffer and starts a loop
that goes over it several (let's say 10..100) times. Then it spits out all
the assigned cluster numbers for each of the input rows.

It's all great when you need to calculate KMeans of 200-5 rows, but for
a million input rows even a hundred passes on a single core are painful.

b)
PostGIS has ST_Subdivide function. It takes a single row of geometry
(usually super-large, like a continent or the wholeness of Russia) and
splits it into many rows that have more simple shape, by performing a
horizontal or vertical split recursively. Since it's a tree traversal, it
can be paralleled efficiently, with one task becoming to follow the right
subpart of geometry and other - to follow left part of it.

Both seem to be a standard thing for OpenMP, which has compiler support in
GCC and clang and MSVC. For an overview how it work, have a look here:
https://web.archive.org/web/20180828151435/https://bisqwit.iki.fi/story/howto/openmp/


So, do I understand correctly that I need to start a parallel worker that
does nothing for each thread I launch to consume the parallel worker limit?
-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa


Re: Doubts about pushing LIMIT to MergeAppendPath

2018-11-01 Thread Tomas Vondra
On 11/01/2018 12:48 PM, Antonin Houska wrote:
> Review of [1] made me think of this optimization, currently used only in
> create_merge_append_path():
> 
>   /*
>* Apply query-wide LIMIT if known and path is for sole base relation.
>* (Handling this at this low level is a bit klugy.)
>*/
>   if (bms_equal(rel->relids, root->all_baserels))
>   pathnode->limit_tuples = root->limit_tuples;
>   else
>   pathnode->limit_tuples = -1.0;
> 
> Currently it's not a problem because the output of MergeAppend plan is not
> likely to be re-sorted, but I don't think data correctness should depend on
> cost evaluation. Instead, -1 should be set here if there's any chance that the
> output will be sorted again.
> 

So you're saying we might end up with a plan like this:

Limit
-> Sort
-> MergeAppend
   -> SeqScan on t

In which case we'd pass the wrong limit_tuples to the MergeAppend?

Hmmm, that would depend on us building MergeAppend node that does not
match the expected pathkeys, and pick it instead of plain Append node.
I'm not sure that's actually possible, but maybe it is ...

> I tried to reproduce the issue by applying the "Incremental sort" [2] patch
> and running the following example:
> 
> CREATE TABLE t(i int, j int);
> CREATE TABLE t1() INHERITS (t);
> CREATE INDEX ON t1(i, j);
> INSERT INTO t1(i, j) VALUES (1, 0), (1, 1);
> CREATE TABLE t2() INHERITS (t);
> CREATE INDEX ON t2(i, j);
> INSERT INTO t2(i, j) VALUES (1, 0), (1, 1);
> 
> ANALYZE;
> 
> SELECT * FROM t ORDER BY i, j DESC LIMIT 1;
> 

So, what query plan did this use?


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Parallel threads in query

2018-11-01 Thread Andres Freund
On 2018-11-01 19:57:17 +0100, Tomas Vondra wrote:
> >> I think that very much depends on how expensive the tasks handled by the
> >> threads are. It may still be cheaper than a reasonable IPC, and if you
> >> don't create/destroy threads, that also saves quite a bit of time.
> > 
> > I'm not following. How can you have a pool *and* threads? Those seem to
> > be contradictory in PG's architecture? You need full blown IPC with your
> > proposal afaict?
> > 
> 
> My suggestion was to create a bgworker, which would then internally
> allocate and manage a pool of threads. It could then open some sort of
> IPC (say, as dumb as unix socket). The backends could could then send
> requests to it, and it would respond to them. Not sure why/how would
> this contradict PG's architecture?

Because you said "faster than reasonable IPC" - which to me implies that
you don't do full blown IPC. Which using threads in a bgworker is very
strongly implying. What you're proposing strongly implies multiple
context switches just to process a few results. Even before, but
especially after, spectre that's an expensive proposition.

Greetings,

Andres Freund



Re: Parallel threads in query

2018-11-01 Thread Tomas Vondra
On 11/01/2018 07:50 PM, Andres Freund wrote:
> Hi,
> 
> On 2018-11-01 19:44:54 +0100, Tomas Vondra wrote:
>> On 11/01/2018 07:40 PM, Andres Freund wrote:
>>> On 2018-11-01 19:33:39 +0100, Tomas Vondra wrote:
 In theory, simulating such global limit should be possible using a bit
 of shared memory for the current total, per-process counter and probably
 some simple abort handling (say, just like contrib/openssl does using
 ResourceOwner).
>>>
>>> Right.  I don't think you even need something resowner like, given that
>>> anything using threads better make it absolutely absolutely impossible
>>> that an error can escape.
>>>
>>
>> True. Still, I wonder if the process can die in a way that would fail to
>> update the counter.
> 
> You'd better make that case a panic restart.
> 
> 
 A better solution might be to start a bgworker managing a connection
 pool and forward the requests to it using IPC (and enforce the thread
 count limit there).
>>>
>>> That doesn't really seem feasible for cases like this - after all, you'd
>>> only use threads to work on individual rows if you wanted to parallelize
>>> relatively granular per-row work or such. Adding cross-process IPC seems
>>> like it'd make that perform badly.
>>>
>>
>> I think that very much depends on how expensive the tasks handled by the
>> threads are. It may still be cheaper than a reasonable IPC, and if you
>> don't create/destroy threads, that also saves quite a bit of time.
> 
> I'm not following. How can you have a pool *and* threads? Those seem to
> be contradictory in PG's architecture? You need full blown IPC with your
> proposal afaict?
> 

My suggestion was to create a bgworker, which would then internally
allocate and manage a pool of threads. It could then open some sort of
IPC (say, as dumb as unix socket). The backends could could then send
requests to it, and it would respond to them. Not sure why/how would
this contradict PG's architecture?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Parallel threads in query

2018-11-01 Thread Andres Freund
Hi,

On 2018-11-01 19:44:54 +0100, Tomas Vondra wrote:
> On 11/01/2018 07:40 PM, Andres Freund wrote:
> > On 2018-11-01 19:33:39 +0100, Tomas Vondra wrote:
> >> In theory, simulating such global limit should be possible using a bit
> >> of shared memory for the current total, per-process counter and probably
> >> some simple abort handling (say, just like contrib/openssl does using
> >> ResourceOwner).
> > 
> > Right.  I don't think you even need something resowner like, given that
> > anything using threads better make it absolutely absolutely impossible
> > that an error can escape.
> > 
> 
> True. Still, I wonder if the process can die in a way that would fail to
> update the counter.

You'd better make that case a panic restart.


> >> A better solution might be to start a bgworker managing a connection
> >> pool and forward the requests to it using IPC (and enforce the thread
> >> count limit there).
> > 
> > That doesn't really seem feasible for cases like this - after all, you'd
> > only use threads to work on individual rows if you wanted to parallelize
> > relatively granular per-row work or such. Adding cross-process IPC seems
> > like it'd make that perform badly.
> > 
> 
> I think that very much depends on how expensive the tasks handled by the
> threads are. It may still be cheaper than a reasonable IPC, and if you
> don't create/destroy threads, that also saves quite a bit of time.

I'm not following. How can you have a pool *and* threads? Those seem to
be contradictory in PG's architecture? You need full blown IPC with your
proposal afaict?

Greetings,

Andres Freund



Re: Parallel threads in query

2018-11-01 Thread Tomas Vondra
On 11/01/2018 07:43 PM, Darafei "Komяpa" Praliaskouski wrote:
> In theory, simulating such global limit should be possible using a bit
> of shared memory for the current total, per-process counter and probably
> some simple abort handling (say, just like contrib/openssl does using
> ResourceOwner).
> 
> 
> I would expect that this limit is already available and it's parallel
> worker limit. Basically, when start a new thread I would like to somehow
> consume a part of parallel worker limit - a thread is a kind of parallel
> worker, from user's perspective. If I have 4 cores and Postgres already
> started 4 parallel workers, I don't really want to start 4 threads for
> each of them, or 4 for one of them and 1 for each of the rest, if I
> manage that separately from parallel worker limit.
> 

Well, PostgreSQL does that, but only for the process-based parallelism.
It has no idea about threads, so it can't work out of the box. Also, the
max_worker_processes limit determines various shared memory we need to
manage those processes, so it's really not about threads.

If you need something like that for threads, feel free to do that, but
I'd strongly suggest using a separate counter (perhaps using a m_w_p as
an initial value).

> IPC and co - that's another question and out of scope for this one.
> Since OpenMP allows to write multithreaded code by just adding more
> #pragma around loops, I don't want to reinvent that part of infrastructure.

Maybe. I don't know OpenMP that well, so I can't really safe if that's a
good idea or not.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Parallel threads in query

2018-11-01 Thread Tomas Vondra
On 11/01/2018 07:40 PM, Andres Freund wrote:
> Hi,
> 
> On 2018-11-01 19:33:39 +0100, Tomas Vondra wrote:
>> In theory, simulating such global limit should be possible using a bit
>> of shared memory for the current total, per-process counter and probably
>> some simple abort handling (say, just like contrib/openssl does using
>> ResourceOwner).
> 
> Right.  I don't think you even need something resowner like, given that
> anything using threads better make it absolutely absolutely impossible
> that an error can escape.
> 

True. Still, I wonder if the process can die in a way that would fail to
update the counter.

> 
>> A better solution might be to start a bgworker managing a connection
>> pool and forward the requests to it using IPC (and enforce the thread
>> count limit there).
> 
> That doesn't really seem feasible for cases like this - after all, you'd
> only use threads to work on individual rows if you wanted to parallelize
> relatively granular per-row work or such. Adding cross-process IPC seems
> like it'd make that perform badly.
> 

I think that very much depends on how expensive the tasks handled by the
threads are. It may still be cheaper than a reasonable IPC, and if you
don't create/destroy threads, that also saves quite a bit of time.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Parallel threads in query

2018-11-01 Thread Komяpa
>
> In theory, simulating such global limit should be possible using a bit
> of shared memory for the current total, per-process counter and probably
> some simple abort handling (say, just like contrib/openssl does using
> ResourceOwner).
>

I would expect that this limit is already available and it's parallel
worker limit. Basically, when start a new thread I would like to somehow
consume a part of parallel worker limit - a thread is a kind of parallel
worker, from user's perspective. If I have 4 cores and Postgres already
started 4 parallel workers, I don't really want to start 4 threads for each
of them, or 4 for one of them and 1 for each of the rest, if I manage that
separately from parallel worker limit.

IPC and co - that's another question and out of scope for this one. Since
OpenMP allows to write multithreaded code by just adding more #pragma
around loops, I don't want to reinvent that part of infrastructure.
-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa


Re: Parallel threads in query

2018-11-01 Thread Andres Freund
Hi,

On 2018-11-01 19:33:39 +0100, Tomas Vondra wrote:
> In theory, simulating such global limit should be possible using a bit
> of shared memory for the current total, per-process counter and probably
> some simple abort handling (say, just like contrib/openssl does using
> ResourceOwner).

Right.  I don't think you even need something resowner like, given that
anything using threads better make it absolutely absolutely impossible
that an error can escape.


> A better solution might be to start a bgworker managing a connection
> pool and forward the requests to it using IPC (and enforce the thread
> count limit there).

That doesn't really seem feasible for cases like this - after all, you'd
only use threads to work on individual rows if you wanted to parallelize
relatively granular per-row work or such. Adding cross-process IPC seems
like it'd make that perform badly.

Greetings,

Andres Freund



Re: Parallel threads in query

2018-11-01 Thread Tomas Vondra



On 11/01/2018 06:15 PM, Andres Freund wrote:
> On 2018-11-01 10:10:33 -0700, Paul Ramsey wrote:
>> On Wed, Oct 31, 2018 at 2:11 PM Tom Lane  wrote:
>>
>>> =?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?= 
>>> writes:
 Question is, what's the best policy to allocate cores so we can play nice
 with rest of postgres?
>>>
>>
>>
>>> There is not, because we do not use or support multiple threads inside
>>> a Postgres backend, and have no intention of doing so any time soon.
>>>
>>
>> As a practical matter though, if we're multi-threading  a heavy PostGIS
>> function, presumably simply grabbing *every* core is not a recommended or
>> friendly practice. My finger-in-the-wind guess would be that the value
>> of max_parallel_workers_per_gather would be the most reasonable value to
>> use to limit the number of cores a parallel PostGIS function should use.
>> Does that make sense?
> 
> I'm not sure that's a good approximation.  Postgres' infrastructure
> prevents every query from using max_parallel_workers_per_gather
> processes due to the global max_worker_processes limit.  I think you
> probably would want something very very roughly approximating a global
> limit - otherwise you'll either need to set the per-process limit way
> too low, or overwhelm machines with context switches.
> 

Yeah. Without a global limit it would be fairly trivial to create way
too many threads - say when a query gets parallelized, and each worker
creates a bunch of private threads. And then a bunch of such queries
executed concurrently, and it's getting bad pretty fast.

In theory, simulating such global limit should be possible using a bit
of shared memory for the current total, per-process counter and probably
some simple abort handling (say, just like contrib/openssl does using
ResourceOwner).

A better solution might be to start a bgworker managing a connection
pool and forward the requests to it using IPC (and enforce the thread
count limit there).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Buildfarm failures for hash indexes: buffer leaks

2018-11-01 Thread Andres Freund
On 2018-10-27 08:18:12 +0200, Fabien COELHO wrote:
> 
> Hello Jeff,
> 
> > > I suspect the easiest thing to narrow it down would be to bisect the
> > > problem in gcc :(
> > 
> > Their commit r265241 is what broke the PostgreSQL build.  It also broke the
> > compiler itself--at that commit it was no longer possible to build itself.
> > I had to --disable-bootstrap in order to get a r265241 compiler to test
> > PostgreSQL on.
> 
> It seems they have done a API change around some kind of "range" analysis,
> which must have been incomplete.
> 
> > Their commit r265375 fixed the ability to compile itself, but built
> > PostgreSQL binaries remain broken there and thereafter.
> > 
> > |...]
> 
> Thanks a lot for this investigation! I can fill in a gcc bug report. There
> would be a enormous work to narrow it down to a small test case, it is
> unclear how they can act about it, but at least they would know.

Have you done so? If so, what's the bug number?

Greetings,

Andres Freund



Re: replication_slots usability issue

2018-11-01 Thread Andres Freund
On 2018-11-01 09:34:05 +0900, Michael Paquier wrote:
> HI Andres,
> 
> On Wed, Oct 31, 2018 at 03:48:02PM -0700, Andres Freund wrote:
> > And done.  Thanks for the report JD.
> 
> Shouldn't we also switch the PANIC to a FATAL in
> RestoreSlotFromDisk()?

That has absolutely nothing to do with the issue at hand though, so I
don't think it'd have made much sense to do it at the same time. Nor do
I think it's particularly important.


> I don't mind doing so myself if you agree with the change, only on
> HEAD as you seemed to disagree about changing that on back-branches.

Cool. And yes, I don't think a cosmetic log level adjustment that could
affect people's scripts should be backpatched without need. Even if not
particularly likely to break something.


> Also, from 691d79a which you just committed:
> +   ereport(FATAL,
> +   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +errmsg("logical replication slots \"%s\" exists, but 
> wal_level < logical",
> +   NameStr(cp.slotdata.name)),
> I can see one grammar mistake here, as you refer to only one slot here.
> The error messages should read:
> "logical replication slot \"%s\" exists, but wal_level < logical"
> and:
> "physical replication slot \"%s\" exists, but wal_level < replica"

Darnit. Fixed. Thanks.

Greetings,

Andres Freund



Re: INSTALL file

2018-11-01 Thread Stephen Frost
Greetings,

* Andreas 'ads' Scherbaum (a...@pgug.de) wrote:
> On 01.11.18 07:26, Michael Paquier wrote:
> >>It includes links to the website, as well as the short version of the
> >>installation instructions.
> >+The installation instructions are listed on the website:
> >+
> >+https://www.postgresql.org/docs/current/static/install-short.html

I don't think we should link to the "Short version" directly here, the
above URL should be the "installation.html" one..  With a caveat, see
below.

> >+Short version:
> >+
> >+./configure
> >+make
> >+su
> >+make install
> >+adduser postgres
> >+mkdir /usr/local/pgsql/data
> >+chown postgres /usr/local/pgsql/data
> >+su - postgres
> >+/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
> >+/usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -l logfile start
> >
> >Adding a section about installation and another one about documentation
> >are good things.  Now for the installation section I disagree about
> >adding this detailed way of doing things, and just adding a URL looks
> >enough.
> Was thinking about this, but then decided to add it as an example,
> and see what people think.

I like having the detail, but I'd go farther, really.  Here's my
thinking:

BUILDING ON UNIX


Detailed instructions for many Unix platforms is available here:
https://www.postgresql.org/docs/current/static/installation.html

To build PostgreSQL on most Unix variants, the following are required:

GNU make, version 3.8 or newer
ISO/ANSI C compilar (at least C99-compliant)
Flex 2.5.31 or later, and Bison 1.875 or later (for building from git)
Perl 5.8.3 (for building from git)

PostgreSQL has many additional capabilities which can be enabled using
configure --enable switches but many of those also depend on additional
libraries.  See the installation instructions for details.

To build PostgreSQL, run the following commands:

./configure
make

PostgreSQL can then be installed using 'make install', which will
require being a superuser to install into the default directory.
The installation location can be changed by passing '--prefix' to
'configure'.  Run './configure --help' for additional options.


BUILDING ON WINDOWS


Detailed instructions for building on Windows is available here:
https://www.postgresql.org/docs/current/static/install-windows.html

To build PostgreSQL on Windows, either Visual Studio Express 2017
for Windows Desktop or Microsoft Visual C++ 2005 (or later) should be
installed.  PostgreSQL can also be build using MinGW or Cygwin using
the Unix instructions.

... some details about how to actually use VSE 2017 would be good

CREATING YOUR FIRST DATABASE


Once the PostgreSQL software is installed, the first step to having a
running database is to initialize a PostgreSQL database using the
'initdb' command, eg:

initdb -D mydatabase

After the database system has been initialized, PostgreSQL can be
started by using the pg_ctl command:

pg_ctl -D mydatabase -l logfile start 

Once PostgreSQL is running, you can connect to it using the psql
command-line client.  A default database called 'postgres' was created
by 'initdb'.

BUILDING THE POSTGRESQL DOCUMENTATION
=

Full documentation for PostgreSQL is available online here:
https://www.postgresql.org/docs/current/static/index.html

To build the PostgreSQL documentation on most Unix variants, the
following tools are required:

 whatever these are

To build PostgreSQL's documentation on Unix, run:

./configure
make docs

The documentation, once built by 'make docs', will be available in
various formats in the 'doc/src/sgml' directory.

> >Now there is also a problem, the README would point out to the
> >development version of the documentation.  As this is made at working
> >using git, I could personally live with having stable branches also
> >refer to the development version, but it could also make sense to have
> >each stable branch point to the URL of the versions they work on.
> 
> That is a bit problematic. The README lives on git first, and therefore
> should point to the development version. The release process
> can replace this with links to the current version.

I'm not sure that I'm really following this, because we aren't pointing
to the development documentation, just the 'current' documentation, and
that seems fine, and there's links on that page to the other versions of
the page for each major version of PostgreSQL, in case someone pulled an
older branch or such.

In short, I think we're fine to just use the 'current' URLs in this
README.

I'd also update the 'Makefile' bit we have at the root to refer to the
README instead of the INSTALL file.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Parallel threads in query

2018-11-01 Thread Andres Freund
On 2018-11-01 10:10:33 -0700, Paul Ramsey wrote:
> On Wed, Oct 31, 2018 at 2:11 PM Tom Lane  wrote:
> 
> > =?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?= 
> > writes:
> > > Question is, what's the best policy to allocate cores so we can play nice
> > > with rest of postgres?
> >
> 
> 
> > There is not, because we do not use or support multiple threads inside
> > a Postgres backend, and have no intention of doing so any time soon.
> >
> 
> As a practical matter though, if we're multi-threading  a heavy PostGIS
> function, presumably simply grabbing *every* core is not a recommended or
> friendly practice. My finger-in-the-wind guess would be that the value
> of max_parallel_workers_per_gather would be the most reasonable value to
> use to limit the number of cores a parallel PostGIS function should use.
> Does that make sense?

I'm not sure that's a good approximation.  Postgres' infrastructure
prevents every query from using max_parallel_workers_per_gather
processes due to the global max_worker_processes limit.  I think you
probably would want something very very roughly approximating a global
limit - otherwise you'll either need to set the per-process limit way
too low, or overwhelm machines with context switches.

Greetings,

Andres Freund



Re: Parallel threads in query

2018-11-01 Thread Paul Ramsey
On Wed, Oct 31, 2018 at 2:11 PM Tom Lane  wrote:

> =?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?= 
> writes:
> > Question is, what's the best policy to allocate cores so we can play nice
> > with rest of postgres?
>


> There is not, because we do not use or support multiple threads inside
> a Postgres backend, and have no intention of doing so any time soon.
>

As a practical matter though, if we're multi-threading  a heavy PostGIS
function, presumably simply grabbing *every* core is not a recommended or
friendly practice. My finger-in-the-wind guess would be that the value
of max_parallel_workers_per_gather would be the most reasonable value to
use to limit the number of cores a parallel PostGIS function should use.
Does that make sense?

P


Re: New vacuum option to do only freezing

2018-11-01 Thread Robert Haas
On Mon, Oct 1, 2018 at 6:23 AM Masahiko Sawada  wrote:
> Attached patch adds a new option FREEZE_ONLY to VACUUM command. This
> option is same as FREEZE option except for it disables reclaiming dead
> tuples. That is, with this option vacuum does pruning HOT chain,
> freezing live tuples and maintaining both visibility map and freespace
> map but does not collect dead tuples and invoke neither heap vacuum
> nor index vacuum. This option will be useful if user wants to prevent
> XID wraparound a table as quick as possible, especially when table is
> quite large and is about to XID wraparound. I think this usecase was
> mentioned in some threads but I couldn't find them.

The feature seems like a reasonable one, but the name doesn't seem
like a good choice.  It doesn't only freeze - e.g. it HOT-prunes, as
it must.  Maybe consider something like (without_index_cleanup true)
or (index_cleanup false).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_promote not marked as parallel-restricted in pg_proc.dat

2018-11-01 Thread Robert Haas
On Wed, Oct 31, 2018 at 8:43 PM Michael Paquier  wrote:
> Okay, but likely we would not want to signal the postmaster
> unnecessarily, no?  FALLBACK_PROMOTE_SIGNAL_FILE gets discarded if
> promotion is triggered more than once, but that does not like a sane
> thing to do if not necessary.

Uh, what in the world does that have to do with the topic at hand?
Parallel-safety has nothing to do with preventing functions from being
called "unnecessarily" or "more than once".

I don't like the fact that Tom keeps arguing for marking things
parallel-unsafe without any real justification.  During the
development of parallel query, he argued that no such concept should
exist, because everything should be work in parallel mode just like it
does otherwise.  I eventually convinced him (or so it seemed) that we
had to have the concept because it was impractical to eliminate all
the corner cases where things were parallel-unsafe in the short/medium
term.  However, in the long term, the fact that we have
parallel-restricted and parallel-unsafe is an implementation
restriction that we should be trying to eliminate.  Making the list of
parallel-unsafe things longer makes that goal harder to achieve,
especially when the marking is applied not out of any real
justification based on what the code does, as was my intent, but based
on an ill-defined feeling of unease.

Eventually, after some future round of infrastructure improvements,
we're going to end up having discussions about changing things that
are marked parallel-unsafe or parallel-restricted to parallel-safe,
and when somebody can't find any reason why the existing markings are
like that in the first place, they're going to use that as
justification for not changing them ("there must've been a reason!").
Parallel-safety shouldn't be some kind of quasi-religious thing where
markings bleed from a function that is properly marked to one with a
similar name, or irrational values are applied in the name of
preventing unspecified or only-vaguely-connected evils.  It should be
based on things that can be determined scientifically, like "hey, does
this code access any global variables other than those which are
automatically synchronized between the master and the workers?" and
"hey, does this code ever attempt
heap_insert/heap_update/heap_delete?" and "hey, does this ever call
other user-defined code that might itself be parallel-unsafe?".

The right way to figure out the appropriate parallel-safety marking
for a function is to read the code and see what it does.  Now you
might miss something, as with any code-reading exercise, but if you
don't, then you should come up with the right answer.  In the case of
pg_promote, it calls out to the following functions:

RecoveryInProgress - checks shared memory state.  equally available to
a worker as to the leader.
AllocateFile/FreeFile - no problem here; these work fine in workers
just like any other backend.
kill, unlink - same thing
ResetLatch, WaitLatch, CHECK_FOR_INTERRUPTS() - all fine in a worker;
in fact, used as part of the worker machinery
ereport - works anywhere, worker machinery propagates errors back to leader

That's all.  There are no obvious references to global variables or
anything here.  So, it looks to be parallel-safe.

If you do the same analysis for pg_start_backup(), you'll immediately
notice that it calls get_backup_status(), and if you look at that
function, you'll see that it returns the value of a global variable.
If you check parallel.c, you'll find that that global variable is not
one of the special ones that gets synchronized between a leader and
the workers.  Therefore, if you ran this function in a worker, it
might do the wrong thing, because it might get the wrong value of that
global variable.  So it's at least (and in fact exactly)
parallel-restricted.  It doesn't need to be parallel-restricted
because it's scary in some ill-defined way or any of these other
reasons advanced on this thread -- it needs to be parallel-restricted
because it *relies on a global variable that is not synchronized to
parallel workers*.  If somebody writes code to move that state out of
a global variables and into the main shared memory segment or to a DSM
shared between the leader and the workers, then it can be marked
parallel-safe (unless, for example, it also depends on OTHER global
variables that are NOT moved into shared storage).  Otherwise not.

I hope I'm making sense here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: New vacuum option to do only freezing

2018-11-01 Thread Bossart, Nathan
Hi,

On 10/1/18, 5:23 AM, "Masahiko Sawada"  wrote:
> Attached patch adds a new option FREEZE_ONLY to VACUUM command. This
> option is same as FREEZE option except for it disables reclaiming dead
> tuples. That is, with this option vacuum does pruning HOT chain,
> freezing live tuples and maintaining both visibility map and freespace
> map but does not collect dead tuples and invoke neither heap vacuum
> nor index vacuum. This option will be useful if user wants to prevent
> XID wraparound a table as quick as possible, especially when table is
> quite large and is about to XID wraparound. I think this usecase was
> mentioned in some threads but I couldn't find them.

I've thought about this a bit myself.  One of the reasons VACUUM can
take so long is because of all the index scans needed.  If you're in a
potential XID wraparound situation and just need a quick way out, it
would be nice to have a way to do the minimum amount of work necessary
to reclaim transaction IDs.  At a high level, I think there are some
improvements to this design we should consider.

1. Create a separate FREEZE command instead of adding a new VACUUM
   option

The first line of the VACUUM documentation reads, "VACUUM reclaims
storage occupied by dead tuples," which is something that we would
explicitly not be doing with FREEZE_ONLY.  I think it makes sense to
reuse many of the VACUUM code paths to implement this feature, but
from a user perspective, it should be separate.

2. We should reclaim transaction IDs from dead tuples as well

Unless we also have a way to freeze XMAX like we do XMIN, I doubt this
feature will be useful for the imminent-XID-wraparound use-case.  In
short, we won't be able to advance relfrozenxid and relminmxid beyond
the oldest XMAX value for the relation.  IIUC the idea of freezing
XMAX doesn't really exist yet.  Either the XMAX is aborted/invalid and
can be reset to InvalidTransactionId, or it is committed and the tuple
can be removed if it beyond the freezing threshold.  So, we probably
also want to look into adding a way to freeze XMAX, either by setting
it to FrozenTransactionId or by setting the hint bits to
(HEAP_XMAX_COMMITTED | HEAP_XMIN_INVALID) as is done for XMIN.

Looking closer, I see that the phrase "freezing XMAX" is currently
used to refer to setting it to InvalidTransactionId if it is aborted
or invalid (e.g. lock-only).

> Currently this patch just adds the new option to VACUUM command but it
> might be good to make autovacuum use it when emergency vacuum is
> required.

This also seems like a valid use-case, but it should definitely be
done as a separate effort after this feature has been committed.

> This is a performance-test result for FREEZE option and FREEZE_ONLY
> option. I've tested them on the table which is about 3.8GB table
> without indexes and randomly modified.
>
> * FREEZE
> ...
> Time: 50301.262 ms (00:50.301)
>
> * FREEZE_ONLY
> ...
> Time: 44589.794 ms (00:44.590)

I'd be curious to see the improvements you get when there are several
indexes on the relation.  The ability to skip the index scans is
likely how this feature will really help speed things up.

Nathan



Re: [PATCH][PROPOSAL] Add enum releation option type

2018-11-01 Thread Nikolay Shaplov
В письме от 1 ноября 2018 11:10:14 пользователь Tom Lane написал:

> >>> I see you lost the Oxford comma:
> >>> 
> >>> -DETAIL:  Valid values are "on", "off", and "auto".
> >>> +DETAIL:  Valid values are "auto", "on" and "off".
> >>> 
> >>> Please put these back.
> >> 
> >> Actually that's me who have lost it. The code with  oxford comma would be
> >> a
> >> bit more complicated. We should put such coma when we have 3+ items and
> >> do not put it when we have 2.
> >> 
> >> Does it worth it?
> > 
> > In my opinion, it is worth it.
> 
> Uh ... I've not been paying attention to this thread, but this exchange
> seems to be about somebody constructing a message like that piece-by-piece
> in code.  This has got to be a complete failure from the translatability
> standpoint.  See
> 
> https://www.postgresql.org/docs/devel/static/nls-programmer.html#NLS-GUIDELI
> NES

It's a very good reason...

In this case the only solution I can see is 

DETAIL:  Valid values are: "value1", "value2", "value3".

Where list '"value1", "value2", "value3"' is built in runtime but have no any 
bindnings to any specific language. And the rest of the message is
'DETAIL:  Valid values are: %s' which can be properly translated.


-- 
Do code for fun.

signature.asc
Description: This is a digitally signed message part.


Re: [PATCH][PROPOSAL] Add enum releation option type

2018-11-01 Thread Tom Lane
Robert Haas  writes:
> On Wed, Mar 7, 2018 at 10:23 AM Nikolay Shaplov  wrote:
>>> I see you lost the Oxford comma:
>>> 
>>> -DETAIL:  Valid values are "on", "off", and "auto".
>>> +DETAIL:  Valid values are "auto", "on" and "off".
>>> 
>>> Please put these back.

>> Actually that's me who have lost it. The code with  oxford comma would be a
>> bit more complicated. We should put such coma when we have 3+ items and do 
>> not
>> put it when we have 2.
>> 
>> Does it worth it?

> In my opinion, it is worth it.

Uh ... I've not been paying attention to this thread, but this exchange
seems to be about somebody constructing a message like that piece-by-piece
in code.  This has got to be a complete failure from the translatability
standpoint.  See

https://www.postgresql.org/docs/devel/static/nls-programmer.html#NLS-GUIDELINES

regards, tom lane



Re: PG vs macOS Mojave

2018-11-01 Thread Daniel Gustafsson
> On 1 Nov 2018, at 15:53, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>>> On 1 Nov 2018, at 15:14, Tom Lane  wrote:
>>> Wow ... could it be that it actually varies depending on the combination
>>> of compiler and OS versions?  That would be weird.
> 
>> Or the version of XCode and the set of installed SDKs?  I only have a single
>> SDK installed on both of these systems, if you have multiple ones on the 10.0
>> installation that might explain something.  Or not.
> 
> Nope, I just have the one:

Then I’m inclined to say that it probably depends on the combination of OS
version, XCode version and potentially the SDK version.  Either way, passing
the -isysroot explicitly as in your suggestion should still work unless I’m
missing something.

cheers ./daniel


Re: [PATCH][PROPOSAL] Add enum releation option type

2018-11-01 Thread Robert Haas
On Wed, Mar 7, 2018 at 10:23 AM Nikolay Shaplov  wrote:
> > I see you lost the Oxford comma:
> >
> > -DETAIL:  Valid values are "on", "off", and "auto".
> > +DETAIL:  Valid values are "auto", "on" and "off".
> >
> > Please put these back.
> Actually that's me who have lost it. The code with  oxford comma would be a
> bit more complicated. We should put such coma when we have 3+ items and do not
> put it when we have 2.
>
> Does it worth it?

In my opinion, it is worth it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PG vs macOS Mojave

2018-11-01 Thread Tom Lane
Daniel Gustafsson  writes:
>> On 1 Nov 2018, at 15:14, Tom Lane  wrote:
>> Wow ... could it be that it actually varies depending on the combination
>> of compiler and OS versions?  That would be weird.

> Or the version of XCode and the set of installed SDKs?  I only have a single
> SDK installed on both of these systems, if you have multiple ones on the 10.0
> installation that might explain something.  Or not.

Nope, I just have the one:

$ ls -l 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/
total 0
drwxr-xr-x  7 root  wheel  224 Oct 31 11:22 MacOSX.sdk/
lrwxr-xr-x  1 root  wheel   10 Sep 19 19:28 MacOSX10.14.sdk@ -> MacOSX.sdk

regards, tom lane



Re: POC: Cleaning up orphaned files using undo logs

2018-11-01 Thread Robert Haas
While I've been involved in the design discussions for this patch set,
I haven't looked at any of the code personally in a very long time.  I
certainly don't claim to be an independent reviewer, and I encourage
others to review this work also.  That said, here are some review
comments.

I decided to start with 0005, as that has the user-facing
documentation for this feature. There is a spurious whitespace-only
hunk in monitoring.sgml.

+ Process ID of the backend currently attached to this undo log
+  for writing.

or NULL/0/something if none?

+   each undo log that exists.  Undo logs are extents within a contiguous
+   addressing space that have their own head and tail pointers.

This sentence seems to me to have so little detail that it's not going
to help anyone, and it also seems somewhat out-of-place here.  I think
it would be better to link to the longer explanation in the new
storage section instead.

+   Each backend that has written undo data is associated with one or more undo

extra space

+
+Undo logs hold data that is used for rolling back and for implementing
+MVCC in access managers that are undo-aware (currently "zheap").  The storage
+format of undo logs is optimized for reusing existing files.
+

I think the mention of zheap should be removed here since the hope is
that the undo stuff can be committed independently of and prior to
zheap.

I think you mean access methods, not access managers.  I suggest
making that an xref.

Maybe add a little more detail, e.g.

Undo logs provide a place for access methods to store data that can be
used to perform necessary cleanup operations after a transaction
abort.  The data will be retained after a transaction abort until the
access method successfully performs the required cleanup operations.
After a transaction commit, undo data will be retained until the
transaction is all-visible.  This makes it possible for access
managers to use undo data to implement MVCC.  Since it most cases undo
data is discarded very quickly, the undo system has been optimized to
minimize writes to disk and to reuse existing files efficiently.

+
+Undo data exists in a 64 bit address space broken up into numbered undo logs
+that represent 1TB extents, for efficient management.  The space is further
+broken up into 1MB segment files, for physical storage.  The name of each file
+is the address of of the first byte in the file, with a period inserted after
+the part that indicates the undo log number.
+

I cannot read this section and know what an undo filename is going to
look like.  Also, the remarks about efficient management seems like it
might be unclear to someone not already familiar with how this works.
Maybe something like:

Undo data exists in a 64-bit address space divided into 2^34 undo
logs, each with a theoretical capacity of 1TB.  The first time a
backend writes undo, it attaches to an existing undo log whose
capacity is not yet exhausted and which is not currently being used by
any other backend; or if no suitable undo log already exists, it
creates a new one.  To avoid wasting space, each undo log is further
divided into 1MB segment files, so that segments which are no longer
needed can be removed (possibly recycling the underlying file by
renaming it) and segments which are not yet needed do not need to be
physically created on disk.  An undo segment file has a name like
, where  is the undo log number and  is the
segment number.

I think it's good to spell out the part about attaching to undo logs
here, because when people look at pg_undo, the number of files will be
roughly proportional to the number of backends, and we should try to
help them understand - at least in general terms - why that happens.

+
+Just as relations can have one of the three persistence levels permanent,
+unlogged or temporary, the undo data that is generated by modifying them must
+be stored in an undo log of the same persistence level.  This enables the
+undo data to be discarded at appropriate times along with the relations that
+reference it.
+

This is not quite general, because we're not necessarily talking about
modifications to the files.  In fact, in this POC, we're explicitly
talking about the cleanup of the files themselves.  Also, it's not
technically correct to say that the persistence level has to match.
You could put everything in permanent undo logs.  It would just suck.

Moving on to 0003, the developer documentation:

+The undo log subsystem provides a way to store data that is needed for
+a limited time.  Undo data is generated whenever zheap relations are
+modified, but it is only useful until (1) the generating transaction
+is committed or rolled back and (2) there is no snapshot that might
+need it for MVCC purposes.  See src/backend/access/zheap/README for
+more information on zheap.  The undo log subsystem is concerned with

Again, I think this should be rewritten to make it independent of
zheap.  We hope that this facility is not only usable by but 

Re: PG vs macOS Mojave

2018-11-01 Thread Daniel Gustafsson
> On 1 Nov 2018, at 15:14, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> Odd.  I don’t actually get -isysroot on XCode 10.0 on my 10.13.6 
>> installation,
>> on 10.12 with XCode 8.3.3 I do however get -isysroot.
> 
> Wow ... could it be that it actually varies depending on the combination
> of compiler and OS versions?  That would be weird.

Or the version of XCode and the set of installed SDKs?  I only have a single
SDK installed on both of these systems, if you have multiple ones on the 10.0
installation that might explain something.  Or not.

cheers ./daniel


Re: PG vs macOS Mojave

2018-11-01 Thread Tom Lane
Daniel Gustafsson  writes:
> Odd.  I don’t actually get -isysroot on XCode 10.0 on my 10.13.6 installation,
> on 10.12 with XCode 8.3.3 I do however get -isysroot.

Wow ... could it be that it actually varies depending on the combination
of compiler and OS versions?  That would be weird.

regards, tom lane



Re: Commitfest 2018-11

2018-11-01 Thread Dmitry Dolgov
> On Thu, 1 Nov 2018 at 15:11, Stephen Frost  wrote:
> Hmm...  Can you try it again?

Yep, now I see the administration menu, thanks.



Re: Commitfest 2018-11

2018-11-01 Thread Stephen Frost
Greetings,

* Dmitry Dolgov (9erthali...@gmail.com) wrote:
> > On Thu, 1 Nov 2018 at 14:11, Stephen Frost  wrote:
> > I've added you to the 'cf admins' group
> 
> Thanks.
> 
> > so please give it a shot now and let me know if you run into any troubles.
> 
> Hm...I don't see any difference in CF app, what should be changed?

Hmm...  Can you try it again?  If it still doesn't show up, please let
me know what your community account is that you're logging in with..

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: zheap: a new storage format for PostgreSQL

2018-11-01 Thread Tomas Vondra

On 11/01/2018 07:43 AM, Amit Kapila wrote:


You can find the latest code at https://github.com/EnterpriseDB/zheap



Seems valgrind complains about a couple of places in the code - nothing 
major, might be noise, but probably worth a look.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
==7569== Conditional jump or move depends on uninitialised value(s)
==7569==at 0x5914C8: ZHeapDetermineModifiedColumns (zheapam.c:)
==7569==by 0x587453: zheap_update (zheapam.c:2176)
==7569==by 0x7577D3: ExecUpdate (nodeModifyTable.c:1426)
==7569==by 0x759C0B: ExecModifyTable (nodeModifyTable.c:2629)
==7569==by 0x729687: ExecProcNodeFirst (execProcnode.c:445)
==7569==by 0x71D8E9: ExecProcNode (executor.h:241)
==7569==by 0x720194: ExecutePlan (execMain.c:1711)
==7569==by 0x71DF11: standard_ExecutorRun (execMain.c:367)
==7569==by 0x71DD3F: ExecutorRun (execMain.c:310)
==7569==by 0x9113CA: ProcessQuery (pquery.c:161)
==7569==by 0x912CE8: PortalRunMulti (pquery.c:1286)
==7569==by 0x9122C1: PortalRun (pquery.c:799)
==7569==by 0x90C1C2: exec_simple_query (postgres.c:1215)
==7569==by 0x9104D9: PostgresMain (postgres.c:4244)
==7569==by 0x86A911: BackendRun (postmaster.c:4388)
==7569==by 0x86A0DF: BackendStartup (postmaster.c:4079)
==7569==by 0x8665F7: ServerLoop (postmaster.c:1711)
==7569==by 0x865EA3: PostmasterMain (postmaster.c:1384)
==7569==by 0x78E3AB: main (main.c:228)
==7569==  Uninitialised value was created by a stack allocation
==7569==at 0x59043D: znocachegetattr (zheapam.c:6206)
==7569== 
{
   
   Memcheck:Cond
   fun:ZHeapDetermineModifiedColumns
   fun:zheap_update
   fun:ExecUpdate
   fun:ExecModifyTable
   fun:ExecProcNodeFirst
   fun:ExecProcNode
   fun:ExecutePlan
   fun:standard_ExecutorRun
   fun:ExecutorRun
   fun:ProcessQuery
   fun:PortalRunMulti
   fun:PortalRun
   fun:exec_simple_query
   fun:PostgresMain
   fun:BackendRun
   fun:BackendStartup
   fun:ServerLoop
   fun:PostmasterMain
   fun:main
}
==8811== Conditional jump or move depends on uninitialised value(s)
==8811==at 0x5914C8: ZHeapDetermineModifiedColumns (zheapam.c:)
==8811==by 0x587453: zheap_update (zheapam.c:2176)
==8811==by 0x7577D3: ExecUpdate (nodeModifyTable.c:1426)
==8811==by 0x759C0B: ExecModifyTable (nodeModifyTable.c:2629)
==8811==by 0x729687: ExecProcNodeFirst (execProcnode.c:445)
==8811==by 0x71D8E9: ExecProcNode (executor.h:241)
==8811==by 0x720194: ExecutePlan (execMain.c:1711)
==8811==by 0x71DF11: standard_ExecutorRun (execMain.c:367)
==8811==by 0x71DD3F: ExecutorRun (execMain.c:310)
==8811==by 0x9113CA: ProcessQuery (pquery.c:161)
==8811==by 0x912CE8: PortalRunMulti (pquery.c:1286)
==8811==by 0x9122C1: PortalRun (pquery.c:799)
==8811==by 0x90C1C2: exec_simple_query (postgres.c:1215)
==8811==by 0x9104D9: PostgresMain (postgres.c:4244)
==8811==by 0x86A911: BackendRun (postmaster.c:4388)
==8811==by 0x86A0DF: BackendStartup (postmaster.c:4079)
==8811==by 0x8665F7: ServerLoop (postmaster.c:1711)
==8811==by 0x865EA3: PostmasterMain (postmaster.c:1384)
==8811==by 0x78E3AB: main (main.c:228)
==8811==  Uninitialised value was created by a stack allocation
==8811==at 0x59043D: znocachegetattr (zheapam.c:6206)
==8811== 
{
   
   Memcheck:Cond
   fun:ZHeapDetermineModifiedColumns
   fun:zheap_update
   fun:ExecUpdate
   fun:ExecModifyTable
   fun:ExecProcNodeFirst
   fun:ExecProcNode
   fun:ExecutePlan
   fun:standard_ExecutorRun
   fun:ExecutorRun
   fun:ProcessQuery
   fun:PortalRunMulti
   fun:PortalRun
   fun:exec_simple_query
   fun:PostgresMain
   fun:BackendRun
   fun:BackendStartup
   fun:ServerLoop
   fun:PostmasterMain
   fun:main
}
==8811== Conditional jump or move depends on uninitialised value(s)
==8811==at 0x574A76: InsertUndoRecord (undorecord.c:135)
==8811==by 0x56F405: InsertPreparedUndo (undoinsert.c:840)
==8811==by 0x5848E0: zheap_insert (zheapam.c:896)
==8811==by 0x755F21: ExecInsert (nodeModifyTable.c:692)
==8811==by 0x759BB2: ExecModifyTable (nodeModifyTable.c:2622)
==8811==by 0x729687: ExecProcNodeFirst (execProcnode.c:445)
==8811==by 0x71D8E9: ExecProcNode (executor.h:241)
==8811==by 0x720194: ExecutePlan (execMain.c:1711)
==8811==by 0x71DF11: standard_ExecutorRun (execMain.c:367)
==8811==by 0x71DD3F: ExecutorRun (execMain.c:310)
==8811==by 0x9113CA: ProcessQuery (pquery.c:161)
==8811==by 0x912CE8: PortalRunMulti (pquery.c:1286)
==8811==by 0x9122C1: PortalRun (pquery.c:799)
==8811==by 0x90C1C2: exec_simple_query (postgres.c:1215)
==8811==by 0x9104D9: PostgresMain (postgres.c:4244)
==8811==by 0x86A911: BackendRun (postmaster.c:4388)
==8811==by 0x86A0DF: BackendStartup (postmaster.c:4079)
==8811==by 0x8665F7: ServerLoop (postmaster.c:1711)
==8811==by 0x865EA3: PostmasterMain 

Re: Commitfest 2018-11

2018-11-01 Thread Dmitry Dolgov
> On Thu, 1 Nov 2018 at 14:11, Stephen Frost  wrote:
> I've added you to the 'cf admins' group

Thanks.

> so please give it a shot now and let me know if you run into any troubles.

Hm...I don't see any difference in CF app, what should be changed?



CF app feature request

2018-11-01 Thread Andrew Dunstan



Yesterday Fabien and I submitted the same item to the Commitfest (1859 
and 1860). Unfortunately there doesn't seem to be any way for one of 
these to be withdrawn. "Rejected" and "Returned with Feedback" seem 
wrong. Ideally, there would be a way for someone who submits an item in 
error to withdraw it, at which point it should be as if it had never 
been submitted.


Meanwhile, what's the advice about how to deal with these?


cheers


andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: bugfix: BUG #15477: Procedure call with named inout refcursor parameter - "invalid input syntax for type boolean"

2018-11-01 Thread Pavel Stehule
Cleaned patch with regress tests
diff --git a/src/pl/plpgsql/src/expected/plpgsql_call.out b/src/pl/plpgsql/src/expected/plpgsql_call.out
index 547ca22a55..8762e1335c 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_call.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_call.out
@@ -276,3 +276,43 @@ DROP PROCEDURE test_proc1;
 DROP PROCEDURE test_proc3;
 DROP PROCEDURE test_proc4;
 DROP TABLE test1;
+CREATE TABLE test_call_proc(key serial, name text);
+CREATE OR REPLACE PROCEDURE p1(v_cnt int, v_ResultSet inout refcursor = NULL)
+AS $$
+BEGIN
+  INSERT INTO test_call_proc(name) VALUES('name test');
+  OPEN v_ResultSet FOR SELECT * FROM test_call_proc;
+END
+$$ LANGUAGE plpgsql;
+DO $$
+DECLARE
+  v_ResultSet refcursor;
+  v_cnt   integer;
+BEGIN
+  CALL p1(v_cnt:=v_cnt, v_ResultSet := v_ResultSet);
+  RAISE NOTICE '%', v_ResultSet;
+END;
+$$;
+NOTICE:  
+DO $$
+DECLARE
+  v_ResultSet refcursor;
+  v_cnt   integer;
+BEGIN
+  CALL p1(10, v_ResultSet := v_ResultSet);
+  RAISE NOTICE '%', v_ResultSet;
+END;
+$$;
+NOTICE:  
+DO $$
+DECLARE
+  v_ResultSet refcursor;
+  v_cnt   integer;
+BEGIN
+  CALL p1(v_ResultSet := v_ResultSet, v_cnt:=v_cnt);
+  RAISE NOTICE '%', v_ResultSet;
+END;
+$$;
+NOTICE:  
+DROP PROCEDURE p1;
+DROP TABLE test_call_proc;
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 45526383f2..8589c62ce1 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2168,10 +2168,10 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 		 */
 		if (!stmt->target)
 		{
+			Form_pg_proc funcform;
 			Node	   *node;
 			ListCell   *lc;
 			FuncExpr   *funcexpr;
-			int			i;
 			HeapTuple	tuple;
 			Oid		   *argtypes;
 			char	  **argnames;
@@ -2179,6 +2179,7 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			MemoryContext oldcontext;
 			PLpgSQL_row *row;
 			int			nfields;
+			int			pronargs;
 
 			/*
 			 * Get the original CallStmt
@@ -2196,6 +2197,8 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			if (!HeapTupleIsValid(tuple))
 elog(ERROR, "cache lookup failed for function %u", funcexpr->funcid);
 			get_func_arg_info(tuple, , , );
+			funcform = (Form_pg_proc) GETSTRUCT(tuple);
+			pronargs = funcform->pronargs;
 			ReleaseSysCache(tuple);
 
 			/*
@@ -2210,45 +2213,80 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			row->varnos = palloc(sizeof(int) * FUNC_MAX_ARGS);
 
 			nfields = 0;
-			i = 0;
-			foreach(lc, funcexpr->args)
+
+			/*
+			 * The argmodes can be in different order than funcexpr->args due
+			 * named args. When we should to check INOUT parameters and prepare
+			 * target variable, we should to reorder a arguments  first. This is
+			 * similar code to reorder_function_arguments. In this part a default
+			 * values are not necessary.
+			 */
+			if (argmodes)
 			{
-Node	   *n = lfirst(lc);
+Node	   *argarray[FUNC_MAX_ARGS];
+int		nargsprovided = list_length(funcexpr->args);
+int		i = 0;
+
+MemSet(argarray, 0, pronargs * sizeof(Node *));
 
-if (argmodes && argmodes[i] == PROARGMODE_INOUT)
+foreach(lc, funcexpr->args)
 {
-	if (IsA(n, Param))
-	{
-		Param	   *param = castNode(Param, n);
+	Node	   *n = lfirst(lc);
 
-		/* paramid is offset by 1 (see make_datum_param()) */
-		row->varnos[nfields++] = param->paramid - 1;
-	}
-	else if (IsA(n, NamedArgExpr))
+	if (IsA(n, NamedArgExpr))
 	{
 		NamedArgExpr *nexpr = castNode(NamedArgExpr, n);
-		Param	   *param;
+		argarray[nexpr->argnumber] = (Node *) nexpr->arg;
+	}
+	else
+		argarray[i++] = n;
+}
 
-		if (!IsA(nexpr->arg, Param))
-			ereport(ERROR,
-	(errcode(ERRCODE_SYNTAX_ERROR),
-	 errmsg("argument %d is an output argument but is not writable", i + 1)));
+Assert(nargsprovided <= pronargs);
 
-		param = castNode(Param, nexpr->arg);
+for (i = 0; i < pronargs; i++)
+{
+	Node	   *n = argarray[i];
 
+	if (argmodes[i] == PROARGMODE_INOUT)
+	{
 		/*
-		 * Named arguments must be after positional arguments,
-		 * so we can increase nfields.
+		 * Empty positions are related to default values. The INOUT defaults
+		 * are allowed, only if after are not any other parameter.
 		 */
-		row->varnos[nexpr->argnumber] = param->paramid - 1;
-		nfields++;
+		if (!n)
+		{
+			int		j;
+			bool	found = false;
+
+			for (j = i + 1; j < pronargs; j++)
+if (argarray[j])
+{
+	found = true;
+	break;
+}
+
+			if (found)
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("argument %i with default values is output argument but it is not writeable", i + 1)));
+
+			/* there are not any other parameter */
+			break;
+		}
+		else if (IsA(n, Param))
+		{
+			Param	   *param = castNode(Param, n);
+
+			/* 

Re: INSTALL file

2018-11-01 Thread Andreas 'ads' Scherbaum

On 01.11.18 07:26, Michael Paquier wrote:

On Thu, Nov 01, 2018 at 01:32:09AM +0100, Andreas 'ads' Scherbaum wrote:

Picking up on this idea, attached is a first draft for changing the
README.

Why don't you add it to the upcoming commit fest?  It would be good to
get some traction with a formal review.


I plan to do that, once I gathered some feedback here.



It includes links to the website, as well as the short version of the
installation instructions.

+The installation instructions are listed on the website:
+
+https://www.postgresql.org/docs/current/static/install-short.html
+
+Short version:
+
+./configure
+make
+su
+make install
+adduser postgres
+mkdir /usr/local/pgsql/data
+chown postgres /usr/local/pgsql/data
+su - postgres
+/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
+/usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -l logfile start

Adding a section about installation and another one about documentation
are good things.  Now for the installation section I disagree about
adding this detailed way of doing things, and just adding a URL looks
enough.

Was thinking about this, but then decided to add it as an example,
and see what people think.




Pointing to the global installation recommendations would be a better
fit also as a lot of things are platform-dependent.  So this URL looks
better:
https://www.postgresql.org/docs/current/static/installation.html

Now there is also a problem, the README would point out to the
development version of the documentation.  As this is made at working
using git, I could personally live with having stable branches also
refer to the development version, but it could also make sense to have
each stable branch point to the URL of the versions they work on.


That is a bit problematic. The README lives on git first, and therefore
should point to the development version. The release process
can replace this with links to the current version.


Regards,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project




Re: Commitfest 2018-11

2018-11-01 Thread Stephen Frost
Greetings,

* Daniel Gustafsson (dan...@yesql.se) wrote:
> > On 1 Nov 2018, at 13:19, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > Do I need to have any special permissions in CF app
> > for that (e.g. now I can't "push the button" to start the current one)?
> 
> Yes, I believe either Magnus or Stephen can set the correct permissions (or at
> least point to whomever can).

I've added you to the 'cf admins' group, so please give it a shot now
and let me know if you run into any troubles.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: bugfix: BUG #15477: Procedure call with named inout refcursor parameter - "invalid input syntax for type boolean"

2018-11-01 Thread Pavel Stehule
čt 1. 11. 2018 v 13:10 odesílatel Pavel Stehule 
napsal:

>
>
> čt 1. 11. 2018 v 13:00 odesílatel Pavel Stehule 
> napsal:
>
>> Hi
>>
>> The processing of named parameters inside CALL statement is not correct.
>>
>> It is my code :-/. I am sorry
>>
>> Attached patch fix it.
>>
>> This still doesn't fix INOUT variables with default value - so is not
>> fully correct, but in this moment, it can show, where is a core of this
>> issue.
>>
>
> This version disallow INOUT default variables from plpgsql
>

Probably correct solution is saving transformed arguments after
expand_function_arguments and iterating over transformed argument list, not
over
original argument list.


>
>
>> Regards
>>
>> Pavel
>>
>>


Re: Commitfest 2018-11

2018-11-01 Thread Daniel Gustafsson
> On 1 Nov 2018, at 13:19, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> 
>> On Thu, 1 Nov 2018 at 13:05, Daniel Gustafsson  wrote:
>> 
>> Dmitry: Are you still interested in running this commitfest?
> 
> Yes, I'm still interested.

Great!

> Do I need to have any special permissions in CF app
> for that (e.g. now I can't "push the button" to start the current one)?

Yes, I believe either Magnus or Stephen can set the correct permissions (or at
least point to whomever can).

cheers ./daniel


Re: Commitfest 2018-11

2018-11-01 Thread Dmitry Dolgov
> On Thu, 1 Nov 2018 at 13:05, Daniel Gustafsson  wrote:
>
> Dmitry: Are you still interested in running this commitfest?

Yes, I'm still interested. Do I need to have any special permissions in CF app
for that (e.g. now I can't "push the button" to start the current one)?



Re: bugfix: BUG #15477: Procedure call with named inout refcursor parameter - "invalid input syntax for type boolean"

2018-11-01 Thread Pavel Stehule
čt 1. 11. 2018 v 13:00 odesílatel Pavel Stehule 
napsal:

> Hi
>
> The processing of named parameters inside CALL statement is not correct.
>
> It is my code :-/. I am sorry
>
> Attached patch fix it.
>
> This still doesn't fix INOUT variables with default value - so is not
> fully correct, but in this moment, it can show, where is a core of this
> issue.
>

This version disallow INOUT default variables from plpgsql



> Regards
>
> Pavel
>
>
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 45526383f2..21c8c398e0 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2171,7 +2171,6 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			Node	   *node;
 			ListCell   *lc;
 			FuncExpr   *funcexpr;
-			int			i;
 			HeapTuple	tuple;
 			Oid		   *argtypes;
 			char	  **argnames;
@@ -2210,45 +2209,62 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			row->varnos = palloc(sizeof(int) * FUNC_MAX_ARGS);
 
 			nfields = 0;
-			i = 0;
-			foreach(lc, funcexpr->args)
+
+			/*
+			 * The argmodes can be in different order than funcexpr->args due
+			 * named args. When we should to check INOUT parameters and prepare
+			 * target variable, we should to reorder a arguments  first.
+			 */
+			if (argmodes)
 			{
-Node	   *n = lfirst(lc);
+Node	   **args;
+int		i = 0;
+int		nargs = list_length(funcexpr->args);
+
+args = palloc0(sizeof(Node *) * FUNC_MAX_ARGS);
 
-if (argmodes && argmodes[i] == PROARGMODE_INOUT)
+foreach(lc, funcexpr->args)
 {
-	if (IsA(n, Param))
-	{
-		Param	   *param = castNode(Param, n);
+	Node	   *n = lfirst(lc);
 
-		/* paramid is offset by 1 (see make_datum_param()) */
-		row->varnos[nfields++] = param->paramid - 1;
-	}
-	else if (IsA(n, NamedArgExpr))
+	if (IsA(n, NamedArgExpr))
 	{
 		NamedArgExpr *nexpr = castNode(NamedArgExpr, n);
-		Param	   *param;
 
-		if (!IsA(nexpr->arg, Param))
+		args[nexpr->argnumber] = (Node *) nexpr->arg;
+	}
+	else
+		args[i++] = n;
+}
+
+for (i = 0; i < nargs; i++)
+{
+	Node	   *n = args[i];
+
+	if (argmodes[i] == PROARGMODE_INOUT)
+	{
+		if (!n)
+		{
 			ereport(ERROR,
 	(errcode(ERRCODE_SYNTAX_ERROR),
-	 errmsg("argument %d is an output argument but is not writable", i + 1)));
+	 errmsg("argument %d is an output argument but is not passed", i + 1)));
+		}
+		else if (IsA(n, Param))
+		{
+			Param	   *param = castNode(Param, n);
 
-		param = castNode(Param, nexpr->arg);
+			/* paramid is offset by 1 (see make_datum_param()) */
 
-		/*
-		 * Named arguments must be after positional arguments,
-		 * so we can increase nfields.
-		 */
-		row->varnos[nexpr->argnumber] = param->paramid - 1;
-		nfields++;
+			row->varnos[nfields++] = param->paramid - 1;
+		}
+		else
+			ereport(ERROR,
+	(errcode(ERRCODE_SYNTAX_ERROR),
+	 errmsg("argument %d is an output argument but is not writable", i + 1)));
 	}
-	else
-		ereport(ERROR,
-(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("argument %d is an output argument but is not writable", i + 1)));
 }
-i++;
+
+pfree(args);
 			}
 
 			row->nfields = nfields;


Commitfest 2018-11

2018-11-01 Thread Daniel Gustafsson
We are now in November and the 2018-11 commitfest has started (or rather, will
one someone pushes the button) but AFAICT there hasn’t been anyone volunteering
to do be CFM this time apart from Dmitry Dolgov who showed an interest (in
CA+q6zcWPs5aFUQxZ5qi=wRhbesUSRGD3_r4HSgQKq+G6=ux...@mail.gmail.com) when the
2018-09 CF started.

Dmitry: Are you still interested in running this commitfest?

If not, I volunteer to do it for 2018-11.

If someone has already volunteered and I completely missed that then sorry for
the noise.

cheers ./daniel


bugfix: BUG #15477: Procedure call with named inout refcursor parameter - "invalid input syntax for type boolean"

2018-11-01 Thread Pavel Stehule
Hi

The processing of named parameters inside CALL statement is not correct.

It is my code :-/. I am sorry

Attached patch fix it.

This still doesn't fix INOUT variables with default value - so is not fully
correct, but in this moment, it can show, where is a core of this issue.

Regards

Pavel
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 45526383f2..781963e32c 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2171,7 +2171,6 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			Node	   *node;
 			ListCell   *lc;
 			FuncExpr   *funcexpr;
-			int			i;
 			HeapTuple	tuple;
 			Oid		   *argtypes;
 			char	  **argnames;
@@ -2210,45 +2209,60 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			row->varnos = palloc(sizeof(int) * FUNC_MAX_ARGS);
 
 			nfields = 0;
-			i = 0;
-			foreach(lc, funcexpr->args)
+
+			/*
+			 * The argmodes can be in different order than funcexpr->args due
+			 * named args. When we should to check INOUT parameters and prepare
+			 * target variable, we should to reorder a arguments  first.
+			 */
+			if (argmodes)
 			{
-Node	   *n = lfirst(lc);
+Node	   **args;
+int		i = 0;
+int		nargs = list_length(funcexpr->args);
+
+args = palloc0(sizeof(Node *) * FUNC_MAX_ARGS);
 
-if (argmodes && argmodes[i] == PROARGMODE_INOUT)
+foreach(lc, funcexpr->args)
 {
-	if (IsA(n, Param))
+	Node	   *n = lfirst(lc);
+
+	if (IsA(n, NamedArgExpr))
 	{
-		Param	   *param = castNode(Param, n);
+		NamedArgExpr *nexpr = castNode(NamedArgExpr, n);
+
+		Assert(nexpr->argnumber < nargs);
 
-		/* paramid is offset by 1 (see make_datum_param()) */
-		row->varnos[nfields++] = param->paramid - 1;
+		args[nexpr->argnumber] = (Node *) nexpr->arg;
 	}
-	else if (IsA(n, NamedArgExpr))
+	else
+		args[i++] = n;
+}
+
+for (i = 0; i < nargs; i++)
+{
+	Node	   *n = args[i];
+
+	Assert(n != NULL);
+
+	if (argmodes[i] == PROARGMODE_INOUT)
 	{
-		NamedArgExpr *nexpr = castNode(NamedArgExpr, n);
-		Param	   *param;
+		if (IsA(n, Param))
+		{
+			Param	   *param = castNode(Param, n);
+
+			/* paramid is offset by 1 (see make_datum_param()) */
 
-		if (!IsA(nexpr->arg, Param))
+			row->varnos[nfields++] = param->paramid - 1;
+		}
+		else
 			ereport(ERROR,
 	(errcode(ERRCODE_SYNTAX_ERROR),
 	 errmsg("argument %d is an output argument but is not writable", i + 1)));
-
-		param = castNode(Param, nexpr->arg);
-
-		/*
-		 * Named arguments must be after positional arguments,
-		 * so we can increase nfields.
-		 */
-		row->varnos[nexpr->argnumber] = param->paramid - 1;
-		nfields++;
 	}
-	else
-		ereport(ERROR,
-(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("argument %d is an output argument but is not writable", i + 1)));
 }
-i++;
+
+pfree(args);
 			}
 
 			row->nfields = nfields;


Re: PG vs macOS Mojave

2018-11-01 Thread Daniel Gustafsson
> On 1 Nov 2018, at 04:17, Tom Lane  wrote:

> and on Xcode 10.0 I get

Odd.  I don’t actually get -isysroot on XCode 10.0 on my 10.13.6 installation,
on 10.12 with XCode 8.3.3 I do however get -isysroot.

> Right now I think the only plausible
> fix is to go back to adding "-isysroot $PG_SYSROOT" to CPPFLAGS.

+1. That does seem like the safe option.

cheers ./daniel


Doubts about pushing LIMIT to MergeAppendPath

2018-11-01 Thread Antonin Houska
Review of [1] made me think of this optimization, currently used only in
create_merge_append_path():

/*
 * Apply query-wide LIMIT if known and path is for sole base relation.
 * (Handling this at this low level is a bit klugy.)
 */
if (bms_equal(rel->relids, root->all_baserels))
pathnode->limit_tuples = root->limit_tuples;
else
pathnode->limit_tuples = -1.0;

Currently it's not a problem because the output of MergeAppend plan is not
likely to be re-sorted, but I don't think data correctness should depend on
cost evaluation. Instead, -1 should be set here if there's any chance that the
output will be sorted again.

I tried to reproduce the issue by applying the "Incremental sort" [2] patch
and running the following example:

CREATE TABLE t(i int, j int);
CREATE TABLE t1() INHERITS (t);
CREATE INDEX ON t1(i, j);
INSERT INTO t1(i, j) VALUES (1, 0), (1, 1);
CREATE TABLE t2() INHERITS (t);
CREATE INDEX ON t2(i, j);
INSERT INTO t2(i, j) VALUES (1, 0), (1, 1);

ANALYZE;

SELECT * FROM t ORDER BY i, j DESC LIMIT 1;

I expected the MergeAppend plan to apply the limit and thus prevent the
incremental sort node from receiving the first tuple that it should emit,
however the query yielded correct result. I think the reason is that the
MergeAppendPath.limit_tuples field is only used for cost estimates, but not
enforced during execution. Is that intended?

I thought this could be better approach to the limit push-down

if (root->limit_tuples > 0 && root->parse->sortClause == NIL &&
bms_equal(rel->relids, root->all_baserels))
pathnode->limit_tuples = root->limit_tuples;
else
pathnode->limit_tuples = -1.0;

however it will stop working as soon as the incremental sort patch (which can
is used below the upper planner) gets applied.

Finally I think we should be able to apply the limit to generic path, not only
to MergeAppendPath. I just don't know how to check when it's correct. Does
anyone have an idea?

[1] https://commitfest.postgresql.org/20/1850/

[2] https://commitfest.postgresql.org/20/1124/

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com



Re: CVE-2017-7484-induced bugs, or, btree cmp functions are not leakproof?

2018-11-01 Thread Dilip Kumar
On Mon, Oct 29, 2018 at 2:53 PM Amit Langote
 wrote:
>
> Thank you for creating the patch.
>
> On 2018/10/28 20:35, Dilip Kumar wrote:
> > On Sat, Oct 27, 2018 at 10:07 AM Dilip Kumar  wrote:
> >> On Fri, Oct 26, 2018 at 1:12 PM Amit Langote wrote:
> >>> On 2018/10/25 19:54, Dilip Kumar wrote:
>  Basically, if the relation is RELOPT_OTHER_MEMBER_REL, we can
>  recursively fetch its parent until we reach to the base relation
>  (which is actually named in the query). And, once we have the base
>  relation we can check ACL on that and set vardata->acl_ok accordingly.
>  Additionally, for getting the parent RTI we need to traverse
>  "root->append_rel_list". Another alternative could be that we can add
>  parent_rti member in RelOptInfo structure.
> >>>
> >>> Adding parent_rti would be a better idea [1].  I think that traversing
> >>> append_rel_list every time would be inefficient.
> >>>
> >>> [1] I've named it inh_root_parent in one of the patches I'm working on
> >>> where I needed such a field (https://commitfest.postgresql.org/20/1778/)
> >>>
> >> Ok, Make sense. I have written a patch by adding this variable.
> >> There is still one FIXME in the patch, basically, after getting the
> >> baserel rte I need to convert child varattno to parent varattno
> >> because in case of inheritance that can be different.  Do we already
> >> have any mapping from child attno to parent attno or we have to look
> >> up the cache.
>
> Sorry I forgot to cc you, but I'd posted a patch on the "speeding up
> planning with partitions" thread, that's extracted from the bigger patch,
> which adds inh_root_parent member to RelOptInfo [1].  Find it attached
> with this email.
>
> One of the differences from your patch is that it makes inh_root_parent
> work not just for partitioning, but to inheritance in general.  Also, it
> codes the changes to build_simple_rel to set inh_root_parent's value a bit
> differently than your patch.
>
> > Attached patch handles this issue.
>
> I noticed a typo in your patch:
>
> transalate_varattno -> translate_varattno
>
> +static int
> +transalate_varattno(Oid oldrelid, Oid newrelid, int old_attno)
> +{
> +   Relationoldrelation = heap_open(oldrelid, NoLock);
>
> It doesn't seem nice that it performs a heap_open on the parent relation.
>
> +   att = TupleDescAttr(old_tupdesc, old_attno - 1);
> +   attname = NameStr(att->attname);
> +
> +   newtup = SearchSysCacheAttName(newrelid, attname);
> +   if (!newtup)
> +   {
> +   heap_close(oldrelation, NoLock);
> +   return InvalidAttrNumber;
> +   }
>
> and
>
> +   varattno = transalate_varattno(relid, rte->relid, var->varattno);
> +   if (AttributeNumberIsValid(varattno))
> +   relid = rte->relid;
> +   else
> +   varattno = var->varattno;
>
> It's not possible for varattno to be invalid here, because the query on
> inheritance parent only allows to select parent's columns, so we'd error
> out before getting here if a column not present in the parent were selected.

Make sense to me.
>
>
> Anyway, why don't we just use the child table's AppendRelInfo to get the
> parent's version of varattno instead of creating a new function?  It can
> be done as shown in the attached revised version of the portion of the
> patch changing selfuncs.c.  Please take a look.

+1

>
> [1]
> https://www.postgresql.org/message-id/f06a398a-40a9-efb4-fab9-784400fecf13%40lab.ntt.co.jp



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: FETCH FIRST clause WITH TIES option

2018-11-01 Thread Surafel Temesgen
hi,

The attached patch include all the comment given by Tomas and i check sql
standard about LIMIT and this feature

it did not say anything about it but I think its good idea to include it to
LIMIT too and I will add it if we have consensus on it.

regards

surafel
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 4db8142afa..b649b1ca7a 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ] ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1397,7 +1397,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ]
 
 In this syntax, the start
 or count value is required by
@@ -1408,6 +1408,9 @@ FETCH { FIRST | NEXT } [ count ] {
 If count is
 omitted in a FETCH clause, it defaults to 1.
 ROW
+WITH TIES option is used to return two or more rows
+that tie for last place in the limit results set according to ORDER BY
+clause (ORDER BY clause must be specified in this case).
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index bb28cf7d1d..dbc060e9a9 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -131,7 +132,8 @@ ExecLimit(PlanState *pstate)
  * the state machine state to record having done so.
  */
 if (!node->noCount &&
-	node->position - node->offset >= node->count)
+	node->position - node->offset >= node->count &&
+	node->limitOption == WITH_ONLY)
 {
 	node->lstate = LIMIT_WINDOWEND;
 
@@ -145,17 +147,64 @@ ExecLimit(PlanState *pstate)
 	return NULL;
 }
 
-/*
- * Get next tuple from subplan, if any.
- */
-slot = ExecProcNode(outerPlan);
-if (TupIsNull(slot))
+else if (!node->noCount &&
+		 node->position - node->offset >= node->count &&
+		 node->limitOption == WITH_TIES)
 {
-	node->lstate = LIMIT_SUBPLANEOF;
-	return NULL;
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	/*
+	 * Test if the new tuple and the last tuple match.
+	 * If so we return the tuple.
+	 */
+	econtext->ecxt_innertuple = slot;
+	econtext->ecxt_outertuple = node->last_slot;
+	if (ExecQualAndReset(node->eqfunction, econtext))
+	{
+		ExecCopySlot(node->last_slot, slot);
+		node->subSlot = slot;
+		node->position++;
+	}
+	else
+	{
+		node->lstate = LIMIT_WINDOWEND;
+
+		/*
+		* If we know we won't need to back up, we can release
+		* resources at this point.
+		*/
+		if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD))
+			(void) ExecShutdownNode(outerPlan);
+
+		return NULL;
+	}
+
+}
+else
+{
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	if (node->limitOption == WITH_TIES)
+	{
+		ExecCopySlot(node->last_slot, slot);
+	}
+	node->subSlot = slot;
+	node->position++;
 }
-node->subSlot = slot;
-node->position++;
 			}
 			else
 			{
@@ -311,7 +360,8 @@ recompute_limits(LimitState *node)
 	 * must update the child node anyway, in case this is a rescan and the
 	 * previous time we got a different result.
 	 */
-	ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
+	if(node->limitOption == WITH_ONLY)
+		ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
 }
 
 /*
@@ -374,6 +424,7 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
 		   (PlanState *) limitstate);
 	limitstate->limitCount = ExecInitExpr((Expr *) node->limitCount,
 		  (PlanState *) limitstate);
+	limitstate->limitOption = node->limitOption;
 
 	/*
 	 * Initialize result slot and type. (XXX not actually used, but upper
@@ -387,6 

Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-01 Thread Amit Kapila
On Sun, Oct 21, 2018 at 7:12 PM Michael Paquier  wrote:
>
> Hi all,
>
> This is a follow-up of the following thread:
> https://www.postgresql.org/message-id/20181012010411.re53cwcistcpi...@alap3.anarazel.de
>
> In a nutshell, b34e84f1 has added TAP tests for pg_verify_checksums, and
> the buildfarm has immediately complained about files generated by
> EXEC_BACKEND missing, causing pg_verify_checksums to fail for such
> builds.  An extra issue has been noticed by Andres in the tool: it
> misses to check for temporary files, hence files like
> base/pgsql_tmp/pgsql_tmp$PID.NN.sharedfileset/1.1 can also cause the
> tool to fail.  After a crash, those files would not be removed, and
> stopping the instance would still let them around.
>
> pg_verify_checksums used first a blacklist to decide if files have
> checksums or not.  So with this approach all files should have checksums
> except files like pg_control, pg_filenode.map, PG_VERSION, etc.
>
> d55241a has added a first fix to silence the buildfarm for EXEC_BACKEND
> builds.  After discussion, Andres has pointed out that some extensions
> may want to drop files within global/ or base/ as part of their logic.
> cstore_fdw was mentioned but if you look at their README that's not the
> case.  However, I think that Andres argument is pretty good as with
> pluggable storage we should allow extensions to put custom files for
> different tablespaces.  So this commit has changed the logic of
> pg_verify_checksums to use a whitelist, which assumes that only normal
> relation files have checksums:
> 
> .
> _
> _.
> After more discussion on the thread mentioned above, Stephen has pointed
> out that base backups use the same blacklist logic when verifying
> checksums.  This has the same problem with EXEC_BACKEND-specific files,
> resulting in spurious warnings (that's an example!) which are confusing
> for the user:
> $ pg_basebackup -D popo
> WARNING:  cannot verify checksum in file "./global/config_exec_params",
> block 0: read buffer size 5 and page size 8192 differ
>
> Folks on this thread agreed that both pg_verify_checksums and checksum
> verification for base backups should use the same logic.  It seems to me
> that using a whitelist-based approach for both is easier to maintain as
> the patterns of files supporting checksums is more limited than files
> which do not support checksums.  So this way we allow extensions
> bundling custom files to still work with pg_verify_checksums and
> checksum verification in base backups.
>

This sounds like a good argument for having a whitelist approach, but
is it really a big problem if a user gets warning for files that the
utility is not able to verify checksums for?  I think in some sense
this message can be useful to the user as it can allow him to know
which files are not verified by the utility for some form of
corruption.  I guess one can say that users might not be interested in
this information in which case such a check could be optional as you
seem to be suggesting in the following paragraph.

> Something else which has been discussed on this thread is that we should
> have a dedicated API to decide if a file has checksums or not, and if it
> should be skipped in both cases.  That's work only for HEAD though, so
> we need to do something for HEAD and v11, which is simple.
>

+1.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Connection slots reserved for replication

2018-11-01 Thread Alexander Kukushkin
Hi,

Attached rebased version patch to the current HEAD and created commit fest entry
On Fri, 21 Sep 2018 at 13:43, Alexander Kukushkin  wrote:
>
> Hi,
>
> On 20 September 2018 at 08:18, Kyotaro HORIGUCHI
>  wrote:
>
> >
> > Instaed, we can iterally "reserve" connection slots for the
> > specific use by providing ProcGlobal->walsenderFreeProcs. The
> > slots are never stolen for other usage. Allowing additional
> > walsenders comes after all the slots are filled to grab an
> > available "normal" slot, it works as the same as the current
> > behavior when walsender_reserved_connectsions = 0.
> >
> > What do you think about this?
>
> Sounds reasonable, please see the updated patch.
>
> Regards,
> --
> Alexander Kukushkin
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7554cba3f9..d9ddcb22b7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3060,6 +3060,27 @@ include_dir 'conf.d'

   
 
+ 
+  replication_reserved_connections
+  (integer)
+  
+   replication_reserved_connections configuration parameter
+  
+  
+  
+   
+Determines the number of connection slots that
+are reserved for replication connections.
+   
+
+   
+The default value is zero. The value should not exceed max_wal_senders.
+This parameter can only be set at server start.
+   
+  
+ 
+
   
max_replication_slots (integer)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 2683385ca6..3b6c636077 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -122,6 +122,8 @@ int			max_wal_senders = 0;	/* the maximum number of concurrent
 int			wal_sender_timeout = 60 * 1000; /* maximum time to send one WAL
 			 * data message */
 bool		log_replication_commands = false;
+int			replication_reserved_connections = 0; /* the number of connection slots
+   * reserved for replication connections */
 
 /*
  * State for WalSndWakeupRequest
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 6f9aaa52fa..2d04a8204a 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -43,6 +43,7 @@
 #include "postmaster/autovacuum.h"
 #include "replication/slot.h"
 #include "replication/syncrep.h"
+#include "replication/walsender.h"
 #include "storage/condition_variable.h"
 #include "storage/standby.h"
 #include "storage/ipc.h"
@@ -180,6 +181,7 @@ InitProcGlobal(void)
 	ProcGlobal->freeProcs = NULL;
 	ProcGlobal->autovacFreeProcs = NULL;
 	ProcGlobal->bgworkerFreeProcs = NULL;
+	ProcGlobal->walsenderFreeProcs = NULL;
 	ProcGlobal->startupProc = NULL;
 	ProcGlobal->startupProcPid = 0;
 	ProcGlobal->startupBufferPinWaitBufId = -1;
@@ -253,13 +255,20 @@ InitProcGlobal(void)
 			ProcGlobal->autovacFreeProcs = [i];
 			procs[i].procgloballist = >autovacFreeProcs;
 		}
-		else if (i < MaxBackends)
+		else if (i < MaxBackends - replication_reserved_connections)
 		{
 			/* PGPROC for bgworker, add to bgworkerFreeProcs list */
 			procs[i].links.next = (SHM_QUEUE *) ProcGlobal->bgworkerFreeProcs;
 			ProcGlobal->bgworkerFreeProcs = [i];
 			procs[i].procgloballist = >bgworkerFreeProcs;
 		}
+		else if (i < MaxBackends)
+		{
+			/* PGPROC for walsender, add to walsenderFreeProcs list */
+			procs[i].links.next = (SHM_QUEUE *) ProcGlobal->walsenderFreeProcs;
+			ProcGlobal->walsenderFreeProcs = [i];
+			procs[i].procgloballist = >walsenderFreeProcs;
+		}
 
 		/* Initialize myProcLocks[] shared memory queues. */
 		for (j = 0; j < NUM_LOCK_PARTITIONS; j++)
@@ -304,6 +313,8 @@ InitProcess(void)
 		procgloballist = >autovacFreeProcs;
 	else if (IsBackgroundWorker)
 		procgloballist = >bgworkerFreeProcs;
+	else if (am_walsender)
+		procgloballist = >walsenderFreeProcs;
 	else
 		procgloballist = >freeProcs;
 
@@ -318,6 +329,14 @@ InitProcess(void)
 
 	set_spins_per_delay(ProcGlobal->spins_per_delay);
 
+	/*
+	* Try to use ProcGlobal->freeProcs as a fallback when
+	* all reserved walsender slots are already busy.
+	*/
+	if (am_walsender && replication_reserved_connections < max_wal_senders
+			&& *procgloballist == NULL)
+		procgloballist = >freeProcs;
+
 	MyProc = *procgloballist;
 
 	if (MyProc != NULL)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 4f1d2a0d28..bb90f53b1e 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -527,7 +527,7 @@ InitializeMaxBackends(void)
 
 	/* the extra unit accounts for the autovacuum launcher */
 	MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
-		max_worker_processes;
+		max_worker_processes + replication_reserved_connections;
 
 	/* internal error because the values were all checked previously */
 	if (MaxBackends > MAX_BACKENDS)
@@ -812,8 +812,8 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 
 	/*
 	 * The last 

Re: FETCH FIRST clause WITH TIES option

2018-11-01 Thread Tomas Vondra

On 10/31/2018 06:17 PM, Robert Haas wrote:

On Mon, Oct 29, 2018 at 12:48 PM Andrew Gierth
 wrote:

Then FETCH FIRST N WITH TIES becomes "stop when the expression
   rank() over (order by ...) <= N
is no longer true" (where the ... is the existing top level order by)


Wouldn't that be wicked expensive compared to something hard-coded
that does the same thing?



Not sure, but that was one of my concerns too, particularly for the 
simple FETCH FIRST N WITH TIES case. But I think Andrew has a point it 
would make it much easier to implement the PERCENT case.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_promote not marked as parallel-restricted in pg_proc.dat

2018-11-01 Thread Amit Kapila
On Thu, Nov 1, 2018 at 6:14 AM Michael Paquier  wrote:
>
> On Wed, Oct 31, 2018 at 01:09:53PM -0400, Robert Haas wrote:
> > There's no rule whatsoever that a parallel worker can't write to the
> > disk.  pg_start_backup and pg_stop_backup have to be
> > parallel-restricted because, when used in non-exclusive mode, they
> > establish backend-local state that wouldn't be synchronized with the
> > state in the workers -- namely the information that a non-exclusive
> > backup is in progress.
>
> Okay, but likely we would not want to signal the postmaster
> unnecessarily, no?

Right, but I think the question here is whether it is safe to execute
this function in parallel workers?  I don't see any meaningful use
cases where anyone wants to run this via parallel workers even if it
is safe to execute via them, but I think that is not how we decide
parallel-safe property of any functions.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] generated columns

2018-11-01 Thread John Naylor
On 10/30/18, Peter Eisentraut  wrote:
> 3. Radical alternative: Collapse everything into one new column.  We
> could combine atthasdef and attgenerated and even attidentity into a new
> column.  (Only one of the three can be the case.)  This would give
> client code a clean break, which may or may not be good.  The
> implementation would be uglier than #1 but probably cleaner than #2.  We
> could also get 4 bytes back per pg_attribute row.

Thinking about the distinction between 'stored' and 'virtual', I
immediately thought of atthasmissing. In a way it indicates that there
is a virtual default value. It seems the level of materialization is
an orthogonal concept to the kind of value we have. What if
attgenerated had

d = normal default value
i = identity default
a = identity always
c = generated column

and in addition there was an attmaterialized column with

s = stored
v = virtual

In this scheme,
-Normal attribute: '\0' + 's'
-Default value: 'd' + 's'
-Fast default: 'd' + 'v' until the table is rewritten
-Identity column: 'i'/'a' + 's'
-Generated column: 'c' + 's'/'v'

This way, we'd still end up with 1 fewer column (2 new ones minus
atthasdef, attidentity, and atthasmissing).

A bit crazier, what if "d = dropped" was another allowed value in
attmaterialized -- we could then get rid of attisdropped as well. That
has obvious disadvantages, but the broader idea is that this design
may have use cases we haven't thought of yet.

Thoughts?

-John Naylor



Re: Super PathKeys (Allowing sort order through precision loss functions)

2018-11-01 Thread Tomas Vondra

On 11/01/2018 02:40 AM, David Rowley wrote:

On 1 November 2018 at 12:24, Andres Freund  wrote:

FWIW, I kind of wonder if we built proper infrastructure to allow to
make such inferrences from function calls, whether it could also be made
to support the transformation of LIKEs into indexable <= >= clauses.


Perhaps, but I doubt it would be the same function to do both.  Surely
I need something that accepts details about the function call as
arguments and returns an Expr * of the argument that we can derive the
order of the return value from, or NULL.  I think the transformation
you need might be more along the lines of returning a List * of quals
that can substitute an OpExpr containing a function call. I'm not that
clear on how we'd know the new quals were better than the existing
ones.  For example extract('year', dt)  = 2018 could be transformed to
dt >= '2018-01-01' AND dt < '2019-01-01', but how would we know that
was better. There might be an index on extract('year', dt).



IMHO there is only a handful of "useful" transformations of this kind, 
depending on the operator class of an index. So when building index 
paths and checking which quals may be used as index conditions, we'd do 
try transforming incompatible quals and leave the rest up to the 
existing create_index_path machinery (which already makes the decision 
about which quals to use for index search etc.)


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Speeding up INSERTs and UPDATEs to partitioned tables

2018-11-01 Thread Amit Langote
On 2018/11/01 10:30, David Rowley wrote:
> It's great to know the patch is now so perfect that we've only the
> macro naming left to debate ;-)

I looked over v12 again and noticed a couple minor issues.

+ *  table then we store the index into parenting
+ *  PartitionTupleRouting 'partition_dispatch_info' array.  An

s/PartitionTupleRouting/PartitionTupleRouting's/g

Also, I got a bit concerned about "parenting".  Does it mean something
like "enclosing", because the PartitionDispatch is a member of
PartitionTupleRouting?  I got concerned because using "parent" like this
may be confusing as this is the partitioning code we're talking about,
where "parent" is generally used to mean "parent" table.

+ * the partitioned table that's the target of the command.  If we must
+ * route tuple via some sub-partitioned table, then the PartitionDispatch
+ * for those is only built the first time it's required.

... via some sub-partitioned table"s"

Or perhaps rewrite a bit as:

If we must route the tuple via some sub-partitioned table, then its
PartitionDispatch is built the first time it's required.


The macro naming discussion got me thinking today about the macro itself.
It encapsulates access to the various PartitionTupleRouting arrays
containing the maps, but maybe we've got the interface of tuple routing a
bit (maybe a lot given this thread!) wrong to begin with.  Instead of
ExecFindPartition returning indexes into various PartitionTupleRouting
arrays and its callers then using those indexes to fetch various objects
from those arrays, why doesn't it return those objects itself?  Although
we made sure that the callers don't need to worry about the meaning of
these indexes changing with this patch, it still seems a bit odd for them
to have to go back to those arrays to get various objects.

How about we change ExecFindPartition's interface so that it returns the
ResultRelInfo, the two maps, and the partition slot?  So, the arrays
simply become a cache for ExecFindPartition et al and are no longer
accessed outside execPartition.c.  Although that makes the interface of
ExecFindPartition longer, I think it reduces overall complexity.

I've implemented that in the attached patch
1-revise-ExecFindPartition-interface.patch.

Also, since all members of PartitionTupleRouting are only accessed within
execPartition.c save root_tuple_slot, we can move it to execPartition.c to
make its internals private, after doing something about root_tuple_slot.
Looking at the code related to root_tuple_slot, it seems the field really
belongs in ModifyTableState, because it got nothing to do with routing.
Attached 2-make-PartitionTupleRouting-private.patch does that.

These patches 1 and 2 apply on top of v12-0001.. patch.

Thanks,
Amit

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 0b0696e61e..b45972682f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2316,6 +2316,7 @@ CopyFrom(CopyState cstate)
bool   *nulls;
ResultRelInfo *resultRelInfo;
ResultRelInfo *target_resultRelInfo;
+   ResultRelInfo *prev_part_rel = NULL;
EState *estate = CreateExecutorState(); /* for ExecConstraints() */
ModifyTableState *mtstate;
ExprContext *econtext;
@@ -2331,7 +2332,6 @@ CopyFrom(CopyState cstate)
CopyInsertMethod insertMethod;
uint64  processed = 0;
int nBufferedTuples = 0;
-   int prev_leaf_part_index = -1;
boolhas_before_insert_row_trig;
boolhas_instead_insert_row_trig;
boolleafpart_use_multi_insert = false;
@@ -2685,19 +2685,24 @@ CopyFrom(CopyState cstate)
/* Determine the partition to heap_insert the tuple into */
if (proute)
{
-   int leaf_part_index;
-   TupleConversionMap *map;
+   TupleTableSlot *partition_slot = NULL;
+   TupleConversionMap *child_to_parent_map,
+  *parent_to_child_map;
 
/*
 * Attempt to find a partition suitable for this tuple.
 * ExecFindPartition() will raise an error if none can 
be found.
+* This replaces the original target ResultRelInfo with
+* partition's.
 */
-   leaf_part_index = ExecFindPartition(mtstate, 
target_resultRelInfo,
-   
proute, slot, estate);
-   Assert(leaf_part_index >= 0 &&
-  leaf_part_index < proute->num_partitions);
+   resultRelInfo = ExecFindPartition(mtstate, 
target_resultRelInfo,

Re: Is there way to detect uncommitted 'new table' in pg_class?

2018-11-01 Thread Hubert Zhang
Thanks

On Thu, Nov 1, 2018 at 8:38 AM Michael Paquier  wrote:

> On Wed, Oct 31, 2018 at 01:30:52PM -0400, Robert Haas wrote:
> > In theory, at least, you could write C code to scan the catalog tables
> > with SnapshotDirty to find the catalog entries, but I don't think that
> > helps a whole lot.  You couldn't necessarily rely on those catalog
> > entries to be in a consistent state, and even if they were, they might
> > depend on committed types or functions or similar whose definitions
> > your backend can't see.  Moreover, the creating backend will have an
> > AccessExclusiveLock on the table -- if you write C code to bypass that
> > and read the data anyway, then you will probably destabilize the
> > entire system for complicated reasons that I don't feel like
> > explaining right now.
>
> One take here is that we cannot give any guarantee that a single DDL
> will create only one consistent version of the tuple added in system
> catalogs.  In those cases a new version is made visible by using
> CommandCounterIncrement() so as the follow-up processing can see it.
>
> > You should try very hard to find some way of solving this problem that
> > doesn't require reading data from a table that hasn't been committed
> > yet, because you are almost certainly not going to be able to make
> > that work reliably even if you are willing to write code in C.
>
> +1.
> --
> Michael
>


-- 
Thanks

Hubert Zhang


  1   2   >