date:20180323

Re: Re: GSOC 2018 Ideas

2018-03-23 Thread Amit Kapila

On Sat, Mar 24, 2018 at 8:39 AM, Vimal Rathod  wrote:
>
> Thank you for replying to my message in very short time. As you have said, I
> looked into the url and found that I am interested in doing
> TOAST'ing in Slices Project. Can you give me some more details about this
> project other than the one which is in the url given by you,like..
> More details about TOAST table?
>

You can read about it in the documentation [1].

> Where do I find the code corresponding to the project?

You can look into below two files:
src/backend/access/heap/tuptoaster.c
src/backend/catalog/toasting.c

> What am I supposed to submit in proposal ?
>

I think you can first submit the design which you want to follow and
once you get some confirmation from other community members, you can
start working on the code.

[1] - https://www.postgresql.org/docs/devel/static/storage-toast.html

Note - Please don't top post, try to reply inline.  That is what all
of the people follow here and it is much more convenient that top
posting.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] path toward faster partition pruning

2018-03-23 Thread David Rowley

On 24 March 2018 at 01:15, Amit Langote  wrote:
> In the updated patch (that is, after incorporating your changes), I have
> moved this partsupfunc switching to the caller of partkey_datum_from_expr
> instead of doing it there.  New patch also checks that returned function
> OID is valid, which if not we don't use the expression's value for pruning.

Thanks for accepting those changes.

> So now. we statically allocate a partsupfunc array on every invocation of
> perform_pruning_base_step() or of get_partitions_excluded_by_ne_datums().
> Considering run-time pruning, we may have to find some other place to
> cache that.

hmm yeah, it's not perfect, but I don't have any better ideas for now,
apart from this probably could be done when creating the steps rather
than executing them. That would save having to look up the correct
function Oid during execution, and save bothering to create steps
values that we simply can't compare to the partition key.

I've done this in the attached patch against v39.

I also renamed argvalues to argexprs, since they're not values.  The
PartClauseInfo could probably do with the same change too, but I
didn't touch it.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

v39_drowley_delta1.patch
Description: Binary data

Re: [HACKERS] why not parallel seq scan for slow functions

2018-03-23 Thread Robert Haas

On Fri, Mar 23, 2018 at 12:12 AM, Amit Kapila  wrote:
> Yeah, sometimes that kind of stuff change performance characteristics,
> but I think what is going on here is that create_projection_plan is
> causing the lower node to build physical tlist which takes some
> additional time.  I have tried below change on top of the patch series
> and it brings back the performance for me.

I tried another approach inspired by this, which is to altogether skip
building the child scan tlist if it will just be replaced.  See 0006.
In testing here, that seems to be a bit better than your proposal, but
I wonder what your results will be.

> Before returning subplan, don't we need to copy the cost estimates
> from best_path as is done in the same function after few lines.

The new 0006 takes care of this, too.  Really, the new 0006 should be
merged into 0001, but I kept it separate for now.

So, rebased:

0001 - More or less as before.

0002 - More or less as before.

0003 - Rewritten in the wake of partitionwise aggregate, as the
tlist_rel must be partitioned in order for partitionwise aggregate to
work.  Quite pleasingly, this eliminates a bunch of Result nodes from
the partitionwise join test results.  Overall, I find this quite a bit
cleaner than the present code (leaving performance aside for the
moment).  In the process of writing this I realized that the
partitionwise aggregate code doesn't look like it handles SRFs
properly, so if this doesn't get committed we'll have to fix that
problem some other way.

0004 - A little different than before as a result of the extensive
changes in 0003.

0005 - Also different, and revealing another defect in partitionwise
aggregate, as noted in the commit message.

0006 - Introduce CP_IGNORE_TLIST; optimization of 0001.

0007 - Use NULL relids set for the toplevel tlist upper rel.  This
seems to be slightly faster than the other way.  This is an
optimization of 0003.

It looks in my testing like this still underperforms master on your
test case.  Do you get the same result?  Any other ideas?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

0007-Use-NULL-relids-set.patch
Description: Binary data

0006-CP_IGNORE_TLIST.patch
Description: Binary data

0005-Remove-explicit-path-construction-logic-in-create_or.patch
Description: Binary data

0004-Postpone-generate_gather_paths-for-topmost-scan-join.patch
Description: Binary data

0003-Add-new-upper-rel-to-represent-projecting-toplevel-s.patch
Description: Binary data

0002-Adjust-use_physical_tlist-to-work-on-upper-rels.patch
Description: Binary data

0001-Teach-create_projection_plan-to-omit-projection-wher.patch
Description: Binary data

Fwd: Re: GSOC 2018 Ideas

2018-03-23 Thread Vimal Rathod

-- Forwarded message --
From: "Vimal Rathod" 
Date: 23-Mar-2018 9:51 PM
Subject: Re: GSOC 2018 Ideas
To: "Jaime Soler" 
Cc:

Thank you for replying to my message in very short time. As you have said,
I looked into the url and found that I am interested in doing
TOAST'ing in Slices Project. Can you give me some more details about this
project other than the one which is in the url given by you,like..
More details about TOAST table?
Where do I find the code corresponding to the project?
What am I supposed to submit in proposal ?

Thank you!

On 23 March 2018 at 21:03, Jaime Soler  wrote:

> you may have a look at https://wiki.postgresql.org/wiki/GSoC_2018 and
> tell us if you like some of them
>
>
> 2018-03-23 15:32 GMT+01:00 Vimal Rathod :
>
>> Hey there,
>>
>> I am Vimal currently pursuing my undergraduate degree in computer science ,I 
>> would love to work with you via
>> the GSoC program this summer.
>>
>> I wanted to know which all projects I can work on. I have good knowledge in 
>> c,java,python,SQL and other web development languages.Can anyone suggest me 
>> a good project I can contribute to? It would be of great help.
>>
>> Thanks alot in advance.Hope you reply soon.
>>
>>
>

Re: public schema default ACL

2018-03-23 Thread Noah Misch

On Thu, Mar 08, 2018 at 11:14:59PM -0800, Noah Misch wrote:
> On Thu, Mar 08, 2018 at 02:00:23PM -0500, Robert Haas wrote:
> > I also wonder why we're all convinced that this urgently needs to be
> > changed.  I agree that the default configuration we ship is not the
> > most secure configuration that we could ship.  However, I think it's a
> > big step from saying that the configuration is not as secure as it
> > could be to saying that we absolutely must change it for v11.
> 
> Did someone say that?  I, for one, wanted to change it but didn't intend to
> present it as a "must change".

In light of the mixed reception, I am withdrawing this proposal.

Re: WIP: Covering + unique indexes.

2018-03-23 Thread Peter Geoghegan

On Thu, Mar 22, 2018 at 8:23 AM, Alexander Korotkov
 wrote:
>> * There is minor formatting issue in this part of code.  Some spaces need
>> to be replaced with tabs.
>> +IndexTuple
>> +index_truncate_tuple(Relation idxrel, IndexTuple olditup)
>> +{
>> + TupleDesc   itupdesc =
>> CreateTupleDescCopyConstr(RelationGetDescr(idxrel));
>> + Datum   values[INDEX_MAX_KEYS];
>> + boolisnull[INDEX_MAX_KEYS];
>> + IndexTuple newitup;

The last time I looked at this patch, in April 2017, I made the point
that we should add something like an "nattributes" argument to
index_truncate_tuple() [1], rather than always using
IndexRelationGetNumberOfKeyAttributes() within index_truncate_tuple().
I can see that that change hasn't been made since that time.

With that approach, we can avoid relying on catalog metadata to the
same degree, which was a specific concern that Tom had around that
time. It's easy to do something with t_tid's offset, which is unused
in internal page IndexTuples. We do very similar things in GIN, where
alternative use of an IndexTuple's t_tid supports all kinds of
enhancements, some of which were not originally anticipated. Alexander
surely knows more about this than I do, since he wrote that code.

Having this index_truncate_tuple() "nattributes" argument, and storing
the number of attributes directly improves quite a lot of things:

* It makes diagnosing issues in the field quite a bit easier. Tools
like pg_filedump can do the right thing (Tom mentioned pg_filedump and
amcheck specifically). The nbtree IndexTuple format should not need to
be interpreted in a context-sensitive way, if we can avoid it.

* It lets you use index_truncate_tuple() for regular suffix truncation
in the future. These INCLUDE IndexTuples are really just a special
case of suffix truncation. At least, they should be, because otherwise
an eventual suffix truncation feature is going to be incompatible with
the INCLUDE tuple format. *Not* doing this makes suffix truncation
harder. Suffix truncation is a classic technique, first described by
Bayer in 1977, and we are very probably going to add it someday.

* Once you can tell a truncated IndexTuple from a non-truncated one
with little or no context, you can add defensive assertions in various
places where they're helpful. Similarly, amcheck can use and expect
this as a cross-check against IndexRelationGetNumberOfKeyAttributes().
This will increase confidence in the design, both initially and over
time.

I must say that I am disappointed that nothing has happened here,
especially because this really wasn't very much additional work, and
has essentially no downside. I can see that it doesn't work that way
in the Postgres Pro fork [2], and diverging from that may
inconvenience Postgres Pro, but that's a downside of forking. I don't
think that the community should have to absorb that cost.

> +Notes to Operator Class Implementors
> +
>
> Besides I really appreciate it, it seems to be unrelated to the covering
> indexes.
> I'd like this to be extracted into separate patch and be committed
> separately.

Commit 3785f7ee, from last month, moved the original "Notes to
Operator Class Implementors" section to the SGML docs. It looks like
that README section was accidentally reintroduced during rebasing. The
new information ("Included attributes in B-tree indexes") should be
moved over to the new section of the user docs -- the section that
3785f7ee added.

[1] 
https://postgr.es/m/cah2-wzm9y59h2m6izjm4fpdup5r4bsrvzgbn2gtrco1j4nz...@mail.gmail.com
[2] 
https://github.com/postgrespro/postgrespro/blob/PGPRO9_5/src/backend/access/common/indextuple.c#L451
-- 
Peter Geoghegan

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-23 Thread Alvaro Herrera

I made a bunch of further edits and I think this v10 is ready to push.
Before doing so I'll give it a final look, particularly because of the
new elog(ERROR) I added.  Post-commit review is of course always
appreciated.

Most notable change is because I noticed that if you mention an
intermediate partition level in the INSERT command, and the index is on
the top level, arbiter index selection fails to find the correct index
because it walks all the way to the top instead of stopping in the
middle, as it should (the command was still working because it ended up
with an empty arbiter index list).

To fix this, I had to completely rework the "get partition parent root"
stuff into "get list of ancestors of this partition".

Because of this, I added a new check that the partition's arbiter index
list is same length as parent's; if not, throw an error.  I couldn't get
it to fire (so it's just an elog not ereport), but maybe I just didn't
try any weird enough scenarios.

Other changes:

* I added a copyObject() call for nodes we're operating upon.  Maybe
  this is unnecessary but the comments claimed "we're working on a copy"
  and I couldn't find any place where we were actually making one.
  Anyway it seems sane to make a copy, because we're scribbling on those
  nodes ... I hope I didn't introduce any serious memory leaks.

* I made the new OnConflictSetState thing into a proper node.  Like
  ResultRelInfo, it doesn't have any niceties like nodeToString support,
  but it seems saner this way (palloc -> makeNode).  I reworked the
  formatting of that struct definition too, and renamed members.

* I removed an assertion block at the bottom of adjust_partition_tlist.
  It seemed quite pointless, since it was just about checking that the
  resno values were sorted, but by construction we already know that
  they are indeed sorted ...

* General code style improvements, comment rewording, etc.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From d87cd6154fe026f7641e98c8a43683b208a61f5b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 22 Mar 2018 19:12:57 -0300
Subject: [PATCH v10] on conflict

---
 doc/src/sgml/ddl.sgml |  15 --
 doc/src/sgml/ref/insert.sgml  |   8 +
 src/backend/catalog/partition.c   |  88 --
 src/backend/executor/execMain.c   |   4 +
 src/backend/executor/execPartition.c  | 229 --
 src/backend/executor/nodeModifyTable.c|  74 +++--
 src/backend/parser/analyze.c  |   7 -
 src/include/catalog/partition.h   |   1 +
 src/include/nodes/execnodes.h |  22 ++-
 src/include/nodes/nodes.h |   1 +
 src/test/regress/expected/insert_conflict.out | 108 ++--
 src/test/regress/expected/triggers.out|  33 
 src/test/regress/sql/insert_conflict.sql  |  95 +--
 src/test/regress/sql/triggers.sql |  33 
 14 files changed, 635 insertions(+), 83 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 3a54ba9d5a..8805b88d82 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3324,21 +3324,6 @@ ALTER TABLE measurement ATTACH PARTITION 
measurement_y2008m02
 
  
   
-   Using the ON CONFLICT clause with partitioned tables
-   will cause an error if the conflict target is specified (see
-for more details on how the clause
-   works).  Therefore, it is not possible to specify
-   DO UPDATE as the alternative action, because
-   specifying the conflict target is mandatory in that case.  On the other
-   hand, specifying DO NOTHING as the alternative action
-   works fine provided the conflict target is not specified.  In that case,
-   unique constraints (or exclusion constraints) of the individual leaf
-   partitions are considered.
-  
- 
-
- 
-  
When an UPDATE causes a row to move from one
partition to another, there is a chance that another concurrent
UPDATE or DELETE misses this row.
diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml
index 134092fa9c..62e142fd8e 100644
--- a/doc/src/sgml/ref/insert.sgml
+++ b/doc/src/sgml/ref/insert.sgml
@@ -518,6 +518,14 @@ INSERT INTO table_name [ AS 
+
+   
+Note that it is currently not supported for the
+ON CONFLICT DO UPDATE clause of an
+INSERT applied to a partitioned table to update the
+partition key of a conflicting row such that it requires the row be moved
+to a new partition.
+   

 
  It is often preferable to use unique index inference rather than
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 53855f5088..b00a986432 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6

Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

2018-03-23 Thread Michael Paquier

On Fri, Mar 23, 2018 at 11:06:48AM +, Simon Riggs wrote:
> Your assumption that I would commit a new patch that was 29 mins old
> is frankly pretty ridiculous, so yes, lets keep calm.

When a committer says that a patch is "ready for commit" and that he
calls for "last objections", I am understanding that you would be ready
to commit the patch from the moment such an email has been sent.  Am I
the only one to think so?  Now let's look at the numbers:
- The last patch sent is a v2, which implements a completely new
approach compared to v1.  This is a non-trivial patch which touches
sensitive parts of the code.
- v2 has been sent exactly two weeks after the last email exchanged on
this thread.
- Per the data publicly available, it took less than 30 minutes to
review the patch, and there are zero comments about its contents.
I do patch review on a more-or-less daily basis, and look at threads on
hackers on a daily basis, but I really rarely see such an "efficient"
review pattern.  You and Pavan have likely discussed the patch offline,
but nobody can guess what has been discussed and what have been the
arguments exchanged.

> Enjoy your weekend and I'll be happy to read your review on Monday.

Er.  So this basically means that I need to do a commitment to look at
this patch in such a short time frame?  If you are asking for reviews,
doing such requests by asking a proper question rather than by implying
it in an affirmation would seem more adapted to me, so this email bit is
making me uncomfortable.  My apologies if I am not able to catch the
nuance in those words.
--
Michael


signature.asc
Description: PGP signature

Re: Sample values for pg_stat_statements

2018-03-23 Thread legrand legrand

+1

If pgss had a PlanId column (just after QueryId), that would be wonderfull
;o)

Question: Is there a simple way to "un-normalize" the query (I mean rebuild
the original query as it was before normalization) ? 

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Removing useless DISTINCT clauses

2018-03-23 Thread David Rowley

On 24 March 2018 at 01:42, Jim Finnerty  wrote:
> Distinctness can also be preserved across joins, so if you have a 'snowflake
> query' type join, where all the joins are to a unique key, then the
> distinctness of the other side of the join is preserved.  For example, a
> SELECT DISTINCT * FROM fact_table ... that joins from each column in its
> compound primary key to a unique key of another (dimension) table would
> remain distinct, and so you could drop the DISTINCT from the query.

I'm aware. It is something I'm interested in but would require several
orders of magnitude more work than what I've done for this patch. You
may have noticed the other work I did a while back to detect if joins
cause row duplicate or not, so it's certainly something I've thought
about.

If Amazon would like to sponsor work in this area then please see [1].

It certainly would be great to see that happen.

[1] https://wiki.postgresql.org/wiki/How_to_sponsor_a_feature

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Removing useless DISTINCT clauses

2018-03-23 Thread David Rowley

On 24 March 2018 at 05:55, Melanie Plageman  wrote:
> I was just wondering, since get_primary_key_attnos opens pg_constraint and
> just skips attributes with other constraint types than primary, what would
> be the reason we wouldn't just also open pg_attribute and check for the
> non-nullness of the non-primary key unique attributes?
> Couldn't we add a function which opens both pg_attribute and pg_constraint
> to get non-null unique attributes?
> I suppose that would have a performance impact.
> And, I did notice the FIXME in the comment before check_functional_grouping
> which seems to make the argument that there are other use cases for wanting
> the non-nullness represented in pg_constraint.

It's certainly possible to do more here.  I'm uncertain what needs to
be done in regards to cached plan invalidation, but we'd certainly
need to ensure cached plans are invalidated whenever the attnotnull is
changed.

There would be no need to look at the pg_attribute table directly. A
syscache function would just need added named get_attnotnull() which
would be akin to something like get_atttypmod().

I'm personally not planning on doing anything in that regard for this
patch. We have 1 week to go before PG11 feature freeze. My goals for
this patch was to apply the same optimization to DISTINCT as I did for
GROUP BY a while back, which is exactly what the patch does. I agree
that more would be nice, but unfortunately, time is finite.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Removing useless DISTINCT clauses

2018-03-23 Thread David Rowley

On 22 March 2018 at 18:58, David Rowley  wrote:
> On 21 March 2018 at 16:29, Melanie Plageman  wrote:
>> The new status of this patch is: Waiting on Author
>
> Thanks for reviewing this.  I've attached an updated patch. I'll set
> back to waiting for review again.

I've attached another version of the patch. The only change is a
thinko in a comment which claimed it may be possible to do something
with DISTINCT ON clauses, but on further thought, I don't think that's
possible, so I've updated the comment to reflect my new thoughts.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

remove_useless_distinct_clauses_v5.patch
Description: Binary data

Re: Backend memory dump analysis

2018-03-23 Thread Teodor Sigaev


Hi!

Some help you could get from
https://github.com/postgrespro/memstat

Vladimir Sitnikov wrote:

Hi,

I investigate an out of memory-related case for PostgreSQL 9.6.5, and it 
looks like MemoryContextStatsDetail + gdb are the only friends there.

--

Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-23 Thread Peter Geoghegan

On Fri, Mar 23, 2018 at 6:55 AM, Simon Riggs  wrote:
> Peter, if you have the code and you consider it important that this
> subfeature is in PostgreSQL, why not post the code so we can commit
> it?

Fair enough. Attached patch shows what I'm on about. This should be
applied on top of 0001_merge_v23e_onconflict_work.patch +
0002_merge_v23e_main.patch. I'm not expecting an authorship credit for
posting this patch.

One thing that the test output shows that is interesting is that there
is never a "SubPlan 1" or "InitPlan 1" in EXPLAIN output -- it seems
to always start at "SubPlan 2". This probably has nothing to do with
CTEs in particular. I didn't notice this before now, although there
were no existing tests of EXPLAIN in the patch that show subplans or
initplans.

Is this somehow related to the issue of using two RTEs for the target
relation? That's certainly why we always see unaliased target table
"m" with the alias "m_1" in EXPLAIN output, so I would not be
surprised if it caused another EXPLAIN issue.

-- 
Peter Geoghegan
From c2aaea9d7e87ffa6075408f61c0748d31d61312a Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Fri, 9 Mar 2018 17:12:42 -0800
Subject: [PATCH] support CTEs with MERGE

---
 src/backend/nodes/copyfuncs.c   |   1 +
 src/backend/nodes/equalfuncs.c  |   1 +
 src/backend/nodes/nodeFuncs.c   |   2 +
 src/backend/parser/gram.y   |  11 +--
 src/backend/parser/parse_merge.c|   9 +++
 src/include/nodes/parsenodes.h  |   1 +
 src/test/regress/expected/merge.out |   3 -
 src/test/regress/expected/with.out  | 132 
 src/test/regress/sql/with.sql   |  51 ++
 9 files changed, 203 insertions(+), 8 deletions(-)

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 770ed3b1a8..c3efca3c45 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3055,6 +3055,7 @@ _copyMergeStmt(const MergeStmt *from)
 	COPY_NODE_FIELD(source_relation);
 	COPY_NODE_FIELD(join_condition);
 	COPY_NODE_FIELD(mergeActionList);
+	COPY_NODE_FIELD(withClause);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 5a0151eece..45ceba2830 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1051,6 +1051,7 @@ _equalMergeStmt(const MergeStmt *a, const MergeStmt *b)
 	COMPARE_NODE_FIELD(source_relation);
 	COMPARE_NODE_FIELD(join_condition);
 	COMPARE_NODE_FIELD(mergeActionList);
+	COMPARE_NODE_FIELD(withClause);
 
 	return true;
 }
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 68e2cec66e..7106765e2b 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3446,6 +3446,8 @@ raw_expression_tree_walker(Node *node,
 	return true;
 if (walker(stmt->mergeActionList, context))
 	return true;
+if (walker(stmt->withClause, context))
+	return true;
 			}
 			break;
 		case T_SelectStmt:
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index ebca5f3eb7..419a86f2ba 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -11105,17 +11105,18 @@ set_target_list:
  */
 
 MergeStmt:
-			MERGE INTO relation_expr_opt_alias
+			opt_with_clause MERGE INTO relation_expr_opt_alias
 			USING table_ref
 			ON a_expr
 			merge_when_list
 {
 	MergeStmt *m = makeNode(MergeStmt);
 
-	m->relation = $3;
-	m->source_relation = $5;
-	m->join_condition = $7;
-	m->mergeActionList = $8;
+	m->relation = $4;
+	m->source_relation = $6;
+	m->join_condition = $8;
+	m->mergeActionList = $9;
+	m->withClause = $1;
 
 	$$ = (Node *)m;
 }
diff --git a/src/backend/parser/parse_merge.c b/src/backend/parser/parse_merge.c
index c3981de0b5..dffa889c5b 100644
--- a/src/backend/parser/parse_merge.c
+++ b/src/backend/parser/parse_merge.c
@@ -24,6 +24,7 @@
 #include "parser/parsetree.h"
 #include "parser/parser.h"
 #include "parser/parse_clause.h"
+#include "parser/parse_cte.h"
 #include "parser/parse_merge.h"
 #include "parser/parse_relation.h"
 #include "parser/parse_target.h"
@@ -200,6 +201,14 @@ transformMergeStmt(ParseState *pstate, MergeStmt *stmt)
 
 	qry->commandType = CMD_MERGE;
 
+	/* process the WITH clause independently of all else */
+	if (stmt->withClause)
+	{
+		qry->hasRecursive = stmt->withClause->recursive;
+		qry->cteList = transformWithClause(pstate, stmt->withClause);
+		qry->hasModifyingCTE = pstate->p_hasModifyingCTE;
+	}
+
 	/*
 	 * Check WHEN clauses for permissions and sanity
 	 */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 0c904f4d7f..36e6e2e976 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1519,6 +1519,7 @@ typedef struct MergeStmt
 	Node	   *source_relation;	/* source relation */

Re: Backend memory dump analysis

2018-03-23 Thread Tom Lane

Andres Freund  writes:
> On 2018-03-23 15:12:43 -0400, Tom Lane wrote:
>> Well, in the cases I'm thinking of at the moment, there's no handy Node
>> to point at, just module-private structs like PLpgSQL_function.

> Well, the cases Vladimir were concerned about seem less clear
> though. It'd be nice if we could just point to a CachedPlanSource and
> such.

You could imagine adding *two* pointers to memory contexts, a callback
function and an arg to pass to it, so that the callback localizes the
knowledge of how to dig an identifier string out of whatever struct
is involved.  I really doubt this is worth that much overhead though.
I think all of the actually interesting cases have a string available
already (though I might find out differently while doing the patch).
Furthermore, if they don't have a string available already, I'm not
real clear on how the callback would create one without doing a palloc.

> I'm not that sure there aren't easy way to overcome those - couldn't we
> "just" make FmgrInfo etc be tagged types? The space overhead of that
> can't matter in comparison to the size of the relevant structs.

Not for extensions, eg PLs, which would be one of the bigger use-cases
IMO.

regards, tom lane

Re: Backend memory dump analysis

2018-03-23 Thread Andres Freund

On 2018-03-23 15:12:43 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-03-23 14:33:25 -0400, Tom Lane wrote:
> >> +   MemoryContextSetIdentifier(func_cxt, function->fn_signature);
> >> 
> >> This would cost an extra char * field in struct MemoryContextData,
> >> which is slightly annoying but it doesn't exactly seem like a killer.
> >> Then the memory stats dump code would just need to know to print this
> >> field if it isn't NULL.
> 
> > That's not a bad idea. How about storing a Node* instead of a char*?
> > Then we could have MemoryContextStats etc support digging out details
> > for a few types, without having to generate strings at runtime.
> 
> Well, in the cases I'm thinking of at the moment, there's no handy Node
> to point at, just module-private structs like PLpgSQL_function.

Well, the cases Vladimir were concerned about seem less clear
though. It'd be nice if we could just point to a CachedPlanSource and
such.

> So doing anything like that would add nonzero overhead to construct
> something.

I'm not that sure there aren't easy way to overcome those - couldn't we
"just" make FmgrInfo etc be tagged types? The space overhead of that
can't matter in comparison to the size of the relevant structs.

> There's also the fact that we don't want MemoryContextStats doing
> anything very complicated, because of the risk of failure and the
> likelihood that any attempt to palloc would fail (if we're there
> because we're up against OOM already).

That's true. But I'm not sure there's a meaningful difference in risk
here. Obviously you shouldn't try to print a node tree or something, but
an if statement looking 

Greetings,

Andres Freund

Re: Backend memory dump analysis

2018-03-23 Thread Tom Lane

Andres Freund  writes:
> On 2018-03-23 14:33:25 -0400, Tom Lane wrote:
>> +   MemoryContextSetIdentifier(func_cxt, function->fn_signature);
>> 
>> This would cost an extra char * field in struct MemoryContextData,
>> which is slightly annoying but it doesn't exactly seem like a killer.
>> Then the memory stats dump code would just need to know to print this
>> field if it isn't NULL.

> That's not a bad idea. How about storing a Node* instead of a char*?
> Then we could have MemoryContextStats etc support digging out details
> for a few types, without having to generate strings at runtime.

Well, in the cases I'm thinking of at the moment, there's no handy Node
to point at, just module-private structs like PLpgSQL_function.  So doing
anything like that would add nonzero overhead to construct something.
Not sure we want to pay that.  There's also the fact that we don't want
MemoryContextStats doing anything very complicated, because of the risk of
failure and the likelihood that any attempt to palloc would fail (if we're
there because we're up against OOM already).

>> If we wanted to do this I'd suggest sneaking it into v11, so that
>> if people have to adapt their code because of 9fa6f00b1 breaking
>> usages with nonconstant context names, they have a solution to turn to
>> immediately rather than having to change things again in v12.

> Yea, that'd make sense.

I'll put together a draft patch.

regards, tom lane

Re: Backend memory dump analysis

2018-03-23 Thread Andres Freund

Hi,

On 2018-03-23 14:33:25 -0400, Tom Lane wrote:
> func_cxt = AllocSetContextCreate(TopMemoryContext,
>  "PL/pgSQL function context",
>  ALLOCSET_DEFAULT_SIZES);
> plpgsql_compile_tmp_cxt = MemoryContextSwitchTo(func_cxt);
> 
> function->fn_signature = format_procedure(fcinfo->flinfo->fn_oid);
> +   MemoryContextSetIdentifier(func_cxt, function->fn_signature);
> function->fn_oid = fcinfo->flinfo->fn_oid;
> function->fn_xmin = HeapTupleHeaderGetRawXmin(procTup->t_data);
> 
> This would cost an extra char * field in struct MemoryContextData,
> which is slightly annoying but it doesn't exactly seem like a killer.
> Then the memory stats dump code would just need to know to print this
> field if it isn't NULL.

That's not a bad idea. How about storing a Node* instead of a char*?
Then we could have MemoryContextStats etc support digging out details
for a few types, without having to generate strings at runtime.

> If we wanted to do this I'd suggest sneaking it into v11, so that
> if people have to adapt their code because of 9fa6f00b1 breaking
> usages with nonconstant context names, they have a solution to turn to
> immediately rather than having to change things again in v12.

Yea, that'd make sense.

Greetings,

Andres Freund

Re: ppc64le support in 9.3 branch?

2018-03-23 Thread Magnus Hagander

On Fri, Mar 23, 2018 at 7:58 PM, Andres Freund  wrote:

> Hi,
>
> On 2018-03-23 14:54:46 -0400, Tom Lane wrote:
> > So I see somebody at 2ndQ has set up a bunch of ppc64le buildfarm
> > members, which I applaud.  But they're all failing on the 9.3 branch,
> > because we lack support for that architecture in that branch.
> >
> > Does anyone have the stomach for trying to add such support?  The minimum
> > requirement would be to back-patch 9.4's config.guess and config.sub,
> > because that's where the builds are falling over right now.  I wouldn't
> be
> > too afraid of that, but what is not clear is what portability issues
> might
> > be lurking beyond that.  I could not find any specific mention of ppc64
> > in the git changelogs, but that doesn't mean there weren't any other 9.4
> > fixes that might need to be back-ported.
> >
> > It's hard to justify putting in very much effort to add new-platform
> > support in a branch that's scheduled to die in six months, so I'm not
> > sure what to do.  Should we just tell 2ndQ not to bother running those
> > animals on 9.3?  Or should we make at least a bit of effort towards
> > making it work?
>
> > The compromise I'm inclined to offer is to see what happens if we
> > back-patch 9.4's config.guess and config.sub.  If that makes these
> > animals go green, and doesn't break any others, we'll call it good.
> > Otherwise, we revert that change and say we're not putting any
> > additional effort into it.
>
> I'm inclined to just ask them to stop running the animals on that
> branch. There are no pre-existing users on 9.3 ppc64le, and new
> customers hopefully won't move to 9.3. ISTM backpatching is riskier than
> just changing a bunch of buildfarm configurations.
>
>
+1 for dropping it.



-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: ppc64le support in 9.3 branch?

2018-03-23 Thread Andres Freund

Hi,

On 2018-03-23 14:54:46 -0400, Tom Lane wrote:
> So I see somebody at 2ndQ has set up a bunch of ppc64le buildfarm
> members, which I applaud.  But they're all failing on the 9.3 branch,
> because we lack support for that architecture in that branch.
> 
> Does anyone have the stomach for trying to add such support?  The minimum
> requirement would be to back-patch 9.4's config.guess and config.sub,
> because that's where the builds are falling over right now.  I wouldn't be
> too afraid of that, but what is not clear is what portability issues might
> be lurking beyond that.  I could not find any specific mention of ppc64
> in the git changelogs, but that doesn't mean there weren't any other 9.4
> fixes that might need to be back-ported.
>
> It's hard to justify putting in very much effort to add new-platform
> support in a branch that's scheduled to die in six months, so I'm not
> sure what to do.  Should we just tell 2ndQ not to bother running those
> animals on 9.3?  Or should we make at least a bit of effort towards
> making it work?

> The compromise I'm inclined to offer is to see what happens if we
> back-patch 9.4's config.guess and config.sub.  If that makes these
> animals go green, and doesn't break any others, we'll call it good.
> Otherwise, we revert that change and say we're not putting any
> additional effort into it.

I'm inclined to just ask them to stop running the animals on that
branch. There are no pre-existing users on 9.3 ppc64le, and new
customers hopefully won't move to 9.3. ISTM backpatching is riskier than
just changing a bunch of buildfarm configurations.

Greetings,

Andres Freund

ppc64le support in 9.3 branch?

2018-03-23 Thread Tom Lane

So I see somebody at 2ndQ has set up a bunch of ppc64le buildfarm
members, which I applaud.  But they're all failing on the 9.3 branch,
because we lack support for that architecture in that branch.

Does anyone have the stomach for trying to add such support?  The minimum
requirement would be to back-patch 9.4's config.guess and config.sub,
because that's where the builds are falling over right now.  I wouldn't be
too afraid of that, but what is not clear is what portability issues might
be lurking beyond that.  I could not find any specific mention of ppc64
in the git changelogs, but that doesn't mean there weren't any other 9.4
fixes that might need to be back-ported.

It's hard to justify putting in very much effort to add new-platform
support in a branch that's scheduled to die in six months, so I'm not
sure what to do.  Should we just tell 2ndQ not to bother running those
animals on 9.3?  Or should we make at least a bit of effort towards
making it work?

The compromise I'm inclined to offer is to see what happens if we
back-patch 9.4's config.guess and config.sub.  If that makes these
animals go green, and doesn't break any others, we'll call it good.
Otherwise, we revert that change and say we're not putting any
additional effort into it.

regards, tom lane

Re: Re: csv format for psql

2018-03-23 Thread Pavel Stehule

2018-03-23 18:55 GMT+01:00 Fabien COELHO :

>
> Hello Daniel,
>
> Do you know when you'll have an updated patch that addresses the minor
>>> issues brought up in review and the concern above?
>>>
>>
>> Here's an update addressing the issues discussed:
>>
>> - fieldsep and recordsep are used, no more fieldsep_csv
>> - the command line option is --csv without short option and is equivalent
>> to -P format=csv -P fieldsep=','
>> - pset output formats are reordered alphabetically on display
>> - a couple more cases in the regression tests
>>
>
> Patch applies cleanly, compiles, doc gen ok, "make check" ok.
>
> The patch adds a simple way to generate csv output from "psql" queries,
> much simpler than playing around with COPY or \copy. It allows to generate
> a clean CSV dump from something as short as:
>
>   sh> psql --csv -c 'TABLE foo' > foo.csv
>
> Documentation is clear.
>
> Test cover a significant number of cases (fieldsep, expanded, tuples-only).
> Although recordsep changes are not actually tested, it worked interactively
> and I think that tests are sufficient as is.
>
> There are somehow remaining point about which a committer/other people
> input
> would be nice:
>
> (1) There are some mild disagreement whether the fieldsep should be format
> specific shared with other format. I do not think that a specific
> fieldsep
> is worth it, but this is a marginal preference, and other people
> opinion
> differ. What is best is not obvious.
>
> Pavel also suggested to have a special handling based on whether
> the fieldsep is explicitely set or not. I'm not too keen on that
> because
> it departs significantly from the way psql formatting is currently
> handled, and what is happening becomes unclear to the user.
>
> (2) For interactive use, two commands are required: \pset format csv +
> \pset fieldsep ',' (or ';' or '\t' or whatever...). Maybe some \csv
> command similar  to \H would be appropriate, or not, to set both values
> more efficiently. Could be something for another patch.
>
> Not sure what is the status of the patch if we do not have a clear
> consensus.
>

I am sorry, but I don't think so this interface is good enough. Using | as
default CSV separator is just wrong. It and only it is a problem. Any other
is perfect.

Regards

Pavel


>
> --
> Fabien.
>
>

Re: Re: csv format for psql

2018-03-23 Thread Pavel Stehule

2018-03-23 12:59 GMT+01:00 Daniel Verite :

> Pavel Stehule wrote:
>
> > It should not be hard. All formats can has '|' like now, and csv can
> have a
> > ',' - then if field separator is not explicit, then default field
> separator
> > is used, else specified field separator is used.
> >
> > You can see my idea in attached patch
>
> With that patch, consider this sequence:
>
> postgres=# \pset format csv
> Output format is csv.
> postgres=# \pset fieldsep
> Field separator is "|".
> postgres=# select 1 as a,2 as b;
> a,b
> 1,2
>
> Here psql reports that fieldsep is "|" and right away is using something
> else in the output. That doesn't look good.
>

yes - my patch was proof concept - nothing more.

But this can be simply solved - if we have a table of default field
separator, then if separator is not explicit, then default for used format
is printed.

>
> You may object that it's fixable by tweaking the output of \pset,
> \pset fieldsep, and \? variables so that it knows that the current
> output format is going to use a "hidden" default separator, and
> then these commands should display that value instead.
> But that'd be somewhat playing whack-a-mole, as the following
> sequence would now be possible, with '|' being used  as
> the separator instead of the ',' reported just above:
>
> postgres=# \pset format csv
> Output format is csv.
> postgres=# \pset fieldsep
> Field separator is ",".
> postgres=# \a
> Output format is aligned.
> postgres=# select 1 as a,2 as b;
>  a | b
> ---+---
>  1 | 2
>

I am sorry, but path that I sent was just proof concept - I didn't
implement defaults for any other related formats.

I'll try to send cleaner patch tomorrow.

Regards

Pavel


>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite
>

Re: Backend memory dump analysis

2018-03-23 Thread Tom Lane

Andres Freund  writes:
> On 2018-03-23 18:05:38 +, Vladimir Sitnikov wrote:
>> For instance, I assume statament cache is stored in some sort of a hash
>> table, so there should be a way to enumerate it in a programmatic way. Of
>> course it would take time, however I do not think it creates cpu/memory
>> overheads. The overhead is to maintain "walker" code.

> Sure, you could, entirely independent of the memory stats dump, do
> that. But what information would you actually gain from it? Which row
> something in the catcache belongs to isn't *that* interesting.

It'd certainly be easy to define this in a way that makes it require
a bunch of support code, which we'd be unlikely to want to write and
maintain.  However, I've often wished that the contexts in a memory
dump were less anonymous.  If you didn't just see a pile of "PL/pgSQL
function context" entries, but could (say) see the name of each function,
that would be a big step forward.  Similarly, if we could see the source
text for each CachedPlanSource in a dump, that'd be useful.  I mention
these things because we do actually store them already, in many cases
--- but the memory stats code doesn't know about them.

Now, commit 9fa6f00b1 already introduced a noticeable penalty for
contexts with nonconstant names, so trying to stick extra info like
this into the context name is not appetizing.  But what if we allowed
the context name to have two parts, a fixed part and a variable part?
We could actually require that the fixed part be a compile-time-constant
string, simplifying matters on that end.  The variable part would best
be assigned later than initial context creation, because you'd need a
chance to copy the string into the context before pointing to it.
So maybe, for contexts where this is worth doing, it'd look something
like this for plpgsql:

func_cxt = AllocSetContextCreate(TopMemoryContext,
 "PL/pgSQL function context",
 ALLOCSET_DEFAULT_SIZES);
plpgsql_compile_tmp_cxt = MemoryContextSwitchTo(func_cxt);

function->fn_signature = format_procedure(fcinfo->flinfo->fn_oid);
+   MemoryContextSetIdentifier(func_cxt, function->fn_signature);
function->fn_oid = fcinfo->flinfo->fn_oid;
function->fn_xmin = HeapTupleHeaderGetRawXmin(procTup->t_data);

This would cost an extra char * field in struct MemoryContextData,
which is slightly annoying but it doesn't exactly seem like a killer.
Then the memory stats dump code would just need to know to print this
field if it isn't NULL.

If we wanted to do this I'd suggest sneaking it into v11, so that
if people have to adapt their code because of 9fa6f00b1 breaking
usages with nonconstant context names, they have a solution to turn to
immediately rather than having to change things again in v12.

Thoughts?

regards, tom lane

Re: Re: csv format for psql

2018-03-23 Thread Fabien COELHO



Hello Daniel,


Do you know when you'll have an updated patch that addresses the minor
issues brought up in review and the concern above?


Here's an update addressing the issues discussed:

- fieldsep and recordsep are used, no more fieldsep_csv
- the command line option is --csv without short option and is equivalent
to -P format=csv -P fieldsep=','
- pset output formats are reordered alphabetically on display
- a couple more cases in the regression tests


Patch applies cleanly, compiles, doc gen ok, "make check" ok.

The patch adds a simple way to generate csv output from "psql" queries, 
much simpler than playing around with COPY or \copy. It allows to generate 
a clean CSV dump from something as short as:


  sh> psql --csv -c 'TABLE foo' > foo.csv

Documentation is clear.

Test cover a significant number of cases (fieldsep, expanded, tuples-only).
Although recordsep changes are not actually tested, it worked interactively
and I think that tests are sufficient as is.

There are somehow remaining point about which a committer/other people input
would be nice:

(1) There are some mild disagreement whether the fieldsep should be format
specific shared with other format. I do not think that a specific fieldsep
is worth it, but this is a marginal preference, and other people opinion
differ. What is best is not obvious.

Pavel also suggested to have a special handling based on whether
the fieldsep is explicitely set or not. I'm not too keen on that because
it departs significantly from the way psql formatting is currently
handled, and what is happening becomes unclear to the user.

(2) For interactive use, two commands are required: \pset format csv +
\pset fieldsep ',' (or ';' or '\t' or whatever...). Maybe some \csv
command similar  to \H would be appropriate, or not, to set both values
more efficiently. Could be something for another patch.

Not sure what is the status of the patch if we do not have a clear 
consensus.


--
Fabien.

Re: Backend memory dump analysis

2018-03-23 Thread Andres Freund

On 2018-03-23 18:05:38 +, Vladimir Sitnikov wrote:
> Andres>The overhead required for it (in cycles, in higher memory usage due
> to
> additional bookeeping
> 
> Does that mean the memory contexts are unparseable? (there's not enough
> information to enumerate contents)

You can enumerate them (that's what the stats dump you're referring to
do), but you can't associate them with individual statements etc without
further bookepping.

> What if memory dump is produced by walking the C structures?

We don't know the types of individual allocations.

> For instance, I assume statament cache is stored in some sort of a hash
> table, so there should be a way to enumerate it in a programmatic way. Of
> course it would take time, however I do not think it creates cpu/memory
> overheads. The overhead is to maintain "walker" code.

Sure, you could, entirely independent of the memory stats dump, do
that. But what information would you actually gain from it? Which row
something in the catcache belongs to isn't *that* interesting.

- Andres

Re: Backend memory dump analysis

2018-03-23 Thread Vladimir Sitnikov

Andres>The overhead required for it (in cycles, in higher memory usage due
to
additional bookeeping

Does that mean the memory contexts are unparseable? (there's not enough
information to enumerate contents)

What if memory dump is produced by walking the C structures?
For instance, I assume statament cache is stored in some sort of a hash
table, so there should be a way to enumerate it in a programmatic way. Of
course it would take time, however I do not think it creates cpu/memory
overheads. The overhead is to maintain "walker" code.

Vladimir

Re: Backend memory dump analysis

2018-03-23 Thread Andres Freund

Hi,

On 2018-03-23 16:18:52 +, Vladimir Sitnikov wrote:
> Hi,
> 
> I investigate an out of memory-related case for PostgreSQL 9.6.5, and it
> looks like MemoryContextStatsDetail + gdb are the only friends there.
> 
> MemoryContextStatsDetail does print some info, however it is rarely
> possible to associate the used memory with business cases.
> For insance:
>CachedPlanSource: 146224 total in 8 blocks; 59768 free (3 chunks); 86456
> used
>   CachedPlanQuery: 130048 total in 7 blocks; 29952 free (2 chunks);
> 100096 used
> 
> It does look like a 182KiB has been spent for some SQL, however there's no
> clear way to tell which SQL is to blame.
> 
> Another case: PL/pgSQL function context: 57344 total in 3 blocks; 17200
> free (2 chunks); 40144 used
> It is not clear what is there inside, which "cached plans" are referenced
> by that pgsql context (if any), etc.
> 
> It would be great if there was an ability to dump the memory in a
> machine-readable format (e.g. Java's HPROF).
> 
> Eclipse Memory Analyzer (https://www.eclipse.org/mat/) can visualize Java
> memory dumps quite well, and I think HPROF format is trivial to generate
> (the generation is easy, the hard part is to parse memory contents).
> That is we could get analysis UI for free if PostgreSQL produces the dump.
> 
> Is it something welcome or non-welcome?
> Is it something worth including in-core?

The overhead required for it (in cycles, in higher memory usage due to
additional bookeeping, in maintenance) makes me highly doubtful it's
worth going there. While I definitely can see the upside, it doesn't
seem to justify the cost.

Greetings,

Andres Freund

Re: file cloning in pg_upgrade and CREATE DATABASE

2018-03-23 Thread Bruce Momjian

I think this documentation change:

+   leaving the old cluster untouched.  At present, this is supported 
on Linux
-

would be better by changing "untouched" to "unmodified".

Also, it would be nice if users could easily know if pg_upgrade is going
to use COW or not because it might affect whether they choose --link or
not.  Right now it seems unclear how a user would know.  Can we have
pg_upgrade --check perhaps output something.  Can we also have the
pg_upgrade status display indicate that too, e.g. change

Copying user relation files

to

Copying (copy-on-write) user relation files

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Removing useless DISTINCT clauses

2018-03-23 Thread Melanie Plageman

On Thu, Mar 22, 2018 at 5:20 PM, David Rowley 
wrote:

> The problem is that in order to properly invalidate cached plans we
> must record the constraint OIDs which the plan depends on.  We can do
> that for PK and UNIQUE constraints, unfortunately, we can't do it for
> NOT NULL constraints as, at the moment, these are just properties of
> pg_attribute.  For this to be made to work we'd need to store those in
> pg_constraint too, which is more work that I'm going to do for this
> patch.
>

I was just wondering, since get_primary_key_attnos opens pg_constraint and
just skips attributes with other constraint types than primary, what would
be the reason we wouldn't just also open pg_attribute and check for the
non-nullness of the non-primary key unique attributes?
Couldn't we add a function which opens both pg_attribute and pg_constraint
to get non-null unique attributes?
I suppose that would have a performance impact.
And, I did notice the FIXME in the comment before check_functional_grouping
which seems to make the argument that there are other use cases for wanting
the non-nullness represented in pg_constraint.

 * Currently we only check to see if the rel has a primary key that is a
 * subset of the grouping_columns.  We could also use plain unique
constraints
 * if all their columns are known not null, but there's a problem: we need
 * to be able to represent the not-null-ness as part of the constraints
added
 * to *constraintDeps.  FIXME whenever not-null constraints get represented
 * in pg_constraint.
 */
-- 
Melanie Plageman

Re: PATCH: Exclude temp relations from base backup

2018-03-23 Thread David Steele

On 3/13/18 12:34 PM, David Steele wrote:

> Updated the patch to change die() to BAIL_OUT() and use append_to_file()
> as suggested for another test patch.

Updated patch now that the unlogged table exclusions have been committed
[1].

Thanks,
-- 
-David
da...@pgmasters.net

[1]
https://www.postgresql.org/message-id/4d9be1c0-5c58-d9a0-7152-2771224910ae%40sigaev.ru
From e4fc8ff2b41091c0e266ecde8f91471adca26f9c Mon Sep 17 00:00:00 2001
From: David Steele 
Date: Fri, 23 Mar 2018 12:39:59 -0400
Subject: [PATCH 1/1] Exclude temporary relations from base backup.

Exclude temporary relations during a base backup using the existing 
looks_like_temp_rel_name() function for identification.
---
 doc/src/sgml/protocol.sgml   |  2 +-
 src/backend/replication/basebackup.c | 10 ++
 src/backend/storage/file/fd.c|  3 +-
 src/bin/pg_basebackup/t/010_pg_basebackup.pl | 49 +++-
 src/include/storage/fd.h |  1 +
 5 files changed, 61 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 472bd3ef69..8c488506fa 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2565,7 +2565,7 @@ The commands accepted in replication mode are:
 
  Various temporary files and directories created during the operation
  of the PostgreSQL server, such as any file or directory beginning
- with pgsql_tmp.
+ with pgsql_tmp and temporary relations.
 


diff --git a/src/backend/replication/basebackup.c 
b/src/backend/replication/basebackup.c
index eb6eb7206d..189e870b9d 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1071,6 +1071,16 @@ sendDir(const char *path, int basepathlen, bool 
sizeonly, List *tablespaces,
}
}
 
+   /* Exclude temporary relations */
+   if (isDbDir && looks_like_temp_rel_name(de->d_name))
+   {
+   elog(DEBUG2,
+"temporary relation file \"%s\" excluded from 
backup",
+de->d_name);
+
+   continue;
+   }
+
snprintf(pathbuf, sizeof(pathbuf), "%s/%s", path, de->d_name);
 
/* Skip pg_control here to back up it last */
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 2a18e94ff4..d30a725f90 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -325,7 +325,6 @@ static void RemovePgTempFilesInDir(const char *tmpdirname, 
bool missing_ok,
   bool unlink_all);
 static void RemovePgTempRelationFiles(const char *tsdirname);
 static void RemovePgTempRelationFilesInDbspace(const char *dbspacedirname);
-static bool looks_like_temp_rel_name(const char *name);
 
 static void walkdir(const char *path,
void (*action) (const char *fname, bool isdir, int elevel),
@@ -3192,7 +3191,7 @@ RemovePgTempRelationFilesInDbspace(const char 
*dbspacedirname)
 }
 
 /* t_, or t__ */
-static bool
+bool
 looks_like_temp_rel_name(const char *name)
 {
int pos;
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl 
b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 455c7fca0d..e6018de054 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -2,9 +2,10 @@ use strict;
 use warnings;
 use Cwd;
 use Config;
+use File::Basename qw(basename dirname);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 87;
+use Test::More tests => 93;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -76,6 +77,18 @@ my $baseUnloggedPath = $node->safe_psql('postgres',
 ok(-f "$pgdata/${baseUnloggedPath}_init", 'unlogged init fork in base');
 ok(-f "$pgdata/$baseUnloggedPath", 'unlogged main fork in base');
 
+# Create files that look like temporary relations to ensure they are ignored.
+my $postgresOid = $node->safe_psql('postgres',
+   q{select oid from pg_database where datname = 'postgres'});
+
+my @tempRelationFiles = qw(t999_999 t_999.1 t999__vm 
t9_9_vm.1);
+
+foreach my $filename (@tempRelationFiles)
+{
+   append_to_file("$pgdata/base/$postgresOid/$filename", 'TEMP_RELATION');
+}
+
+# Run base backup.
 $node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup", '-X', 'none' ],
'pg_basebackup runs');
 ok(-f "$tempdir/backup/PG_VERSION", 'backup was created');
@@ -112,6 +125,13 @@ ok(-f "$tempdir/backup/${baseUnloggedPath}_init",
 ok(!-f "$tempdir/backup/$baseUnloggedPath",
'unlogged main fork not in backup');
 
+# Temp relations should not be copied.
+foreach my $filename (@tempRelationFiles)
+{
+   ok(!-f "$tempdir/backup/base/$postgresOid/$filename",
+  "base/$postgresOid/$filename not

Re: [PATCH] Verify Checksums during Basebackups

2018-03-23 Thread Michael Banck

Hi David,

Am Freitag, den 23.03.2018, 10:54 -0400 schrieb David Steele:
> On 3/23/18 5:36 AM, Michael Banck wrote:
> > Am Donnerstag, den 22.03.2018, 12:22 -0400 schrieb David Steele:
> > > 
> > > +if (phdr->pd_checksum != checksum)
> > > 
> > > I've attached a patch that adds basic retry functionality.  It's not
> > > terrible efficient since it rereads the entire buffer for any block
> > > error.  A better way is to keep a bitmap for each block in the buffer,
> > > then on retry compare bitmaps.  If the block is still bad, report it.
> > > If the block was corrected moved on.  If a block was good before but is
> > > bad on retry it can be ignored.
> > 
> > I have to admit I find it a bit convoluted and non-obvious on first
> > reading, but I'll try to check it out some more.
> 
> Yeah, I think I was influenced too much by how pgBackRest does things,
> which doesn't work as well here.  Attached is a simpler version.

This looks much cleaner to me, yeah.

> > I agree that major corruption could make the whole output blow up but I
> > would prefer to keep this feature simple for now, which implies possibly
> >  printing out a lot of WARNING or maybe just stopping after the first
> > one (or first few, dunno).
> 
> In my experience actual block errors are relatively rare, so there
> aren't likely to be more than a few in a file.  More common are
> overwritten or transposed files, rogue files, etc.  These produce a lot
> of output.
> 
> Maybe stop after five?

I'm on board with this, but I have the feeling that this is not a very
common pattern in Postgres, or might not be project style at all.  I
can't remember even seen an error message like that.

Anybody know whether we're doing this in a similar fashion elsewhere?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Re: PATCH: Configurable file mode mask

2018-03-23 Thread David Steele

Hi Peter,

On 3/23/18 10:36 AM, Peter Eisentraut wrote:
> I have committed a basic pg_resetwal test suite as part of another
> patch, so please adjust accordingly.
> 
> It looks to me like your pg_resetwal tests contain a bit of useless
> copy-and-paste.  For example, you are not using use Config, nor
> $tempdir, and you run $node->stop right after $node->start.  Please
> clean that up a bit, or comment, if appropriate.

Yeah, definitely too much copy-paste going on.

The various start/stops were intended the ensure that PG actually starts
with the reset WAL.  I see that these tests don't do them, though, so
perhaps that's not a good use of test time.

The pg_rewind tests work for my purposes but it seems worth preserving
the ones I wrote since there is no overlap.

I've attached a patch that integrates my tests with the current tests.
If you don't think they are worth adding then I'll just drop them from
my patchset.

Thanks,
-- 
-David
da...@pgmasters.net
diff --git a/src/bin/pg_resetwal/t/001_basic.pl 
b/src/bin/pg_resetwal/t/001_basic.pl
index 1b157cb555..474ece99c7 100644
--- a/src/bin/pg_resetwal/t/001_basic.pl
+++ b/src/bin/pg_resetwal/t/001_basic.pl
@@ -3,7 +3,7 @@ use warnings;
 
 use PostgresNode;
 use TestLib;
-use Test::More tests => 11;
+use Test::More tests => 16;
 
 program_help_ok('pg_resetwal');
 program_version_ok('pg_resetwal');
@@ -15,3 +15,29 @@ $node->init;
 command_like([ 'pg_resetwal', '-n', $node->data_dir ],
 qr/checkpoint/,
 'pg_resetwal -n produces output');
+
+# Reset WAL after segment has been removed
+my $pgwal = $node->data_dir . '/pg_wal';
+
+unlink("$pgwal/00010001") == 1
+   or BAIL_OUT("unable to remove 00010001");
+
+is_deeply(
+   [sort(slurp_dir($pgwal))], [sort(qw(. .. archive_status))], 'no WAL');
+
+$node->command_ok(['pg_resetwal', '-D', $node->data_dir], 'recreate pg_wal');
+
+is_deeply(
+   [sort(slurp_dir($pgwal))],
+   [sort(qw(. .. archive_status 00010002))],
+   'WAL recreated');
+
+# Reset to specific WAL segment
+$node->command_ok(
+   ['pg_resetwal', '-l', '000700070007', '-D', 
$node->data_dir],
+   'set to specific WAL');
+
+is_deeply(
+   [sort(slurp_dir($pgwal))],
+   [sort(qw(. .. archive_status 000700070007))],
+   'WAL recreated');

Re: PATCH: Exclude unlogged tables from base backups

2018-03-23 Thread David Steele

On 3/23/18 12:14 PM, Teodor Sigaev wrote:
> 
> Thank you, pushed

Thank you, Teodor!  I'll rebase the temp table exclusion patch and
provide an updated patch soon.

-- 
-David
da...@pgmasters.net

Re: [HACKERS] Surjective functional indexes

2018-03-23 Thread Konstantin Knizhnik




On 23.03.2018 18:45, Alvaro Herrera wrote:

Konstantin Knizhnik wrote:


rd_projidx is not a list, it is Bitmapset. It is just one of many bitmap
sets in RelationData:

Yes, but the other bitmapsets are of AttrNumber of the involved column.
They new one is of list_nth() counters for items in the index list.
That seems weird and it scares me -- do we use that coding pattern
elsewhere?  Maybe use an array of OIDs instead?


Using list or array instead of bitmap requires much more space...
And bitmaps are much more efficient for many operations: check for 
element presence, combine, intersect,...
Check in bitmap has O(0) complexity so iterating through list of all 
indexes or attributes with bitmap check doesn't add any essential overhead.


Sorry, I do not understand this: "They new one is of list_nth() counters 
for items in the index list"


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: PATCH: Exclude unlogged tables from base backups

2018-03-23 Thread Teodor Sigaev



Thank you, pushed

David Steele wrote:

On 1/29/18 8:10 PM, Masahiko Sawada wrote:

On Tue, Jan 30, 2018 at 5:45 AM, Adam Brightwell


If it is agreed that the temp file exclusion should be submitted as a
separate patch, then I will mark 'ready for committer'.


Agreed, please mark this patch as "Ready for Committer".


Attached is a rebased patch that applies cleanly.

Thanks,



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

Re: [HACKERS] Surjective functional indexes

2018-03-23 Thread Konstantin Knizhnik




On 22.03.2018 23:53, Alvaro Herrera wrote:

The whole IsProjectionFunctionalIndex looks kinda bogus/ugly to me.  Set
the boolean to false, but keep evaluating anyway?  But then, I thought
the idea was to do this based on the reloption, not by comparing the
expression cost to a magical (unmodifiable) value?
The idea is very simple, I tried to explain it in the comments. Sorry if 
my explanation was not so clear.
By default we assume all functional indexes as projectional. This is 
done by first assignment is_projection = true.

Then we check cost of index function.
I agree that "magical (unmodifiable) value" used as cost threshold is 
not a good idea. But I tried to avoid as much as possible adding yet 
another configuration parameter.
I do no think that flexibility here is so important. I believe that in 
99% cases functional indexes are projectional,
and in all other cases DBA cna explicitly specify it using 
recheck_on_update index option.
Which should override cost threshold: if DBA thinks that recheck is 
useful for this index, then it should be used despite to previsions 
guess based on index expression cost.
This is why after setting "is_projection" to false because of too large 
cost of index expression, we  continue work and check index options for 
explicitly specified value.




In RelationGetIndexAttrBitmap(), indexattrs no longer gets the columns
corresponding to projection indexes.  Isn't that weird/error
prone/confusing?  I think it'd be saner to add these bits to both
bitmaps.
This bitmap is needed only to mark attributes, which modification 
prevents hot update.
Should I rename rd_indexattr to  rd_hotindexattr or whatever else to 
avoid confusion?
Please notice, that rd_indexattr is used only inside 
RelationGetIndexAttrBitmap and is not visible from outside.

It is returned for INDEX_ATTR_BITMAP_HOT IndexAttrBitmapKind.


Please update the comments ending in heapam.c:4188, and generally all
comments that you should update.

Sorry, did you apply projection-7.patch?
I have checked all trailing spaces and other indentation issues.
At least there is no trailing space at heapam.c:4188...



Please keep serial_schedule in sync with parallel_schedule.


Sorry, once again do not understand your complaint: I have added 
func_index both to serial and parallel schedule.



Also, pgindent.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: max_memory_per_backend GUC to limit backend's memory usage

2018-03-23 Thread Andres Freund

Hi,

On 2018-03-23 15:58:55 +, Vladimir Sitnikov wrote:
> I've got a problem with PostgreSQL 9.6.5: backend gets killed by OOM
> killer, and it shuts the DB down.
> Of course, the OOM case is to be investigated (MemoryContextStatsDetail,
> etc), however I wonder if DB can be more robust.

Configuring overcommit_memory to not overcommit should do the trick.

Greetings,

Andres Freund

max_memory_per_backend GUC to limit backend's memory usage

2018-03-23 Thread Vladimir Sitnikov

Hi,

I've got a problem with PostgreSQL 9.6.5: backend gets killed by OOM
killer, and it shuts the DB down.
Of course, the OOM case is to be investigated (MemoryContextStatsDetail,
etc), however I wonder if DB can be more robust.
The sad thing is a single backend crash results in the DB shutdown, so it
interrupts lots of transactions.

I wonder if a GUC can be implemented, so it could fail just a single
backend by limiting its memory use.
For instance: max_mememory_per_backend=100MiB.
The idea is to increase stability by limiting each process. Of course it
would result in "out of memory" in case a single query requires 100500MiB
(e.g. it misunderestimates the hash join). As far as I understand, it
should be safer to terminate just one bad backend rather than kill all the
processes.

I did some research, and I have not found the discussion of this idea.

Vladimir Rusinov> FWIW, lack of per-connection and/or global memory limit
for work_mem is major PITA
>
https://www.postgresql.org/message-id/CAE1wr-ykMDUFMjucDGqU-s98ARk3oiCfhxrHkajnb3f%3DUp70JA%40mail.gmail.com


Vladimir

Re: [HACKERS] Surjective functional indexes

2018-03-23 Thread Simon Riggs

On 23 March 2018 at 15:39, Konstantin Knizhnik
 wrote:
>
>
> On 22.03.2018 23:37, Alvaro Herrera wrote:
>>
>> The rd_projidx (list of each nth element in the index list that is a
>> projection index) thing looks odd.  Wouldn't it make more sense to have
>> a list of index OIDs that are projection indexes?
>>
>
> rd_projidx is not a list, it is Bitmapset. It is just one of many bitmap
> sets in RelationData:
>
> Bitmapset  *rd_indexattr;/* identifies columns used in indexes */
> Bitmapset  *rd_keyattr;/* cols that can be ref'd by foreign keys
> */
> Bitmapset  *rd_pkattr;/* cols included in primary key */
> Bitmapset  *rd_idattr;/* included in replica identity index */
>
> Yes, it is possible to construct Oid list of projectional indexes instead of
> having bitmap which marks some indexes in projectional, but I am not sure
> that
> it will be more efficient both from CPU and memory footprint point of view
> (in most indexes bitmap will consists of just one integer). In any case, I
> do not think that it can have some measurable impact on performance or
> memory size:
> number of indexes and especially projectional indexes will very rarely
> exceed 10.

Alvaro's comment seems reasonable to me.

The patch adds both rd_projindexattr and rd_projidx
rd_projindexattr should be a Bitmapset, but rd_projidx looks strange
now he mentions it.

Above that in RelationData we have other structures that are List of
OIDs, so Alvaro's proposal make sense.

That would simplify the code in ProjectionIsNotChanged() by just
looping over the list of projection indexes rather than the list of
indexes

So please could you make the change?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Surjective functional indexes

2018-03-23 Thread Alvaro Herrera

Konstantin Knizhnik wrote:

> rd_projidx is not a list, it is Bitmapset. It is just one of many bitmap
> sets in RelationData:

Yes, but the other bitmapsets are of AttrNumber of the involved column.
They new one is of list_nth() counters for items in the index list.
That seems weird and it scares me -- do we use that coding pattern
elsewhere?  Maybe use an array of OIDs instead?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Do I understand commit timestamps correctly?

2018-03-23 Thread Alvaro Herrera

Chapman Flack wrote:
> Hi,
> 
> Can somebody confirm or correct what I (think I)'ve gleaned from
> the code?
> 
> - Commit timestamps are always WAL logged, and so in principle
>   determinable after the fact (with some amount of effort), regardless
>   of the track_commit_timestamp setting. (I guess this must have long
>   been true, to support recovery_target_time.)

Right.

> - The extra machinery turned on by track_commit_timestamp maintains
>   a cache of recent ones so they can be efficiently queried from SQL
>   in normal operation.

Yes.

> ? Given a base backup and a bunch of WAL from a cluster that had
>   track_commit_timestamps turned off, is it possible (in principle?)
>   to do a PITR with the switch turned on, and have the commit_ts
>   cache get populated (at least from the transactions encountered
>   in the WAL)? Could that be done by changing the setting in
>   postgresql.conf for the recovery, or would it take something more
>   invasive, like poking the value in pg_control? Or would that just
>   make something fall over? Would it require dummying up some commit_ts
>   files first?

I don't remember if this is explicitly supported, but yeah AFAIR it
should work to just start the "promoted standby" (in your case just a
recovered backup) on after setting the option in postgresql.conf.  This
is because StartupCommitTs() activates the commit_ts module just before
starting recovery.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Surjective functional indexes

2018-03-23 Thread Konstantin Knizhnik




On 22.03.2018 23:37, Alvaro Herrera wrote:

The rd_projidx (list of each nth element in the index list that is a
projection index) thing looks odd.  Wouldn't it make more sense to have
a list of index OIDs that are projection indexes?



rd_projidx is not a list, it is Bitmapset. It is just one of many bitmap 
sets in RelationData:


    Bitmapset  *rd_indexattr;    /* identifies columns used in indexes */
    Bitmapset  *rd_keyattr;        /* cols that can be ref'd by foreign 
keys */

    Bitmapset  *rd_pkattr;        /* cols included in primary key */
    Bitmapset  *rd_idattr;        /* included in replica identity index */

Yes, it is possible to construct Oid list of projectional indexes 
instead of having bitmap which marks some indexes in projectional, but I 
am not sure that
it will be more efficient both from CPU and memory footprint point of 
view (in most indexes bitmap will consists of just one integer). In any 
case, I do not think that it can have some measurable impact on 
performance or memory size:
number of indexes and especially projectional indexes will very rarely 
exceed 10.



--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: GSOC 2018 Ideas

2018-03-23 Thread Jaime Soler

you may have a look at https://wiki.postgresql.org/wiki/GSoC_2018 and tell
us if you like some of them


2018-03-23 15:32 GMT+01:00 Vimal Rathod :

> Hey there,
>
> I am Vimal currently pursuing my undergraduate degree in computer science ,I 
> would love to work with you via
> the GSoC program this summer.
>
> I wanted to know which all projects I can work on. I have good knowledge in 
> c,java,python,SQL and other web development languages.Can anyone suggest me a 
> good project I can contribute to? It would be of great help.
>
> Thanks alot in advance.Hope you reply soon.
>
>

Do I understand commit timestamps correctly?

2018-03-23 Thread Chapman Flack

Hi,

Can somebody confirm or correct what I (think I)'ve gleaned from
the code?

- Commit timestamps are always WAL logged, and so in principle
  determinable after the fact (with some amount of effort), regardless
  of the track_commit_timestamp setting. (I guess this must have long
  been true, to support recovery_target_time.)

- The extra machinery turned on by track_commit_timestamp maintains
  a cache of recent ones so they can be efficiently queried from SQL
  in normal operation.

? Given a base backup and a bunch of WAL from a cluster that had
  track_commit_timestamps turned off, is it possible (in principle?)
  to do a PITR with the switch turned on, and have the commit_ts
  cache get populated (at least from the transactions encountered
  in the WAL)? Could that be done by changing the setting in
  postgresql.conf for the recovery, or would it take something more
  invasive, like poking the value in pg_control? Or would that just
  make something fall over? Would it require dummying up some commit_ts
  files first?

Thanks,
-Chap

Re: [HACKERS] logical decoding of two-phase transactions

2018-03-23 Thread Simon Riggs

On 5 March 2018 at 16:37, Nikhil Sontakke  wrote:

>>
>> I will re-submit with "git format-patch" soon.
>>
> PFA, patches in "format-patch" format.
>
> This patch set also includes changes in the test_decoding plugin along
> with an additional savepoint related test case that was pointed out on
> this thread, upstream.

Reviewing 0001-Cleaning-up-and-addition-of-new-flags-in-ReorderBuff.patch

Change from is_known_as_subxact to rbtxn_is_subxact
loses some meaning, since rbtxn entries with this flag set false might
still be subxacts, we just don't know yet.

rbtxn_is_serialized refers to RBTXN_SERIALIZED
so flag name should be RBTXN_IS_SERIALIZED so it matches

Otherwise looks OK to commit


Reviewing 0003-Add-support-for-logging-GID-in-commit-abort-WAL-reco

Looks fine, reworked patch attached
* added changes to xact.h from patch 4 so that this is a whole,
committable patch
* added comments to make abort and commit structs look same

Attached patch is proposed for a separate, early commit as part of this

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


logging-GID-in-commit-abort-WAL.v2.patch
Description: Binary data

Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

2018-03-23 Thread Andres Freund

On 2018-03-23 11:06:48 +, Simon Riggs wrote:
> Your assumption that I would commit a new patch that was 29 mins old
> is frankly pretty ridiculous, so yes, lets keep calm.

Uh.

Re: [PATCH] Verify Checksums during Basebackups

2018-03-23 Thread David Steele

Hi Michael,

On 3/23/18 5:36 AM, Michael Banck wrote:
> Am Donnerstag, den 22.03.2018, 12:22 -0400 schrieb David Steele:
>>
>> +if (phdr->pd_checksum != checksum)
>>
>> I've attached a patch that adds basic retry functionality.  It's not
>> terrible efficient since it rereads the entire buffer for any block
>> error.  A better way is to keep a bitmap for each block in the buffer,
>> then on retry compare bitmaps.  If the block is still bad, report it.
>> If the block was corrected moved on.  If a block was good before but is
>> bad on retry it can be ignored.
> 
> I have to admit I find it a bit convoluted and non-obvious on first
> reading, but I'll try to check it out some more.

Yeah, I think I was influenced too much by how pgBackRest does things,
which doesn't work as well here.  Attached is a simpler version.

> I think it would be very useful if we could come up with a testcase
> showing this problem, but I guess this will be quite hard to hit
> reproducibly, right?

This was brought up by Robert in [1] when discussing validating
checksums during backup.  I don't know of any way to reproduce this
issue but it seems perfectly possible, if highly unlikely.

>> +ereport(WARNING,
>> +(errmsg("checksum verification failed in file "
>>
>> I'm worried about how verbose this warning could be since there are
>> 131,072 blocks per segment.  It's unlikely to have that many block
>> errors, but users do sometimes put files in PGDATA which look like they
>> should be validated.  Since these warnings all go to the server log it
>> could get pretty bad.
> 
> We only verify checksums of files in tablespaces, and I don't think
> dropping random files in those is supported in any way, as opposed to
> maybe the top-level PGDATA directory. So I would say that this is not a
> real concern.

Perhaps, but a very corrupt file is still going to spew lots of warnings
into the server log.

>> We should either stop warning after the first failure, or aggregate the
>> failures for a file into a single message.
> 
> I agree that major corruption could make the whole output blow up but I
> would prefer to keep this feature simple for now, which implies possibly
>  printing out a lot of WARNING or maybe just stopping after the first
> one (or first few, dunno).

In my experience actual block errors are relatively rare, so there
aren't likely to be more than a few in a file.  More common are
overwritten or transposed files, rogue files, etc.  These produce a lot
of output.

Maybe stop after five?

Regards,
-- 
-David
da...@pgmasters.net

[1]
https://www.postgresql.org/message-id/CA%2BTgmobHd%2B-yVJHofSWg%3Dg%2B%3DA3EiCN2wsAiEyj7dj1hhevNq9Q%40mail.gmail.com
diff --git a/src/backend/replication/basebackup.c 
b/src/backend/replication/basebackup.c
index 9f735a2c07..fe7285f7a2 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1258,6 +1258,7 @@ sendFile(const char *readfilename, const char 
*tarfilename, struct stat *statbuf
int i;
pgoff_t len = 0;
charpage[BLCKSZ];
+   boolblock_retry = false;
size_t  pad;
PageHeader  phdr;
int segmentno = 0;
@@ -1341,6 +1342,50 @@ sendFile(const char *readfilename, const char 
*tarfilename, struct stat *statbuf
phdr = (PageHeader) page;
if (phdr->pd_checksum != checksum)
{
+   /*
+* Retry the block on the first 
failure.  It's possible
+* that we read the first 4K 
page of the block just
+* before postgres updated the 
entire block so it ends
+* up looking torn to us.  We 
only need to retry once
+* because the LSN should be 
updated to something we can
+* ignore on the next pass.  If 
the error happens again
+* then it is a true validation 
failure.
+*/
+   if (block_retry == false)
+   {
+   /* Reread the failed 
block */
+   if (fseek(fp, -(cnt - 
BLCKSZ * i), SEEK_CUR) == -1)
+   {
+   ereport(ERROR,
+   
(errcode_for_file_access(),
+

Re: PQHost() undefined behavior if connecting string contains both host and hostaddr types

2018-03-23 Thread Haribabu Kommi

On Thu, Mar 22, 2018 at 12:28 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 3/21/18 03:40, Michael Paquier wrote:
> >>> Moreover, I wonder whether we shouldn't remove the branch where
> >>> conn->connhost is NULL.  When would that be the case?  The current
> >>> behavior is to sometimes return the actual host connected to, and
> >>> sometimes the host list.  That doesn't make sense.
> >> Scenarios where the connection is not yet established, in that scenario
> >> the PQhost() can return the provided connection host information.
> >>
> >> Other than the above, it always returns the proper host details.
> > That remark is from me upthread.  In the case of a non-established
> > connection, I think that we ought to return that.
>
> So, if the connection object is NULL, PQhost() returns NULL.  While the
> connection is being established (whatever that means), it returns
> whatever was specified as host.  And when the connection is established,
> it returns the host actually connected to.  That seems pretty crazy.  It
> should do only one or the other.  Especially since there is, AFAICT, no
> way to know at run time whether the value it returned just then is one
> or the other.


OK.

Here I attached the updated patch that returns either the connected
host/hostaddr
or NULL in case if the connection is not established.

I removed the returning default host details, because the default host
details are also available with the connhost member itself.

Regards,
Hari Babu
Fujitsu Australia


PQhost-to-return-connected-host-and-hostaddr-details_v3.patch
Description: Binary data

Re: [PATCH] pg_hba.conf : new auth option : clientcert=verify-full

2018-03-23 Thread Julian Markwort

On Sat, 2018-03-17 at 18:24 +0100, Magnus Hagander wrote:
> > The error message "certificate authentication failed for user XYZ:
> > 
> > client certificate contains no user name" is the result of calling
> > 
> > CheckCertAuth when the user presented a certificate without a CN in
> > it.
> 
> That is arguably wrong, since it's actually password authentication
> that fails. That is the authentication type that was picked in
> pg_hba.conf. It's more "certificate validation" that failed.

I think we got confused about this; maybe I didn't graps it fully
before: CheckCertAuth is currently only called when auth method cert is
used. So it actually makes sense to say that certificate authentication
failed, I think.

> I agree that the log message is useful. Though it could be good to
> clearly indicate that it was caused specifically because of the
> verify-full, to differentiate it from other cases of the same
> message.
I've modified my patch so it still uses CheckCertAuth, but now a
different message is written to the log when clientcert=verify-full was
used.
For auth method cert, the function should behave as before.


> For example, what about the scenario where I use GSSAPI
> authentication and clientcert=verify-full. Then we suddenly have
> three usernames (gssapi, certificate and specified) -- how is the
> user supposed to know which one came from the cert and which one came
> from gssapi for example?
The user will only see what's printed in the auth_failed() function in
auth.c with the addition of the logdetail string, which I don't touch
with this patch.
As you said, it makes sense that more detailed information is only
available in the server's log.

I've attached an updated version of the patch.
I'm not sure if it is preferred to keep patches as short as possible
(mostly with respect to the changed lines in the documentation) or to
organize changes so that the text matches the surrounding column width
und text flow? Also, I've omitted mentions of the current usage
'clientcert=1' - this is still supported in code, but I think telling
new users only about 'clientcert=verify-ca' and 'clientcert=verify-
full' is clearer. Or am I wrong on this one?

Greetings
Juliandiff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 53832d0..11c5961 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -563,10 +563,17 @@ hostnossl  database  user
   
In addition to the method-specific options listed below, there is one
method-independent authentication option clientcert, which
-   can be specified in any hostssl record.  When set
-   to 1, this option requires the client to present a valid
-   (trusted) SSL certificate, in addition to the other requirements of the
-   authentication method.
+   can be specified in any hostssl record.
+   This option can be set to verify-ca or
+   verify-full. Both options require the client
+   to present a valid (trusted) SSL certificate, while
+   verify-full additionally enforces that the
+   CN (Common Name) in the certificate matches
+   the username or an applicable mapping.
+   This behaviour is similar to the cert autentication method
+   ( See  ) but enables pairing
+   the verification of client certificates with any authentication
+   method that supports hostssl entries.
   
  
 
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 587b430..a3eb180 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -2263,13 +2263,24 @@ pg_dumpall -p 5432 | psql -d postgres -p 5433
(CAs) you trust in a file in the data
directory, set the parameter  in
postgresql.conf to the new file name, and add the
-   authentication option clientcert=1 to the appropriate
+   authentication option clientcert=verify-ca or
+   clientcert=verify-full to the appropriate
hostssl line(s) in pg_hba.conf.
A certificate will then be requested from the client during SSL
connection startup.  (See  for a description
-   of how to set up certificates on the client.)  The server will
-   verify that the client's certificate is signed by one of the trusted
-   certificate authorities.
+   of how to set up certificates on the client.)
+  
+
+  
+   For a hostssl entry with 
+   clientcert=verify-ca, the server will verify
+   that the client's certificate is signed by one of the trusted
+   certificate authorities. If clientcert=verify-full
+   is specified, the server will not only verify the certificate
+   chain, but it will also check whether the username or it's
+   mapping match the common name (CN) of the provided certificate.
+   Note that this is always ensured when cert
+   authentication method is used (See ).
   
 
   
@@ -2293,14 +2304,33 @@ pg_dumpall -p 5432 | psql -d postgres -p 5433
client certificates against its CA file, if one is configured  but
it will not insist that a client

Re: Odd procedure resolution

2018-03-23 Thread Ashutosh Bapat

On Fri, Mar 23, 2018 at 7:53 PM, Tom Lane  wrote:
> Ashutosh Bapat  writes:
>> Incidently the fix looks quite simple. See patch attached.
>
> ISTM this patch effectively proposes to make procedures have their own
> namespace yet still live in pg_proc.  That is the worst of all possible
> worlds IMO.  Somewhere early in this patch series, I complained that
> procedures should be in a different namespace and therefore not be kept
> in pg_proc but in some new catalog.  That argument was rejected on the
> grounds that SQL requires them to be in the same namespace, which I
> wasn't particularly sold on, but that's where we are.  If they are in
> the same namespace, though, we have to live with the consequences of
> that, including ambiguity.  Otherwise there will soon be questions
> like "well, why can't I create both function foo(int) and procedure
> foo(int), seeing that there's no question which of them a particular
> statement intends to call?".
>

That question did cross my mind and I think that's a valid question.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: PATCH: Configurable file mode mask

2018-03-23 Thread Peter Eisentraut

I have committed a basic pg_resetwal test suite as part of another
patch, so please adjust accordingly.

It looks to me like your pg_resetwal tests contain a bit of useless
copy-and-paste.  For example, you are not using use Config, nor
$tempdir, and you run $node->stop right after $node->start.  Please
clean that up a bit, or comment, if appropriate.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-23 Thread Alvaro Herrera

Thanks for these changes.  I'm going over this now, with intention to
push it shortly.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

GSOC 2018 Ideas

2018-03-23 Thread Vimal Rathod

Hey there,

I am Vimal currently pursuing my undergraduate degree in computer
science ,I would love to work with you via
the GSoC program this summer.

I wanted to know which all projects I can work on. I have good
knowledge in c,java,python,SQL and other web development languages.Can
anyone suggest me a good project I can contribute to? It would be of
great help.

Thanks alot in advance.Hope you reply soon.

Re: Odd procedure resolution

2018-03-23 Thread Tom Lane

Ashutosh Bapat  writes:
> Incidently the fix looks quite simple. See patch attached.

ISTM this patch effectively proposes to make procedures have their own
namespace yet still live in pg_proc.  That is the worst of all possible
worlds IMO.  Somewhere early in this patch series, I complained that
procedures should be in a different namespace and therefore not be kept
in pg_proc but in some new catalog.  That argument was rejected on the
grounds that SQL requires them to be in the same namespace, which I
wasn't particularly sold on, but that's where we are.  If they are in
the same namespace, though, we have to live with the consequences of
that, including ambiguity.  Otherwise there will soon be questions
like "well, why can't I create both function foo(int) and procedure
foo(int), seeing that there's no question which of them a particular
statement intends to call?".

regards, tom lane

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-03-23 Thread Teodor Sigaev

I suggest a patch where pgbench client sessions are not disconnected because of 
serialization or deadlock failures and these failures are mentioned in reports. 
In details:
- transaction with one of these failures continue run normally, but its result 
is rolled back;

- if there were these failures during script execution this "transaction" is 
marked
appropriately in logs;
- numbers of "transactions" with these failures are printed in progress, in 
aggregation logs and in the end with other results (all and for each script);
Hm, I took a look on both thread about patch and it seems to me now it's 
overcomplicated. With recently committed enhancements of pgbench (\if, \when) it 
becomes close to impossible to retry transation in case of failure. So, initial 
approach just to rollback such transaction looks more attractive.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

Re: [HACKERS] A design for amcheck heapam verification

2018-03-23 Thread Andrey Borodin

Hi!

> 8 февр. 2018 г., в 22:45, Peter Geoghegan  написал(а):
> 
> On Thu, Feb 8, 2018 at 6:05 AM, Andrey Borodin  wrote:
>> I do not see a reason behind hashing the seed.
> 
> It made some sense when I was XOR'ing it to mix. A uniform
> distribution of bits seemed desirable then, since random() won't use
> the most significant bit -- it generates random numbers in the range
> of 0 to 2^31-1. It does seem unnecessary now.
> 
>> Also, I'd like to reformulate this paragraph. I understand what you want to 
>> say, but the sentence is incorrect.
>> + * The Bloom filter behaves non-deterministically when caller passes a 
>> random
>> + * seed value.  This ensures that the same false positives will not occur 
>> from
>> + * one run to the next, which is useful to some callers.
>> Bloom filter behaves deterministically, but differently. This does not 
>> ensures any thing, but probably will give something with hight probability.
> 
> I agree that that's unclear. I should probably cut it down, and say
> something like "caller can pass a random seed to make it unlikely that
> the same false positives will occur from one run to the next".

I've just flipped patch to WoA. But if above issues will be fixed I think that 
patch is ready for committer.

Best regards, Andrey Borodin.

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-23 Thread Simon Riggs

On 23 March 2018 at 11:26, Pavan Deolasee  wrote:
>
>
> On Fri, Mar 23, 2018 at 6:45 AM, Peter Geoghegan  wrote:
>>
>> On Thu, Mar 22, 2018 at 6:02 PM, Alvaro Herrera 
>> wrote:
>> > Incremental development is a good thing.  Trying to do everything in a
>> > single commit is great when time is infinite or even merely very long,
>> > but if you run out of it, which I'm sure is common, leaving some things
>> > out that can be reasonable implemented in a separate patch is perfectly
>> > acceptable.
>>
>> We're talking about something that took me less than an hour to get
>> working. AFAICT, it's just a matter of tweaking the grammar, and
>> adding a bit of transformWithClause() boilerplate to the start of
>> transformMergeStmt().
>>>
>>>
>>> I quickly implemented CTE support myself (not wCTE support, since
>>> MERGE doesn't use RETURNING), and it wasn't tricky. It seems to work
>>> when I mechanically duplicate the approach taken with other types of
>>> DML statement in the parser. I have written a few tests, and so far it
>>> holds up.

Peter, if you have the code and you consider it important that this
subfeature is in PostgreSQL, why not post the code so we can commit
it?

Why would we repeat what has already been done?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: FOR EACH ROW triggers on partitioned tables

2018-03-23 Thread Alvaro Herrera

Pushed.  Thanks for all the review.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2018-03-23 Thread Daniel Verite

Kyotaro HORIGUCHI wrote:

> A disucssion on psql batch mode was held in another branch of
> this thread. How do we treat that?

There's a batch mode for pgbench in a patch posted in [1],
with \beginbatch and \endbatch commands, but nothing
for psql AFAICS.
psql is more complicated because currently it uses a 
blocking PQexec() call at its core. Craig mentioned psql
integration in [2] and [3].
Also a script can have inter-query dependencies such as in
   insert into table(...) returning id \gset
   update othertable set col= :id where ...;
which is a problem in batch mode, as we don't want
to send the update before the right value for :id is
known. Whether we want to support these dependencies
and how needs discussion.
For instance we might not support them at all, or
create a synchronization command that collects
all results of queries sent so far, or do it implicitly
when a variable is injected into a query...
This looks like substantial work that might be best
done separately from the libpq patch.

[1]
https://www.postgresql.org/message-id/b4e34135-2bd9-4b8a-94ca-27d760da2...@manitou-mail.org
[2]
https://www.postgresql.org/message-id/CAMsr+YGLhaDkjymLuNVQy4MrSKQoA=F1vO=aN8XQf30N=aq...@mail.gmail.com
[3]
https://www.postgresql.org/message-id/CAMsr+YE6BK4iAaQz=ny3xdnblhnnz_4tp-ptjqbnnpszmgo...@mail.gmail.com

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: [HACKERS] Add support for tuple routing to foreign partitions

2018-03-23 Thread Robert Haas

On Fri, Mar 23, 2018 at 8:22 AM, Etsuro Fujita
 wrote:
>> I think for bulk
>> inserts we'll need an API that says "here's a row, store it or buffer
>> it as you like" and then another API that says "flush any buffered
>> rows to the actual table and perform any necessary cleanup".  Or maybe
>> in postgres_fdw the first API could start a COPY if not already done
>> and send the row, and the second one could end the COPY.
> That's really what I have in mind.

So let's do it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-23 Thread Amit Langote

On 2018/03/23 20:07, Pavan Deolasee wrote:
> On Fri, Mar 23, 2018 at 12:57 PM, Amit Langote wrote:
>> Also, it seems that the delta patch I sent in the last email didn't
>> contain all the changes I had to make.  It didn't contain, for example,
>> replacing adjust_and_expand_inherited_tlist() with
>> adjust_partition_tlist().  I guess you'll know when you rebase anyway.
>>
> 
> Yes, I am planning to fix that once the ON CONFLICT patch is
> ready/committed.

OK, thanks.

>> 3. I think the comment above this should be updated to explain why the
>> map_index is being set to the leaf index in case of MERGE instead of the
>> subplan index as is done in case of plain UPDATE:
>>
>> -map_index = resultRelInfo - mtstate->resultRelInfo;
>> -Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
>> -tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
>> +if (mtstate->operation == CMD_MERGE)
>> +{
>> +map_index = resultRelInfo->ri_PartitionLeafIndex;
>> +Assert(mtstate->rootResultRelInfo == NULL);
>> +tupconv_map = TupConvMapForLeaf(proute,
>> +mtstate->resultRelInfo,
>> +map_index);
>> +}
>> +else
>> +{
>>
> 
> Done. I wonder though if we should just always set ri_PartitionLeafIndex
> even for regular UPDATE and always use that to retrieve the map.

In the regular UPDATE case, we'd be looking at a resultRelInfo that's from
the ModifyTableState's per-subplan result rel array and the
TupleConversionMap array perhaps accepts subscripts that are in range 0 to
mtstate->nplans - 1 (not 0 to nparts - 1), so using ri_PartitionLeafIndex
there would be incorrect, I think.

>> 4. Do you  think it would be possible at a later late to change this junk
>> attribute to contain something other than "tableoid"?
>>
>> +if (operation == CMD_MERGE)
>> +{
>> +j->jf_otherJunkAttNo =
>> ExecFindJunkAttribute(j, "tableoid");
>> +if
>> (!AttributeNumberIsValid(j->jf_otherJunkAttNo))
>> +elog(ERROR, "could not find junk tableoid
>> column");
>> +
>> +}
>>
>> Currently, it seems ExecMergeMatched will take this OID and look up the
>> partition (its ResultRelInfo) by calling ExecFindPartitionByOid which in
>> turn looks it up in the PartitionTupleRouting struct.  I'm imagining it
>> might be possible to instead return an integer that specifies "a partition
>> number".  Of course, nothing like that exists currently, but just curious
>> if we're going to be "stuck" with this junk attribute always containing
>> "tableoid".  Or maybe putting a "partition number" into the junk attribute
>> is not doable to begin with.
>>
> 
> I am not sure. Wouldn't adding a new junk column require a whole new
> machinery? It might be worth adding it someday to reduce the cost
> associated with the lookups. But I don't want to include the change in this
> already largish patch.

No I wasn't suggesting that we do that in this patch.  Also, I wasn't
saying that we store the "partition index" in a *new* junk column, but
*instead of* tableoid as this patch does.  My question was whether we can
replace tableoid that we are going to store with this patch in the MERGE's
source plan's targetlist with something else in the future.

>> 5. In ExecInitModifyTable, in the if (node->mergeActionList) block:
>>
>> +
>> +/* initialize slot for the existing tuple */
>> +mtstate->mt_existing = ExecInitExtraTupleSlot(estate, NULL);
>>
>> Maybe a good idea to Assert that mt_existing is NULL before.  Also, why
>> not set the slot's descriptor right away if tuple routing is not going to
>> be used.  I did it that way in the ON CONFLICT DO UPDATE patch.
>>
> 
> Yeah, I plan to update this code once the other patch gets in. This change
> was mostly borrowed from your/Alvaro's patch, but I don't think it will be
> part of the MERGE patch if the ON CONFLICT DO UPDATE patch gets in ahead of
> this.

OK.

>> 6. I see that there is a slot called mergeSlot that becomes part of
>> ResultRelInfo of the table (partition) via ri_MergeState.  That means we
>> might end up creating as many slots as there are partitions (* number of
>> actions?).  Can't we have just one, say, mt_mergeproj in ModifyTableState
>> similar to mt_conflproj and just reset its descriptor before use.  I guess
>> reset will have happen before carrying out an action applied to a given
>> partition.  When I tried that (see attached delta), nothing got broken.
>>
>>
> Thanks! It was on my TODO list. So thanks for taking care of it. I've
> included your patch in the main patch. I imagine we can similarly set the
> tuple descriptor for this slot during initialisation if target table is a
> non-partitioned table. But I shall take care of that along

Re: [HACKERS] Add support for tuple routing to foreign partitions

2018-03-23 Thread Etsuro Fujita


(2018/03/23 21:02), Robert Haas wrote:

On Fri, Mar 23, 2018 at 7:55 AM, Etsuro Fujita
  wrote:

Maybe I'm missing something, but I think the proposed FDW API could be used
for the COPY case as well with some modifications to core.  If so, my
question is: should we support COPY into foreign tables as well?  I think
that if we support COPY tuple routing for foreign partitions, it would be
better to support direct COPY into foreign partitions as well.


Yes, I really, really want to be able to support COPY.  If you think
you can make that work with this API -- let's see it!


OK


3. It looks like we're just doing an INSERT for every row, which is
pretty much an anti-pattern for inserting data into a PostgreSQL
database.  COPY is way faster, and even multi-row inserts are
significantly faster.


I planed to work on new FDW API for using COPY for COPY tuple routing [1],
but I didn't have time for that in this development cycle, so I'm
re-planning to work on that for PG12.  I'm not sure we can optimize that
insertion using multi-row inserts because tuple routing works row by row, as
you know.  Anyway, I think these would be beyond the scope of the first
version.


My concern is that if we add APIs now that only support single-row
inserts, we'll have to rework them again in order to support multi-row
inserts.  I'd like to avoid that, if possible.


Yeah, but we would have this issue for normal inserts into foreign 
tables.  For the normal case, what I'm thinking to support multi-row 
inserts is to use DirectModify FDW API.  With this API, I think we could 
support pushing down INSERT with multiple VALUES sublists to the remote, 
but I'm not sure we could extend that to the tuple-routing case.



I think for bulk
inserts we'll need an API that says "here's a row, store it or buffer
it as you like" and then another API that says "flush any buffered
rows to the actual table and perform any necessary cleanup".  Or maybe
in postgres_fdw the first API could start a COPY if not already done
and send the row, and the second one could end the COPY.


That's really what I have in mind.

Best regards,
Etsuro Fujita

Re: [HACKERS] path toward faster partition pruning

2018-03-23 Thread Amit Langote

Thanks for the review.

On 2018/03/21 6:29, Robert Haas wrote:
> On Tue, Mar 20, 2018 at 7:07 AM, Amit Langote
>  wrote:
>> On 2018/03/16 21:55, Amit Langote wrote:
>>> Attached updated patches.
>>
>> Attached is further revised version.
>>
>> Of note is getting rid of PartitionPruneContext usage in the static
>> functions of partprune.c.  Most of the code there ought to only run during
>> planning, so it can access the necessary information from RelOptInfo
>> directly instead of copying it to PartitionPruneContext and then passing
>> it around.
> 
> +if (rte->relkind == RELKIND_PARTITIONED_TABLE)
> +{
> +if (rel->baserestrictinfo != NIL)
> +{
> +live_children = prune_append_rel_partitions(root, rel);
> +did_pruning = true;
> +}
> +}
> 
> Use &&

Fixed in the latest version.

> 
> +case COMBINE_OR:
> +{
> 
> Won't survive pgindent, which currently produces a *massive* diff for
> these patches.

That's gone in the latest patch.

Things for the overall patch should have improved in the latest version.

> +/*
> + * XXX- The following ad-hoc method of pruning only works for 
> list
> + * partitioning.  It checks for each partition if all of its
> + * accepted values appear in ne_datums[].
> + */
> 
> So why are we doing it this way?  How about doing something not
> ad-hoc?  I tried to propose that before.

Hmm, perhaps I should have written a better comment.

What I really meant to say is that pruning using <> operators can be
implemented sanely only for list partitions.  We can prune a given
partition with <> clauses, only if we can find a <> clause for *all*
values that the partition accepts.  Doing so seems doable only for list
partitions where we require enumerating all values that the partition may
contain.

> + *  Set *value to the constant value obtained by evaluating 'expr'
> + *
> + * Note that we may not be able to evaluate the input expression, in which
> + * case, the function returns false to indicate that *value has not been
> + * set.  True is returned otherwise.
> 
> These comments need updating, since this function (laudibly) no longer
> does any evaluating.  I wonder how this will work for run-time
> pruning, though.

Fixed the comment.

Run-time pruning patch adds some code to this function to be able to
return values for Params for which, afaik, it also adds some state
information to the PartitionPruneContext argument.

> +if (context->partopcintype[partkeyidx] != exprTyp)
> +{
> +Oid new_supfuncid;
> +int16   procnum;
> +
> +
> +procnum = (context->strategy == PARTITION_STRATEGY_HASH)
> +? HASHEXTENDED_PROC
> +: BTORDER_PROC;
> +new_supfuncid = get_opfamily_proc(context->partopfamily[partkeyidx],
> +  context->partopcintype[partkeyidx],
> +  exprTyp, procnum);
> +fmgr_info(new_supfuncid, >partsupfunc[partkeyidx]);
> +}
> 
> What's the point of this, exactly?  Leftover dead code, maybe?

Actually, this *was* an effort to teach the patch to use the correct
comparison function for comparison against partition bounds in case of
clause value being of different type.

After reading David's comment about this, I concluded that it's placed at
a wrong place, which is fixed in the latest patch.   The comparison
functions are changed (if needed) in the function that would call
partkey_datum_from_expr, not in partkey_datum_from_expr itself.

> + * Input:
> + *  See the comments above the definition of PartScanKeyInfo to see what
> + *  kind of information is contained in 'keys'.
> 
> There's no such thing as PartScanKeyInfo any more and the function has
> no argument called 'keys'.  None of the functions actual arguments are
> explained.

Sorry, this should be gone in the latest patches.

> +/*
> + * If there are multiple pruning steps, we perform them one after 
> another,
> + * passing the result of one step as input to another.  Based on the type
> + * of pruning step, perform_pruning_step may add or remove partitions 
> from
> + * the set of partitions it receives as the input.
> + */
> 
> The comment sounds great, but the code doesn't work that way; it
> always calls bms_int_members to intersect the new result with any
> previous result.  I'm baffled as to how this manages to DTRT if
> COMBINE_OR is used.  In general I had hoped that the list of pruning
> steps was something over which we were only going to iterate, not
> recurse.  This definitely recurses for the combine steps, but it's
> still (sorta) got the idea of a list of iterable steps.  That's a
> weird mix.

At the top-level (in get_matching_partitions), it is assumed that the
steps in the input list come from implicitly AND'd clauses, so the
intersection between

Re: [HACKERS] Add support for tuple routing to foreign partitions

2018-03-23 Thread Robert Haas

On Fri, Mar 23, 2018 at 7:55 AM, Etsuro Fujita
 wrote:
> Maybe I'm missing something, but I think the proposed FDW API could be used
> for the COPY case as well with some modifications to core.  If so, my
> question is: should we support COPY into foreign tables as well?  I think
> that if we support COPY tuple routing for foreign partitions, it would be
> better to support direct COPY into foreign partitions as well.

Yes, I really, really want to be able to support COPY.  If you think
you can make that work with this API -- let's see it!

>> 3. It looks like we're just doing an INSERT for every row, which is
>> pretty much an anti-pattern for inserting data into a PostgreSQL
>> database.  COPY is way faster, and even multi-row inserts are
>> significantly faster.
>
> I planed to work on new FDW API for using COPY for COPY tuple routing [1],
> but I didn't have time for that in this development cycle, so I'm
> re-planning to work on that for PG12.  I'm not sure we can optimize that
> insertion using multi-row inserts because tuple routing works row by row, as
> you know.  Anyway, I think these would be beyond the scope of the first
> version.

My concern is that if we add APIs now that only support single-row
inserts, we'll have to rework them again in order to support multi-row
inserts.  I'd like to avoid that, if possible.  I think for bulk
inserts we'll need an API that says "here's a row, store it or buffer
it as you like" and then another API that says "flush any buffered
rows to the actual table and perform any necessary cleanup".  Or maybe
in postgres_fdw the first API could start a COPY if not already done
and send the row, and the second one could end the COPY.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: csv format for psql

2018-03-23 Thread Daniel Verite

Pavel Stehule wrote:

> It should not be hard. All formats can has '|' like now, and csv can have a
> ',' - then if field separator is not explicit, then default field separator
> is used, else specified field separator is used.
> 
> You can see my idea in attached patch

With that patch, consider this sequence:

postgres=# \pset format csv
Output format is csv.
postgres=# \pset fieldsep
Field separator is "|".
postgres=# select 1 as a,2 as b;
a,b
1,2

Here psql reports that fieldsep is "|" and right away is using something
else in the output. That doesn't look good.

You may object that it's fixable by tweaking the output of \pset,
\pset fieldsep, and \? variables so that it knows that the current
output format is going to use a "hidden" default separator, and
then these commands should display that value instead.
But that'd be somewhat playing whack-a-mole, as the following
sequence would now be possible, with '|' being used  as
the separator instead of the ',' reported just above:

postgres=# \pset format csv
Output format is csv.
postgres=# \pset fieldsep
Field separator is ",".
postgres=# \a
Output format is aligned.
postgres=# select 1 as a,2 as b;
 a | b 
---+---
 1 | 2

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-23 Thread Pavan Deolasee

On Fri, Mar 23, 2018 at 4:43 AM, Peter Geoghegan  wrote:

> On Thu, Mar 22, 2018 at 11:42 AM, Pavan Deolasee
>  wrote:
> > A slightly improved version attached.
>
> You still need to remove this change:
>
> > diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
> > index a4574cd533..dbfb5d2a1a 100644
> > --- a/src/include/miscadmin.h
> > +++ b/src/include/miscadmin.h
> > @@ -444,5 +444,6 @@ extern bool has_rolreplication(Oid roleid);
> >  /* in access/transam/xlog.c */
> >  extern bool BackupInProgress(void);
> >  extern void CancelBackup(void);
> > +extern int64 GetXactWALBytes(void);
>

Sigh. Fixed in the recent version.

>
> I see that we're still using two target RTEs in this latest revision,
> v23e -- the new approach to partitioning, which I haven't had time to
> study in more detail, has not produced a change there.

Yes, we continue to use two RTEs because I don't have any brighter idea
than that to handle the case of partitioned table and right outer join. As
I explained sometime back, this is necessary to ensure that we don't
produce duplicate rows when a partition is joined with the source and then
a second partition is joined again with the source.

Now I don't know if we can run a join query and still have a single RTE,
but that looks improbable and wrong.

This causes
> weird effects, such as the following:
>
> """
> pg@~[20658]=# create table foo(bar int4);
> CREATE TABLE
> pg@~[20658]=# merge into foo f using (select 1 col) dd on f.bar=dd.col
> when matched then update set bar = f.barr + 1;
> ERROR:  column f.barr does not exist
> LINE 1: ...n f.bar=dd.col when matched then update set bar = f.barr + 1...
>  ^
> HINT:  Perhaps you meant to reference the column "f.bar" or the column
> "f.bar".
>
> """
>
> While I think this this particular HINT buglet is pretty harmless, I
> continue to be concerned about the unintended consequences of having
> multiple RTEs for MERGE's target table. Each RTE comes from a
> different lookup path -- the first one goes through setTargetTable()'s
> parserOpenTable() + addRangeTableEntryForRelation() calls. The second
> one goes through transformFromClauseItem(), for the join, which
> ultimately ends up calling transformTableEntry()/addRangeTableEntry().
>

How's it different than running a INSERT query with the target table again
specified in a subquery producing the rows to be inserted? For example,

postgres=# insert into target as t select sid from source s join target t
on t.ttid = s.sid;
ERROR:  column t.ttid does not exist
LINE 1: ...rget as t select sid from source join target t on t.ttid = s...
 ^
HINT:  Perhaps you meant to reference the column "t.tid" or the column
"t.tid".
postgres=#

This produces a very similar looking HINT as your test above. I am certain
that "target" table gets two RTEs, exactly via the same code paths as you
discussed above. So if this is not a problem for INSERT, why it would be a
problem for MERGE? May be I am missing a point here.

> Consider commit 5f173040, which fixed a privilege escalation security
> bug around multiple name lookup. Could the approach taken by MERGE
> here introduce a similar security issue?
>
> Using GDB, I see two calls to RangeVarGetRelidExtended() when a simple
> MERGE is executed. They both have identical relation arguments, that
> look like this:
>
> (gdb) p *relation
> $4 = {
>   type = T_RangeVar,
>   catalogname = 0x0,
>   schemaname = 0x0,
>   relname = 0x5600ebdcafb0 "foo",
>   inh = 1 '\001',
>   relpersistence = 112 'p',
>   alias = 0x5600ebdcb048,
>   location = 11
> }
>
> This seems like something that needs to be explained, at a minimum.
> Even if I'm completely wrong about there being a security hazard,
> maybe the suggestion that there might be still gives you some idea of
> what I mean about unintended consequences.
>

Ok. I will try to explain it better and also think about the security
hazards.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Add support for tuple routing to foreign partitions

2018-03-23 Thread Etsuro Fujita


(2018/03/23 4:09), Robert Haas wrote:

On Tue, Feb 27, 2018 at 7:01 AM, Etsuro Fujita

Attached is a new version of the patch set.


I took a look at this patch set today but I really don't think we
should commit something so minimal.  It's got at least four issues
that I see:

1. It still doesn't work for COPY, because COPY isn't going to have a
ModifyTableState.  I really think it ought to be possible to come up
with an API that can handle both INSERT and COPY; I don't think it
should be necessary to have two different APIs for those two problems.
Amit managed to do it for regular tables, and I don't really see a
good reason why foreign tables need to be different.


Maybe I'm missing something, but I think the proposed FDW API could be 
used for the COPY case as well with some modifications to core.  If so, 
my question is: should we support COPY into foreign tables as well?  I 
think that if we support COPY tuple routing for foreign partitions, it 
would be better to support direct COPY into foreign partitions as well.



2. I previously asked why we couldn't use the existing APIs for this,
and you gave me some answer, but I can't help noticing that
postgresExecForeignRouting is an exact copy of
postgresExecForeignInsert.  Apparently, some code reuse is indeed
possible here!  Why not reuse the same function instead of calling a
new one?  If the answer is that planSlot might be NULL in this case,
or something like that, then let's just document that.  A needless
proliferation of FDW APIs is really undesirable, as it raises the bar
for every FDW author.


You've got a point!  I'll change the patch that way.


3. It looks like we're just doing an INSERT for every row, which is
pretty much an anti-pattern for inserting data into a PostgreSQL
database.  COPY is way faster, and even multi-row inserts are
significantly faster.


I planed to work on new FDW API for using COPY for COPY tuple routing 
[1], but I didn't have time for that in this development cycle, so I'm 
re-planning to work on that for PG12.  I'm not sure we can optimize that 
insertion using multi-row inserts because tuple routing works row by 
row, as you know.  Anyway, I think these would be beyond the scope of 
the first version.



4. It doesn't do anything about the UPDATE tuple routing problem
mentioned upthread.

I don't necessarily think that the first patch in this area has to
solve all of those problems, and #4 in particular seems like it might
be a pretty big can of worms.


OK


However, I think that getting INSERT
but not COPY, with bad performance, and with duplicated APIs, is
moving more in the wrong direction than the right one.


Will fix.

Thanks for reviewing the patch!

Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/23990375-45a6-5823-b0aa-a6a7a6a957f0%40lab.ntt.co.jp

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-23 Thread Jeevan Chalke

Hi Robert,

On pgsql-committers Andres reported one concern about test case failure
with installcheck with local settings.
(Sorry, I have not subscribed to that mailing list and thus not able to
reply there).

Attached patch which fixes that.

However, I am not sure whether it is expected to have stable regression run
with installcheck having local settings.
For example, If I have enabale_hashagg = false locally; I will definitely
see failures.

ISTM, that I am missing Andres point here.

Thanks

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 66449694

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index bf8272e..76a8209 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -6,6 +6,8 @@
 SET enable_partitionwise_aggregate TO true;
 -- Enable partitionwise join, which by default is disabled.
 SET enable_partitionwise_join TO true;
+-- Disable parallel plans.
+SET max_parallel_workers_per_gather TO 0;
 --
 -- Tests for list partitioned tables.
 --
@@ -921,6 +923,8 @@ ALTER TABLE pagg_tab_ml_p3 ATTACH PARTITION pagg_tab_ml_p3_s1 FOR VALUES FROM (0
 ALTER TABLE pagg_tab_ml ATTACH PARTITION pagg_tab_ml_p3 FOR VALUES FROM (20) TO (30);
 INSERT INTO pagg_tab_ml SELECT i % 30, i % 10, to_char(i % 4, 'FM') FROM generate_series(0, 2) i;
 ANALYZE pagg_tab_ml;
+-- For Parallel Append
+SET max_parallel_workers_per_gather TO 2;
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
 -- PARTITION KEY, but still we do not see a partial aggregation as array_agg()
@@ -1146,7 +1150,6 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 O
 (12 rows)
 
 -- Parallelism within partitionwise aggregates
-SET max_parallel_workers_per_gather TO 2;
 SET min_parallel_table_scan_size TO '8kB';
 SET parallel_setup_cost TO 0;
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index f7b5f5a..c60d7d2 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -7,6 +7,8 @@
 SET enable_partitionwise_aggregate TO true;
 -- Enable partitionwise join, which by default is disabled.
 SET enable_partitionwise_join TO true;
+-- Disable parallel plans.
+SET max_parallel_workers_per_gather TO 0;
 
 --
 -- Tests for list partitioned tables.
@@ -206,6 +208,9 @@ ALTER TABLE pagg_tab_ml ATTACH PARTITION pagg_tab_ml_p3 FOR VALUES FROM (20) TO
 INSERT INTO pagg_tab_ml SELECT i % 30, i % 10, to_char(i % 4, 'FM') FROM generate_series(0, 2) i;
 ANALYZE pagg_tab_ml;
 
+-- For Parallel Append
+SET max_parallel_workers_per_gather TO 2;
+
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
 -- PARTITION KEY, but still we do not see a partial aggregation as array_agg()
@@ -238,7 +243,6 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 O
 
 -- Parallelism within partitionwise aggregates
 
-SET max_parallel_workers_per_gather TO 2;
 SET min_parallel_table_scan_size TO '8kB';
 SET parallel_setup_cost TO 0;

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

2018-03-23 Thread Teodor Sigaev


Hi!

Patch seems good, but I found one bug in it, in fact, nobody
checks serializible conflict with fastupdate=on:
gininsert()
{
if (GinGetUseFastUpdate())
{
/* two next lines are GinCheckForSerializableConflictIn() */
if (!GinGetUseFastUpdate())
CheckForSerializableConflictIn()
}
}

I  changed to direct call CheckForSerializableConflictIn() (see attachment)

I'd like to see fastupdate=on in test too, now tests cover only case without 
fastupdate. Please, add them.


Shubham Barai wrote:



On 16 March 2018 at 03:57, Alexander Korotkov > wrote:


On Tue, Mar 13, 2018 at 3:25 PM, Alvaro Herrera > wrote:

Alexander Korotkov wrote:

> And what happen if somebody concurrently set (fastupdate = on)?
> Can we miss conflicts because of that?

I think it'd be better to have that option require AccessExclusive lock,
so that it can never be changed concurrently with readers.  Seems to me
that penalizing every single read to cope with this case would be a bad
trade-off.


As Andrey Borodin mentioned, we already do.  Sorry for buzz :)



I have updated the patch based on suggestions.

Regards,
Shubham


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
commit b69b24ac54f3d31c4c25026aefd2c47ade2ed571
Author: Teodor Sigaev 
Date:   Fri Mar 23 14:25:24 2018 +0300

x

diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 37070b3b72..095b1192cb 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -17,6 +17,7 @@
 #include "access/gin_private.h"
 #include "access/ginxlog.h"
 #include "access/xloginsert.h"
+#include "storage/predicate.h"
 #include "miscadmin.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -515,6 +516,19 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
 			btree->fillRoot(btree, newrootpg,
 			BufferGetBlockNumber(lbuffer), newlpage,
 			BufferGetBlockNumber(rbuffer), newrpage);
+
+			if (GinPageIsLeaf(BufferGetPage(stack->buffer)))
+			{
+
+PredicateLockPageSplit(btree->index,
+		BufferGetBlockNumber(stack->buffer),
+		BufferGetBlockNumber(lbuffer));
+
+PredicateLockPageSplit(btree->index,
+		BufferGetBlockNumber(stack->buffer),
+		BufferGetBlockNumber(rbuffer));
+			}
+
 		}
 		else
 		{
@@ -524,6 +538,14 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
 			GinPageGetOpaque(newrpage)->rightlink = savedRightLink;
 			GinPageGetOpaque(newlpage)->flags |= GIN_INCOMPLETE_SPLIT;
 			GinPageGetOpaque(newlpage)->rightlink = BufferGetBlockNumber(rbuffer);
+
+			if (GinPageIsLeaf(BufferGetPage(stack->buffer)))
+			{
+
+PredicateLockPageSplit(btree->index,
+		BufferGetBlockNumber(stack->buffer),
+		BufferGetBlockNumber(rbuffer));
+			}
 		}
 
 		/*
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index f9daaba52e..3fb4fc8264 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -19,6 +19,7 @@
 #include "access/xloginsert.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
+#include "storage/predicate.h"
 #include "utils/rel.h"
 
 /*
@@ -1423,7 +1424,7 @@ disassembleLeaf(Page page)
  * Any segments that acquire new items are decoded, and the new items are
  * merged with the old items.
  *
- * Returns true if any new items were added. False means they were all
+ * Returns true if any new items were added. false means they were all
  * duplicates of existing items on the page.
  */
 static bool
@@ -1759,7 +1760,7 @@ leafRepackItems(disassembledLeaf *leaf, ItemPointer remaining)
  */
 BlockNumber
 createPostingTree(Relation index, ItemPointerData *items, uint32 nitems,
-  GinStatsData *buildStats)
+  GinStatsData *buildStats, Buffer entrybuffer)
 {
 	BlockNumber blkno;
 	Buffer		buffer;
@@ -1810,6 +1811,12 @@ createPostingTree(Relation index, ItemPointerData *items, uint32 nitems,
 	page = BufferGetPage(buffer);
 	blkno = BufferGetBlockNumber(buffer);
 
+	/*
+	 * Copy a predicate lock from entry tree leaf (containing posting list)
+	 * to  posting tree.
+	 */
+	PredicateLockPageSplit(index, BufferGetBlockNumber(entrybuffer), blkno);
+
 	START_CRIT_SECTION();
 
 	PageRestoreTempPage(tmppage, page);
@@ -1904,6 +1911,7 @@ ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
 		btree.itemptr = insertdata.items[insertdata.curitem];
 		stack = ginFindLeafPage(, false, NULL);
 
+		GinCheckForSerializableConflictIn(btree.index, NULL, stack->buffer);
 		ginInsertValue(, stack, , buildStats);
 	}
 }
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 6fe67f346d..63603859bc 100644
---

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-23 Thread Pavan Deolasee

On Fri, Mar 23, 2018 at 6:45 AM, Peter Geoghegan  wrote:

> On Thu, Mar 22, 2018 at 6:02 PM, Alvaro Herrera 
> wrote:
> > Incremental development is a good thing.  Trying to do everything in a
> > single commit is great when time is infinite or even merely very long,
> > but if you run out of it, which I'm sure is common, leaving some things
> > out that can be reasonable implemented in a separate patch is perfectly
> > acceptable.
>
> We're talking about something that took me less than an hour to get
> working. AFAICT, it's just a matter of tweaking the grammar, and
> adding a bit of transformWithClause() boilerplate to the start of
> transformMergeStmt().
>>
>>
>> I quickly implemented CTE support myself (not wCTE support, since
>> MERGE doesn't use RETURNING), and it wasn't tricky. It seems to work
>> when I mechanically duplicate the approach taken with other types of
>> DML statement in the parser. I have written a few tests, and so far it
>> holds up.
>>
>
> Ok, thanks. I started doing something similar, but great if you have
> already implemented. I will focus on other things for now.
>
>

I am sorry. I was under the impression that you're actually writing this
piece of code and hence did not pay much attention till now. I should have
confirmed with you instead of assuming. I think it's a bit too late now,
but I will give it a fair try tomorrow. I don't want to spend too much time
on it though given how close we are to the deadline. As Alvaro said, we can
always revisit this for pg12.


> As I've pointed out on this thread already, I'm often concerned about
> supporting functionality like this because it increases my overall
> confidence in the design. If it was genuinely hard to add WITH clause
> support, then that would probably tell us something about the overall
> design that likely creates problems elsewhere. It's easy to say that
> it isn't worth holding the patch up for WITH clause support, because
> that's true, but it's also beside the point.
>

Understood.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Odd procedure resolution

2018-03-23 Thread Ashutosh Bapat

Incidently the fix looks quite simple. See patch attached.

With this patch we have a diffs in create_procedure test like
  CALL random();  -- error
! ERROR:  random() is not a procedure
  LINE 1: CALL random();
   ^
! HINT:  To call a function, use SELECT.
  CREATE FUNCTION cp_testfunc1(a int) RETURNS int LANGUAGE SQL AS $$
SELECT a $$;
  CREATE TABLE cp_test (a int, b text);
  CREATE PROCEDURE ptest1(x text)
--- 4,13 
   ^
  HINT:  No function matches the given name and argument types. You
might need to add explicit type casts.
  CALL random();  -- error
! ERROR:  function random() does not exist
  LINE 1: CALL random();
   ^
! HINT:  No function matches the given name and argument types. You
might need to add explicit type casts.
  CREATE FUNCTION cp_testfunc1(a int) RETURNS int LANGUAGE SQL AS $$
SELECT a $$;
  CREATE TABLE cp_test (a int, b text);
  CREATE PROCEDURE ptest1(x text)

If we replace "function" with "procedure" the new error messages read
"procedure random() does not exist" "No procedure matches the given
...". Those messages look better than "random() is not a procedure".

But I haven't fixed the error messages in this patch. I need to first
see if the changes are acceptable.


On Fri, Mar 23, 2018 at 3:53 PM, Ashutosh Bapat
 wrote:
> Hi,
> Consider following scenario
>
> create function foo(a int) returns integer as $$begin return a; end;
> $$ language plpgsql;
> create procedure foo(a float) as $$begin end; $$ language plpgsql;
> call foo(1);
> psql:proc_func_resolution.sql:8: ERROR:  foo(integer) is not a procedure
> LINE 1: call foo(1);
>  ^
> HINT:  To call a function, use SELECT.
>
> to me the error message looks confusing. I am using CALL, which means
> I am trying to invoke a "procedure" not a "function" and there exists
> one which can be invoked. If I drop function foo() and try call again,
> it succeeds.
>
> drop function foo(a int);
> DROP FUNCTION
> call foo(1);
> CALL
>
> Functions and Procedures are two different objects and we enforce
> different methods to invoke those, SELECT and CALL resp. So, we should
> be able to filter out one or the other and try to find best candidate
> of a given kind.
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 52dd400..2a8b78d 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -901,7 +901,7 @@ TypeIsVisible(Oid typid)
  */
 FuncCandidateList
 FuncnameGetCandidates(List *names, int nargs, List *argnames,
-	  bool expand_variadic, bool expand_defaults,
+	  bool expand_variadic, bool expand_defaults, char prokind,
 	  bool missing_ok)
 {
 	FuncCandidateList resultList = NULL;
@@ -948,6 +948,10 @@ FuncnameGetCandidates(List *names, int nargs, List *argnames,
 		int		   *argnumbers = NULL;
 		FuncCandidateList newResult;
 
+		/* Match prokind if specific one is requested. */
+		if (prokind != PROKIND_INVALID && prokind != procform->prokind)
+			continue;
+
 		if (OidIsValid(namespaceId))
 		{
 			/* Consider only procs in specified namespace */
@@ -1409,7 +1413,8 @@ FunctionIsVisible(Oid funcid)
 		visible = false;
 
 		clist = FuncnameGetCandidates(list_make1(makeString(proname)),
-	  nargs, NIL, false, false, false);
+	  nargs, NIL, false, false,
+	  PROKIND_INVALID, false);
 
 		for (; clist; clist = clist->next)
 		{
diff --git a/src/backend/catalog/pg_aggregate.c b/src/backend/catalog/pg_aggregate.c
index 50d8d81..27530b0 100644
--- a/src/backend/catalog/pg_aggregate.c
+++ b/src/backend/catalog/pg_aggregate.c
@@ -803,6 +803,7 @@ lookup_agg_function(List *fnName,
 	 */
 	fdresult = func_get_detail(fnName, NIL, NIL,
 			   nargs, input_types, false, false,
+			   PROKIND_INVALID,
 			   , rettype, ,
 			   , ,
 			   _oid_array, NULL);
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index ea5d521..5784691 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -247,6 +247,7 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
 	fdresult = func_get_detail(funcname, fargs, argnames, nargs,
 			   actual_arg_types,
 			   !func_variadic, true,
+			   proc_call ? PROKIND_PROCEDURE : PROKIND_INVALID,
 			   , , ,
 			   , ,
 			   _arg_types, );
@@ -1316,6 +1317,7 @@ func_get_detail(List *funcname,
 Oid *argtypes,
 bool expand_variadic,
 bool expand_defaults,
+char prokind,
 Oid *funcid,	/* return value */
 Oid *rettype,	/* return value */
 bool *retset,	/* return value */
@@ -1342,7 +1344,7 @@ func_get_detail(List *funcname,
 
 	/* Get list of possible candidates from namespace search */
 	raw_candidates =

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-23 Thread Jeevan Chalke

On Thu, Mar 22, 2018 at 10:28 PM, Robert Haas  wrote:

> On Thu, Mar 22, 2018 at 6:15 AM, Jeevan Chalke
>  wrote:
> > Leeks cleaner now. Thanks for refactoring it.
> >
> > I have merged these changes and flatten all previuos changes into the
> main
> > patch.
>
> Committed 0001-0005.


Thanks Robert.


>   I made a few further modifications.  These were
> mostly cosmetic, but with two exceptions:
>
> 1. I moved one set_cheapest() call to avoid doing that twice for the
> top-level grouped_rel.
>
> 2. I removed the logic to set partition properties for grouped_rels.
> As far as I can see, there's nothing that needs this.  It would be
> important if we wanted subsequent planning stages to be able to do
> partition-wise stuff, e.g. when doing window functions or setops, or
> at higher query levels.  Maybe we'll have that someday; until then, I
> think this is just a waste of cycles.
>

OK.

Changes related to postgres_fdw which allows pushing aggregate on the
foreign server is not yet committed. Due to this, we will end up getting an
error when we have foreign partitions + aggregation.

Attached 0001 patch here (0006 from my earlier patch-set) which adds
support for this and thus will not have any error.

Thanks


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
From 1173c609f6c6b2b45c0d1ce05533b840f7885717 Mon Sep 17 00:00:00 2001
From: Jeevan Chalke 
Date: Fri, 23 Mar 2018 14:23:53 +0530
Subject: [PATCH] Teach postgres_fdw to push aggregates for child relations
 too.

GetForeignUpperPaths() now takes an extra void parameter which will
be used to pass any additional details required to create an upper
path at the remote server. However, we support only grouping over
remote server today and thus it passes grouping specific details i.e.
GroupPathExtradata, NULL otherwise.

Since we don't know how to get a partially aggregated result from a
remote server, only full aggregation is pushed on the remote server.
---
 contrib/postgres_fdw/expected/postgres_fdw.out | 132 +
 contrib/postgres_fdw/postgres_fdw.c|  48 ++---
 contrib/postgres_fdw/postgres_fdw.h|   2 +
 contrib/postgres_fdw/sql/postgres_fdw.sql  |  51 ++
 doc/src/sgml/fdwhandler.sgml   |   8 +-
 src/backend/optimizer/plan/createplan.c|   7 +-
 src/backend/optimizer/plan/planner.c   |  29 +++---
 src/backend/optimizer/prep/prepunion.c |   2 +-
 src/include/foreign/fdwapi.h   |   3 +-
 src/include/optimizer/planner.h|   3 +-
 10 files changed, 253 insertions(+), 32 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d6e387..a211aa9 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -7852,3 +7852,135 @@ SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE
 (14 rows)
 
 RESET enable_partitionwise_join;
+-- ===
+-- test partitionwise aggregates
+-- ===
+CREATE TABLE pagg_tab (a int, b int, c text) PARTITION BY RANGE(a);
+CREATE TABLE pagg_tab_p1 (LIKE pagg_tab);
+CREATE TABLE pagg_tab_p2 (LIKE pagg_tab);
+CREATE TABLE pagg_tab_p3 (LIKE pagg_tab);
+INSERT INTO pagg_tab_p1 SELECT i % 30, i % 50, to_char(i/30, 'FM') FROM generate_series(1, 3000) i WHERE (i % 30) < 10;
+INSERT INTO pagg_tab_p2 SELECT i % 30, i % 50, to_char(i/30, 'FM') FROM generate_series(1, 3000) i WHERE (i % 30) < 20 and (i % 30) >= 10;
+INSERT INTO pagg_tab_p3 SELECT i % 30, i % 50, to_char(i/30, 'FM') FROM generate_series(1, 3000) i WHERE (i % 30) < 30 and (i % 30) >= 20;
+-- Create foreign partitions
+CREATE FOREIGN TABLE fpagg_tab_p1 PARTITION OF pagg_tab FOR VALUES FROM (0) TO (10) SERVER loopback OPTIONS (table_name 'pagg_tab_p1');
+CREATE FOREIGN TABLE fpagg_tab_p2 PARTITION OF pagg_tab FOR VALUES FROM (10) TO (20) SERVER loopback OPTIONS (table_name 'pagg_tab_p2');;
+CREATE FOREIGN TABLE fpagg_tab_p3 PARTITION OF pagg_tab FOR VALUES FROM (20) TO (30) SERVER loopback OPTIONS (table_name 'pagg_tab_p3');;
+ANALYZE pagg_tab;
+ANALYZE fpagg_tab_p1;
+ANALYZE fpagg_tab_p2;
+ANALYZE fpagg_tab_p3;
+-- When GROUP BY clause matches with PARTITION KEY.
+-- Plan with partitionwise aggregates is disabled
+SET enable_partitionwise_aggregate TO false;
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+  QUERY PLAN   
+---
+ Sort
+   Sort Key: fpagg_tab_p1.a
+

Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

2018-03-23 Thread Simon Riggs

On 23 March 2018 at 09:22, Michael Paquier  wrote:
> On Fri, Mar 23, 2018 at 09:04:55AM +, Simon Riggs wrote:
>> So it shows clear benefit for both bulk actions and OLTP, with no 
>> regressions.
>>
>> No objection exists to the approach used in the patch, so I'm now
>> ready to commit this.
>>
>> Last call for objections?
>
> Please hold on.  It is Friday evening here.  First I cannot take the
> time to articulate an answer you are requesting on the other thread
> part.  Second, the latest version of the patch has been sent from Pavan
> a couple of hours ago, and you are proposing to commit it in a message
> sent 29 minutes after the last version has been sent.
>
> Let's cool down a bit and take some time for reviews, please.  You still
> have one week until the end of the commit fest anyway.

Your assumption that I would commit a new patch that was 29 mins old
is frankly pretty ridiculous, so yes, lets keep calm.

Enjoy your weekend and I'll be happy to read your review on Monday.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-23 Thread Etsuro Fujita


(2018/03/22 18:31), Amit Langote wrote:

On 2018/03/20 20:53, Etsuro Fujita wrote:

Here are comments on executor changes in (the latest version of) the patch:

@@ -421,8 +424,18 @@ ExecInsert(ModifyTableState *mtstate,
  ItemPointerData conflictTid;
  boolspecConflict;
  List   *arbiterIndexes;
+PartitionTupleRouting *proute =
+mtstate->mt_partition_tuple_routing;

-arbiterIndexes = node->arbiterIndexes;
+/* Use the appropriate list of arbiter indexes. */
+if (mtstate->mt_partition_tuple_routing != NULL)
+{
+Assert(partition_index>= 0&&  proute != NULL);
+arbiterIndexes =
+proute->partition_arbiter_indexes[partition_index];
+}
+else
+arbiterIndexes = node->arbiterIndexes;

To handle both cases the same way, I wonder if it would be better to have
the arbiterindexes list in ResultRelInfo as well, as mentioned by Alvaro
upthread, or to re-add mt_arbiterindexes as before and set it to
proute->partition_arbiter_indexes[partition_index] before we get here,
maybe in ExecPrepareTupleRouting, in the case of tuple routing.


It's a good idea.  I somehow missed that Alvaro had already mentioned it.

In HEAD, we now have ri_onConflictSetProj and ri_onConflictSetWhere.  I
propose we name the field ri_onConflictArbiterIndexes as done in the
updated patch.


I like that naming.


@@ -2368,9 +2419,13 @@ ExecInitModifyTable(ModifyTable *node, EState
*estate, int eflags)
  econtext = mtstate->ps.ps_ExprContext;
  relationDesc = resultRelInfo->ri_RelationDesc->rd_att;

-/* initialize slot for the existing tuple */
-mtstate->mt_existing =
-ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc);
+/*
+ * Initialize slot for the existing tuple.  We determine which
+ * tupleDesc to use for this after we have determined which relation
+ * the insert/update will be applied to, possibly after performing
+ * tuple routing.
+ */
+mtstate->mt_existing = ExecInitExtraTupleSlot(mtstate->ps.state,
NULL);

  /* carried forward solely for the benefit of explain */
  mtstate->mt_excludedtlist = node->exclRelTlist;
@@ -2378,8 +2433,16 @@ ExecInitModifyTable(ModifyTable *node, EState
*estate, int eflags)
  /* create target slot for UPDATE SET projection */
  tupDesc = ExecTypeFromTL((List *) node->onConflictSet,
   relationDesc->tdhasoid);
+PinTupleDesc(tupDesc);
+mtstate->mt_conflproj_tupdesc = tupDesc;
+
+/*
+ * Just like the "existing tuple" slot, we'll defer deciding which
+ * tupleDesc to use for this slot to a point where tuple routing has
+ * been performed.
+ */
  mtstate->mt_conflproj =
-ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc);
+ExecInitExtraTupleSlot(mtstate->ps.state, NULL);

If we do ExecInitExtraTupleSlot for mtstate->mt_existing and
mtstate->mt_conflproj in ExecPrepareTupleRouting in the case of tuple
routing, as said above, we wouldn't need this changes.  I think doing that
only in the case of tuple routing and keeping this as-is would be better
because that would save cycles in the normal case.


Hmm, I think we shouldn't be doing ExecInitExtraTupleSlot in
ExecPrepareTupleRouting, because we shouldn't have more than one instance
of mtstate->mt_existing and mtstate->mt_conflproj slots.


Yeah, I think so too.  What I was going to say here is 
ExecSetSlotDescriptor, not ExecInitExtraTupleSlot, as you said below. 
Sorry about the incorrectness.  I guess I was too tired when writing 
that comments.



As you also said above, I think you meant to say here that we do
ExecInitExtraTupleSlot only once for both mtstate->mt_existing and
mtstate->mt_conflproj in ExecInitModifyTable and only do
ExecSetSlotDescriptor in ExecPrepareTupleRouting.


That's right.


I have changed it so
that ExecInitModifyTable now both creates the slot and sets the descriptor
for non-tuple-routing cases and only creates but doesn't set the
descriptor in the tuple-routing case.


IMHO I don't see much value in modifying code as such, because we do 
ExecSetSlotDescriptor for mt_existing and mt_conflproj in 
ExecPrepareTupleRouting for every inserted tuple.  So, I would leave 
that as-is, to keep that simple.



For ExecPrepareTupleRouting to be able to access the tupDesc of the
onConflictSet target list, I've added ri_onConflictSetProjTupDesc which is
set by ExecInitPartitionInfo on first call for a give partition.  This is
also suggested by Pavan in his review.


Seems like a good idea.

Here are some comments on the latest version of the patch:

+   /*
+* Caller must set mtstate->mt_conflproj's tuple 
descriptor to

+

Odd procedure resolution

2018-03-23 Thread Ashutosh Bapat

Hi,
Consider following scenario

create function foo(a int) returns integer as $$begin return a; end;
$$ language plpgsql;
create procedure foo(a float) as $$begin end; $$ language plpgsql;
call foo(1);
psql:proc_func_resolution.sql:8: ERROR:  foo(integer) is not a procedure
LINE 1: call foo(1);
 ^
HINT:  To call a function, use SELECT.

to me the error message looks confusing. I am using CALL, which means
I am trying to invoke a "procedure" not a "function" and there exists
one which can be invoked. If I drop function foo() and try call again,
it succeeds.

drop function foo(a int);
DROP FUNCTION
call foo(1);
CALL

Functions and Procedures are two different objects and we enforce
different methods to invoke those, SELECT and CALL resp. So, we should
be able to filter out one or the other and try to find best candidate
of a given kind.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Faster inserts with mostly-monotonically increasing values

2018-03-23 Thread Pavan Deolasee

Hi Andrew and Claudio,

Thanks for the review!

On Fri, Mar 23, 2018 at 10:19 AM, Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:

> On Fri, Mar 23, 2018 at 10:20 AM, Claudio Freire 
> wrote:
>
>
> This patch looks in pretty good shape. I have been trying hard to
> think of some failure mode but haven't been able to come up with one.
>
>
Great!


>
> > Some comments
> >
> > +/*
> > + * It's very common to have an index on an auto-incremented or
> > + * monotonically increasing value. In such cases, every insertion
> happens
> > + * towards the end of the index. We try to optimise that case by
> caching
> > + * the right-most block of the index. If our cached block is still
> the
> > + * rightmost block, has enough free space to accommodate a new
> entry and
> > + * the insertion key is greater or equal to the first key in this
> page,
> > + * then we can safely conclude that the new key will be inserted in
> the
> > + * cached block. So we simply search within the cached page and
> insert the
> > + * key at the appropriate location. We call it a fastpath.
> >
> > It should say "the insertion key is strictly greater than the first key"
>
>
Good catch. Fixed.


>
> >
> > Also, "rightmost block" != "rightmost leaf page" ("leaf" being the key
> > difference). So it should say "rightmost leaf page".
>
>
> right.
>
>
Fixed.


> [...]
> >
> > Setting "offset = InvalidOffsetNumber" in that contorted way is
> > unnecessary. You can remove the first assignment and instead
> > initialize unconditionally right after the fastpath block (that
> > doesn't make use of offset anyway):
>
>
> Yes, I noticed that and it's confusing, Just set it at the top.
>
>
Good idea. Changed that way.


> >
> > Having costs in explain tests can be fragile. Better use "explain
> > (costs off)". If you run "make check" continuously in a loop, you
> > should get some failures related to that pretty quickly.
> >
>
> Agree about costs off, but I'm fairly dubious of the value of using
> EXPLAIN at all here.Nothing in the patch should result in any change
> in EXPLAIN output.
>

I agree. I initially added EXPLAIN to ensure that we're doing index-only
scans. But you're correct, we don't need them in the tests itself.


>
> I would probably just have a few regression lines that should be sure
> to exercise the code path and leave it at that.
>
>
I changed the regression tests to include a few more scenarios, basically
using multi-column indexes in different ways and they querying rows by
ordering rows in different ways. I did not take away the vacuum and I
believe it will actually help the tests by introducing some fuzziness in
the tests i.e. if the vacuum does not do its job, we might execute a
different plan and ensure that the output remains unchanged.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pg_btree_target_block_v4.patch
Description: Binary data

Re: [PATCH] Verify Checksums during Basebackups

2018-03-23 Thread Michael Banck

Hi David,

thanks for the review!

Am Donnerstag, den 22.03.2018, 12:22 -0400 schrieb David Steele:
> On 3/17/18 5:34 PM, Michael Banck wrote:
> > On Fri, Mar 09, 2018 at 10:35:33PM +0100, Michael Banck wrote:
> > I think most people (including those I had off-list discussions about
> > this with) were of the opinion that such an option should be there, so I
> > added an additional option NOVERIFY_CHECKSUMS to the BASE_BACKUP
> > replication command and also an option -k / --no-verify-checksums to
> > pg_basebackup to trigger this.
> > 
> > Updated patch attached.
> 
> +memcpy(page, (buf + BLCKSZ * i), BLCKSZ);
> 
> Why make a copy here?  How about:
> 
> char *page = buf + BLCKSZ * i

Right, ok.

> I know pg_checksum_page manipulates the checksum field but I have found
> it to be safe.
> 
> +if (phdr->pd_checksum != checksum)
> 
> I've attached a patch that adds basic retry functionality.  It's not
> terrible efficient since it rereads the entire buffer for any block
> error.  A better way is to keep a bitmap for each block in the buffer,
> then on retry compare bitmaps.  If the block is still bad, report it.
> If the block was corrected moved on.  If a block was good before but is
> bad on retry it can be ignored.

I have to admit I find it a bit convoluted and non-obvious on first
reading, but I'll try to check it out some more.

I think it would be very useful if we could come up with a testcase
showing this problem, but I guess this will be quite hard to hit
reproducibly, right?

> +ereport(WARNING,
> +(errmsg("checksum verification failed in file "
> 
> I'm worried about how verbose this warning could be since there are
> 131,072 blocks per segment.  It's unlikely to have that many block
> errors, but users do sometimes put files in PGDATA which look like they
> should be validated.  Since these warnings all go to the server log it
> could get pretty bad.

We only verify checksums of files in tablespaces, and I don't think
dropping random files in those is supported in any way, as opposed to
maybe the top-level PGDATA directory. So I would say that this is not a
real concern.

> We should either stop warning after the first failure, or aggregate the
> failures for a file into a single message.

I agree that major corruption could make the whole output blow up but I
would prefer to keep this feature simple for now, which implies possibly
 printing out a lot of WARNING or maybe just stopping after the first
one (or first few, dunno).

Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Comment update in BuildTupleFromCStrings()

2018-03-23 Thread Ashutosh Bapat

Hi,
BuildTupleFromCStrings() has comment "/* Call the "in" function for
each non-dropped attribute */". It then calls the in function even
when it's going to set that attribute to NULL.
1189 if (!TupleDescAttr(tupdesc, i)->attisdropped)
1190 {
1191 /* Non-dropped attributes */
1192 dvalues[i] = InputFunctionCall(>attinfuncs[i],
1193values[i],
1194attinmeta->attioparams[i],
1195attinmeta->atttypmods[i]);
1196 if (values[i] != NULL)
1197 nulls[i] = false;
1198 else
1199 nulls[i] = true;
1200 }

 If we are setting isnull to true i.e. it's a NULL value, dvalues
value doesn't matter but we still invoke corresponding in function,
which looks strange and the comment doesn't help. But there's code in
make_tuple_from_result_row() which does the same thing and explain why
we need to invoke in() function even on the NULL values. I thought,
the same comment applies here. Here's patch to update the comment in
BuildTupleFromCStrings().

The code in make_tuple_from_result_row() that converts an array of
values string to tuple looks quite similar to what
BuildTupleFromCStrings() is doing with a small difference that
make_tuple_from_result_row() maintains and error context to report the
attribute whose in() function caused an error. May be we could pass an
optional error context to the later and use it.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
From c8faf979d34a04099597bdbbe952ca7a936e5dbe Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Fri, 23 Mar 2018 14:37:46 +0530
Subject: [PATCH] Comment fix in BuildTupleFromCStrings()

Explain in what case we need to invoke in() function even when the
value string is NULL in BuildTupleFromCStrings().

Ashutosh Bapat
---
 src/backend/executor/execTuples.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index c46d65c..5089b7f 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -1183,7 +1183,10 @@ BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values)
 	dvalues = (Datum *) palloc(natts * sizeof(Datum));
 	nulls = (bool *) palloc(natts * sizeof(bool));
 
-	/* Call the "in" function for each non-dropped attribute */
+	/*
+	 * Call the "in" function for each non-dropped attribute, even for nulls,
+	 * to support domains.
+	 */
 	for (i = 0; i < natts; i++)
 	{
 		if (!TupleDescAttr(tupdesc, i)->attisdropped)
-- 
1.7.9.5

Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

2018-03-23 Thread Michael Paquier

On Fri, Mar 23, 2018 at 09:04:55AM +, Simon Riggs wrote:
> So it shows clear benefit for both bulk actions and OLTP, with no regressions.
> 
> No objection exists to the approach used in the patch, so I'm now
> ready to commit this.
> 
> Last call for objections?

Please hold on.  It is Friday evening here.  First I cannot take the
time to articulate an answer you are requesting on the other thread
part.  Second, the latest version of the patch has been sent from Pavan
a couple of hours ago, and you are proposing to commit it in a message
sent 29 minutes after the last version has been sent.

Let's cool down a bit and take some time for reviews, please.  You still 
have one week until the end of the commit fest anyway.

Thanks,
--
Michael

signature.asc
Description: PGP signature

Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

2018-03-23 Thread Simon Riggs

On 23 March 2018 at 08:35, Pavan Deolasee  wrote:
>
>
> On Fri, Mar 9, 2018 at 8:49 PM, Peter Eisentraut
>  wrote:
>>
>> On 2/1/18 19:21, Simon Riggs wrote:
>> > If we really can't persuade you of that, it doesn't sink the patch. We
>> > can have the WAL pointer itself - it wouldn't save space but it would
>> > at least alleviate the spinlock.
>>
>> Do you want to send in an alternative patch that preserves the WAL
>> pointer and only changes the locking?
>>
>
> Sorry for the delay. Here is an updated patch which now replaces xl_prev
> with xl_curr, thus providing similar safeguards against corrupted or torn
> WAL pages, but still providing benefits of atomic operations.
>
> I repeated the same set of tests and the results are almost similar. These
> tests are done on a different AWS instance though and hence not comparable
> to previous tests. What we do in these tests is essentially call
> ReserveXLogInsertLocation() 1M times to reserve 256 bytes each time, in a
> single select statement (the function is exported and a SQL-callable routine
> is added for the purpose of the tests)
>
> Patched:
> ==
> #clients  #tps
> 1 34.195311
> 2 29.001584
> 4 31.712009
> 8 35.489272
> 16 41.950044
>
> Master:
> ==
> #clients  #tps
> 1 24.128004
> 2 12.326135
> 4 8.334143
> 8 16.035699
> 16 8.502794
>
> So that's pretty good improvement across the spectrum.

So it shows clear benefit for both bulk actions and OLTP, with no regressions.

No objection exists to the approach used in the patch, so I'm now
ready to commit this.

Last call for objections?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

2018-03-23 Thread Simon Riggs

On 2 February 2018 at 02:17, Michael Paquier  wrote:
> On Fri, Feb 02, 2018 at 12:21:49AM +, Simon Riggs wrote:
>> Yes, it would be about 99% of the time.
>
> When it comes to recovery, I don't think that 99% is a guarantee
> sufficient.  (Wondering about the maths behind such a number as well.)
>
>> But you have it backwards - we are not assuming that case. That is the
>> only case that has risk - the one where an old WAL record starts at
>> exactly the place the latest one stops. Otherwise the rest of the WAL
>> record will certainly fail the CRC check, since it will effectively
>> have random data in it, as you say.
>
> Your patch assumes that a single WAL segment recycling is fine to
> escape based on the file name, but you need to think beyond that.  It
> seems to me that your assumption is wrong if the tail of a segment gets
> reused after more cycles than a single one, which could happen when
> doing recovery from an archive, where segments used could have junk in
> them.  So you actually *increase* the odds of problems if a segment is
> forcibly switched and archived, then reused in recovery after being
> fetched from an archive.

This seems to be a pivotal point in your argument, yet it is just an assertion.

Please explain for the archive why you think the odds increase in the
way you describe.

Thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Prefix operator for text and spgist support

2018-03-23 Thread Alexander Korotkov

On Thu, Mar 22, 2018 at 7:09 PM, Teodor Sigaev  wrote:

> Patch looks resonable, but I see some place to improvement:
> spg_text_leaf_consistent() only needs to check with text_startswith() if
> reconstucted value came to leaf consistent is shorter than given prefix.
> For example, if level >= length of prefix then we guarantee that fully
> reconstracted is matched too. But do not miss that you may need to return
> value for index only scan, consult returnData field
>
> In attachment rebased and minorly edited version of your patch.

I took a look at this patch.  In addition to Teodor's comments I can note
following.

* This line looks strange for me.

-   if (strategy > 10)
+   if (strategy > 10 && strategy != RTPrefixStrategyNumber)

It's not because we added strategy != RTPrefixStrategyNumber condition
there.
It's because we already used magic number here and now have a mix of magic
number and macro constant in one line.  Once we anyway touch this place,
could we get rid of magic numbers here?

* I'm a little concern about operator name.  We're going to choose @^
operator for
prefix search without any preliminary discussion.  However, personally I
don't
have better ideas :)

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

2018-03-23 Thread Pavan Deolasee

On Fri, Mar 9, 2018 at 8:49 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2/1/18 19:21, Simon Riggs wrote:
> > If we really can't persuade you of that, it doesn't sink the patch. We
> > can have the WAL pointer itself - it wouldn't save space but it would
> > at least alleviate the spinlock.
>
> Do you want to send in an alternative patch that preserves the WAL
> pointer and only changes the locking?
>
>
Sorry for the delay. Here is an updated patch which now replaces xl_prev
with xl_curr, thus providing similar safeguards against corrupted or torn
WAL pages, but still providing benefits of atomic operations.

I repeated the same set of tests and the results are almost similar. These
tests are done on a different AWS instance though and hence not comparable
to previous tests. What we do in these tests is essentially
call ReserveXLogInsertLocation() 1M times to reserve 256 bytes each time,
in a single select statement (the function is exported and a SQL-callable
routine is added for the purpose of the tests)

Patched:
==
#clients  #tps
1 34.195311
2 29.001584
4 31.712009
8 35.489272
16 41.950044

Master:
==
#clients  #tps
1 24.128004
2 12.326135
4 8.334143
8 16.035699
16 8.502794

So that's pretty good improvement across the spectrum.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pg_reduce_wal_contention_v2.patch
Description: Binary data

Re: Protect syscache from bloating with negative cache entries

2018-03-23 Thread Kyotaro HORIGUCHI

Hello.

At Wed, 21 Mar 2018 15:28:07 -0400, David Steele  wrote in 
<43095b16-14fc-e4d8-3310-2b86eaaab...@pgmasters.net>
> On 3/15/18 1:12 AM, Kyotaro HORIGUCHI wrote:
> > At Mon, 12 Mar 2018 17:34:08 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
> >  wrote in > 
> > The attached is the patch set including this plancache stuff.
> > 
> > 0001- catcache time-based expiration (The origin of this thread)
> > 0002- introduces dynahash pruning feature
> > 0003- implement relcache pruning using 0002
> > 0004- (perhaps) independent from the three above. PoC of
> >   plancache pruning. Details are shown above.
> 
> It looks like this should be marked Needs Review so I have done so.  If
> that's not right please change it back or let me know and I will.

Mmm. I haven't noticed that. Thanks!

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Error detail/hint style fixup

2018-03-23 Thread Daniel Gustafsson

> On 22 Mar 2018, at 22:38, Tom Lane  wrote:

> This stuff seems reasonably non-controversial, so pushed.

Thanks!

> BTW, really the point of what I'd mentioned before was to avoid having
> dblink_res_error constructing a message out of fragments, which it's
> still doing.  I'd thought perhaps we would shove the responsibility for
> mentioning the connection name out to the callers to get rid of that.
> But handling the possibility of an unnamed connection seems like it'd
> complicate the callers considerably.  And as long as we don't actually
> have translation support in that module, it'd just be make-work, so
> I didn't do it.

Right, that would be a lof of complexity for no real gain until dblink is
translated.

cheers ./daniel

Re: Problem while setting the fpw with SIGHUP

2018-03-23 Thread Michael Paquier

On Tue, Mar 20, 2018 at 12:00:47PM +0530, Dilip Kumar wrote:
> Yeah, you are right.  Fixed.

So I have been spending a couple of hours playing with your patch, and
tested various configurations manually, like switch the fpw switch to on
and off while running in parallel pgbench.  I have also tested
promotions, etc.

While the patch does its job, it is possible to simplify a bit more the
calls to InitXLogInsert().  Particularly, the one in CreateCheckpoint()
is basically useless for the checkpointer, still it is useful for the
startup process when trigerring an end-in-recovery checkpoint.  So one
additional cleanup would be to move the call in CreateCheckpoint() to
bootstrap.c for the startup process.  In order to test that, please make
sure to create fallback_promote at the root of the data folder before
sending SIGUSR2 to the startup process which would trigger the pre-9.3
promotion where the end-of-recovery checkpoint is triggered by the
startup process itself.

+   /* Initialize the working areas for constructing WAL records. */
+   InitXLogInsert();
Instead of having the same comment for each process calling
InitXLogInsert() multiple times, I think that it would be better to
complete the comment in bootstrap.c where is written "XLOG operations".

Here is a suggestion:
/*
 * Initialize WAL-related operations and enter the main loop of each
 * process.  InitXLogInsert is called for each process which can
 * generate WAL.  While this is wasteful for processes started on
 * a standby, this gives the guarantee that initialization of WAL
 * insertion areas is able to happen in a consistent way and out of
 * any critical sections so as the facility is usable when a promotion
 * is triggered.
 */

What do you think?
--
Michael


signature.asc
Description: PGP signature

Re: Removing useless DISTINCT clauses

2018-03-23 Thread Laurenz Albe

David Rowley wrote:
> > Would it be very difficult to extend that to "if any unique constraints are
> > contained in the DISTINCT clause"?
> 
> Unfortunately, it's quite a bit more work to make that happen. It's
> not just unique constraints that are required to make this work. We
> also need to pay attention to NOT NULL constraints too, as we're
> unable to prove function dependency when the keys have NULLs, as there
> may be any number of rows containing NULL values.
> 
> The problem is that in order to properly invalidate cached plans we
> must record the constraint OIDs which the plan depends on.  We can do
> that for PK and UNIQUE constraints, unfortunately, we can't do it for
> NOT NULL constraints as, at the moment, these are just properties of
> pg_attribute.  For this to be made to work we'd need to store those in
> pg_constraint too, which is more work that I'm going to do for this
> patch.

That makes sense; thanks for explaining.

Yours,
Laurenz Albe

Re: [HACKERS] path toward faster partition pruning

2018-03-23 Thread David Rowley

On 21 March 2018 at 00:07, Amit Langote  wrote:
> Attached is further revised version.

Hi Amit,

Thanks for sending the v38 patch.

I started reviewing this, but I just ended up starting to hack at the
patch instead. It's still got quite a bit of work to be done as I
think, unfortunately,  the cross type stuff is still pretty broken.
There's not really any sort of checking done to make sure you've found
a valid cross type hash or compare function in the code which results
in errors like:

create table hashp (a int, b numeric) partition by hash(a,b);
create table hashp1 partition of hashp for values with (modulus 4, remainder 0);
create table hashp2 partition of hashp for values with (modulus 4, remainder 1);
create table hashp3 partition of hashp for values with (modulus 4, remainder 2);
create table hashp4 partition of hashp for values with (modulus 4, remainder 3);
explain select * from hashp where a = 3::smallint and b = 1.0;
ERROR:  cache lookup failed for function 0

I'm not really sure if this should be a matter of doing an if
(!OidIsValid(new_supfuncid)) return false; I think the
context->partsupfunc must be pretty broken in cases like:

create table listp (a bigint) partition by list(a);
create table listp1 partition of listp for values in(1);
select * from listp where a <> 1::smallint and a <> 1::bigint;

The current patch simply just remembers the last comparison function
for comparing int8 to int4 and uses that one for the int8 to int2
comparison too.

Probably we need to cache the comparison function's Oid in with the
Expr in the step and use the correct one each time. I'm unsure of how
the fmgr info should be cached, but looks like it certainly cannot be
cached in the context in an array per partition key. I've so far only
thought some sort of hash table, but I'm sure there must be a much
better way to do this.

I started hacking it partition.c and ended up changing quite a few
things. I changed get_partitions_for_keys into 3 separate functions,
one for hash, list and range and tidied a few things up in that area.
There were a few bugs, for example passing the wrong value for the
size of the array into get_partitions_excluded_by_ne_datums.

I also changed how the Bitmapsets are handled in the step functions
and got rid of the Noop step type completely. I also got rid of the
passing of the srcparts into these functions. I think Roberts idea is
to process the steps in isolation and just combine the partitions
matching each step.

It would be great if we could coordinate our efforts here. I'm posting
this patch now just in case you're working or about to work on this.

In the meantime, I'll continue to drip feed cleanup patches. I'll try
to start writing some comments too, once I figure a few things out...

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v38_drowley_delta1.patch
Description: Binary data

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-23 Thread Amit Langote

On 2018/03/23 13:57, Pavan Deolasee wrote:
> On Fri, Mar 23, 2018 at 10:00 AM, Amit Langote wrote:
>> I managed to apply it by ignoring the errors, but couldn't get make check
>> to pass; attached regressions.diffs if you want to take a look.
>
> Thanks. Are you sure you're using a clean repo? I suspect you'd a previous
> version of the patch applied and hence the apply errors now. I also suspect
> that you may have made a mistake while resolving the conflicts while
> applying the patch (since a file at the same path existed). The failures
> also seem related to past version of the patch.
> 
> I just checked with a freshly checked out repo and the patches apply
> correctly on the current master and regression passes too.
> http://commitfest.cputube.org/ also reported success overnight.

You're right, I seem to have messed something up.  Sorry about the noise.

Also, it seems that the delta patch I sent in the last email didn't
contain all the changes I had to make.  It didn't contain, for example,
replacing adjust_and_expand_inherited_tlist() with
adjust_partition_tlist().  I guess you'll know when you rebase anyway.

Sorry that this is me coming a bit late to this thread, but I noticed a
few things in patch that I thought I should comment on.

1. White space errors

$ git diff master --check
src/backend/executor/execPartition.c:737: trailing whitespace.
+/*
src/backend/executor/nodeMerge.c:90: indent with spaces.
+PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
src/backend/executor/nodeMerge.c:116: trailing whitespace.
+
src/backend/executor/nodeMerge.c:565: new blank line at EOF.

2. Sorry if this has been discussed before, but is it OK to use AclMode
like this:

+
+AclMode mt_merge_subcommands;   /* Flags show which cmd types are
+ * present */

3. I think the comment above this should be updated to explain why the
map_index is being set to the leaf index in case of MERGE instead of the
subplan index as is done in case of plain UPDATE:

-map_index = resultRelInfo - mtstate->resultRelInfo;
-Assert(map_index >= 0 && map_index < mtstate->mt_nplans);
-tupconv_map = tupconv_map_for_subplan(mtstate, map_index);
+if (mtstate->operation == CMD_MERGE)
+{
+map_index = resultRelInfo->ri_PartitionLeafIndex;
+Assert(mtstate->rootResultRelInfo == NULL);
+tupconv_map = TupConvMapForLeaf(proute,
+mtstate->resultRelInfo,
+map_index);
+}
+else
+{

4. Do you  think it would be possible at a later late to change this junk
attribute to contain something other than "tableoid"?

+if (operation == CMD_MERGE)
+{
+j->jf_otherJunkAttNo =
ExecFindJunkAttribute(j, "tableoid");
+if
(!AttributeNumberIsValid(j->jf_otherJunkAttNo))
+elog(ERROR, "could not find junk tableoid
column");
+
+}

Currently, it seems ExecMergeMatched will take this OID and look up the
partition (its ResultRelInfo) by calling ExecFindPartitionByOid which in
turn looks it up in the PartitionTupleRouting struct.  I'm imagining it
might be possible to instead return an integer that specifies "a partition
number".  Of course, nothing like that exists currently, but just curious
if we're going to be "stuck" with this junk attribute always containing
"tableoid".  Or maybe putting a "partition number" into the junk attribute
is not doable to begin with.

5. In ExecInitModifyTable, in the if (node->mergeActionList) block:

+
+/* initialize slot for the existing tuple */
+mtstate->mt_existing = ExecInitExtraTupleSlot(estate, NULL);

Maybe a good idea to Assert that mt_existing is NULL before.  Also, why
not set the slot's descriptor right away if tuple routing is not going to
be used.  I did it that way in the ON CONFLICT DO UPDATE patch.

6. I see that there is a slot called mergeSlot that becomes part of
ResultRelInfo of the table (partition) via ri_MergeState.  That means we
might end up creating as many slots as there are partitions (* number of
actions?).  Can't we have just one, say, mt_mergeproj in ModifyTableState
similar to mt_conflproj and just reset its descriptor before use.  I guess
reset will have happen before carrying out an action applied to a given
partition.  When I tried that (see attached delta), nothing got broken.

Thanks,
Amit
diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 86596b5a79..936e8e7e5b 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -681,9 +681,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
List   *matchedActionStates = NIL;

90 matches

Mail list logo