date:20190714

Insecure initialization of required_relids field

2019-07-14 Thread Andrey Lepikhov


Hi,

commit a31ad27fc5d introduced required_relids field. By default, it 
links to the clause_relids.

It works good while we do not modify clause_relids or required_relids.
But in the case of modification such initialization demands us to 
remember, that this field is shared. And we need to do bms_copy() before 
making any changes (see [1] for example).
Also, we make some changes of the RestrictInfo fields (see patch [2]) 
during removing of unneeded self joins.
I propose to do more secure initialization way of required_relids (see 
patch in attachment).


[1] commit 4e97631e6a9, analyzejoins.c, line 434,435:
rinfo->required_relids = bms_copy(rinfo->required_relids);
rinfo->required_relids = bms_del_member(rinfo->required_relids, relid);
[2] https://commitfest.postgresql.org/23/1712/

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
>From 1dc021c0c9827ecd530ee3bb0020bad9bc5d14b6 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Mon, 15 Jul 2019 11:07:20 +0500
Subject: [PATCH] More secure initialization of required_relids field

---
 src/backend/optimizer/util/restrictinfo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c
index 3b50fd29ad..450558e61b 100644
--- a/src/backend/optimizer/util/restrictinfo.c
+++ b/src/backend/optimizer/util/restrictinfo.c
@@ -172,7 +172,7 @@ make_restrictinfo_internal(Expr *clause,
 	if (required_relids != NULL)
 		restrictinfo->required_relids = required_relids;
 	else
-		restrictinfo->required_relids = restrictinfo->clause_relids;
+		restrictinfo->required_relids = bms_copy(restrictinfo->clause_relids);
 
 	/*
 	 * Fill in all the cacheable fields with "not yet set" markers. None of
-- 
2.17.1

Re: Tid scan improvements

2019-07-14 Thread Edmund Horner

On Thu, 11 Jul 2019 at 10:22, David Rowley  wrote:
> So it seems that the plan is to insist that TIDs are tuple identifiers
> for all table AMs, for now.  If that changes in the future, then so be
> it, but I don't think that's cause for delaying any work on TID Range
> Scans.  Also from scanning around tableam.h, I see that there's no
> shortage of usages of BlockNumber, so it seems reasonable to assume
> table AMs must use blocks... It's hard to imagine moving away from
> that given that we have shared buffers.
>
> We do appear to have some table AM methods that are optional, although
> I'm not sure where the documentation is about that. For example, in
> get_relation_info() I see:
>
> info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
> relation->rd_tableam->scan_bitmap_next_block != NULL;
>
> so it appears that at least scan_bitmap_next_block is optional.
>
> I think what I'd do would be to add a table_setscanlimits API method
> for table AM and perhaps have the planner only add TID range scan
> paths if the relation has a non-NULL function pointer for that API
> function.  It would be good to stick a comment at least in tableam.h
> that mentions that the callback is optional.  That might help a bit
> when it comes to writing documentation on each API function and what
> they do.

Hi.  Here's a new patch.

Summary of changes compared to last time:
  - I've added the additional "scan_setlimits" table AM method.  To
check whether it's implemented in the planner, I have added an
additional "has_scan_setlimits" flag to RelOptInfo.  It seems to work.
  - I've also changed nodeTidrangescan to not require anything from heapam now.
  - To simply the patch and avoid changing heapam, I've removed the
backward scan support (which was needed for FETCH LAST/PRIOR) and made
ExecSupportsBackwardScan return false for this plan type.
  - I've removed the vestigial passing of "direction" through
nodeTidrangescan.  If my understanding is correct, the direction
passed to TidRangeNext will always be forward.  But I did leave FETCH
LAST/PRIOR in the regression tests (after adding SCROLL to the
cursor).

The patch now only targets the simple use case of restricting the
range of a table to scan.  I think it would be nice to eventually
support the other major use cases of ORDER BY ctid ASC/DESC and
MIN/MAX(ctid), but that can be another feature...

Edmund


v8-0001-Add-a-new-plan-type-Tid-Range-Scan-to-support-range-.patch
Description: Binary data

Re: A little report on informal commit tag usage

2019-07-14 Thread Thomas Munro

On Mon, Jul 15, 2019 at 5:12 PM Tom Lane  wrote:
> Thomas Munro  writes:
> >   42 Doc

> [...]  I see a lot more than 42 such commit messages in the past
> year, so not sure what you were counting?

I would have tried to exclude the first line messages if I'd thought
of that.  But anyway, the reason for the low Doc number is case
sensitivity. I ran that on a Mac and its lame collation support failed
me in the "sort" step (also -i didn't do what I wanted, but that
wasn't the issue).  Trying again on FreeBSD box and explicitly setting
LANG for the benefit of anyone else wanting to run this (see end), and
then removing a few obvious false matches, I now get similar numbers
in most fields but a higher "doc" number:

 767 Author
   9 Authors
 144 Backpatch-through
  55 Backpatch
  14 Bug
  14 Co-authored-by
  27 Diagnosed-by
1599 Discussion
 119 doc
  36 docs
 284 Reported-by
   5 Review
   8 Reviewed by
 460 Reviewed-by
   7 Security
   9 Tested-by

git log --since 2018-07-14 | \
  grep -E '^ +[a-zA-Z].*: ' | \
  LANG=en_US.UTF-8 sort | \
  sed 's/:.*//' | \
  LANG=en_US.UTF-8 uniq -ic | \
  grep -v -E '^ *[12] '

-- 
Thomas Munro
https://enterprisedb.com

Creating partitions automatically at least on HASH?

2019-07-14 Thread Fabien COELHO




Hello pgdevs,

sorry if this has been already discussed, but G did not yield anything 
convincing about that.


While looking at HASH partitioning and creating a few ones, it occured to 
me that while RANGE and LIST partitions cannot be guessed easily, it would 
be easy to derive HASH partitioned table for a fixed MODULUS, e.g. with


  CREATE TABLE foo(...) PARTITION BY HASH AUTOMATIC (MODULUS 10);
  -- or some other syntax

Postgres could derive statically the 10 subtables, eg named foo_$0$ to 
foo_$1$.


That would not be a replacement for the feature where one may do something 
funny and doubtful like (MODULUS 2 REMAINDER 0, MODULUS 4 REMAINDER 1, 
MODULUS 4 REMAINDER 3).


The same declarative approach could eventually be considered for RANGE 
with a fixed partition duration and starting and ending points.


This would be a relief on the longer path of dynamically creating 
partitions, but with lower costs than a dynamic approach.


The ALTER thing would be a little pain.

Thoughts?

--
Fabien.

Re: XLogRecGetFullXid()

2019-07-14 Thread Thomas Munro

On Fri, Jul 12, 2019 at 1:25 PM Thomas Munro  wrote:
> Here is a small patch extracted from the undo log patch set that I'd
> like to discuss separately and commit soon. [...]

Pushed.

-- 
Thomas Munro
https://enterprisedb.com

Re: A little report on informal commit tag usage

2019-07-14 Thread Tom Lane

Thomas Munro  writes:
> Here are the tags that people have used in the past year, in commit messages:

>  763 Author
>9 Authors
>  144 Backpatch-through
>   55 Backpatch
>   14 Bug
>   14 Co-authored-by
>   27 Diagnosed-By
> 1593 Discussion
>   42 Doc
>  284 Reported-By
>5 Review
>8 Reviewed by
>  456 Reviewed-By
>7 Security
>9 Tested-By

One small comment on that --- I'm not sure what you meant to count
in respect to the "Doc" item, but I believe there's a fairly widespread
convention to write "doc:" or some variant in the initial summary line
of commits that touch only documentation.  The point here is to let
release-note writers quickly ignore such commits, since we never list
them as release note items.  Bruce and I, being the usual suspects for
release-note writing, are pretty religious about this but other people
do it too.  I see a lot more than 42 such commit messages in the past
year, so not sure what you were counting?

Anyway, that's not a "tag" in the sense I understand you to be using
(otherwise the entries would look something like "Doc: yes" and be at
the end, which is unhelpful for the purpose).  But it's a related sort
of commit-message convention.

regards, tom lane

Re: refactoring - share str2*int64 functions

2019-07-14 Thread Fabien COELHO



Hello Thomas,


+extern bool pg_strtoint64(const char *str, bool errorOK, int64 *result);
+extern uint64 pg_strtouint64(const char *str, char **endptr, int base);

One of these things is not like the other.


Indeed.

I agree that it is unfortunate, and it was bothering me a little as well.

Let's see... the int64 version is used only by pgbench and is being 
promoted to common where it can be used by more code.


No and yes.

The pgbench code was a copy of server-side internal "scanint8", so it is 
used both by pgbench and the server-side handling of "int8", it is used 
significantly, taking advantage of its versatile error reporting feature 
on both sides.


With a name like that, wouldn't it make sense to bring it into line with 
the uint64 interface, and then move pgbench's error reporting stuff back 
into pgbench?


That would need moving the server-side error handling as well, which I 
would not really be happy with.


The uint64 one derives its shape from the family of standard functions 
like strtol() so I think it wins.


Yep, it cannot be changed either.

I do not think that changing the error handling capability is appropriate, 
it is really a feature of the function. The function could try to use an 
internal pg_strtoint64 which would look like the other unsigned version, 
but then it would not differentiate the various error conditions (out of 
range vs syntax error).


The compromise I can offer is to change the name of the first one, say to 
"pg_scanint8" to reflect its former backend name. Attached a v4 which does 
a renaming so as to avoid the name similarity but signature difference. I 
also made both error messages identical.


--
Fabien.diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index ba57628f6f..5c3536b117 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -79,6 +79,7 @@
 #include "utils/builtins.h"
 #include "utils/hashutils.h"
 #include "utils/memutils.h"
+#include "common/string.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 8eedb613a1..e8744f054c 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -21,6 +21,7 @@
 #include "catalog/heap.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "common/string.h"
 #include "executor/executor.h"
 #include "executor/spi_priv.h"
 #include "miscadmin.h"
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 6c2626ee62..3735268e3a 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -34,6 +34,7 @@
 
 #include "fmgr.h"
 #include "miscadmin.h"
+#include "common/string.h"
 #include "nodes/extensible.h"
 #include "nodes/parsenodes.h"
 #include "nodes/plannodes.h"
diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index 1baf7ef31f..9a66ad41bd 100644
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -25,10 +25,10 @@
 #include "parser/parse_expr.h"
 #include "parser/parse_relation.h"
 #include "utils/builtins.h"
-#include "utils/int8.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 #include "utils/varbit.h"
+#include "common/string.h"
 
 
 static void pcb_error_callback(void *arg);
@@ -496,7 +496,7 @@ make_const(ParseState *pstate, Value *value, int location)
 
 		case T_Float:
 			/* could be an oversize integer as well as a float ... */
-			if (scanint8(strVal(value), true, &val64))
+			if (pg_scanint8(strVal(value), true, &val64))
 			{
 /*
  * It might actually fit in int32. Probably only INT_MIN can
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index d317fd7006..673f25a014 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -12,6 +12,7 @@
  */
 #include "postgres.h"
 
+#include "common/string.h"
 #include "catalog/pg_publication.h"
 
 #include "replication/logical.h"
@@ -20,7 +21,6 @@
 #include "replication/pgoutput.h"
 
 #include "utils/inval.h"
-#include "utils/int8.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
 #include "utils/varlena.h"
@@ -111,7 +111,7 @@ parse_output_parameters(List *options, uint32 *protocol_version,
 		 errmsg("conflicting or redundant options")));
 			protocol_version_given = true;
 
-			if (!scanint8(strVal(defel->arg), true, &parsed))
+			if (!pg_scanint8(strVal(defel->arg), true, &parsed))
 ereport(ERROR,
 		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		 errmsg("invalid proto_version")));
diff --git a/src/backend/utils/adt/cash.c b/src/backend/utils/adt/cash.c
index c92e9d5046..618660f372 100644
--- a/src/backend/utils/adt/cash.c
+++ b/src/backend/utils/adt/cash.c
@@ -26,7 +26,6 @@
 #include "libpq/pqformat.h"
 #include "utils/builtins.h"
 #include "utils/cash.h"
-#include "utils/int8.h"
 #include "utils/numeric.

A little report on informal commit tag usage

2019-07-14 Thread Thomas Munro

On Fri, Jul 12, 2019 at 1:25 PM Michael Paquier  wrote:
> On Thu, Jul 11, 2019 at 09:44:07AM -0400, Tom Lane wrote:
> > I thought we *did* have an agreement, to wit using
> >
> > Discussion: https://postgr.es/m/
> >
> > to link to relevant mail thread(s).  Some people use more tags
> > but that seems inessential to me.
>
> Hehe.  I actually was thinking about advocating for having more of
> them in the commit logs.  I'll just start a new thread about what I
> had in mind.  Perhaps that will lead us nowhere, but let's see.

[Moving to -hackers]

Here are the tags that people have used in the past year, in commit messages:

 763 Author
   9 Authors
 144 Backpatch-through
  55 Backpatch
  14 Bug
  14 Co-authored-by
  27 Diagnosed-By
1593 Discussion
  42 Doc
 284 Reported-By
   5 Review
   8 Reviewed by
 456 Reviewed-By
   7 Security
   9 Tested-By

Other things I've noticed:

* a few people list authors and reviewers in prose in a fairly
mechanical paragraph
* some people put back-patch and bug number information in prose
* a few people list authors and reviewers with full email addresses
* some people repeat tags for multiple values, others make comma separated lists
* some people break long lines of meta-data with newlines
* authors "X and Y" may be an alternative to "X, Y", or imply greater
collaboration

The counts above were produced by case-insensitively sorting and
counting unique stuff that precedes a colon, and then throwing out
those used fewer than three times (these are false matches and typos),
and then throwing out a couple of obvious false matches by hand.
Starting from here:

git log --since 2018-07-14 | \
  grep -E '^ *[A-Z].*: ' | \
  sort -i | \
  sed 's/:.*//' | \
  uniq -ic | \
  grep -v -E '^ *[12] '

-- 
Thomas Munro
https://enterprisedb.com

Re: proposal - patch: psql - sort_by_size

2019-07-14 Thread Pavel Stehule

Hi

pá 12. 7. 2019 v 15:10 odesílatel Fabien COELHO 
napsal:

>
> Hello Pavel,
>
> > rebased patch attached
>
> I prefer patches with a number rather than a date, if possible. For one
> thing, there may be several updates in one day.
>
> About this version (20180708, probably v3): applies cleanly, compiles,
> make check ok, doc build ok. No tests.
>

attached version 4


> It works for me on a few manual tests against a 11.4 server.
>
> Documentation: if you say "\d*+", then it already applies to \db+ and
> \dP+, so why listing them? Otherwise, state all commands or make it work
> on all commands that have a size?
>
> About the text:
>- remove , before "sorts"
>- ... outputs by decreasing size, when size is displayed.
>- add: When size is not displayed, the output is sorted by names.
>

fixed


> I still think that the object name should be kept as a secondary sort
> criterion, in case of size equality, so that the output is deterministic.
> Having plenty of objects of the same size out of alphabetical order looks
> very strange.
>

fixed

Regards

Pavel

>
> I still do not like much the boolean approach. I understand that the name
> approach has been rejected, and I can understand why.
>
> I've been thinking about another more generic interface, that I'm putting
> here for discussion, I do not claim that it is a good idea. Probably could
> fall under "over engineering", but it might not be much harder to
> implement, and it solves a few potential problems.
>
> The idea is to add an option to \d commands, such as "\echo -n":
>
>\dt+ [-o 1d,2a] ...
>
> meaning do the \dt+, order by column 1 descending, column 2 ascending.
> With this there would be no need for a special variable nor other
> extensions to specify some ordering, whatever the user wishes.
>
> Maybe it could be "\dt+ [-o '1 DESC, 2 ASC'] ..." so that the string
> is roughly used as an ORDER BY specification by the query, but it would be
> longer to specify.
>
> It also solves the issue that if someone wants another sorting order we
> would end with competing boolean variables such as SORT_BY_SIZE,
> SORT_BY_TYPE, SORT_BY_SCHEMA, which would be pretty unpractical. The
> boolean approach works for *one* sorting extension and breaks at the next
> extension.
>
> Also, the boolean does not say that it is a descending order. I could be
> interested in looking at the small tables.
>
> Another benefit for me is that I do not like much variables with side
> effects, whereas with an explicit syntax there would be no such thing, the
> user has what was asked for. Ok, psql is full of them, but I cannot say I
> like it for that.
>
> The approach could be extended to specify a limit, eg \dt -l 10 would
> add a LIMIT 10 on the query.
>
> Also, the implementation could be high enough so that the description
> handlers would not have to deal with it individually, it could return
> the query which would then be completed with SORT/LIMIT clauses before
> being executed, possibly with a default order if none is specified.
>
> --
> Fabien.
>
>
>
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 7789fc6177..40aeb8cef0 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -3973,6 +3973,17 @@ bar
 
   
 
+  
+SORT_BY_SIZE
+
+
+Sorts \d[bimtv]+ and \l+
+outputs by decreasing size (when size is displayed). When size
+is not displayed, then output is sorted by name.
+
+
+  
+
   
SQLSTATE

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 8b4cd53631..9a7f02607c 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -223,6 +223,7 @@ describeTablespaces(const char *pattern, bool verbose)
 	PQExpBufferData buf;
 	PGresult   *res;
 	printQueryOpt myopt = pset.popt;
+	const char *sizefunc = NULL;
 
 	if (pset.sversion < 8)
 	{
@@ -265,9 +266,12 @@ describeTablespaces(const char *pattern, bool verbose)
 		  gettext_noop("Options"));
 
 	if (verbose && pset.sversion >= 90200)
+	{
 		appendPQExpBuffer(&buf,
 		  ",\n  pg_catalog.pg_size_pretty(pg_catalog.pg_tablespace_size(oid)) AS \"%s\"",
 		  gettext_noop("Size"));
+		sizefunc = "pg_catalog.pg_tablespace_size(oid)";
+	}
 
 	if (verbose && pset.sversion >= 80200)
 		appendPQExpBuffer(&buf,
@@ -281,7 +285,10 @@ describeTablespaces(const char *pattern, bool verbose)
 		  NULL, "spcname", NULL,
 		  NULL);
 
-	appendPQExpBufferStr(&buf, "ORDER BY 1;");
+	if (pset.sort_by_size && sizefunc)
+		appendPQExpBuffer(&buf, "ORDER BY %s DESC, 1;", sizefunc);
+	else
+		appendPQExpBufferStr(&buf, "ORDER BY 1;");
 
 	res = PSQLexec(buf.data);
 	termPQExpBuffer(&buf);
@@ -863,6 +870,7 @@ listAllDbs(const char *pattern, bool verbose)
 	PGresult   *res;
 	PQExpBufferData buf;
 	printQueryOpt myopt = pset.popt;
+	const char *sizefunc = NULL;
 
 	initPQExpBuffer(&buf);
 
@@ -882,12 +890,15 @@ listAllDbs(const

Re: Control your disk usage in PG: Introduction to Disk Quota Extension

2019-07-14 Thread Hubert Zhang

Thanks, Thomas.

On Mon, Jul 8, 2019 at 6:47 AM Thomas Munro  wrote:

> On Mon, Feb 18, 2019 at 7:39 PM Hubert Zhang  wrote:
> > Based on the assumption we use smgr as hook position, hook API option1
> or option2 which is better?
> > Or we could find some balanced API between option1 and option2?
> >
> > Again comments on other better hook positions are also appreciated!
>
> Hi Hubert,
>
> The July Commitfest is now running, and this entry is in "needs
> review" state.  Could you please post a rebased patch?
>
> I have questions about how disk quotas should work and I think we'll
> probably eventually want more hooks than these, but simply adding
> these hooks so extensions can do whatever they want doesn't seem very
> risky for core.  I think it's highly likely that the hook signatures
> will have to change in future releases too, but that seems OK for such
> detailed internal hooks.  As for your question, my first reaction was
> that I preferred your option 1, because SMgrRelation seems quite
> private and there are no existing examples of that object being
> exposed to extensions.  But on reflection, other callbacks don't take
> such a mollycoddling approach.  So my vote is for option 2 "just pass
> all the arguments to the callback", which I understand to be the
> approach of patch you have posted.
>
> +if (smgrcreate_hook)
> +{
> +(*smgrcreate_hook)(reln, forknum, isRedo);
> +}
>
> Usually we don't use curlies for single line if branches.
>
>
I have rebased the patch to v4 and removed the unnecessary braces.
As your comments, Options 2 is still used in patch v4.

Agree that diskquota extension may use more hooks in future.
Currently the behavior of diskquota extension is that we use smgr hooks to
detect active tables and record them in the shared memory. Bgworkers of
diskquota extension will read these active tables from shared memory and
calculate the latest table size and sum them into the size of schema or
role. If size of schema of role exceeds their quota limit, they will be put
into a black list in shared memory. When a new query comes,
ExecutorCheckPerms_hook will be used to check the black list the cancel the
query if needed.

-- 
Thanks

Hubert Zhang


disk_quota_hooks_v4.patch
Description: Binary data

doc: mention pg_reload_conf() in pg_hba.conf documentation

2019-07-14 Thread Ian Barwick


Hi

I noticed the documentation for pg_hba.conf:

  https://www.postgresql.org/docs/current/auth-pg-hba-conf.html

says:

you will need to signal the postmaster (using pg_ctl reload or kill -HUP) to
make it re-read the file.

It would be useful to mention pg_reload_conf() as another option here, as done
elsewhere in the docs.

Patch with suggested change attached.


Regards

Ian Barwick


--
 Ian Barwick   https://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
new file mode 100644
index 3ed74d8..c90ca0b
*** a/doc/src/sgml/client-auth.sgml
--- b/doc/src/sgml/client-auth.sgml
*** hostnogssenc databaseSIGHUPSIGHUP
 signal. If you edit the file on an
 active system, you will need to signal the postmaster
!(using pg_ctl reload or kill -HUP) to make it
!re-read the file.

  

--- 650,658 
 SIGHUPSIGHUP
 signal. If you edit the file on an
 active system, you will need to signal the postmaster
!(using pg_ctl reload, kill -HUP
!or by calling the SQL function pg_reload_conf())
!to make it re-read the file.

Re: Commitfest 2019-07, the first of five* for PostgreSQL 13

2019-07-14 Thread Thomas Munro

Hello hackers,

Here's a quick update at the end of the second week of CF1.

 status | w1  | w2
+-+-
 Committed  |  32 |  41
 Moved to next CF   |   5 |   6
 Needs review   | 146 | 128
 Ready for Committer|   7 |   9
 Rejected   |   2 |   2
 Returned with feedback |   2 |   2
 Waiting on Author  |  29 |  35
 Withdrawn  |   8 |   8

It looks like we continued our commit rate of around 10/week, punted
one to the next CF, and returned/rejected nothing.

Last week I highlighted 20 'Needs review' patches whose threads hadn't
seen traffic for the longest time as places that could use some
attention if our goal is to move all of these patches closer to their
destiny.  A few of them made some progress and one was committed.
Here are another 20 like that -- these are threads have been silent
for 24 to 90 days.  That means they mostly apply and pass basic
testing (or I'd probably have reported the failure on the thread and
they wouldn't be on this list).  Which means you can test them!

 2080 | Minimizing pg_stat_statements performanc | {"Raymond Martin"}
 2103 | Fix failure of identity columns if there | {"Laurenz Albe"}
 1472 | SQL/JSON: functions  | {"Fedor
Sigaev","Alexander Korotkov","Nikita Glukhov","Oleg Bartunov"}
 2124 | Introduce spgist quadtree @<(point,circl | {"Matwey V. Kornilov"}
 1306 | pgbench - another attempt at tap test fo | {"Fabien Coelho"}
 2126 | Rearrange postmaster startup order to im | {"Tom Lane"}
 2128 | Fix issues with "x SIMILAR TO y ESCAPE N | {"Tom Lane"}
 2102 | Improve Append/MergeAppend EXPLAIN outpu | {"David Rowley"}
 1774 | Block level parallel vacuum  | {"Masahiko Sawada"}
 2086 | pgbench - extend initialization phase co | {"Fabien Coelho"}
 1348 | BRIN bloom and multi-minmax indexes  | {"Tomas Vondra"}
 2183 | Opclass parameters   | {"Nikita Glukhov"}
 1854 | libpq trace log  | {"Aya Iwata"}
 2147 | Parallel grouping sets   | {"Richard Guo"}
 2148 | vacuumlo: report the number of large obj | {"Timur Birsh"}
 1984 | Fix performance issue in foreign-key-awa | {"David Rowley"}
 1911 | anycompatible and anycompatiblearray pol | {"Pavel Stehule"}
 2048 | WIP: Temporal primary and foreign keys   | {"Paul Jungwirth"}
 2160 | Multi insert in CTAS/MatView | {"Paul Guo","Taylor Vesely"}
 2154 | Race conditions with TAP test for syncre | {"Michael Paquier"}

-- 
Thomas Munro
https://enterprisedb.com

Re: [PATCH] Implement uuid_version()

2019-07-14 Thread Ian Barwick


On 7/14/19 9:40 PM, Peter Eisentraut wrote:

On 2019-07-13 17:13, Fabien COELHO wrote:

What about avoiding a redirection with something like:

Datum (* const pg_random_uuid)(PG_FUNCTION_ARGS) = gen_random_uuid;


That seems very confusing.


Dunno. Possibly. The user does not have to look at the implementation, and
probably such code would deserve a comment.

The point is to avoid one call so as to perform the same (otherwise the
pg_random_uuid would be slightly slower), and to ensure that it behaves
the same, as it would be the very same function by construction.

I've switched the patch to ready anyway.


committed


Small doc tweak suggestion - the pgcrypto docs [1] now say about 
gen_random_uuid():

Returns a version 4 (random) UUID. (Obsolete, this function is now also
included in core PostgreSQL.)

which gives the impression the code contains two versions of this function, the 
core
one and an obsolete one in pgcrypto. Per the commit message the situation is 
actually:

The pgcrypto implementation now internally redirects to the built-in one.

Suggested wording improvement in the attached patch.

[1] https://www.postgresql.org/docs/devel/pgcrypto.html#id-1.11.7.34.9


Regards

Ian Barwick

--
 Ian Barwick   https://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/doc/src/sgml/pgcrypto.sgml b/doc/src/sgml/pgcrypto.sgml
new file mode 100644
index 0acd11e..f636703
*** a/doc/src/sgml/pgcrypto.sgml
--- b/doc/src/sgml/pgcrypto.sgml
*** gen_random_bytes(count integer) returns
*** 1132,1139 
  gen_random_uuid() returns uuid
  

!Returns a version 4 (random) UUID. (Obsolete, this function is now also
!included in core PostgreSQL.)

   
  
--- 1132,1141 
  gen_random_uuid() returns uuid
  

!Returns a version 4 (random) UUID. (This function now redirects internally
!to the core PostgreSQL function providing
!the same functionality and is included in the extension for backwards compatibility;
!see  for details.)

Re: Re: SQL/JSON: functions

2019-07-14 Thread Thomas Munro

On Tue, May 14, 2019 at 12:54 PM Andrew Alsup  wrote:
> On 3/5/19 5:35 PM, Nikita Glukhov wrote:
> > Attached 36th version of the patches rebased onto jsonpath v36.
> While testing this patch a found a few issues:
>
> [1] I was not able to apply the patch to the current HEAD. However, it
> applies cleanly to commit: e988878f85 (NOTE: I did not investigate which
> commit between e988878f85 and HEAD caused problems).

Thanks for that note, which should help other reviewers/testers
looking a this patch set in CF1.  I hope we can eventually get a
rebased patch set, though.

-- 
Thomas Munro
https://enterprisedb.com

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-07-14 Thread Tomas Vondra

On Sun, Jul 14, 2019 at 02:38:40PM -0400, James Coleman wrote:

On Tue, Jul 9, 2019 at 10:54 AM Tomas Vondra
 wrote:

On Tue, Jul 09, 2019 at 09:28:42AM -0400, James Coleman wrote:
>On Mon, Jul 8, 2019 at 9:37 PM Tomas Vondra
> wrote:
>>
>> On Mon, Jul 08, 2019 at 12:07:06PM -0400, James Coleman wrote:
>> > ...
>> >
>> >I guess I'm still not following. If (2) is responsible (currently) for
>> >adding an explicit sort, why wouldn't adding an incremental sort be an
>> >example of that responsibility? The subpath that either a Sort or
>> >IncrementalSort is being added on top of (to then feed into the
>> >GatherMerge) is the same in both cases right?
>> >
>> >Unless you're saying that the difference is that since we have a
>> >partial ordering already for incremental sort then incremental sort
>> >falls into the category of "maintaining" existing ordering of the
>> >subpath?
>> >
>>
>> Oh, I think I understand what you're saying. Essentially, we should not
>> be making generate_gather_paths responsible for adding the incremental
>> sort. Instead, we should be looking at places than are adding explicit
>> sort (using create_sort_path) and also consider adding incremental sort.
>>
>> I definitely agree with the second half - we should look at all places
>> that create explicit sorts and make them also consider incremental
>> sorts. That makes sense.
>
>Yep, exactly.
>

If I remember correctly, one of the previous patch versions (in the early
2018 commitfests) actually modified many of those places, but it did that
in a somewhat "naive" way. It simply used incremental sort whenever the
path was partially sorted, or something like that. So it did not use
costing properly. There was an attempt to fix that in the last commitfest
but the costing model was deemed to be a bit too rough and unreliable
(especially the ndistinct estimates can be quite weak), so the agreement
was to try salvaging the patch for PG11 by only considering incremental
sort in "safest" places with greatest gains.

We've significantly improved the costing model since then, and the
implementation likely handles the corner cases much better. But that does
not mean we have to introduce the incremental sort to all those places at
once - it might be wise to split that into separate patches.

Yes, although we haven't added the MCV checking yet; that's on my
mental checklist, but another new area of the codebase for me to
understand, so I've been prioritizing other parts of the patch.

Sure, no problem.

It's not just about picking the right plan - we've kinda what impact these
extra paths might have on planner performance, so maybe we should look at
that too. And the impact might be different for each of those places.

I'll leave that up to you, but I certainly won't insist on doing it all in
one huge patch.

I'm not opposed to handling some of them separately, but I would like
to at least hit the places where it's most likely (for example, with
LIMIT) to improve things. I supposed I'll have to look at all of the
usages of create_sort_path() and try to rank them in terms of
perceived likely value.

Yep, makes sense.

>> But I'm not sure it'll address all cases - the problem is that those
>> places add the explicit sort because they need sorted input. Gather
>> Merge does not do that, it only preserves existing ordering of paths.
>>
>> So it's possible the path does not have an explicit sort on to, and
>> gather merge will not know to add it. And once we have the gather merge
>> in place, we can't push place "under" it.
>
>That's the explanation I was missing; and it makes sense (to restate:
>sometimes sorting is useful even when not required for correctness of
>the user returned data).
>

Yes, although even when the sorting is required for correctness (because
the user specified ORDER BY) you can do it at different points in the
plan. We'd still produce correct results, but the sort might be done at
the very end without these changes.

For example we might end up with plans

   Incremental Sort (presorted: a, path keys: a,b)
-> Gather Merge (path keys: a)
-> Index Scan (path keys: a)

but with those changes we might push the incremental sort down into the
parallel part:

   Gather Merge (path keys: a,b)
-> Incremental Sort (presorted: a, path keys: a,b)
-> Index Scan (path keys: a)

which is likely better. Perhaps that's what you meant, though.

I was thinking of ordering being useful for grouping/aggregation or
merge joins; I didn't realize the above plan wasn't possible yet, so
that explanation is helpful.

>> And I think I know why is that - while gather_grouping_paths() tries to
>> add explicit sort below the gather merge, there are other places that
>> call generate_gather_paths() that don't do that. In this case it's
>> probably apply_scanjoin_target_to_paths() which simply builds
>>
>>parallel (seq|index) scan + gather merge
>>
>> and that's it. The problem is likely the same - the code does not know
>> w

Re: Change ereport level for QueuePartitionConstraintValidation

2019-07-14 Thread David Rowley

On Mon, 15 Jul 2019 at 11:46, Thomas Munro  wrote:
>
> On Tue, Jul 2, 2019 at 12:17 AM Sergei Kornilov  wrote:
> > This change is discussed as open item for pg12. Seems we have nor 
> > objections nor agreement. I attached updated version due merge conflict.
> >
> > > Per discussion started here: 
> > > https://www.postgresql.org/message-id/CA%2BTgmoZWSLUjVcc9KBSVvbn%3DU5QRgW1O-MgUX0y5CnLZOA2qyQ%40mail.gmail.com
>
> I took the liberty of setting this to "Ready for Committer" to see if
> we can get a decision one way or another and clear both a Commitfest
> item and a PG12 Open Item.  No committer is signed up, but it looks
> like Amit L wrote the messages in question, Robert committed them, and
> David made arguments for AND against on the referenced thread, so I'm
> CCing them, and retreating to a safe distance.

I think the only argument against it was around lack of ability to
test if the constraint was used to verify no row breaks the partition
bound during the ATTACH PARTITION.

Does anyone feel strongly that we need to the test to confirm that the
constraint was used for this?

If nobody feels so strongly about that then I say we can just push
this.   It seems something that's unlikely to get broken, but then you
could probably say that for most things our tests test for.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-14 Thread Tomas Vondra


On Sun, Jul 14, 2019 at 12:13:45PM -0400, Joe Conway wrote:

On 7/13/19 5:58 PM, Tomas Vondra wrote:

On Sat, Jul 13, 2019 at 02:41:34PM -0400, Joe Conway wrote:

[2] also says provides additional support for AES 256. It also mentions
CBC versus XTS -- I came across this elsewhere and it bears discussion:

"Currently, the following pairs of encryption modes are supported:

   AES-256-XTS for contents and AES-256-CTS-CBC for filenames
   AES-128-CBC for contents and AES-128-CTS-CBC for filenames
   Adiantum for both contents and filenames

If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair.

AES-128-CBC was added only for low-powered embedded devices with crypto
accelerators such as CAAM or CESA that do not support XTS."
---
[2] also states this, which again makes me think in terms of table being
the moral equivalent to a file:

"Unlike dm-crypt, fscrypt operates at the filesystem level rather than
at the block device level. This allows it to encrypt different files
with different keys and to have unencrypted files on the same
filesystem. This is useful for multi-user systems where each user’s
data-at-rest needs to be cryptographically isolated from the others.
However, except for filenames, fscrypt does not encrypt filesystem
metadata."





[5] has this to say which seems independent of mode:

"When encrypting data with a symmetric block cipher, which uses blocks
of n bits, some security concerns begin to appear when the amount of
data encrypted with a single key comes close to 2n/2 blocks, i.e. n*2n/2
bits. With AES, n = 128 (AES-128, AES-192 and AES-256 all use 128-bit
blocks). This means a limit of more than 250 millions of terabytes,
which is sufficiently large not to be a problem. That's precisely why
AES was defined with 128-bit blocks, instead of the more common (at that
time) 64-bit blocks: so that data size is practically unlimited."



FWIW I was a bit confused at first, because the copy paste mangled the
formulas a bit - it should have been 2^(n/2) and n*2^(n/2).


Yeah, sorry about that.


But goes on to say:
"I wouldn't use n*2^(n/2) bits in any sort of recommendation. Once you
reach that number of bits the probability of a collision will grow
quickly and you will be way over 50% probability of a collision by the
time you reach 2*n*2^(n/2) bits. In order to keep the probability of a
collision negligible I recommend encrypting no more than n*2^(n/4) bits
with the same key. In the case of AES that works out to 64GB"

It is hard to say if that recommendation is per key or per key+IV.


Hmm, yeah. The question is what collisions they have in mind? Presumably
it's AES(block1,key) = AES(block2,key) in which case it'd be with fixed
IV, so per key+IV.


Seems likely.


But I did find that files in an encrypted file system are encrypted with
derived keys from a master key, and I view this as analogous to what we
are doing.



My understanding always was that we'd do something like that, i.e. we'd
have a master key (or perhaps multiple of them, for various users), but
the data would be encrypted with secondary (generated) keys, and those
secondary keys would be encrypted by the master key. At least that's
what was proposed at the beginning of this thread by Insung Moon.


In my email I linked the wrong page for [2]. The correct one is here:
[2] https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html

Following that, I think we could end up with three tiers:

1. A master key encryption key (KEK): this is the ley supplied by the
  database admin using something akin to ssl_passphrase_command

2. A master data encryption key (MDEK): this is a generated key using a
  cryptographically secure pseudo-random number generator. It is
  encrypted using the KEK, probably with Key Wrap (KW):
  or maybe better Key Wrap with Padding (KWP):

3a. Per table data encryption keys (TDEK): use MDEK and HKDF to generate
   table specific keys.

3b. WAL data encryption keys (WDEK):  Similarly use MDEK and a HKDF to
   generate new keys when needed for WAL (based on the other info we
   need to change WAL keys every 68 GB unless I read that wrong).

I believe that would allows us to have multiple keys but they are
derived securely from the one DEK using available info similar to the
way we intend to use LSN to derive the IVs -- perhaps table.oid for
tables and something else for WAL.

We also need to figure out how/when to generate new WDEK. Maybe every
checkpoint, also meaning we would have to force a checkpoint every 68GB?



I think that very much depends on what exactly the 68GB refers to - key
or key+IV? If key+IV, then I suppose we can use LSN as IV and we would
not need to change checkpoints. But it's not clear to me why we would
need to force checkpoints at all? Surely we can just write a WAL message
about switching to the new key, or something like that?


[HKDF]: https://tools.ietf.org/html/rfc5869
[KW]: https://tools.ietf.org/html/rfc3394
[KWP]: https://tools.ietf.org/html/rfc5649



But AFA

Re: Using unique btree indexes for pathkeys with one extra column

2019-07-14 Thread David Rowley

On Mon, 15 Jul 2019 at 12:59, Thomas Munro  wrote:
> In the category "doing more tricks with our existing btrees", which
> includes all that difficult stuff like skip scans and incremental
> sort, here's an easier planner-only one:  if you have a unique index
> on (a) possibly "including" (b) and you have a pathkey (a, b), you can
> use an index [only] scan.  That is, if the index is unique, and you
> want exactly one extra column in index order, then you don't need any
> extra sorting to get (a, b) in order.  (If the index is not unique, or
> there is more than one extra trailing column in the pathkey, you need
> the incremental sort patch[1] to use this index).  This was brought to
> my attention by a guru from a different RDBMS complaining about stupid
> stuff that PostgreSQL does and I was triggered to write this message
> as a kind of TODO note...
>
> [1] https://commitfest.postgresql.org/24/1124/

This is one of the problems I've wanted to solve in the various times
I've mentioned the word "UniqueKeys" on this mailing list.

Probably my most detailed explanation is in
https://www.postgresql.org/message-id/CAKJS1f86FgODuUnHiQ25RKeuES4qTqeNxm1QbqJWrBoZxVGLiQ%40mail.gmail.com

Without detecting the UniqueKeys through joins then the optimisation
you mention is limited to just single rel queries, since a join may
duplicate the "a" column and make it so the sort on "b" is no longer
redundant. In my view, limiting this to just single relation queries
is just too restrictive to bother writing any code for, so I think do
to as you mention we need the full-blown thing I mention in the link
above. i.e tagging a list of UniqueKeys onto RelOptInfo and checking
which ones are still applicable after joins and tagging those onto
join RelOptInfos too.  PathKey redundancy could then take into account
that list of UniqueKeys the RelOptInfo level.  At the top-level plan,
you can do smarts for ORDER BY / GROUP BY / DISTINCT.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: refactoring - share str2*int64 functions

2019-07-14 Thread Thomas Munro

On Sun, Jul 14, 2019 at 3:22 AM Fabien COELHO  wrote:
> Here is the updated patch on which I checked "make check-world".

Thanks!  So, we're moving pg_strtouint64() to a place where frontend
code can use it, and getting rid of some duplication.  I like it.  I
wanted this once before myself[1].

+extern bool pg_strtoint64(const char *str, bool errorOK, int64 *result);
+extern uint64 pg_strtouint64(const char *str, char **endptr, int base);

One of these things is not like the other.  Let's see... the int64
version is used only by pgbench and is being promoted to common where
it can be used by more code.  With a name like that, wouldn't it make
sense to bring it into line with the uint64 interface, and then move
pgbench's error reporting stuff back into pgbench?  The uint64 one
derives its shape from the family of standard functions like strtol()
so I think it wins.

[1] 
https://www.postgresql.org/message-id/CAEepm=2kec8xdbewgdtdobxgqphfw4kcd7bzxr6nmfihjjn...@mail.gmail.com

-- 
Thomas Munro
https://enterprisedb.com

Re: [sqlsmith] Crash in mcv_get_match_bitmap

2019-07-14 Thread Tomas Vondra


OK, attached is a sequence of WIP fixes for the issues discussed here.


1) using column-specific collations (instead of type/default ones)

The collations patch is pretty simple, but I'm not sure it actually does
the right thing, particularly during estimation where it uses collation
from the Var node (varcollid). But looking at 5e0928005, this should use
the same collation as when building the extended statistics (which we
get from the per-column stats, as stored in pg_statistic.stacoll#).

But we don't actually store collations for extended statistics, so we
can either modify pg_statistic_ext_data and store it there, or lookup
the per-column statistic info during estimation, and use that. I kinda
think the first option is the right one, but that'd mean yet another
catversion bump.

OTOH 5e0928005 actually did modify the extended statistics (mvdistinct
and dependencies) to use type->typcollation during building, so maybe we
want to use the default type collation for some reason?


2) proper extraction of Var/Const from opclauses

This is the primary issue discussed in this thread - I've renamed the
function to examine_opclause_expression() so that it kinda resembles
examine_variable() and I've moved it to the "internal" header file. We
still need it from two places so it can't be static, but hopefully this
naming is acceptable.


3) handling of NULL values (Const and MCV items)

Aside from the issue that Const may represent NULL, I've realized the
code might do the wrong thing for NULL in the MCV item itself. It did
treat it as mismatch and update the bitmap, but it might have invoke the
operator procedure anyway (depending on whether it's AND/OR clause,
what's the current value in the bitmap, etc.). This patch should fix
both issues by treating them as mismatch, and skipping the proc call.


4) refactoring of the bitmap updates

This is not a bug per se, but while working on (3) I've realized the
code updating the bitmap is quite repetitive and it does stuff like

  if (is_or)
 bitmap[i] = Max(bitmap[i], match)
  else
 bitmap[i] = Min(bitmap[i], match)

over and over on various places. This moves this into a separate static
function, which I think makes it way more readable. Also, it replaces
the Min/Max with a plain boolean operators (the patch originally used
three states, not true/false, hence the Min/Max).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
>From 1586963befe8fe7de473a28515bd7676fa2d0acd Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sun, 14 Jul 2019 14:19:01 +0200
Subject: [PATCH 1/5] Use proper collations in extended statistics

The extended statistics code was a bit confused about which collation
to use when building the statistics and when computing the estimates.
For building it used a default collation for each data type, while for
estimation it used DEFAULT_COLLATION_OID. That's clearly inconsistent.

Commit 5e0928005 changed how this works for per-column statistics, in
which case we now use collation specified for each column - both for
building the statistics and selectivity estimation. This commit adopts
the same approach for extended statistics.

Note: One issue is that for per-column statistics we store collation
in pg_statistic catalog, but we don't store this in pg_statistic_ext.
So we'd have to either add another column into the catalog (which is
probably the right thing to do) or rely on info from pg_statistic.
But we probably need to add this into pg_statistic_ext, to allow stats
on expressions, or extended statistics with different collations.
---
 src/backend/statistics/mcv.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 913a72ff67..2e375edcb4 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -348,7 +348,7 @@ build_mss(VacAttrStats **stats, int numattrs)
elog(ERROR, "cache lookup failed for ordering operator 
for type %u",
 colstat->attrtypid);
 
-   multi_sort_add_dimension(mss, i, type->lt_opr, 
type->typcollation);
+   multi_sort_add_dimension(mss, i, type->lt_opr, 
colstat->attrcollid);
}
 
return mss;
@@ -668,7 +668,7 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats 
**stats)
 
/* sort and deduplicate the data */
ssup[dim].ssup_cxt = CurrentMemoryContext;
-   ssup[dim].ssup_collation = DEFAULT_COLLATION_OID;
+   ssup[dim].ssup_collation = stats[dim]->attrcollid;
ssup[dim].ssup_nulls_first = false;
 
PrepareSortSupportFromOrderingOp(typentry->lt_opr, &ssup[dim]);
@@ -1577,8 +1577,6 @@ mcv_get_match_bitmap(PlannerInfo *root, List *clauses,
 
if (ok)
{
-   TypeCacheE

Using unique btree indexes for pathkeys with one extra column

2019-07-14 Thread Thomas Munro

Hello,

In the category "doing more tricks with our existing btrees", which
includes all that difficult stuff like skip scans and incremental
sort, here's an easier planner-only one:  if you have a unique index
on (a) possibly "including" (b) and you have a pathkey (a, b), you can
use an index [only] scan.  That is, if the index is unique, and you
want exactly one extra column in index order, then you don't need any
extra sorting to get (a, b) in order.  (If the index is not unique, or
there is more than one extra trailing column in the pathkey, you need
the incremental sort patch[1] to use this index).  This was brought to
my attention by a guru from a different RDBMS complaining about stupid
stuff that PostgreSQL does and I was triggered to write this message
as a kind of TODO note...

[1] https://commitfest.postgresql.org/24/1124/

-- 
Thomas Munro
https://enterprisedb.com

Re: Change ereport level for QueuePartitionConstraintValidation

2019-07-14 Thread Thomas Munro

On Tue, Jul 2, 2019 at 12:17 AM Sergei Kornilov  wrote:
> This change is discussed as open item for pg12. Seems we have nor objections 
> nor agreement. I attached updated version due merge conflict.
>
> > Per discussion started here: 
> > https://www.postgresql.org/message-id/CA%2BTgmoZWSLUjVcc9KBSVvbn%3DU5QRgW1O-MgUX0y5CnLZOA2qyQ%40mail.gmail.com

I took the liberty of setting this to "Ready for Committer" to see if
we can get a decision one way or another and clear both a Commitfest
item and a PG12 Open Item.  No committer is signed up, but it looks
like Amit L wrote the messages in question, Robert committed them, and
David made arguments for AND against on the referenced thread, so I'm
CCing them, and retreating to a safe distance.

-- 
Thomas Munro
https://enterprisedb.com

Re: Built-in connection pooler

2019-07-14 Thread Thomas Munro

On Mon, Jul 15, 2019 at 7:56 AM Konstantin Knizhnik
 wrote:
> Can you please explain me more precisely how to reproduce the problem
> (how to configure postgres and how to connect to it)?

Maybe it's just that postmaster.c's ConnCreate() always allocates a
struct for port->gss if the feature is enabled, but the equivalent
code in proxy.c's proxy_loop() doesn't?

./configure
  --prefix=$HOME/install/postgres \
  --enable-cassert \
  --enable-debug \
  --enable-depend \
  --with-gssapi \
  --with-icu \
  --with-pam \
  --with-ldap \
  --with-openssl \
  --enable-tap-tests \
  --with-includes="/opt/local/include" \
  --with-libraries="/opt/local/lib" \
  CC="ccache cc" CFLAGS="-O0"

~/install/postgres/bin/psql postgres -h localhost -p 6543

2019-07-15 08:37:25.348 NZST [97972] LOG:  server process (PID 97974)
was terminated by signal 11: Segmentation fault: 11

(lldb) bt
* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x000104163e7e
postgres`secure_read(port=0x7fda9ef001d0, ptr=0x0001047ab690,
len=8192) at be-secure.c:164:6
frame #1: 0x00010417760d postgres`pq_recvbuf at pqcomm.c:969:7
frame #2: 0x0001041779ed postgres`pq_getbytes(s="??\x04~?,
len=1) at pqcomm.c:1110:8
frame #3: 0x000104284147
postgres`ProcessStartupPacket(port=0x7fda9ef001d0,
secure_done=true) at postmaster.c:2064:6
...
(lldb) f 0
frame #0: 0x000104163e7e
postgres`secure_read(port=0x7fda9ef001d0, ptr=0x0001047ab690,
len=8192) at be-secure.c:164:6
   161 else
   162 #endif
   163 #ifdef ENABLE_GSS
-> 164 if (port->gss->enc)
   165 {
   166 n = be_gssapi_read(port, ptr, len);
   167 waitfor = WL_SOCKET_READABLE;
(lldb) print port->gss
(pg_gssinfo *) $0 = 0x

> > Obviously your concept of tainted backends (= backends that can't be
> > reused by other connections because they contain non-default session
> > state) is quite simplistic and would help only the very simplest use
> > cases.  Obviously the problems that need to be solved first to do
> > better than that are quite large.  Personally I think we should move
> > all GUCs into the Session struct, put the Session struct into shared
> > memory, and then figure out how to put things like prepared plans into
> > something like Ideriha-san's experimental shared memory context so
> > that they also can be accessed by any process, and then we'll mostly
> > be tackling problems that we'll have to tackle for threads too.  But I
> > think you made the right choice to experiment with just reusing the
> > backends that have no state like that.
>
> That was not actually my idea: it was proposed by Dimitri Fontaine.
> In PgPRO-EE we have another version of builtin connection pooler
> which maintains session context and allows to use GUCs, prepared statements
> and temporary tables in pooled sessions.

Interesting.  Do you serialise/deserialise plans and GUCs using
machinery similar to parallel query, and did you changed temporary
tables to use shared buffers?  People have suggested that we do that
to allow temporary tables in parallel queries too.  FWIW I have also
wondered about per (database, user) pools for reusable parallel
workers.

> But the main idea of this patch was to make connection pooler less
> invasive and
> minimize changes in Postgres core. This is why I have implemented proxy.

How do you do it without a proxy?

One idea I've wondered about that doesn't involve feeding all network
IO through an extra process and extra layer of syscalls like this is
to figure out how to give back your PGPROC slot when idle.  If you
don't have one and can't acquire one at the beginning of a
transaction, you wait until one becomes free.  When idle and not in a
transaction you give it back to the pool, perhaps after a slight
delay.  That may be impossible for some reason or other, I don't know.
If it could be done, it'd deal with the size-of-procarray problem
(since we like to scan it) and provide a kind of 'admission control'
(limiting number of transactions that can run), but wouldn't deal with
the amount-of-backend-memory-wasted-on-per-process-caches problem
(maybe solvable by shared caching).

Ok, time for a little bit more testing.  I was curious about the user
experience when there are no free backends.

1.  I tested with connection_proxies=1, max_connections=4, and I began
3 transactions.  Then I tried to make a 4th connection, and it blocked
while trying to connect and the log shows a 'sorry, too many clients
already' message.  Then I committed one of my transactions and the 4th
connection was allowed to proceed.

2.  I tried again, this time with 4 already existing connections and
no transactions.  I began 3 concurrent transactions, and then when I
tried to begin a 4th transaction the BEGIN command blocked until one
of the other transactions committed.

Some thoughts:   Why should case 1 block?  Shouldn't I be allowed to
connect, even though I can't begin a transaction

Re: [PATCH] Add support for ON UPDATE/DELETE actions on ALTER CONSTRAINT

2019-07-14 Thread Matheus de Oliveira

On Mon, Jul 1, 2019 at 6:21 AM Thomas Munro  wrote:

>
> Hi Matheus,
>
> As the commitfest is starting, could you please send a rebased patch?
>
>
Hi all,

Glad to start working on that again... Follows the rebased version (at
5925e55498).

Thank you all.

Best regards,
-- 
Matheus de Oliveira
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 90bf19564c..198c640b98 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -55,7 +55,9 @@ ALTER TABLE [ IF EXISTS ] name
 ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }
 ADD table_constraint [ NOT VALID ]
 ADD table_constraint_using_index
-ALTER CONSTRAINT constraint_name [ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
+ALTER CONSTRAINT constraint_name
+  [ ON DELETE action ] [ ON UPDATE action ]
+  [ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
 VALIDATE CONSTRAINT constraint_name
 DROP CONSTRAINT [ IF EXISTS ]  constraint_name [ RESTRICT | CASCADE ]
 DISABLE TRIGGER [ trigger_name | ALL | USER ]
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0f1a9f0e54..b54c3d67c5 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8979,8 +8979,43 @@ ATExecAlterConstraint(Relation rel, AlterTableCmd *cmd,
  errmsg("constraint \"%s\" of relation \"%s\" is not a foreign key constraint",
 		cmdcon->conname, RelationGetRelationName(rel;
 
+	/*
+	 * Verify for FKCONSTR_ACTION_UNKNOWN, if found, replace by current
+	 * action. We could handle FKCONSTR_ACTION_UNKNOWN bellow, but since
+	 * we already have to handle the case of changing to the same action,
+	 * seems simpler to simple replace FKCONSTR_ACTION_UNKNOWN by the
+	 * current action here.
+	 */
+	if (cmdcon->fk_del_action == FKCONSTR_ACTION_UNKNOWN)
+		cmdcon->fk_del_action = currcon->confdeltype;
+
+	if (cmdcon->fk_upd_action == FKCONSTR_ACTION_UNKNOWN)
+		cmdcon->fk_upd_action = currcon->confupdtype;
+
+	/*
+	 * Do the same for deferrable attributes. But consider that if changed
+	 * only initdeferred attribute and to true, force deferrable to be also
+	 * true. On the other hand, if changed only deferrable attribute and to
+	 * false, force initdeferred to be also false.
+	 */
+	if (!cmdcon->was_deferrable_set)
+		cmdcon->deferrable = cmdcon->initdeferred ? true : currcon->condeferrable;
+
+	if (!cmdcon->was_initdeferred_set)
+		cmdcon->initdeferred = !cmdcon->deferrable ? false : currcon->condeferred;
+
+	/*
+	 * This is a safe check only, should never happen here.
+	 */
+	if (cmdcon->initdeferred && !cmdcon->deferrable)
+		ereport(ERROR,
+(errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("constraint declared INITIALLY DEFERRED must be DEFERRABLE")));
+
 	if (currcon->condeferrable != cmdcon->deferrable ||
-		currcon->condeferred != cmdcon->initdeferred)
+		currcon->condeferred != cmdcon->initdeferred ||
+		currcon->confdeltype != cmdcon->fk_del_action ||
+		currcon->confupdtype != cmdcon->fk_upd_action)
 	{
 		HeapTuple	copyTuple;
 		HeapTuple	tgtuple;
@@ -8998,6 +9033,8 @@ ATExecAlterConstraint(Relation rel, AlterTableCmd *cmd,
 		copy_con = (Form_pg_constraint) GETSTRUCT(copyTuple);
 		copy_con->condeferrable = cmdcon->deferrable;
 		copy_con->condeferred = cmdcon->initdeferred;
+		copy_con->confdeltype = cmdcon->fk_del_action;
+		copy_con->confupdtype = cmdcon->fk_upd_action;
 		CatalogTupleUpdate(conrel, ©Tuple->t_self, copyTuple);
 
 		InvokeObjectPostAlterHook(ConstraintRelationId,
@@ -9034,23 +9071,106 @@ ATExecAlterConstraint(Relation rel, AlterTableCmd *cmd,
 otherrelids = list_append_unique_oid(otherrelids,
 	 tgform->tgrelid);
 
-			/*
-			 * Update deferrability of RI_FKey_noaction_del,
-			 * RI_FKey_noaction_upd, RI_FKey_check_ins and RI_FKey_check_upd
-			 * triggers, but not others; see createForeignKeyActionTriggers
-			 * and CreateFKCheckTrigger.
-			 */
-			if (tgform->tgfoid != F_RI_FKEY_NOACTION_DEL &&
-tgform->tgfoid != F_RI_FKEY_NOACTION_UPD &&
-tgform->tgfoid != F_RI_FKEY_CHECK_INS &&
-tgform->tgfoid != F_RI_FKEY_CHECK_UPD)
-continue;
-
 			copyTuple = heap_copytuple(tgtuple);
 			copy_tg = (Form_pg_trigger) GETSTRUCT(copyTuple);
 
+			/*
+			 * Set deferrability here, but note that it may be overridden bellow
+			 * if the pg_trigger entry is on the referencing table and depending
+			 * on the action used for ON UPDATE/DELETE. But for check triggers
+			 * (in the referenced table) it is kept as is (since ON
+			 * UPDATE/DELETE actions makes no difference for the check
+			 * triggers).
+			 */
 			copy_tg->tgdeferrable = cmdcon->deferrable;
 			copy_tg->tginitdeferred = cmdcon->initdeferred;
+
+			/*
+			 * Set ON DELETE action
+			 */
+			if (tgform->tgfoid == F_RI_FKEY_NOACTION_DEL ||
+tgform->tgfoid == F_RI_FKEY_RESTRICT_DEL ||
+tgform->tgfoid == F_RI_FKEY_CASCADE

Re: Built-in connection pooler

2019-07-14 Thread Konstantin Knizhnik





On 14.07.2019 8:03, Thomas Munro wrote:

On Tue, Jul 9, 2019 at 8:30 AM Konstantin Knizhnik
 wrote:

Rebased version of the patch is attached.

Thanks for including nice documentation in the patch, which gives a
good overview of what's going on.  I haven't read any code yet, but I
took it for a quick drive to understand the user experience.  These
are just some first impressions.

I started my server with -c connection_proxies=1 and tried to connect
to port 6543 and the proxy segfaulted on null ptr accessing
port->gss->enc.  I rebuilt without --with-gssapi to get past that.


First of all a lot of thanks for your review.

Sorry, I failed to reproduce the problem with GSSAPI.
I have configured postgres with --with-openssl --with-gssapi
and then try to connect to the serverwith psql using the following 
connection string:


psql "sslmode=require dbname=postgres port=6543 krbsrvname=POSTGRES"

There are no SIGFAULS and port->gss structure is normally initialized.

Can you please explain me more precisely how to reproduce the problem 
(how to configure postgres and how to connect to it)?




Using SELECT pg_backend_pid() from many different connections, I could
see that they were often being served by the same process (although
sometimes it created an extra one when there didn't seem to be a good
reason for it to do that).  I could see the proxy managing these
connections with SELECT * FROM pg_pooler_state() (I suppose this would
be wrapped in a view with a name like pg_stat_proxies).  I could see
that once I did something like SET foo.bar = 42, a backend became
dedicated to my connection and no other connection could use it.  As
described.  Neat.

Obviously your concept of tainted backends (= backends that can't be
reused by other connections because they contain non-default session
state) is quite simplistic and would help only the very simplest use
cases.  Obviously the problems that need to be solved first to do
better than that are quite large.  Personally I think we should move
all GUCs into the Session struct, put the Session struct into shared
memory, and then figure out how to put things like prepared plans into
something like Ideriha-san's experimental shared memory context so
that they also can be accessed by any process, and then we'll mostly
be tackling problems that we'll have to tackle for threads too.  But I
think you made the right choice to experiment with just reusing the
backends that have no state like that.

That was not actually my idea: it was proposed by Dimitri Fontaine.
In PgPRO-EE we have another version of builtin connection pooler
which maintains session context and allows to use GUCs, prepared statements
and temporary tables in pooled sessions.
But the main idea of this patch was to make connection pooler less 
invasive and

minimize changes in Postgres core. This is why I have implemented proxy.



On my FreeBSD box (which doesn't have epoll(), so it's latch.c's old
school poll() for now), I see the connection proxy process eating a
lot of CPU and the temperature rising.  I see with truss that it's
doing this as fast as it can:

poll({ 13/POLLIN 17/POLLIN|POLLOUT },2,1000) = 1 (0x1)

Ouch.  I admit that I had the idea to test on FreeBSD because I
noticed the patch introduces EPOLLET and I figured this might have
been tested only on Linux.  FWIW the same happens on a Mac.


Yehh.
This is really the problem. In addition to FreeBSD and MacOS, it also 
takes place at Win32.
I have to think more how to solve it. We should somehow emulate 
"edge-triggered" more for this system.
The source of the problem is that proxy is reading data from one socket 
and writing it in another socket.
If write socket is busy, we have to wait until is is available. But at 
the same time there can be data available for input,
so poll will return immediately unless we remove read event for this 
socket. Not here how to better do it without changing

WaitEvenSet API.




   C:\projects\postgresql\src\include\../interfaces/libpq/libpq-int.h(33):
fatal error C1083: Cannot open include file: 'pthread-win32.h': No
such file or directory (src/backend/postmaster/proxy.c)
[C:\projects\postgresql\postgres.vcxproj]


I will investigate the problem with Win32 build once I get image of 
Win32 for VirtualBox.



These relative includes in proxy.c are part of the problem:

#include "../interfaces/libpq/libpq-fe.h"
#include "../interfaces/libpq/libpq-int.h"

I didn't dig into this much but some first reactions:

I have added
override CPPFLAGS :=  $(CPPFLAGS) -I$(top_builddir)/src/port 
-I$(top_srcdir)/src/port


in Makefile for postmaster in order to fix this problem (as in 
interface/libpq/Makefile).
But looks like it is not enough. As I wrote above I will try to solve 
this problem once I get access to Win32 environment.



1.  I see that proxy.c uses libpq, and correctly loads it as a dynamic
library just like postgres_fdw.  Unfortunately it's part of core, so
it can't use the same technique as postgres_fdw to

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-07-14 Thread James Coleman

On Tue, Jul 9, 2019 at 10:54 AM Tomas Vondra
 wrote:
>
> On Tue, Jul 09, 2019 at 09:28:42AM -0400, James Coleman wrote:
> >On Mon, Jul 8, 2019 at 9:37 PM Tomas Vondra
> > wrote:
> >>
> >> On Mon, Jul 08, 2019 at 12:07:06PM -0400, James Coleman wrote:
> >> > ...
> >> >
> >> >I guess I'm still not following. If (2) is responsible (currently) for
> >> >adding an explicit sort, why wouldn't adding an incremental sort be an
> >> >example of that responsibility? The subpath that either a Sort or
> >> >IncrementalSort is being added on top of (to then feed into the
> >> >GatherMerge) is the same in both cases right?
> >> >
> >> >Unless you're saying that the difference is that since we have a
> >> >partial ordering already for incremental sort then incremental sort
> >> >falls into the category of "maintaining" existing ordering of the
> >> >subpath?
> >> >
> >>
> >> Oh, I think I understand what you're saying. Essentially, we should not
> >> be making generate_gather_paths responsible for adding the incremental
> >> sort. Instead, we should be looking at places than are adding explicit
> >> sort (using create_sort_path) and also consider adding incremental sort.
> >>
> >> I definitely agree with the second half - we should look at all places
> >> that create explicit sorts and make them also consider incremental
> >> sorts. That makes sense.
> >
> >Yep, exactly.
> >
>
> If I remember correctly, one of the previous patch versions (in the early
> 2018 commitfests) actually modified many of those places, but it did that
> in a somewhat "naive" way. It simply used incremental sort whenever the
> path was partially sorted, or something like that. So it did not use
> costing properly. There was an attempt to fix that in the last commitfest
> but the costing model was deemed to be a bit too rough and unreliable
> (especially the ndistinct estimates can be quite weak), so the agreement
> was to try salvaging the patch for PG11 by only considering incremental
> sort in "safest" places with greatest gains.
>
> We've significantly improved the costing model since then, and the
> implementation likely handles the corner cases much better. But that does
> not mean we have to introduce the incremental sort to all those places at
> once - it might be wise to split that into separate patches.

Yes, although we haven't added the MCV checking yet; that's on my
mental checklist, but another new area of the codebase for me to
understand, so I've been prioritizing other parts of the patch.

> It's not just about picking the right plan - we've kinda what impact these
> extra paths might have on planner performance, so maybe we should look at
> that too. And the impact might be different for each of those places.
>
> I'll leave that up to you, but I certainly won't insist on doing it all in
> one huge patch.

I'm not opposed to handling some of them separately, but I would like
to at least hit the places where it's most likely (for example, with
LIMIT) to improve things. I supposed I'll have to look at all of the
usages of create_sort_path() and try to rank them in terms of
perceived likely value.

> >> But I'm not sure it'll address all cases - the problem is that those
> >> places add the explicit sort because they need sorted input. Gather
> >> Merge does not do that, it only preserves existing ordering of paths.
> >>
> >> So it's possible the path does not have an explicit sort on to, and
> >> gather merge will not know to add it. And once we have the gather merge
> >> in place, we can't push place "under" it.
> >
> >That's the explanation I was missing; and it makes sense (to restate:
> >sometimes sorting is useful even when not required for correctness of
> >the user returned data).
> >
>
> Yes, although even when the sorting is required for correctness (because
> the user specified ORDER BY) you can do it at different points in the
> plan. We'd still produce correct results, but the sort might be done at
> the very end without these changes.
>
> For example we might end up with plans
>
>Incremental Sort (presorted: a, path keys: a,b)
> -> Gather Merge (path keys: a)
> -> Index Scan (path keys: a)
>
> but with those changes we might push the incremental sort down into the
> parallel part:
>
>Gather Merge (path keys: a,b)
> -> Incremental Sort (presorted: a, path keys: a,b)
> -> Index Scan (path keys: a)
>
> which is likely better. Perhaps that's what you meant, though.

I was thinking of ordering being useful for grouping/aggregation or
merge joins; I didn't realize the above plan wasn't possible yet, so
that explanation is helpful.

> >> And I think I know why is that - while gather_grouping_paths() tries to
> >> add explicit sort below the gather merge, there are other places that
> >> call generate_gather_paths() that don't do that. In this case it's
> >> probably apply_scanjoin_target_to_paths() which simply builds
> >>
> >>parallel (seq|index) scan + gather merge
> >>
> >> and that's it.

Re: Conflict handling for COPY FROM

2019-07-14 Thread Alvaro Herrera

I think making ERROR a reserved word is a terrible idea, and we don't
need it for this feature anyway.  Use a new option in the parenthesized
options list instead.


error_limit being an integer, please don't use it as a boolean:

if (cstate->error_limit)
 ...

Add an explicit comparison to zero instead, for code readability.
Also, since each error decrements the same variable, it becomes hard to
reason about the state: at the end, are we ending with the exact number
of errors, or did we start with the feature disabled?  I suggest that
it'd make sense to have a boolean indicating whether this feature has
been requested, and the integer is just the remaining allowed problems.

Line 3255 or thereabouts contains an excess " char

The "warn about it" comment is obsolete, isn't it?  There's no warning
there.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-14 Thread Joe Conway

On 7/13/19 5:58 PM, Tomas Vondra wrote:
> On Sat, Jul 13, 2019 at 02:41:34PM -0400, Joe Conway wrote:
>>[2] also says provides additional support for AES 256. It also mentions
>>CBC versus XTS -- I came across this elsewhere and it bears discussion:
>>
>>"Currently, the following pairs of encryption modes are supported:
>>
>>AES-256-XTS for contents and AES-256-CTS-CBC for filenames
>>AES-128-CBC for contents and AES-128-CTS-CBC for filenames
>>Adiantum for both contents and filenames
>>
>>If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair.
>>
>>AES-128-CBC was added only for low-powered embedded devices with crypto
>>accelerators such as CAAM or CESA that do not support XTS."
>>---
>>[2] also states this, which again makes me think in terms of table being
>>the moral equivalent to a file:
>>
>>"Unlike dm-crypt, fscrypt operates at the filesystem level rather than
>>at the block device level. This allows it to encrypt different files
>>with different keys and to have unencrypted files on the same
>>filesystem. This is useful for multi-user systems where each user’s
>>data-at-rest needs to be cryptographically isolated from the others.
>>However, except for filenames, fscrypt does not encrypt filesystem
>>metadata."



>>[5] has this to say which seems independent of mode:
>>
>>"When encrypting data with a symmetric block cipher, which uses blocks
>>of n bits, some security concerns begin to appear when the amount of
>>data encrypted with a single key comes close to 2n/2 blocks, i.e. n*2n/2
>>bits. With AES, n = 128 (AES-128, AES-192 and AES-256 all use 128-bit
>>blocks). This means a limit of more than 250 millions of terabytes,
>>which is sufficiently large not to be a problem. That's precisely why
>>AES was defined with 128-bit blocks, instead of the more common (at that
>>time) 64-bit blocks: so that data size is practically unlimited."
>>
> 
> FWIW I was a bit confused at first, because the copy paste mangled the
> formulas a bit - it should have been 2^(n/2) and n*2^(n/2).

Yeah, sorry about that.

>>But goes on to say:
>>"I wouldn't use n*2^(n/2) bits in any sort of recommendation. Once you
>>reach that number of bits the probability of a collision will grow
>>quickly and you will be way over 50% probability of a collision by the
>>time you reach 2*n*2^(n/2) bits. In order to keep the probability of a
>>collision negligible I recommend encrypting no more than n*2^(n/4) bits
>>with the same key. In the case of AES that works out to 64GB"
>>
>>It is hard to say if that recommendation is per key or per key+IV.
> 
> Hmm, yeah. The question is what collisions they have in mind? Presumably
> it's AES(block1,key) = AES(block2,key) in which case it'd be with fixed
> IV, so per key+IV.

Seems likely.

>>But I did find that files in an encrypted file system are encrypted with
>>derived keys from a master key, and I view this as analogous to what we
>>are doing.
>>
> 
> My understanding always was that we'd do something like that, i.e. we'd
> have a master key (or perhaps multiple of them, for various users), but
> the data would be encrypted with secondary (generated) keys, and those
> secondary keys would be encrypted by the master key. At least that's
> what was proposed at the beginning of this thread by Insung Moon.

In my email I linked the wrong page for [2]. The correct one is here:
[2] https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html

Following that, I think we could end up with three tiers:

1. A master key encryption key (KEK): this is the ley supplied by the
   database admin using something akin to ssl_passphrase_command

2. A master data encryption key (MDEK): this is a generated key using a
   cryptographically secure pseudo-random number generator. It is
   encrypted using the KEK, probably with Key Wrap (KW):
   or maybe better Key Wrap with Padding (KWP):

3a. Per table data encryption keys (TDEK): use MDEK and HKDF to generate
table specific keys.

3b. WAL data encryption keys (WDEK):  Similarly use MDEK and a HKDF to
generate new keys when needed for WAL (based on the other info we
need to change WAL keys every 68 GB unless I read that wrong).

I believe that would allows us to have multiple keys but they are
derived securely from the one DEK using available info similar to the
way we intend to use LSN to derive the IVs -- perhaps table.oid for
tables and something else for WAL.

We also need to figure out how/when to generate new WDEK. Maybe every
checkpoint, also meaning we would have to force a checkpoint every 68GB?

[HKDF]: https://tools.ietf.org/html/rfc5869
[KW]: https://tools.ietf.org/html/rfc3394
[KWP]: https://tools.ietf.org/html/rfc5649


> But AFAICS the 2-tier key scheme is primarily motivated by operational
> reasons, i.e. effort to rotate the master key etc. So I would not expect
> to find recommendations to use multiple keys in sources primarily
> dealing with cryptography.

It does in [2]


> One extra thing we should con

Re: [PATCH] Implement uuid_version()

2019-07-14 Thread Peter Eisentraut

On 2019-07-13 17:13, Fabien COELHO wrote:
>>> What about avoiding a redirection with something like:
>>>
>>> Datum (* const pg_random_uuid)(PG_FUNCTION_ARGS) = gen_random_uuid;
>>
>> That seems very confusing.
> 
> Dunno. Possibly. The user does not have to look at the implementation, and 
> probably such code would deserve a comment.
> 
> The point is to avoid one call so as to perform the same (otherwise the 
> pg_random_uuid would be slightly slower), and to ensure that it behaves 
> the same, as it would be the very same function by construction.
> 
> I've switched the patch to ready anyway.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-14 Thread Joe Conway

On 7/13/19 2:41 PM, Joe Conway wrote:
> [2]
> https://www.postgresql.org/message-id/20190708194733.cztnwhqge4acepzw%40development

BTW I managed to mess up this link. This is what I intended to link
there (from Tomas):

[2] https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html

I am sure I have confused the heck out of everyone reading what I wrote
by that error :-/

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: pg_stat_statements vs. SELECT FOR UPDATE

2019-07-14 Thread Sergei Kornilov

Hi

>  Sergei> PS: my note about comments in tests from my previous review is
>  Sergei> actual too.
>
> I changed the comment when committing it.

Great, thank you!

regards, Sergei

Re: pg_stat_statements vs. SELECT FOR UPDATE

2019-07-14 Thread Andrew Gierth

> "Sergei" == Sergei Kornilov  writes:

 Sergei> PS: my note about comments in tests from my previous review is
 Sergei> actual too.

I changed the comment when committing it.

-- 
Andrew (irc:RhodiumToad)

Re: Patch to document base64 encoding

2019-07-14 Thread Fabien COELHO




Hello Karl,


It really works in research papers: "Theorem X can be proven by
applying Proposition Y. See Figure 2 for details. Algorithm Z
describes whatever, which is listed in Table W..."


I've not thought about it before but I suppose the difference is between 
declarative and descriptive, the latter being more inviting and better 
allows for flow between sentences. Otherwise you're writing in bullet 
points.  So it is a question of balance between specification and 
narration. In regular prose you're always going to see the "the" unless 
the sentence starts with the name.  The trouble is that we can't start 
sentences with function names because of capitalization confusion.


Sure. For me "Function" would work as a title on its name, as in "Sir 
Samuel", "Doctor Frankenstein", "Mister Bean", "Professor Layton"... 
"Function sqrt" and solves the casing issue on the function name which is 
better not capitalized.


--
Fabien.

Re: pgbench - implement strict TPC-B benchmark

2019-07-14 Thread Fabien COELHO



Hello Thomas,

Thanks for the feedback.


+   the account branch has a 15% probability to be in the same branch
as the teller (unless

I would say "... has a 15% probability of being in the same ...".  The
same wording appears further down in the comment.


Fixed.

I see that the parameters you propose match the TPCB 2.0 description[1], 
[...]


Nearly:-(

While re-re-re-re-reading the spec, it was 85%, i.e. people mostly go to 
their local teller, I managed to get it wrong. Sigh. Fixed. Hopefully.


I've updated the script a little so that it is closer to spec. I've also 
changed the script definition so that it still works as expected if 
someone changes "nbranches" definition for some reason, even if this

is explicitely discourage by comments.

I wonder if "strict" is the right name here though.  "tpcb-like-2" at 
least leaves room for someone to propose yet another variant, and still 
includes the "-like" disclaimer, which I interpret as a way of making it 
clear that this benchmark and results produced by it have no official 
TPC audited status.


Hmmm.

The -like suffix is really about the conformance of the script, not the 
rest, but that should indeed be clearer. I've expanded the comment and doc 
about this with a disclaimers, so that there is no ambiguity about what is 
expected to conform, which is only the transaction script.


I have added a comment about the non conformance of the "int" type use for 
balances in the initialization phase.


Also, on second thought, I've changed the name to "standard-tpcb", but I'm 
unsure whether it is better than "script-tpcb". There is an insentive to 
have a different prefix so that "-b t" would not complain of ambiguity 
between "tpcb-like*", which would be a regression. This is why I did not 
choose the simple "tcp". There may be a "standard-tpcb-2" anyway.


I have added a small test run in the TAP test.

On my TODO list is adding an initialization option to change the balance 
type for conformance, by using NUMERIC or integer8.


While thinking about that, an unrelated thought occured to me: adding a 
partitioned initialization variant would be nice to test the performance 
impact of partitioning easily. I should have thought of that as soon as 
partitioning was added. Added to my TODO list as well.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 09af9f1a7d..6ff557b2b3 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -347,7 +347,8 @@ pgbench  options  d
 An optional integer weight after @ allows to adjust the
 probability of drawing the script.  If not specified, it is set to 1.
 Available built-in scripts are: tpcb-like,
-simple-update and select-only.
+simple-update, select-only and
+standard-tpcb.
 Unambiguous prefixes of built-in names are accepted.
 With special name list, show the list of built-in scripts
 and exit immediately.
@@ -869,6 +870,25 @@ pgbench  options  d
If you select the select-only built-in (also -S),
only the SELECT is issued.
   
+
+  
+   If you select the standard-tpcb built-in, the SQL commands are similar
+   to tpcb-like, but the delta amount is 6 digits,
+   the account branch has a 85% probability of being in the same branch as the teller (unless
+   there is only one branch in which case it is always the same), and the final account
+   balance is actually extracted by the script.
+  
+
+  
+   
+Disclaimer: with standard-tpcb, only the transaction script
+is expected to conform to the "TPC Benchmark(tm) B revision 2.0" specification.
+Other parts of the benchmark, such as type capabilities, initialization,
+performance data collection, checks on final values, database configuration,
+test duration... may or may not conform to the standard depending on the actual
+run.
+   
+  
  
 
  
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 8b84658ccd..4888beb129 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -193,8 +193,12 @@ int64		random_seed = -1;
  * end of configurable parameters
  */
 
-#define nbranches	1			/* Makes little sense to change this.  Change
- * -s instead */
+/*
+ * It makes little sense to change "nbranches", use -s instead.
+ *
+ * Nevertheless, "ntellers" and "naccounts" must be divisible by "nbranches".
+ */
+#define nbranches	1
 #define ntellers	10
 #define naccounts	10
 
@@ -566,6 +570,61 @@ static const BuiltinScript builtin_script[] =
 		"",
 		"\\set aid random(1, " CppAsString2(naccounts) " * :scale)\n"
 		"SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n"
+	},
+	{
+		/*---
+		 * Standard conforming TPC-B benchmark script as defined in
+		 * "TPC BENCHMARK (tm) B Standard Specification Revision 2.0"
+		 * from 7 June 1994, see http://www.tpc.org/tpcb/spec/tpcb_current.pdf
+		 *
+

Re: Conflict handling for COPY FROM

2019-07-14 Thread Anthony Nowocien

Hi,

sorry for answering a bit later than I hoped. Here is my review so far:

Contents

==

This patch starts to address in my opinion one of COPY's shortcoming, which
is error handling.  PK and exclusion errors are taken care of, but not
(yet?) other types of errors.

Documentation is updated, "\h copy" also and some regression tests are
added.

Initial Run

===

Patch applies (i've tested v6) cleanly.

make: OK

make install: OK

make check: OK

make installcheck: OK

Performance

I've tested the patch on a 1.1G file with 10 000 000 lines. Each test was
done 15 times on a small local VM. Table is without constraints.

head: 38,93s

head + patch: 38,76s

Another test was one a 0.1GB file with 1 000 000 lines. Each test done 10
times on a small local VM and the table has a pk.

COPY4,550s

COPY CONFLICT   4,595s

COPY CONFLICT with only one pk error 10,529s

COPY CONFLICT pk error every 100 lines  10,859s

COPY CONFLICT pk error every 1000 lines10,879s

I did not test exclusions so far.

Thoughts

==

I find the feature useful in itself. One big question for me is can it be
improved later on to handle other types of errors (like check constraints
for example) ? A "-1" for the error limit would be very useful in my
opinion.

I am also afraid that the name "error_limit" might mislead users into
thinking that all error types are handled. But I do not have a better
suggestion without making this clause much longer...

I've had a short look at the code, but this will need review by someone
else.

Anyway, thanks a lot for taking the time to work on it.

Anthony

On Sun, Jul 14, 2019 at 3:48 AM Thomas Munro  wrote:

> On Fri, Jul 12, 2019 at 1:42 AM Surafel Temesgen 
> wrote:
> > Here are the patch that contain all the comment given except adding a
> way to specify
> > to ignoring all error because specifying a highest number can do the
> work and may be
> > try to store such badly structure data is a bad idea
>
> Hi Surafel,
>
> FYI GCC warns:
>
> copy.c: In function ‘CopyFrom’:
> copy.c:3383:8: error: ‘dest’ may be used uninitialized in this
> function [-Werror=maybe-uninitialized]
> (void) dest->receiveSlot(myslot, dest);
> ^
> copy.c:2702:16: note: ‘dest’ was declared here
>   DestReceiver *dest;
> ^
>
> --
> Thomas Munro
> https://enterprisedb.com
>
>
>

Insecure initialization of required_relids field

Re: Tid scan improvements

Re: A little report on informal commit tag usage

Creating partitions automatically at least on HASH?

Re: XLogRecGetFullXid()

Re: A little report on informal commit tag usage

Re: refactoring - share str2*int64 functions

A little report on informal commit tag usage

Re: proposal - patch: psql - sort_by_size

Re: Control your disk usage in PG: Introduction to Disk Quota Extension

doc: mention pg_reload_conf() in pg_hba.conf documentation

Re: Commitfest 2019-07, the first of five* for PostgreSQL 13

Re: [PATCH] Implement uuid_version()

Re: Re: SQL/JSON: functions

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Re: Change ereport level for QueuePartitionConstraintValidation

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Re: Using unique btree indexes for pathkeys with one extra column

Re: refactoring - share str2*int64 functions

Re: [sqlsmith] Crash in mcv_get_match_bitmap

Using unique btree indexes for pathkeys with one extra column

Re: Change ereport level for QueuePartitionConstraintValidation

Re: Built-in connection pooler

Re: [PATCH] Add support for ON UPDATE/DELETE actions on ALTER CONSTRAINT

Re: Built-in connection pooler

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Re: Conflict handling for COPY FROM

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Re: [PATCH] Implement uuid_version()

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Re: pg_stat_statements vs. SELECT FOR UPDATE

Re: pg_stat_statements vs. SELECT FOR UPDATE

Re: Patch to document base64 encoding

Re: pgbench - implement strict TPC-B benchmark

Re: Conflict handling for COPY FROM

35 matches

Site Navigation

Mail list logo

Footer information