date:20190713

Re: Comment fix of config_default.pl

2019-07-13 Thread Michael Paquier

On Fri, Jul 12, 2019 at 05:01:41PM +0900, Michael Paquier wrote:
> I would also patch GetFakeConfigure in Solution.pm (no need to send a
> new patch), and I thought that you'd actually do the change.  What do
> you think?

OK, applied as I have been able to look at it again, and after fixing
the portion for GetFakeConfigure.  Thanks! 
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Implement uuid_version()

2019-07-13 Thread Jose Luis Tallon


On 13/7/19 8:31, Fabien COELHO wrote:


Hello Jose,


Hello, Fabien

Thanks for taking a look



Got it, and done. Please find attached a v2 patch with the upgrade 
script included.


Patch v2 applies cleanly. Compiles cleanly (once running configure 
--with-uuid=...). Local make check ok. Doc build ok.


There are no tests, I'd suggest to add some under sql & change 
expected if possible which would return all possible values, including 
with calling pg_random_uuid() which should return 4.


Documentation describes uuid_version(), should it not describe 
uuid_version(namespace uuid)?


I'd suggest to add an example.

The extension update script seems ok, but ISTM that 
"uuid-ossp-1.1.sql" should be replaced with an updated 
"uuid-ossp-1.2.sql".


This was a quite naïf approach to the issue on my part, more a "scratch 
my own itch" than anything else but definitively sparked some 
interest. Thanks to all involved.


Considering the later arguments on-list, I plan on submitting a more 
elaborate patchset integrating the feedback received so far, and along 
the following lines:


- uuid type, v4 generation and uuid_version() in core

- Provide a means to rename/supercede extensions keeping backwards 
compatibility (i.e. uuid-ossp -> uuid, keep old code working)


- Miscellaneous other functionality

- Drop "dead" code

  ...but I've tried to keep quiet so as not to disturb too much around 
release time.



I intend to continue working on this in late July, aiming for the 
following commitfest (once more "urgent" patches will have landed)



Thanks again.

    J.L.

Re: POC: Cleaning up orphaned files using undo logs

2019-07-13 Thread Amit Kapila

On Fri, Jul 12, 2019 at 7:08 PM Robert Haas  wrote:
>
> On Fri, Jul 12, 2019 at 5:40 AM Amit Kapila  wrote:
>
> > Apart from this, the duplicate key (ex. for size queues the size of
> > two requests can be same) handling might need some work.  Basically,
> > either special combine function needs to be written (not sure yet what
> > we should do there) or we always need to ensure that the key is unique
> > like (size + start_urec_ptr).  If the size is the same, then we can
> > decide based on start_urec_ptr.
>
> I think that this problem is somewhat independent of whether we use an
> rbtree or a binaryheap or some other data structure.
>

I think then I am missing something because what I am talking about is
below code in rbt_insert:
rbt_insert()
{
..
cmp = rbt->comparator(data, current, rbt->arg);
if (cmp == 0)
{
/*
* Found node with given key.  Apply combiner.
*/
rbt->combiner(current, data, rbt->arg);
*isNew = false;
return current;
}
..
}

If you see, here it doesn't add the duplicate key in the tree and that
is not the case with binary_heap as far as I can understand.

>  I would be
> inclined to use XID as a tiebreak for the size queue, so that it's
> more likely to match the ordering of the XID queue, but if that's
> inconvenient, then some other arbitrary value like start_urec_ptr
> should be fine.
>

I think it would be better to use start_urec_ptr because XID can be
non-unique in our case.  As I explained in one of the emails above [1]
that we register the requests for logged and unlogged relations
separately, so XID can be non-unique.

> >
> > I think even if we currently go with a binary heap, it will be
> > possible to change it to rbtree later, but I am fine either way.
>
> Well, I don't see much point in revising all of this logic twice. We
> should pick the way we want it to work and make it work that way.
>

Yeah, I agree.  So, I am assuming here that as you have discussed this
idea with Andres offlist, he is on board with changing it as he has
originally suggested using binary_heap.  Andres, do let us know if you
think differently here.  It would be good if anyone else following the
thread can also weigh in.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1LEKyPZD5Dy4j1u2smUUyMzxgC2YLj8E%2BaJpsvG7sVJYA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: refactoring - share str2*int64 functions

2019-07-13 Thread Thomas Munro

On Mon, Jul 8, 2019 at 3:22 PM Thomas Munro  wrote:
> Here's some semi-automated feedback, noted while going through
> failures on cfbot.cputube.org.  You have a stray editor file
> src/backend/parser/parse_node.c.~1~.  Something is failing to compile
> while doing the temp-install in make check-world, which probably
> indicates that some test or contrib module is using the interface you
> changed?

Please disregard the the comment about the ".~1~" file, my mistake.
As for the check-world failure, it's here:

pg_stat_statements.c:1024:11: error: implicit declaration of function
'pg_strtouint64' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
rows = pg_strtouint64(completionTag + 5, NULL, 10);
   ^
Apparently it needs to include common/string.h.

-- 
Thomas Munro
https://enterprisedb.com

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-13 Thread Joe Conway

On 7/12/19 2:45 PM, Bruce Momjian wrote:
> On Fri, Jul 12, 2019 at 12:41:19PM -0600, Ryan Lambert wrote:
>> >> I vote for AES 256 rather than 128.
>> >
>> > Why?  This page seems to think 128 is sufficient:
>> >
>> >         https://crypto.stackexchange.com/questions/20/
>> what-are-the-practical-differences-between-256-bit-192-bit-and-128-bit-aes-enc
>> >
>> >         For practical purposes, 128-bit keys are sufficient to ensure
>> security.
>> >         The larger key sizes exist mostly to satisfy some US military
>> >         regulations which call for the existence of several distinct
>> "security
>> >         levels", regardless of whether breaking the lowest level is already
>> far
>> >         beyond existing technology.
>> 
>> After researching AES key sizes a bit more my vote is (surprisingly?) for
>> AES-128.  My reasoning is about security, I did not consider performance
>> impacts in my decision.
> 
> Thank you for this exhaustive research.

First of all, that is a mischaracterization of the issue. That paper
also states:

"we describe several key derivation attacks of practical complexity on
AES-256 when its number of rounds is reduced to approximately that of
AES-128. The best previously published  attacks  on  such  variants
were  far  from  practical,  requiring  4  related  keys  and  2^120
time  to break a 9-round version of AES-256 [9], and 64 related keys and
2^172time to break a 10-round version of AES-256 ([9], see also [2]). In
this paper we describe an attack on 9-round AES-256 which can findits
complete 256-bit key in 239time by using only the simplest type of
related keys (in which the chosenplaintexts are encrypted under two keys
whose XOR difference can be chosen in many different ways).Our best
attack on 10-round AES-256 requires only two keys and 245time, but it
uses a stronger type ofrelated subkey attack. These attacks can be
extended into a quasi-practical 270attack on 11-round AES,and into a
trivial 226attack on 8-round AES."

Notice 2 key things about this:
1. The attacks described are against a reduced number of rounds. AES256
   is 14 rounds, not 9 or 10.
2, They are "related key" attacks, which can be avoided by not using
   related keys, and we certainly should not be doing that.

Also, please read the links that Sehrope sent earlier if you have not
done so. In particular this one:

https://www.ecrypt.eu.org/csa/documents/PQC-whitepaper.pdf

which says:

"Post-quantum cryptography is an area of cryptography in which systems
are studied under the  security  assumption  that  the  attacker  has  a
 quantum  computer. This  attack  model  is interesting because Shor has
shown a quantum algorithm that breaks RSA, ECC, and finite field
discrete logarithms in polynomial time.  This means that in this model
all commonly used public-key systems are no longer secure.Symmetric
cryptography is also affected but significantly less.  For systems that
do not rely on mathematical structures the main effect is that an
algorithm due to Grover halves the security level, i.e., breaking
AES-128 takes 2^64 quantum operations while current attacks take  2^128
steps.   While  this  is  a  big  change,  it  can  be  managed  quite
easily  by  doubling the key sizes, e.g., by deploying AES-256.  The
operations needed in Grover’s algorithm are inherently  sequential
which  has  led  some  to  doubt  that  even  264quantum  operations
are feasible, but since the remedy of changing to larger key sizes is
very inexpensive it is generally recommended to do so."

In addition, that first paper was written in 2010, yet in 2016 NSA
published one of the other documents referenced by Sehrope:

https://apps.nsa.gov/iaarchive/customcf/openAttachment.cfm?FilePath=/iad/library/ia-guidance/ia-solutions-for-classified/algorithm-guidance/assets/public/upload/CNSA-Suite-and-Quantum-Computing-FAQ.pdf&WpKes=aF6woL7fQp3dJiyWTFKrYn3ZZShmLnzECSjJhf

Which states:

"If you have already implemented Suite B algorithms using the larger
(for TOP SECRET) key sizes, you should continue to use those algorithms
and key sizes through this upcoming transition period. In many products
changing to a larger key size can be done via a configuration
change.Implementations using only the algorithms previously approved for
SECRET and below in Suite B should not be used in NSS. In more precise
terms this means that NSS (National Security Systems) should no longer use
 •ECDH and ECDSA with NIST P-256
 •SHA-256
 •AES-128
 •RSA with 2048-bit keys
 •Diffie-Hellman with 2048-bit keys"

I stand by my position. At a minimum, we need a choice of AES128 and AES256.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

signature.asc
Description: OpenPGP digital signature

Re: [PATCH] Implement uuid_version()

2019-07-13 Thread Peter Eisentraut

On 2019-07-13 08:08, Fabien COELHO wrote:
> About doc: I'd consider "generation" instead of "generating" as a 
> secondary index term.

We do use the "-ing" form for other secondary index terms.  It's useful
because the concatenation of primary and secondary term should usually
make a phrase of some sort.  The alternative would be "generation of",
but that doesn't seem clearly better.

>> (There is also precedent for redirecting the extension function to the
>> internal one by changing the SQL-level function definition using CREATE
>> OR REPLACE FUNCTION ... LANGUAGE INTERNAL.  But that seems more
>> complicated and would require a new extension version.  It could maybe
>> be included if the extension version is changed for other reasons.)
> 
> What about avoiding a redirection with something like:
> 
> Datum (* const pg_random_uuid)(PG_FUNCTION_ARGS) = gen_random_uuid;

That seems very confusing.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: initdb recommendations

2019-07-13 Thread Peter Eisentraut

On 2019-07-11 21:34, Julien Rouhaud wrote:
>> Note that with this change, running initdb without arguments will now
>> error on those platforms: You need to supply either a password or select
>> a different default authentication method.
> Should we make this explicitly stated in the documentation?  As a
> reference, it's saying:
> 
> The default client authentication setup is such that users can connect
> over the Unix-domain socket to the same database user name as their
> operating system user names (on operating systems that support this,
> which are most modern Unix-like systems, but not Windows) and
> otherwise with a password. To assign a password to the initial
> database superuser, use one of initdb's -W, --pwprompt or -- pwfile
> options.

Do you have a suggestion for where to put this and exactly how to phrase
this?

I think the initdb reference page would be more appropriate than
runtime.sgml.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Bad canonicalization for dateranges with 'infinity' bounds

2019-07-13 Thread Thomas Munro

On Fri, May 3, 2019 at 12:49 AM Laurenz Albe  wrote:
> > I propose the attached patch which fixes the problem.

Hi Laurenz,

I agree that the patch makes the code match the documentation.  The
documented behaviour seems to make more sense than the code, since
unpatched master gives this nonsense result when it flips the
inclusive flag but doesn't adjust the value (because it can't):

postgres=# select '(-infinity,infinity]'::daterange @> 'infinity'::date;
 ?column?
--
 f
(1 row)

-if (!upper.infinite && upper.inclusive)
+if (!(upper.infinite || DATE_NOT_FINITE(upper.val)) && upper.inclusive)

Even though !(X || Y) is equivalent to !X && !Y, by my reading of
range_in(), lower.value can be uninitialised when lower.infinite is
true, and it's also a bit hard to read IMHO, so I'd probably write
that as !upper.infinite && !DATE_NOT_FINITE(upper.val) &&
upper.inclusive.  I don't think it can affect the result but it might
upset Valgrind or similar.

-- 
Thomas Munro
https://enterprisedb.com

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-13 Thread Joe Conway

On 7/11/19 9:05 PM, Bruce Momjian wrote:
> On Thu, Jul 11, 2019 at 08:41:52PM -0400, Joe Conway wrote:
>> On 7/11/19 6:37 PM, Bruce Momjian wrote:
>> > Our first implementation will encrypt the entire cluster.  We can later
>> > consider encryption per table or tablespace.  It is unclear if
>> > encrypting different parts of the system with different keys is useful
>> > or feasible.  (This is separate from key rotation.)
>> 
>> I still object strongly to using a single key for the entire database. I
>> think we can use a single key for WAL, but we need some way to split the
>> heap so that multiple keys are used. If not by tablespace, then some
>> other method.
> 
> What do you base this on?

I have stated this more than once, and you and Stephen discussed it
earlier as well, but will find all the links back into the thread and
references and address in a separate email.

>> Regardless of the method to split the heap into different keys, I think
>> there should be an option for some tables to not be encrypted. If we
>> decide it must be all or nothing for the first implementation I guess I
>> could live with it but would be very disappointed.
> 
> What does it mean you "could live this it"?  Why do you consider having
> some tables unencrypted important?


I think it is pretty obvious isn't it? You have talked about the
performance impact. Why would I want to encrypt, for example a lookup
table, if there is nothing in that table warranting encryption?

I think in many if not most applications the sensitive data is limited
to much less than all of the tables, and I'd rather not take the hit for
those tables.


>> The keys themselves should be in an file which is encrypted by a master
>> key. Obtaining the master key should be pattern it after the GUC
>> ssl_passphrase_command.
>> 
>> > We will use CBC AES128 mode for tables/indexes, and CTR AES128 for WAL.
>> > 8k pages will use the LSN as a nonce, which will be encrypted to
>> > generate the initialization vector (IV).  We will not encrypt the first
>> > 16 bytes of each pages so the LSN can be used in this way.  The WAL will
>> > use the WAL file segment number as the nonce and the IV will be created
>> > in the same way.
>> 
>> I vote for AES 256 rather than 128.
> 
> Why?  This page seems to think 128 is sufficient:


Addressed in the other email


>> Thinking out loud (and I believe somewhere in this massive thread
>> someone else already said this), if we had a way to flag "key version"
>> at the page level it seems like we could potentially rekey page-by-page
>> while online, locking only one page at a time. We really only need to
>> support 2 key versions and could ping-pong between them as they change.
>> Or maybe this is a crazy idea.
> 
> Yes, we did talk about this.  It is certainly possible, but we would
> still need a tool to guarantee all pages are using the new version, so I
> am not sure what per-page buys us except making the later check faster. 
> I don't see this as a version-1 feature, frankly.

If we allow for say, 2 versions of the key to exist at any given time,
and if we could store that key version information on each page, we
could change the key from old to new without locking the entire table at
once, just locking one page at a time. Or at least that was my thinking.

Joe
-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] Implement uuid_version()

2019-07-13 Thread Fabien COELHO




Hello Peter,


About doc: I'd consider "generation" instead of "generating" as a
secondary index term.


We do use the "-ing" form for other secondary index terms.  It's useful
because the concatenation of primary and secondary term should usually
make a phrase of some sort.  The alternative would be "generation of",
but that doesn't seem clearly better.


Ok, fine. I looked but did not find other instances of "generating".


What about avoiding a redirection with something like:

Datum (* const pg_random_uuid)(PG_FUNCTION_ARGS) = gen_random_uuid;


That seems very confusing.


Dunno. Possibly. The user does not have to look at the implementation, and 
probably such code would deserve a comment.


The point is to avoid one call so as to perform the same (otherwise the 
pg_random_uuid would be slightly slower), and to ensure that it behaves 
the same, as it would be the very same function by construction.


I've switched the patch to ready anyway.

--
Fabien.

Re: Check-out mutable functions in check constraints

2019-07-13 Thread Tom Lane

Tomas Vondra  writes:
> On Fri, Jul 12, 2019 at 07:59:13PM -0400, Tom Lane wrote:
>> I'm pretty sure this change has been proposed before, and rejected before.
>> Has anybody excavated in the archives for prior discussions?

> Yes, I've done some quick searches like "volatile constraint" and so on.
> There are a couple of relevant discussions:
> 2004: 
> https://www.postgresql.org/message-id/flat/0C3A1AEC-6BE4-11D8-9224-000A95C88220%40myrealbox.com
> 2010: 
> https://www.postgresql.org/message-id/flat/12849.1277918175%40sss.pgh.pa.us#736c8ef9d7810c0bb85f495490fd40f5
> But I don't think the conclusions are particularly clear.
> In the first thread you seem to agree with requiring immutable functions
> for check constraints (and triggers for one-time checks). The second
> thread ended up discussing some new related stuff in SQL standard.

Well, I think that second thread is very relevant here, because
it correctly points out that we are *required by spec* to allow
check constraints of the form CHECK(datecol <= CURRENT_DATE) and
related tests.  See the stuff about "retrospectively deterministic"
predicates in SQL:2003 or later.

I suppose you could imagine writing some messy logic that allowed the
specific cases called out by the spec but not any other non-immutable
function calls.  But that just leaves us with an inconsistent
restriction.  If the spec is allowing this because it can be seen
to be safe, why should we not allow other cases that the user has
taken the trouble to prove to themselves are safe?  (If their proof is
wrong, well, it wouldn't be the first bug in anyone's SQL application.)

regards, tom lane

Re: [PATCH] Implement uuid_version()

2019-07-13 Thread Fabien COELHO



Hello Jose,

Considering the later arguments on-list, I plan on submitting a more 
elaborate patchset integrating the feedback received so far, and along the 
following lines:


- uuid type, v4 generation and uuid_version() in core

- Provide a means to rename/supercede extensions keeping backwards 
compatibility (i.e. uuid-ossp -> uuid, keep old code working)


- Miscellaneous other functionality

- Drop "dead" code

  ...but I've tried to keep quiet so as not to disturb too much around 
release time.


I intend to continue working on this in late July, aiming for the following 
commitfest (once more "urgent" patches will have landed)


Ok.

I've changed the patch status for this CF to "moved do next CF", and to 
"Waiting on author" there.


The idea is to go on in the same thread when you are ready.

--
Fabien.

Re: refactoring - share str2*int64 functions

2019-07-13 Thread Fabien COELHO



Hello Thomas,


pg_stat_statements.c:1024:11: error: implicit declaration of function
'pg_strtouint64' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
   rows = pg_strtouint64(completionTag + 5, NULL, 10);
  ^
Apparently it needs to include common/string.h.


Yep, I gathered that as well, but did not act promptly on your help.
Thanks for it!

Here is the updated patch on which I checked "make check-world".

--
Fabien.diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 00cae04eea..8f40245ac4 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -79,6 +79,7 @@
 #include "utils/builtins.h"
 #include "utils/hashutils.h"
 #include "utils/memutils.h"
+#include "common/string.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 8eedb613a1..e8744f054c 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -21,6 +21,7 @@
 #include "catalog/heap.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "common/string.h"
 #include "executor/executor.h"
 #include "executor/spi_priv.h"
 #include "miscadmin.h"
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 6c2626ee62..3735268e3a 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -34,6 +34,7 @@
 
 #include "fmgr.h"
 #include "miscadmin.h"
+#include "common/string.h"
 #include "nodes/extensible.h"
 #include "nodes/parsenodes.h"
 #include "nodes/plannodes.h"
diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index 1baf7ef31f..53b89ece39 100644
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -25,10 +25,10 @@
 #include "parser/parse_expr.h"
 #include "parser/parse_relation.h"
 #include "utils/builtins.h"
-#include "utils/int8.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 #include "utils/varbit.h"
+#include "common/string.h"
 
 
 static void pcb_error_callback(void *arg);
@@ -496,7 +496,7 @@ make_const(ParseState *pstate, Value *value, int location)
 
 		case T_Float:
 			/* could be an oversize integer as well as a float ... */
-			if (scanint8(strVal(value), true, &val64))
+			if (pg_strtoint64(strVal(value), true, &val64))
 			{
 /*
  * It might actually fit in int32. Probably only INT_MIN can
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index d317fd7006..70681588a1 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -12,6 +12,7 @@
  */
 #include "postgres.h"
 
+#include "common/string.h"
 #include "catalog/pg_publication.h"
 
 #include "replication/logical.h"
@@ -20,7 +21,6 @@
 #include "replication/pgoutput.h"
 
 #include "utils/inval.h"
-#include "utils/int8.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
 #include "utils/varlena.h"
@@ -111,7 +111,7 @@ parse_output_parameters(List *options, uint32 *protocol_version,
 		 errmsg("conflicting or redundant options")));
 			protocol_version_given = true;
 
-			if (!scanint8(strVal(defel->arg), true, &parsed))
+			if (!pg_strtoint64(strVal(defel->arg), true, &parsed))
 ereport(ERROR,
 		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		 errmsg("invalid proto_version")));
diff --git a/src/backend/utils/adt/cash.c b/src/backend/utils/adt/cash.c
index c92e9d5046..618660f372 100644
--- a/src/backend/utils/adt/cash.c
+++ b/src/backend/utils/adt/cash.c
@@ -26,7 +26,6 @@
 #include "libpq/pqformat.h"
 #include "utils/builtins.h"
 #include "utils/cash.h"
-#include "utils/int8.h"
 #include "utils/numeric.h"
 #include "utils/pg_locale.h"
 
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 206576d4bd..1f17987e85 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -92,7 +92,6 @@
 #include "utils/datetime.h"
 #include "utils/float.h"
 #include "utils/formatting.h"
-#include "utils/int8.h"
 #include "utils/memutils.h"
 #include "utils/numeric.h"
 #include "utils/pg_locale.h"
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 0ff9394a2f..98e7b52ce9 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -18,12 +18,12 @@
 #include 
 
 #include "common/int.h"
+#include "common/string.h"
 #include "funcapi.h"
 #include "libpq/pqformat.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
 #include "optimizer/optimizer.h"
-#include "utils/int8.h"
 #include "utils/builtins.h"
 
 
@@ -47,89 +47,6 @@ typedef struct
  * Formatting and conversion routines.
  *-*/
 
-/*
- * scanint8 --- try to parse a string into an int8.
- *
- * If errorOK is false, ereport a useful error message if the string is ba

Re: [sqlsmith] Crash in mcv_get_match_bitmap

2019-07-13 Thread Tom Lane

Tomas Vondra  writes:
> On Thu, Jul 11, 2019 at 05:08:22PM +0200, Tomas Vondra wrote:
>> On Wed, Jul 10, 2019 at 06:48:16PM -0400, Tom Lane wrote:
>>> The right way to determine operator semantics is to look to see
>>> whether they are in a btree opclass.  That's what the rest of the
>>> planner does, and there is no good reason for the mcv code to
>>> do it some other way.

>> Hmmm, but that will mean we're unable to estimate operators that are not
>> part of a btree opclass. Which is a bit annoying, because that would also
>> affect equalities (and thus functional dependencies), in which case the
>> type may easily have just hash opclass or something.

After thinking about this more, I may have been analogizing to the wrong
code.  It's necessary to use opclass properties when we're reasoning about
operators in a way that *must* be correct, for instance to conclude that
a partition can be pruned from a query.  But this code is just doing
selectivity estimation, so the correctness standards are a lot lower.
In particular I see that the long-established range-query-detection
code in clauselist_selectivity is looking for operators that have
F_SCALARLTSEL etc. as restriction estimators (in fact I'm guessing you
lifted parts of the mcv code from that, cause it looks pretty similar).
So if we've gotten away with that so far over there, there's probably
no reason not to do likewise here.

I am a little troubled by the point I made about operators possibly
wanting to have a more-specific estimator than scalarltsel, but that
seems like an issue to solve some other time; and if we do change that
logic then clauselist_selectivity needs to change too.

> Here are WIP patches addressing two of the issues:

> 1) determining operator semantics by matching them to btree opclasses

Per above, I'm sort of inclined to drop this, unless you feel better
about doing it like this than the existing way.

> 2) deconstructing OpExpr into Var/Const nodes

deconstruct_opexpr is still failing to verify that the Var is a Var.
I'd try something like

leftop = linitial(expr->args);
while (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
// and similarly for rightop
if (IsA(leftop, Var) && IsA(rightop, Const))
// return appropriate results
else if (IsA(leftop, Const) && IsA(rightop, Var))
// return appropriate results
else
// fail

Also, I think deconstruct_opexpr is far too generic a name for what
this is actually doing.  It'd be okay as a static function name
perhaps, but not if you're going to expose it globally.

> a) I don't think unary-argument OpExpr are an issue, because this is
> checked when determining which clauses are compatible (and the code only
> allows the case with 2 arguments).

OK.

> b) Const with constisnull=true - I'm not yet sure what to do about this.
> The easiest fix would be to deem those clauses incompatible, but that
> seems a bit too harsh. The right thing is probably passing the NULL to
> the operator proc (but that means we can't use FunctionCall).

No, because most of the functions in question are strict and will just
crash on null inputs.  Perhaps you could just deem that cases involving
a null Const don't match what you're looking for.

> Now, looking at this code, I wonder if using DEFAULT_COLLATION_OID when
> calling the operator is the right thing. We're using type->typcollation
> when building the stats, so maybe we should do the same thing here.

Yeah, I was wondering that too.  But really you should be using the
column's collation not the type's default collation.  See commit
5e0928005.

regards, tom lane

Re: POC: converting Lists into arrays

2019-07-13 Thread Tom Lane

David Rowley  writes:
> Thanks for the speedy turnaround. I've looked at v8, as far as a diff
> between the two patches and I'm happy.
> I've marked as ready for committer.

So ... last chance for objections?

I see from the cfbot that v8 is already broken (new call of lnext
to be fixed).  Don't really want to keep chasing a moving target,
so unless I hear objections I'm going to adjust the additional
spot(s) and commit this pretty soon, like tomorrow or Monday.

regards, tom lane

Re: initdb recommendations

2019-07-13 Thread Julien Rouhaud

On Sat, Jul 13, 2019 at 2:44 PM Peter Eisentraut
 wrote:
>
> On 2019-07-11 21:34, Julien Rouhaud wrote:
> >> Note that with this change, running initdb without arguments will now
> >> error on those platforms: You need to supply either a password or select
> >> a different default authentication method.
> > Should we make this explicitly stated in the documentation?  As a
> > reference, it's saying:
> >
> > The default client authentication setup is such that users can connect
> > over the Unix-domain socket to the same database user name as their
> > operating system user names (on operating systems that support this,
> > which are most modern Unix-like systems, but not Windows) and
> > otherwise with a password. To assign a password to the initial
> > database superuser, use one of initdb's -W, --pwprompt or -- pwfile
> > options.
>
> Do you have a suggestion for where to put this and exactly how to phrase
> this?
>
> I think the initdb reference page would be more appropriate than
> runtime.sgml.

Yes initdb.sgml seems more suitable.  I was thinking something very
similar to your note, maybe like (also attached if my MUA ruins it):

diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index c47b9139eb..764cf737c7 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -143,6 +143,15 @@ PostgreSQL documentation
 connections.


+   
+
+ Running initdb without arguments on platforms lacking
+ peer or Unix-domain socket connections will exit
+ with an error.  On such environments, you need to either provide a
+ password or choose a different authentication method.
+
+   
+

 Do not use
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index c47b9139eb..764cf737c7 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -143,6 +143,15 @@ PostgreSQL documentation
 connections.

 
+   
+
+ Running initdb without arguments on platforms lacking
+ peer or Unix-domain socket connections will exit
+ with an error.  On such environments, you need to either provide a
+ password or choose a different authentication method.
+
+   
+

 Do not use trust unless you trust all local users on your
 system.

Re: Add CREATE DATABASE LOCALE option

2019-07-13 Thread Fabien COELHO




Hello Peter,


I think pg_dump/t/002_pg_dump.pl might be a good place.  Not the easiest
program in the world to work with, admittedly.


Updated patch with test and expanded documentation.


Patch v2 applies cleanly, compiles, make check-world ok with tap enabled. 
Doc gen ok.


The addition looks reasonable.

The second error message about conflicting option could more explicit than 
a terse "conflicting or redundant options"? The user may expect later 
options to superseedes earlier options, for instance.


About the pg_dump code, I'm wondering whether it is worth generating 
LOCALE as it breaks backward compatibility (eg dumping a new db to load it 
into a older version).


If it is to be generated, I'd do merge the two conditions instead of 
nesting.


  if (strlen(collate) > 0 && strcmp(collate, ctype) == 0)
// generate LOCALE

--
Fabien.

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-13 Thread Joe Conway

On 7/13/19 9:38 AM, Joe Conway wrote:
> On 7/11/19 9:05 PM, Bruce Momjian wrote:
>> On Thu, Jul 11, 2019 at 08:41:52PM -0400, Joe Conway wrote:
>>> On 7/11/19 6:37 PM, Bruce Momjian wrote:
>>> > Our first implementation will encrypt the entire cluster.  We can later
>>> > consider encryption per table or tablespace.  It is unclear if
>>> > encrypting different parts of the system with different keys is useful
>>> > or feasible.  (This is separate from key rotation.)
>>> 
>>> I still object strongly to using a single key for the entire database. I
>>> think we can use a single key for WAL, but we need some way to split the
>>> heap so that multiple keys are used. If not by tablespace, then some
>>> other method.
>> 
>> What do you base this on?

Ok, so here we go. See links below. I skimmed through the entire thread
and FWIW it was exhausting.

To some extent this degenerated into a general search for relevant
information:

---
[1] and [2] show that at least some file system encryption uses a
different key per file.
---
[2] also shows that file system encryption uses a KDF (key derivation
function) which we may want to use ourselves. The analogy would be
per-table derived key instead of per file derived key. Note that KDF is
a safe way to derive a key and it is not the same as a "related key"
which was mentioned on another email as an attack vector.
---
[2] also says provides additional support for AES 256. It also mentions
CBC versus XTS -- I came across this elsewhere and it bears discussion:

"Currently, the following pairs of encryption modes are supported:

AES-256-XTS for contents and AES-256-CTS-CBC for filenames
AES-128-CBC for contents and AES-128-CTS-CBC for filenames
Adiantum for both contents and filenames

If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair.

AES-128-CBC was added only for low-powered embedded devices with crypto
accelerators such as CAAM or CESA that do not support XTS."
---
[2] also states this, which again makes me think in terms of table being
the moral equivalent to a file:

"Unlike dm-crypt, fscrypt operates at the filesystem level rather than
at the block device level. This allows it to encrypt different files
with different keys and to have unencrypted files on the same
filesystem. This is useful for multi-user systems where each user’s
data-at-rest needs to be cryptographically isolated from the others.
However, except for filenames, fscrypt does not encrypt filesystem
metadata."
---
[3] suggests 68 GB per key and unique IV in GCM mode.
---
[4] specifies 68 GB per key and unique IV in CTR mode -- this applies
directly to our proposal to use CTR for WAL.
---
[5] has this to say which seems independent of mode:

"When encrypting data with a symmetric block cipher, which uses blocks
of n bits, some security concerns begin to appear when the amount of
data encrypted with a single key comes close to 2n/2 blocks, i.e. n*2n/2
bits. With AES, n = 128 (AES-128, AES-192 and AES-256 all use 128-bit
blocks). This means a limit of more than 250 millions of terabytes,
which is sufficiently large not to be a problem. That's precisely why
AES was defined with 128-bit blocks, instead of the more common (at that
time) 64-bit blocks: so that data size is practically unlimited."

But goes on to say:
"I wouldn't use n*2^(n/2) bits in any sort of recommendation. Once you
reach that number of bits the probability of a collision will grow
quickly and you will be way over 50% probability of a collision by the
time you reach 2*n*2^(n/2) bits. In order to keep the probability of a
collision negligible I recommend encrypting no more than n*2^(n/4) bits
with the same key. In the case of AES that works out to 64GB"

It is hard to say if that recommendation is per key or per key+IV.
---
[6] shows that Azure SQL Database uses AES 256 for TDE. It also seems to
imply a single key is used although at one point it says "transparent
data encryption master key, also known as the transparent data
encryption protector". The term "master key" indicates that they likely
use derived keys under the covers.
---
[7] is generally useful read about how many of the things we have been
discussing are done in SQL Server
---
[8] was referenced by Sehrope. In addition to support for AES 256 for
long term use, table 5.1 is interesting. It lists CBC mode as "legacy"
but not "future".
---
[9] IETF RFC for KDF
---
[10] IETF RFC for Key wrapping -- this is probably how we should wrap
the master key with the Key Encryption Key (KEK) -- i.e. the outer key
provided by the user or command on postmaster start
---

Based on all of that I cannot find a requirement that we use more than
one key per database.

But I did find that files in an encrypted file system are encrypted with
derived keys from a master key, and I view this as analogous to what we
are doing.

As an aside to the specific question, I also found more evidence that
AES 256 is appropriate.

Joe

[1]
https://www.po

Re: [PATCH] Improve performance of NOTIFY over many databases (issue blocking on AccessExclusiveLock on object 0 of class 1262 of database 0)

2019-07-13 Thread Tom Lane

Martijn van Oosterhout  writes:
> Please find attached updated versions of the patches, I've now tested
> them. Also attached is a reproduction script to verify that they
> actually work.

I looked through these (a bit cursorily).

I'm generally on board with the idea of 0001, but not with the patch
details.  As coded, asyncQueueAdvanceTail is supposing that it can
examine the shared QUEUE_HEAD and QUEUE_TAIL pointers without any
lock whatsoever.  That's probably unsafe, and if it is safe for some
reason, you haven't made the argument why.  Moreover, it seems
unnecessary to make any such assumption.  Why not put back the
advanceTail tests you removed, but adjust them so that advanceTail
isn't set true unless QUEUE_HEAD and QUEUE_TAIL point to different
pages?  (Note that in the existing coding, those tests are made
while holding an appropriate lock, so it's safe to look at those
pointers there.)

It might be a good idea to make a macro encapsulating this new,
more complicated rule for setting advanceTail, instead of relying
on keeping the various call sites in sync.

More attention to comments is also needed.  For instance, you've
made a lie out of the documentation of the tail pointer:

QueuePosition tail; /* the global tail is equivalent to the pos of
 * the "slowest" backend */

It needs to say something like "is <= the pos of the slowest backend",
instead.  I think the explanation of why this algorithm is good could
use more effort, too.

Comments for 0002 are about the same: for no explained reason, and
certainly no savings, you've put the notify_all test in an unsafe
place rather than a safe one (viz, two lines down, *after* taking
the relevant lock).  And 0002 needs more commentary about why
its optimization is safe and useful, too.  In particular it's
not obvious why QUEUE_HEAD being on a different page from QUEUE_TAIL
has anything to do with whether we should wake up other backends.

I'm not very persuaded by 0003, mainly because it seems likely to
me that 0001 and 0002 will greatly reduce the possibility that
the early-exit can happen.  So it seems like it's adding cycles
(in a spot where we hold exclusive lock) without a good chance of
saving any cycles.

Taking a step back in hopes of seeing the bigger picture ...
as you already noted, these changes don't really fix the "thundering
herd of wakeups" problem, they just arrange for it to happen
once per SLRU page rather than once per message.  I wonder if we
could improve matters by stealing an idea from the sinval code:
when we're trying to cause advance of the global QUEUE_TAIL, waken
only the slowest backend, and have it waken the next-slowest after
it's done.  In sinval there are some additional provisions to prevent
a nonresponsive backend from delaying matters for other backends,
but I think maybe we don't need that here.  async.c doesn't have
anything equivalent to sinval reset, so there's no chance of
overruling a slow backend's failure to advance its pos pointer,
so there's not much reason not to just wait till it does do so.

A related idea is to awaken only one backend at a time when we
send a new message (i.e., advance QUEUE_HEAD) but I think that
would likely be bad.  The hazard with the chained-wakeups method
is that a slow backend blocks wakeup of everything else.  We don't
care about that hugely for QUEUE_TAIL advance, because we're just
hoping to free some SLRU space.  But for QUEUE_HEAD advance it'd
mean failing to deliver notifies in a timely way, which we do care
about.  (Also, if I remember correctly, the processing on that side
only requires shared lock so it's less of a problem if many backends
do it at once.)

regards, tom lane

Re: [sqlsmith] Crash in mcv_get_match_bitmap

2019-07-13 Thread Tomas Vondra


On Sat, Jul 13, 2019 at 11:39:55AM -0400, Tom Lane wrote:

Tomas Vondra  writes:

On Thu, Jul 11, 2019 at 05:08:22PM +0200, Tomas Vondra wrote:

On Wed, Jul 10, 2019 at 06:48:16PM -0400, Tom Lane wrote:

The right way to determine operator semantics is to look to see
whether they are in a btree opclass.  That's what the rest of the
planner does, and there is no good reason for the mcv code to
do it some other way.



Hmmm, but that will mean we're unable to estimate operators that are not
part of a btree opclass. Which is a bit annoying, because that would also
affect equalities (and thus functional dependencies), in which case the
type may easily have just hash opclass or something.


After thinking about this more, I may have been analogizing to the wrong
code.  It's necessary to use opclass properties when we're reasoning about
operators in a way that *must* be correct, for instance to conclude that
a partition can be pruned from a query.  But this code is just doing
selectivity estimation, so the correctness standards are a lot lower.
In particular I see that the long-established range-query-detection
code in clauselist_selectivity is looking for operators that have
F_SCALARLTSEL etc. as restriction estimators (in fact I'm guessing you
lifted parts of the mcv code from that, cause it looks pretty similar).
So if we've gotten away with that so far over there, there's probably
no reason not to do likewise here.

I am a little troubled by the point I made about operators possibly
wanting to have a more-specific estimator than scalarltsel, but that
seems like an issue to solve some other time; and if we do change that
logic then clauselist_selectivity needs to change too.


Here are WIP patches addressing two of the issues:



1) determining operator semantics by matching them to btree opclasses


Per above, I'm sort of inclined to drop this, unless you feel better
about doing it like this than the existing way.



OK. TBH I don't have a very strong opinion on this - I always disliked
how we rely on the estimator OIDs in this code, and relying on btree
opclasses seems somewhat more principled. But I'm not sure I understand
all the implications of such change (and I have some concerns about it
too, per my last message), so I'd revisit that in PG13.


2) deconstructing OpExpr into Var/Const nodes


deconstruct_opexpr is still failing to verify that the Var is a Var.
I'd try something like

leftop = linitial(expr->args);
while (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
// and similarly for rightop
if (IsA(leftop, Var) && IsA(rightop, Const))
// return appropriate results
else if (IsA(leftop, Const) && IsA(rightop, Var))
// return appropriate results
else
// fail



Ah, right. The RelabelType might be on top of something that's not Var.


Also, I think deconstruct_opexpr is far too generic a name for what
this is actually doing.  It'd be okay as a static function name
perhaps, but not if you're going to expose it globally.



I agree. I can't quite make it static, because it's used from multiple
places, but I'll move it to extended_stats_internal.h (and I'll see if I
can think of a better name too).


a) I don't think unary-argument OpExpr are an issue, because this is
checked when determining which clauses are compatible (and the code only
allows the case with 2 arguments).


OK.


b) Const with constisnull=true - I'm not yet sure what to do about this.
The easiest fix would be to deem those clauses incompatible, but that
seems a bit too harsh. The right thing is probably passing the NULL to
the operator proc (but that means we can't use FunctionCall).


No, because most of the functions in question are strict and will just
crash on null inputs.  Perhaps you could just deem that cases involving
a null Const don't match what you're looking for.



Makes sense, I'll do that.


Now, looking at this code, I wonder if using DEFAULT_COLLATION_OID when
calling the operator is the right thing. We're using type->typcollation
when building the stats, so maybe we should do the same thing here.


Yeah, I was wondering that too.  But really you should be using the
column's collation not the type's default collation.  See commit
5e0928005.



OK, thanks.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-13 Thread Tomas Vondra

On Sat, Jul 13, 2019 at 02:41:34PM -0400, Joe Conway wrote:

On 7/13/19 9:38 AM, Joe Conway wrote:

On 7/11/19 9:05 PM, Bruce Momjian wrote:

On Thu, Jul 11, 2019 at 08:41:52PM -0400, Joe Conway wrote:

On 7/11/19 6:37 PM, Bruce Momjian wrote:
> Our first implementation will encrypt the entire cluster.  We can later
> consider encryption per table or tablespace.  It is unclear if
> encrypting different parts of the system with different keys is useful
> or feasible.  (This is separate from key rotation.)

I still object strongly to using a single key for the entire database. I
think we can use a single key for WAL, but we need some way to split the
heap so that multiple keys are used. If not by tablespace, then some
other method.

What do you base this on?

Ok, so here we go. See links below. I skimmed through the entire thread
and FWIW it was exhausting.

To some extent this degenerated into a general search for relevant
information:

---
[1] and [2] show that at least some file system encryption uses a
different key per file.
---
[2] also shows that file system encryption uses a KDF (key derivation
function) which we may want to use ourselves. The analogy would be
per-table derived key instead of per file derived key. Note that KDF is
a safe way to derive a key and it is not the same as a "related key"
which was mentioned on another email as an attack vector.
---
[2] also says provides additional support for AES 256. It also mentions
CBC versus XTS -- I came across this elsewhere and it bears discussion:

"Currently, the following pairs of encryption modes are supported:

   AES-256-XTS for contents and AES-256-CTS-CBC for filenames
   AES-128-CBC for contents and AES-128-CTS-CBC for filenames
   Adiantum for both contents and filenames

If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair.

AES-128-CBC was added only for low-powered embedded devices with crypto
accelerators such as CAAM or CESA that do not support XTS."
---
[2] also states this, which again makes me think in terms of table being
the moral equivalent to a file:

"Unlike dm-crypt, fscrypt operates at the filesystem level rather than
at the block device level. This allows it to encrypt different files
with different keys and to have unencrypted files on the same
filesystem. This is useful for multi-user systems where each user’s
data-at-rest needs to be cryptographically isolated from the others.
However, except for filenames, fscrypt does not encrypt filesystem
metadata."
---
[3] suggests 68 GB per key and unique IV in GCM mode.
---
[4] specifies 68 GB per key and unique IV in CTR mode -- this applies
directly to our proposal to use CTR for WAL.
---
[5] has this to say which seems independent of mode:

"When encrypting data with a symmetric block cipher, which uses blocks
of n bits, some security concerns begin to appear when the amount of
data encrypted with a single key comes close to 2n/2 blocks, i.e. n*2n/2
bits. With AES, n = 128 (AES-128, AES-192 and AES-256 all use 128-bit
blocks). This means a limit of more than 250 millions of terabytes,
which is sufficiently large not to be a problem. That's precisely why
AES was defined with 128-bit blocks, instead of the more common (at that
time) 64-bit blocks: so that data size is practically unlimited."

FWIW I was a bit confused at first, because the copy paste mangled the
formulas a bit - it should have been 2^(n/2) and n*2^(n/2).

But goes on to say:
"I wouldn't use n*2^(n/2) bits in any sort of recommendation. Once you
reach that number of bits the probability of a collision will grow
quickly and you will be way over 50% probability of a collision by the
time you reach 2*n*2^(n/2) bits. In order to keep the probability of a
collision negligible I recommend encrypting no more than n*2^(n/4) bits
with the same key. In the case of AES that works out to 64GB"

It is hard to say if that recommendation is per key or per key+IV.

Hmm, yeah. The question is what collisions they have in mind? Presumably
it's AES(block1,key) = AES(block2,key) in which case it'd be with fixed
IV, so per key+IV.

---
[6] shows that Azure SQL Database uses AES 256 for TDE. It also seems to
imply a single key is used although at one point it says "transparent
data encryption master key, also known as the transparent data
encryption protector". The term "master key" indicates that they likely
use derived keys under the covers.
---
[7] is generally useful read about how many of the things we have been
discussing are done in SQL Server
---
[8] was referenced by Sehrope. In addition to support for AES 256 for
long term use, table 5.1 is interesting. It lists CBC mode as "legacy"
but not "future".
---
[9] IETF RFC for KDF
---
[10] IETF RFC for Key wrapping -- this is probably how we should wrap
the master key with the Key Encryption Key (KEK) -- i.e. the outer key
provided by the user or command on postmaster start
---

Based on all of that I cannot find a requirement that we use more than
one key per

Re: Patch to document base64 encoding

2019-07-13 Thread Karl O. Pinc

Hi Fabien,

Attached is doc_base64_v10.patch

On Fri, 12 Jul 2019 15:58:21 +0200 (CEST)
Fabien COELHO  wrote:

> >> I suggested "Function <>decode ...", which is the kind of thing
> >> we do in academic writing to improve precision, because I thought
> >> it could be better:-)  
> >
> > "Function <>decode ..." just does not work in English.  
> 
> It really works in research papers: "Theorem X can be proven by
> applying Proposition Y. See Figure 2 for details. Algorithm Z
> describes whatever, which is listed in Table W..."

I've not thought about it before but I suppose the difference
is between declarative and descriptive, the latter being
more inviting and better allows for flow between sentences.
Otherwise you're writing in bullet points.  So it is a
question of balance between specification and narration.
In regular prose you're always going to see the "the"
unless the sentence starts with the name.  The trouble
is that we can't start sentences with function names
because of capitalization confusion.

> > I hope that 3 patches will make review easier.  
> 
> Not really. I'm reviewing the 3 patches put together rather than each
> one individually, which would require more work.

I figured with e.g. a separate patch for moving and alphabetizing
that it'd be easier to check that nothing got lost.  Anyhow,
Just one patch this time.

> convert: I'd merge the 2 first sentences to state that if convert
> from X to Y. The doc does not say explicitely what happens if a
> character cannot be converted. After testing, an error is raised. The
> example comment could add ", if possible".

Done.  Good idea.  I reworked the whole paragraph to shorten and
clarify since I was altering it anyway.  This does introduce
some inconsistency with wording that appears elsewhere but it seems
worth it because the listentry box was getting overfull.

> to_hex: add "." at the end of the sentence?

I left as-is, without a ".".  The only consistent rule about
periods in the listentrys seems to be that if there's more than
one sentence then there's periods -- I think.  In any case a
lot of them don't have periods and probably don't need
periods.  I don't know what to do and since the original did
not have a period it seems better to leave well enough alone.

> Minor comment: you usually put two spaces between a "." and the first 
> world of then next sentence, but not always.

I now always put 2 spaces after the end of a sentence.  But
I've only done this where I've changed text, not when
moving pre-existing text around.  Again, there seems
to be no consistency in the original.  (I believe docbook
redoes all inter-sentence spacing anyway.)

Thanks for the help.

I'll be sure to sign up to review a patch (or patches) when life
permits.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a25c122ac8..131a63b36c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1795,48 +1795,6 @@

- convert
-
-convert(string bytea,
-src_encoding name,
-dest_encoding name)
-   
-   bytea
-   
-Convert string to dest_encoding.  The
-original encoding is specified by
-src_encoding. The
-string must be valid in this encoding.
-Conversions can be defined by CREATE CONVERSION.
-Also there are some predefined conversions. See  for available conversions.
-   
-   convert('text_in_utf8', 'UTF8', 'LATIN1')
-   text_in_utf8 represented in Latin-1
-   encoding (ISO 8859-1)
-  
-
-  
-   
-
- convert_from
-
-convert_from(string bytea,
-src_encoding name)
-   
-   text
-   
-Convert string to the database encoding.  The original encoding
-is specified by src_encoding. The
-string must be valid in this encoding.
-   
-   convert_from('text_in_utf8', 'UTF8')
-   text_in_utf8 represented in the current database encoding
-  
-
-  
-   
-
  convert_to

 convert_to(string text,
@@ -1844,7 +1802,8 @@

bytea

-Convert string to dest_encoding.
+Convert string to dest_encoding.  See  for available conversions.

convert_to('some text', 'UTF8')
some text represented in the UTF8 encoding
@@ -1855,39 +1814,24 @@

  decode

+
+  base64 encoding
+
 decode(string text,
 format text)

bytea

 Decode binary data from textual representation in string.
-Options for format are same as in encode.
+Options
+for format are same as
+in encode.

decode('MTIzAAE=', 'base64')
\x3132330001

-   
-
- encode
-
-

Re: SHOW CREATE

2019-07-13 Thread Michael Glaesemann

> On 2019–07–05, at 12:14, Corey Huinker  wrote:
> 
> In doing that work, it became clear that the command was serving two masters:
> 1. A desire to see the underlying nuts and bolts of a given database object.
> 2. A desire to essentially make the schema portion of pg_dump a server side 
> command.
> 
> To that end, I see splitting this into two commands, SHOW CREATE and SHOW 
> DUMP.

I like the idea of having these features available via SQL as opposed to 
separate tools. Is it necessary to have specific commands for them? It seems 
they would potentially more useful as functions, where they'd be available for 
all of the programmatic features of the rest of SQL.

Michael Glaesemann
grzm seespotcode net

Re: Avoiding hash join batch explosions with extreme skew and weird stats

2019-07-13 Thread Tomas Vondra


On Wed, Jul 03, 2019 at 02:22:09PM -0700, Melanie Plageman wrote:

On Tue, Jun 18, 2019 at 3:24 PM Melanie Plageman 
wrote:



These questions will probably make a lot more sense with corresponding
code, so I will follow up with the second version of the state machine
patch once I finish it.



I have changed the state machine and resolved the questions I had
raised in the previous email. This seems to work for the parallel and
non-parallel cases. I have not yet rewritten the unmatched outer tuple
status as a bitmap in a spill file (for ease of debugging).

Before doing that, I wanted to ask what a desirable fallback condition
would be. In this patch, fallback to hashloop join happens only when
inserting tuples into the hashtable after batch 0 when inserting
another tuple from the batch file would exceed work_mem. This means
you can't increase nbatches, which, I would think is undesirable.



Yes, I think that's undesirable.


I thought a bit about when fallback should happen. So, let's say that
we would like to fallback to hashloop join when we have increased
nbatches X times. At that point, since we do not want to fall back to
hashloop join for all batches, we have to make a decision. After
increasing nbatches the Xth time, do we then fall back for all batches
for which inserting inner tuples exceeds work_mem? Do we use this
strategy but work_mem + some fudge factor?

Or, do we instead try to determine if data skew led us to increase
nbatches both times and then determine which batch, given new
nbatches, contains that data, set fallback to true only for that
batch, and let all other batches use the existing logic (with no
fallback option) unless they contain a value which leads to increasing
nbatches X number of times?



I think we should try to detect the skew and use this hashloop logic
only for the one batch. That's based on the assumption that the hashloop
is less efficient than the regular hashjoin.

We may need to apply it even for some non-skewed (but misestimated)
cases, though. At some point we'd need more than work_mem for BufFiles,
at which point we ought to use this hashloop.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: SHOW CREATE

2019-07-13 Thread David Fetter

On Sat, Jul 13, 2019 at 06:32:41PM -0500, Michael Glaesemann wrote:
> 
> 
> > On 2019–07–05, at 12:14, Corey Huinker  wrote:
> > 
> > In doing that work, it became clear that the command was serving two 
> > masters:
> > 1. A desire to see the underlying nuts and bolts of a given database object.
> > 2. A desire to essentially make the schema portion of pg_dump a server side 
> > command.
> > 
> > To that end, I see splitting this into two commands, SHOW CREATE
> > and SHOW DUMP.
> 
> I like the idea of having these features available via SQL as
> opposed to separate tools. Is it necessary to have specific commands
> for them? It seems they would potentially more useful as functions,
> where they'd be available for all of the programmatic features of
> the rest of SQL.

Having commands for them would help meet people's expectations coming
from other RDBMSs.

On the other hand, making functions could just be done in SQL, which
might hurry the process along.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Conflict handling for COPY FROM

2019-07-13 Thread Thomas Munro

On Fri, Jul 12, 2019 at 1:42 AM Surafel Temesgen  wrote:
> Here are the patch that contain all the comment given except adding a way to 
> specify
> to ignoring all error because specifying a highest number can do the work and 
> may be
> try to store such badly structure data is a bad idea

Hi Surafel,

FYI GCC warns:

copy.c: In function ‘CopyFrom’:
copy.c:3383:8: error: ‘dest’ may be used uninitialized in this
function [-Werror=maybe-uninitialized]
(void) dest->receiveSlot(myslot, dest);
^
copy.c:2702:16: note: ‘dest’ was declared here
  DestReceiver *dest;
^

-- 
Thomas Munro
https://enterprisedb.com

Re: pgbench - implement strict TPC-B benchmark

2019-07-13 Thread Thomas Munro

On Tue, Apr 9, 2019 at 3:58 AM Fabien COELHO  wrote:
> The attached patch does $SUBJECT, as a showcase for recently added
> features, including advanced expressions (CASE...), \if, \gset, ending SQL
> commands at ";"...

Hi Fabien,

+   the account branch has a 15% probability to be in the same branch
as the teller (unless

I would say "... has a 15% probability of being in the same ...".  The
same wording appears further down in the comment.

I see that the parameters you propose match the TPCB 2.0
description[1], and the account balance was indeed supposed to be
returned to the driver.  I wonder if "strict" is the right name here
though.  "tpcb-like-2" at least leaves room for someone to propose yet
another variant, and still includes the "-like" disclaimer, which I
interpret as a way of making it clear that this benchmark and results
produced by it have no official TPC audited status.

> There is also a small fix to the doc which describes the tpcb-like
> implementation but gets one variable name wrong: balance -> delta.

Agreed.  I committed that part.  Thanks!

[1] http://www.tpc.org/tpcb/spec/tpcb_current.pdf

-- 
Thomas Munro
https://enterprisedb.com

Re: Bad canonicalization for dateranges with 'infinity' bounds

2019-07-13 Thread Thomas Munro

On Sun, Jul 14, 2019 at 12:44 AM Thomas Munro  wrote:
> Even though !(X || Y) is equivalent to !X && !Y, by my reading of
> range_in(), lower.value can be uninitialised when lower.infinite is
> true, and it's also a bit hard to read IMHO, so I'd probably write
> that as !upper.infinite && !DATE_NOT_FINITE(upper.val) &&
> upper.inclusive.  I don't think it can affect the result but it might
> upset Valgrind or similar.

I take back the bit about reading an uninitialised value (X || Y
doesn't access Y if X is true... duh), but I still think the other way
of putting it is a bit easier to read.  YMMV.

Generally, +1 for this patch.  I'll wait a couple of days for more
feedback to appear.

-- 
Thomas Munro
https://enterprisedb.com

Re: Built-in connection pooler

2019-07-13 Thread Thomas Munro

On Tue, Jul 9, 2019 at 8:30 AM Konstantin Knizhnik
 wrote:
 Rebased version of the patch is attached.

Thanks for including nice documentation in the patch, which gives a
good overview of what's going on.  I haven't read any code yet, but I
took it for a quick drive to understand the user experience.  These
are just some first impressions.

I started my server with -c connection_proxies=1 and tried to connect
to port 6543 and the proxy segfaulted on null ptr accessing
port->gss->enc.  I rebuilt without --with-gssapi to get past that.

Using SELECT pg_backend_pid() from many different connections, I could
see that they were often being served by the same process (although
sometimes it created an extra one when there didn't seem to be a good
reason for it to do that).  I could see the proxy managing these
connections with SELECT * FROM pg_pooler_state() (I suppose this would
be wrapped in a view with a name like pg_stat_proxies).  I could see
that once I did something like SET foo.bar = 42, a backend became
dedicated to my connection and no other connection could use it.  As
described.  Neat.

Obviously your concept of tainted backends (= backends that can't be
reused by other connections because they contain non-default session
state) is quite simplistic and would help only the very simplest use
cases.  Obviously the problems that need to be solved first to do
better than that are quite large.  Personally I think we should move
all GUCs into the Session struct, put the Session struct into shared
memory, and then figure out how to put things like prepared plans into
something like Ideriha-san's experimental shared memory context so
that they also can be accessed by any process, and then we'll mostly
be tackling problems that we'll have to tackle for threads too.  But I
think you made the right choice to experiment with just reusing the
backends that have no state like that.

On my FreeBSD box (which doesn't have epoll(), so it's latch.c's old
school poll() for now), I see the connection proxy process eating a
lot of CPU and the temperature rising.  I see with truss that it's
doing this as fast as it can:

poll({ 13/POLLIN 17/POLLIN|POLLOUT },2,1000) = 1 (0x1)

Ouch.  I admit that I had the idea to test on FreeBSD because I
noticed the patch introduces EPOLLET and I figured this might have
been tested only on Linux.  FWIW the same happens on a Mac.

That's all I had time for today, but I'm planning to poke this some
more, and get a better understand of how this works at an OS level.  I
can see fd passing, IO multiplexing, and other interesting things
happening.  I suspect there are many people on this list who have
thoughts about the architecture we should use to allow a smaller
number of PGPROCs and a larger number of connections, with various
different motivations.

> Thank you, I will look at Takeshi Ideriha's patch.

Cool.

> > Could you please fix these compiler warnings so we can see this
> > running check-world on CI?
> >
> > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.46324
> > https://travis-ci.org/postgresql-cfbot/postgresql/builds/555180678
> >
> Sorry, I do not have access to Windows host, so can not check Win32
> build myself.

  C:\projects\postgresql\src\include\../interfaces/libpq/libpq-int.h(33):
fatal error C1083: Cannot open include file: 'pthread-win32.h': No
such file or directory (src/backend/postmaster/proxy.c)
[C:\projects\postgresql\postgres.vcxproj]

These relative includes in proxy.c are part of the problem:

#include "../interfaces/libpq/libpq-fe.h"
#include "../interfaces/libpq/libpq-int.h"

I didn't dig into this much but some first reactions:

1.  I see that proxy.c uses libpq, and correctly loads it as a dynamic
library just like postgres_fdw.  Unfortunately it's part of core, so
it can't use the same technique as postgres_fdw to add the libpq
headers to the include path.

2.  libpq-int.h isn't supposed to be included by code outside libpq,
and in this case it fails to find pthead-win32.h which is apparently
expects to find in either the same directory or the include path.  I
didn't look into what exactly is going on (I don't have Windows
either) but I think we can say the root problem is that you shouldn't
be including that from backend code.

-- 
Thomas Munro
https://enterprisedb.com

Fix typos and inconsistencies for HEAD (take 6)

2019-07-13 Thread Alexander Lakhin

Hello hackers,

Please consider fixing the next batch of typos and inconsistencies in
the tree:
6.1. FADVISE_WILLNEED -> POSIX_FADV_WILLNEED
6.2. failOK -> missing_ok
6.3. failOnerror -> failOnSignal
6.4. fakebits -> remove (irrelevant since introduction in 945543d9)
6.5. FastPathGetLockEntry -> FastPathGetRelationLockEntry
6.6. FAST_PATH_HASH_BUCKETS -> FAST_PATH_STRONG_LOCK_HASH_PARTITIONS
6.7. FastPathTransferLocks -> FastPathTransferRelationLocks
6.8. GenericOptionFlags -> remove (unused since 090173a3)
6.9. fetch_data -> fetched_data
6.10. fildes -> fd
6.11. filedescriptors -> file descriptors
6.12. fillatt -> remove (orphaned since 8609d4ab)
6.13. finalfunction -> finalfn
6.14. flail -> fail
6.15. FlushBuffers -> FlushBuffer & rephrase a comment (incorrectly
updated in 6f5c38dc)
6.16. flush_context -> wb_context
6.17. followon -> follow-on
6.18. force_quotes -> remove (orphaned since e18d900d)
6.19. formatstring -> format-string
6.20. formarray, formfloat -> remove (orphaned since a237dd2b)
6.21. found_row_type -> found_whole_row
6.22. freeScanStack -> remove a comment (irrelevant since 2a636834)
6.23. free_segment_counter -> freed_segment_counter
6.24. FreeSpace Map -> FreeSpaceMap
6.25. user friendly-operand -> user-friendly operand
6.26. frozenids -> frozenxids
6.27. fsm_internal.h -> fsm_internals.h
6.28. fsm_size_to_avail_cat -> fsm_space_avail_to_cat
6.29. full_result -> full_results
6.30. FULL_SIZ -> remove (orphaned since 65b731bd)
6.31. funxtions -> functions
6.32. generate_nonunion_plan, generate_union_plan ->
generate_nonunion_paths, generate_union_paths
6.33. getaddinfo -> getaddrinfo
6.34. get_expr, get_indexdef, get_ruledef, get_viewdef, get_triggerdef,
get_userbyid -> pg_get_*
6.35. GetHashPageStatis -> GetHashPageStats
6.36. GetNumShmemAttachedBgworkers -> remove (orphaned since 6bc8ef0b)
6.37. get_one_range_partition_bound_string -> get_range_partbound_string
6.38. getPartitions -> remove a comment (irrelevant since 44c52881)
6.39. GetRecordedFreePage -> GetRecordedFreeSpace
6.40. get_special_varno -> resolve_special_varno
6.41. gig -> GB
6.42. GinITupIsCompressed -> GinItupIsCompressed
6.43. GinPostingListSegmentMaxSize-bytes -> GinPostingListSegmentMaxSize
bytes
6.44. gistfindCorrectParent -> gistFindCorrectParent
6.45. gistinserthere -> gistinserttuple
6.46. GISTstate -> giststate
6.47. GlobalSerializableXmin, SerializableGlobalXmin -> SxactGlobalXmin
6.48. Greenwish -> Greenwich
6.49. groupClauseVars -> groupClauseCommonVars

As a side note, while looking at dt_common.c (fixing 6.47), I've got a
feeling that the datetktbl is largely outdated and thus mostly unuseful
(e.g. USSR doesn't exist for almost 30 years).

Best regards,
Alexander
diff --git a/doc/src/sgml/gist.sgml b/doc/src/sgml/gist.sgml
index 763b8cf7fd..c95776eb75 100644
--- a/doc/src/sgml/gist.sgml
+++ b/doc/src/sgml/gist.sgml
@@ -921,7 +921,7 @@ my_fetch(PG_FUNCTION_ARGS)
  * Convert 'fetched_data' into the a Datum of the original datatype.
  */
 
-/* fill *retval from fetch_data. */
+/* fill *retval from fetched_data. */
 gistentryinit(*retval, PointerGetDatum(converted_datum),
   entry->rel, entry->page, entry->offset, FALSE);
 
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 47db7a8a88..d78f706ff7 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -155,7 +155,7 @@ static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
  * NB: this is a low-level routine and is NOT the preferred entry point
  * for most uses; functions in transam.c are the intended callers.
  *
- * XXX Think about issuing FADVISE_WILLNEED on pages that we will need,
+ * XXX Think about issuing POSIX_FADV_WILLNEED on pages that we will need,
  * but aren't yet in cache, as well as hinting pages not to fall out of
  * cache yet.
  */
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 05c6ca81b9..0aa12e8dde 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -2399,7 +2399,7 @@ TSParserIsVisible(Oid prsId)
 /*
  * get_ts_dict_oid - find a TS dictionary by possibly qualified name
  *
- * If not found, returns InvalidOid if failOK, else throws error
+ * If not found, returns InvalidOid if missing_ok, else throws error
  */
 Oid
 get_ts_dict_oid(List *names, bool missing_ok)
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 6745a2432e..e2bed96b9b 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -216,9 +216,9 @@ static PROCLOCK *FastPathGetRelationLockEntry(LOCALLOCK *locallock);
 
 /*
  * To make the fast-path lock mechanism work, we must have some way of
- * preventing the use of the fast-path when a conflicting lock might be
- * present.  We partition* the locktag space into FAST_PATH_HASH_BUCKETS
- * partitions, and maintain an integer count of the number of "strong" lockers
+

Re: Comment fix of config_default.pl

2019-07-13 Thread Michael Paquier

On Fri, Jul 12, 2019 at 05:01:41PM +0900, Michael Paquier wrote:
> I would also patch GetFakeConfigure in Solution.pm (no need to send a
> new patch), and I thought that you'd actually do the change.  What do
> you think?

OK, applied as I have been able to look at it again, and after fixing
the portion for GetFakeConfigure.  Thanks! 
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Implement uuid_version()

2019-07-13 Thread Jose Luis Tallon


On 13/7/19 8:31, Fabien COELHO wrote:


Hello Jose,


Hello, Fabien

Thanks for taking a look



Got it, and done. Please find attached a v2 patch with the upgrade 
script included.


Patch v2 applies cleanly. Compiles cleanly (once running configure 
--with-uuid=...). Local make check ok. Doc build ok.


There are no tests, I'd suggest to add some under sql & change 
expected if possible which would return all possible values, including 
with calling pg_random_uuid() which should return 4.


Documentation describes uuid_version(), should it not describe 
uuid_version(namespace uuid)?


I'd suggest to add an example.

The extension update script seems ok, but ISTM that 
"uuid-ossp-1.1.sql" should be replaced with an updated 
"uuid-ossp-1.2.sql".


This was a quite naïf approach to the issue on my part, more a "scratch 
my own itch" than anything else but definitively sparked some 
interest. Thanks to all involved.


Considering the later arguments on-list, I plan on submitting a more 
elaborate patchset integrating the feedback received so far, and along 
the following lines:


- uuid type, v4 generation and uuid_version() in core

- Provide a means to rename/supercede extensions keeping backwards 
compatibility (i.e. uuid-ossp -> uuid, keep old code working)


- Miscellaneous other functionality

- Drop "dead" code

  ...but I've tried to keep quiet so as not to disturb too much around 
release time.



I intend to continue working on this in late July, aiming for the 
following commitfest (once more "urgent" patches will have landed)



Thanks again.

    J.L.

Re: POC: Cleaning up orphaned files using undo logs

2019-07-13 Thread Amit Kapila

On Fri, Jul 12, 2019 at 7:08 PM Robert Haas  wrote:
>
> On Fri, Jul 12, 2019 at 5:40 AM Amit Kapila  wrote:
>
> > Apart from this, the duplicate key (ex. for size queues the size of
> > two requests can be same) handling might need some work.  Basically,
> > either special combine function needs to be written (not sure yet what
> > we should do there) or we always need to ensure that the key is unique
> > like (size + start_urec_ptr).  If the size is the same, then we can
> > decide based on start_urec_ptr.
>
> I think that this problem is somewhat independent of whether we use an
> rbtree or a binaryheap or some other data structure.
>

I think then I am missing something because what I am talking about is
below code in rbt_insert:
rbt_insert()
{
..
cmp = rbt->comparator(data, current, rbt->arg);
if (cmp == 0)
{
/*
* Found node with given key.  Apply combiner.
*/
rbt->combiner(current, data, rbt->arg);
*isNew = false;
return current;
}
..
}

If you see, here it doesn't add the duplicate key in the tree and that
is not the case with binary_heap as far as I can understand.

>  I would be
> inclined to use XID as a tiebreak for the size queue, so that it's
> more likely to match the ordering of the XID queue, but if that's
> inconvenient, then some other arbitrary value like start_urec_ptr
> should be fine.
>

I think it would be better to use start_urec_ptr because XID can be
non-unique in our case.  As I explained in one of the emails above [1]
that we register the requests for logged and unlogged relations
separately, so XID can be non-unique.

> >
> > I think even if we currently go with a binary heap, it will be
> > possible to change it to rbtree later, but I am fine either way.
>
> Well, I don't see much point in revising all of this logic twice. We
> should pick the way we want it to work and make it work that way.
>

Yeah, I agree.  So, I am assuming here that as you have discussed this
idea with Andres offlist, he is on board with changing it as he has
originally suggested using binary_heap.  Andres, do let us know if you
think differently here.  It would be good if anyone else following the
thread can also weigh in.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1LEKyPZD5Dy4j1u2smUUyMzxgC2YLj8E%2BaJpsvG7sVJYA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: refactoring - share str2*int64 functions

2019-07-13 Thread Thomas Munro

On Mon, Jul 8, 2019 at 3:22 PM Thomas Munro  wrote:
> Here's some semi-automated feedback, noted while going through
> failures on cfbot.cputube.org.  You have a stray editor file
> src/backend/parser/parse_node.c.~1~.  Something is failing to compile
> while doing the temp-install in make check-world, which probably
> indicates that some test or contrib module is using the interface you
> changed?

Please disregard the the comment about the ".~1~" file, my mistake.
As for the check-world failure, it's here:

pg_stat_statements.c:1024:11: error: implicit declaration of function
'pg_strtouint64' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
rows = pg_strtouint64(completionTag + 5, NULL, 10);
   ^
Apparently it needs to include common/string.h.

-- 
Thomas Munro
https://enterprisedb.com

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-13 Thread Joe Conway

On 7/12/19 2:45 PM, Bruce Momjian wrote:
> On Fri, Jul 12, 2019 at 12:41:19PM -0600, Ryan Lambert wrote:
>> >> I vote for AES 256 rather than 128.
>> >
>> > Why?  This page seems to think 128 is sufficient:
>> >
>> >         https://crypto.stackexchange.com/questions/20/
>> what-are-the-practical-differences-between-256-bit-192-bit-and-128-bit-aes-enc
>> >
>> >         For practical purposes, 128-bit keys are sufficient to ensure
>> security.
>> >         The larger key sizes exist mostly to satisfy some US military
>> >         regulations which call for the existence of several distinct
>> "security
>> >         levels", regardless of whether breaking the lowest level is already
>> far
>> >         beyond existing technology.
>> 
>> After researching AES key sizes a bit more my vote is (surprisingly?) for
>> AES-128.  My reasoning is about security, I did not consider performance
>> impacts in my decision.
> 
> Thank you for this exhaustive research.

First of all, that is a mischaracterization of the issue. That paper
also states:

"we describe several key derivation attacks of practical complexity on
AES-256 when its number of rounds is reduced to approximately that of
AES-128. The best previously published  attacks  on  such  variants
were  far  from  practical,  requiring  4  related  keys  and  2^120
time  to break a 9-round version of AES-256 [9], and 64 related keys and
2^172time to break a 10-round version of AES-256 ([9], see also [2]). In
this paper we describe an attack on 9-round AES-256 which can findits
complete 256-bit key in 239time by using only the simplest type of
related keys (in which the chosenplaintexts are encrypted under two keys
whose XOR difference can be chosen in many different ways).Our best
attack on 10-round AES-256 requires only two keys and 245time, but it
uses a stronger type ofrelated subkey attack. These attacks can be
extended into a quasi-practical 270attack on 11-round AES,and into a
trivial 226attack on 8-round AES."

Notice 2 key things about this:
1. The attacks described are against a reduced number of rounds. AES256
   is 14 rounds, not 9 or 10.
2, They are "related key" attacks, which can be avoided by not using
   related keys, and we certainly should not be doing that.

Also, please read the links that Sehrope sent earlier if you have not
done so. In particular this one:

https://www.ecrypt.eu.org/csa/documents/PQC-whitepaper.pdf

which says:

"Post-quantum cryptography is an area of cryptography in which systems
are studied under the  security  assumption  that  the  attacker  has  a
 quantum  computer. This  attack  model  is interesting because Shor has
shown a quantum algorithm that breaks RSA, ECC, and finite field
discrete logarithms in polynomial time.  This means that in this model
all commonly used public-key systems are no longer secure.Symmetric
cryptography is also affected but significantly less.  For systems that
do not rely on mathematical structures the main effect is that an
algorithm due to Grover halves the security level, i.e., breaking
AES-128 takes 2^64 quantum operations while current attacks take  2^128
steps.   While  this  is  a  big  change,  it  can  be  managed  quite
easily  by  doubling the key sizes, e.g., by deploying AES-256.  The
operations needed in Grover’s algorithm are inherently  sequential
which  has  led  some  to  doubt  that  even  264quantum  operations
are feasible, but since the remedy of changing to larger key sizes is
very inexpensive it is generally recommended to do so."

In addition, that first paper was written in 2010, yet in 2016 NSA
published one of the other documents referenced by Sehrope:

https://apps.nsa.gov/iaarchive/customcf/openAttachment.cfm?FilePath=/iad/library/ia-guidance/ia-solutions-for-classified/algorithm-guidance/assets/public/upload/CNSA-Suite-and-Quantum-Computing-FAQ.pdf&WpKes=aF6woL7fQp3dJiyWTFKrYn3ZZShmLnzECSjJhf

Which states:

"If you have already implemented Suite B algorithms using the larger
(for TOP SECRET) key sizes, you should continue to use those algorithms
and key sizes through this upcoming transition period. In many products
changing to a larger key size can be done via a configuration
change.Implementations using only the algorithms previously approved for
SECRET and below in Suite B should not be used in NSS. In more precise
terms this means that NSS (National Security Systems) should no longer use
 •ECDH and ECDSA with NIST P-256
 •SHA-256
 •AES-128
 •RSA with 2048-bit keys
 •Diffie-Hellman with 2048-bit keys"

I stand by my position. At a minimum, we need a choice of AES128 and AES256.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

signature.asc
Description: OpenPGP digital signature

Re: [PATCH] Implement uuid_version()

2019-07-13 Thread Peter Eisentraut

On 2019-07-13 08:08, Fabien COELHO wrote:
> About doc: I'd consider "generation" instead of "generating" as a 
> secondary index term.

We do use the "-ing" form for other secondary index terms.  It's useful
because the concatenation of primary and secondary term should usually
make a phrase of some sort.  The alternative would be "generation of",
but that doesn't seem clearly better.

>> (There is also precedent for redirecting the extension function to the
>> internal one by changing the SQL-level function definition using CREATE
>> OR REPLACE FUNCTION ... LANGUAGE INTERNAL.  But that seems more
>> complicated and would require a new extension version.  It could maybe
>> be included if the extension version is changed for other reasons.)
> 
> What about avoiding a redirection with something like:
> 
> Datum (* const pg_random_uuid)(PG_FUNCTION_ARGS) = gen_random_uuid;

That seems very confusing.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: initdb recommendations

2019-07-13 Thread Peter Eisentraut

On 2019-07-11 21:34, Julien Rouhaud wrote:
>> Note that with this change, running initdb without arguments will now
>> error on those platforms: You need to supply either a password or select
>> a different default authentication method.
> Should we make this explicitly stated in the documentation?  As a
> reference, it's saying:
> 
> The default client authentication setup is such that users can connect
> over the Unix-domain socket to the same database user name as their
> operating system user names (on operating systems that support this,
> which are most modern Unix-like systems, but not Windows) and
> otherwise with a password. To assign a password to the initial
> database superuser, use one of initdb's -W, --pwprompt or -- pwfile
> options.

Do you have a suggestion for where to put this and exactly how to phrase
this?

I think the initdb reference page would be more appropriate than
runtime.sgml.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Bad canonicalization for dateranges with 'infinity' bounds

2019-07-13 Thread Thomas Munro

On Fri, May 3, 2019 at 12:49 AM Laurenz Albe  wrote:
> > I propose the attached patch which fixes the problem.

Hi Laurenz,

I agree that the patch makes the code match the documentation.  The
documented behaviour seems to make more sense than the code, since
unpatched master gives this nonsense result when it flips the
inclusive flag but doesn't adjust the value (because it can't):

postgres=# select '(-infinity,infinity]'::daterange @> 'infinity'::date;
 ?column?
--
 f
(1 row)

-if (!upper.infinite && upper.inclusive)
+if (!(upper.infinite || DATE_NOT_FINITE(upper.val)) && upper.inclusive)

Even though !(X || Y) is equivalent to !X && !Y, by my reading of
range_in(), lower.value can be uninitialised when lower.infinite is
true, and it's also a bit hard to read IMHO, so I'd probably write
that as !upper.infinite && !DATE_NOT_FINITE(upper.val) &&
upper.inclusive.  I don't think it can affect the result but it might
upset Valgrind or similar.

-- 
Thomas Munro
https://enterprisedb.com

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-13 Thread Joe Conway

On 7/11/19 9:05 PM, Bruce Momjian wrote:
> On Thu, Jul 11, 2019 at 08:41:52PM -0400, Joe Conway wrote:
>> On 7/11/19 6:37 PM, Bruce Momjian wrote:
>> > Our first implementation will encrypt the entire cluster.  We can later
>> > consider encryption per table or tablespace.  It is unclear if
>> > encrypting different parts of the system with different keys is useful
>> > or feasible.  (This is separate from key rotation.)
>> 
>> I still object strongly to using a single key for the entire database. I
>> think we can use a single key for WAL, but we need some way to split the
>> heap so that multiple keys are used. If not by tablespace, then some
>> other method.
> 
> What do you base this on?

I have stated this more than once, and you and Stephen discussed it
earlier as well, but will find all the links back into the thread and
references and address in a separate email.

>> Regardless of the method to split the heap into different keys, I think
>> there should be an option for some tables to not be encrypted. If we
>> decide it must be all or nothing for the first implementation I guess I
>> could live with it but would be very disappointed.
> 
> What does it mean you "could live this it"?  Why do you consider having
> some tables unencrypted important?


I think it is pretty obvious isn't it? You have talked about the
performance impact. Why would I want to encrypt, for example a lookup
table, if there is nothing in that table warranting encryption?

I think in many if not most applications the sensitive data is limited
to much less than all of the tables, and I'd rather not take the hit for
those tables.


>> The keys themselves should be in an file which is encrypted by a master
>> key. Obtaining the master key should be pattern it after the GUC
>> ssl_passphrase_command.
>> 
>> > We will use CBC AES128 mode for tables/indexes, and CTR AES128 for WAL.
>> > 8k pages will use the LSN as a nonce, which will be encrypted to
>> > generate the initialization vector (IV).  We will not encrypt the first
>> > 16 bytes of each pages so the LSN can be used in this way.  The WAL will
>> > use the WAL file segment number as the nonce and the IV will be created
>> > in the same way.
>> 
>> I vote for AES 256 rather than 128.
> 
> Why?  This page seems to think 128 is sufficient:


Addressed in the other email


>> Thinking out loud (and I believe somewhere in this massive thread
>> someone else already said this), if we had a way to flag "key version"
>> at the page level it seems like we could potentially rekey page-by-page
>> while online, locking only one page at a time. We really only need to
>> support 2 key versions and could ping-pong between them as they change.
>> Or maybe this is a crazy idea.
> 
> Yes, we did talk about this.  It is certainly possible, but we would
> still need a tool to guarantee all pages are using the new version, so I
> am not sure what per-page buys us except making the later check faster. 
> I don't see this as a version-1 feature, frankly.

If we allow for say, 2 versions of the key to exist at any given time,
and if we could store that key version information on each page, we
could change the key from old to new without locking the entire table at
once, just locking one page at a time. Or at least that was my thinking.

Joe
-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] Implement uuid_version()

2019-07-13 Thread Fabien COELHO




Hello Peter,


About doc: I'd consider "generation" instead of "generating" as a
secondary index term.


We do use the "-ing" form for other secondary index terms.  It's useful
because the concatenation of primary and secondary term should usually
make a phrase of some sort.  The alternative would be "generation of",
but that doesn't seem clearly better.


Ok, fine. I looked but did not find other instances of "generating".


What about avoiding a redirection with something like:

Datum (* const pg_random_uuid)(PG_FUNCTION_ARGS) = gen_random_uuid;


That seems very confusing.


Dunno. Possibly. The user does not have to look at the implementation, and 
probably such code would deserve a comment.


The point is to avoid one call so as to perform the same (otherwise the 
pg_random_uuid would be slightly slower), and to ensure that it behaves 
the same, as it would be the very same function by construction.


I've switched the patch to ready anyway.

--
Fabien.

Re: Check-out mutable functions in check constraints

2019-07-13 Thread Tom Lane

Tomas Vondra  writes:
> On Fri, Jul 12, 2019 at 07:59:13PM -0400, Tom Lane wrote:
>> I'm pretty sure this change has been proposed before, and rejected before.
>> Has anybody excavated in the archives for prior discussions?

> Yes, I've done some quick searches like "volatile constraint" and so on.
> There are a couple of relevant discussions:
> 2004: 
> https://www.postgresql.org/message-id/flat/0C3A1AEC-6BE4-11D8-9224-000A95C88220%40myrealbox.com
> 2010: 
> https://www.postgresql.org/message-id/flat/12849.1277918175%40sss.pgh.pa.us#736c8ef9d7810c0bb85f495490fd40f5
> But I don't think the conclusions are particularly clear.
> In the first thread you seem to agree with requiring immutable functions
> for check constraints (and triggers for one-time checks). The second
> thread ended up discussing some new related stuff in SQL standard.

Well, I think that second thread is very relevant here, because
it correctly points out that we are *required by spec* to allow
check constraints of the form CHECK(datecol <= CURRENT_DATE) and
related tests.  See the stuff about "retrospectively deterministic"
predicates in SQL:2003 or later.

I suppose you could imagine writing some messy logic that allowed the
specific cases called out by the spec but not any other non-immutable
function calls.  But that just leaves us with an inconsistent
restriction.  If the spec is allowing this because it can be seen
to be safe, why should we not allow other cases that the user has
taken the trouble to prove to themselves are safe?  (If their proof is
wrong, well, it wouldn't be the first bug in anyone's SQL application.)

regards, tom lane

Re: [PATCH] Implement uuid_version()

2019-07-13 Thread Fabien COELHO



Hello Jose,

Considering the later arguments on-list, I plan on submitting a more 
elaborate patchset integrating the feedback received so far, and along the 
following lines:


- uuid type, v4 generation and uuid_version() in core

- Provide a means to rename/supercede extensions keeping backwards 
compatibility (i.e. uuid-ossp -> uuid, keep old code working)


- Miscellaneous other functionality

- Drop "dead" code

  ...but I've tried to keep quiet so as not to disturb too much around 
release time.


I intend to continue working on this in late July, aiming for the following 
commitfest (once more "urgent" patches will have landed)


Ok.

I've changed the patch status for this CF to "moved do next CF", and to 
"Waiting on author" there.


The idea is to go on in the same thread when you are ready.

--
Fabien.

Re: refactoring - share str2*int64 functions

2019-07-13 Thread Fabien COELHO



Hello Thomas,


pg_stat_statements.c:1024:11: error: implicit declaration of function
'pg_strtouint64' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
   rows = pg_strtouint64(completionTag + 5, NULL, 10);
  ^
Apparently it needs to include common/string.h.


Yep, I gathered that as well, but did not act promptly on your help.
Thanks for it!

Here is the updated patch on which I checked "make check-world".

--
Fabien.diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 00cae04eea..8f40245ac4 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -79,6 +79,7 @@
 #include "utils/builtins.h"
 #include "utils/hashutils.h"
 #include "utils/memutils.h"
+#include "common/string.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 8eedb613a1..e8744f054c 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -21,6 +21,7 @@
 #include "catalog/heap.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "common/string.h"
 #include "executor/executor.h"
 #include "executor/spi_priv.h"
 #include "miscadmin.h"
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 6c2626ee62..3735268e3a 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -34,6 +34,7 @@
 
 #include "fmgr.h"
 #include "miscadmin.h"
+#include "common/string.h"
 #include "nodes/extensible.h"
 #include "nodes/parsenodes.h"
 #include "nodes/plannodes.h"
diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index 1baf7ef31f..53b89ece39 100644
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -25,10 +25,10 @@
 #include "parser/parse_expr.h"
 #include "parser/parse_relation.h"
 #include "utils/builtins.h"
-#include "utils/int8.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 #include "utils/varbit.h"
+#include "common/string.h"
 
 
 static void pcb_error_callback(void *arg);
@@ -496,7 +496,7 @@ make_const(ParseState *pstate, Value *value, int location)
 
 		case T_Float:
 			/* could be an oversize integer as well as a float ... */
-			if (scanint8(strVal(value), true, &val64))
+			if (pg_strtoint64(strVal(value), true, &val64))
 			{
 /*
  * It might actually fit in int32. Probably only INT_MIN can
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index d317fd7006..70681588a1 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -12,6 +12,7 @@
  */
 #include "postgres.h"
 
+#include "common/string.h"
 #include "catalog/pg_publication.h"
 
 #include "replication/logical.h"
@@ -20,7 +21,6 @@
 #include "replication/pgoutput.h"
 
 #include "utils/inval.h"
-#include "utils/int8.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
 #include "utils/varlena.h"
@@ -111,7 +111,7 @@ parse_output_parameters(List *options, uint32 *protocol_version,
 		 errmsg("conflicting or redundant options")));
 			protocol_version_given = true;
 
-			if (!scanint8(strVal(defel->arg), true, &parsed))
+			if (!pg_strtoint64(strVal(defel->arg), true, &parsed))
 ereport(ERROR,
 		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		 errmsg("invalid proto_version")));
diff --git a/src/backend/utils/adt/cash.c b/src/backend/utils/adt/cash.c
index c92e9d5046..618660f372 100644
--- a/src/backend/utils/adt/cash.c
+++ b/src/backend/utils/adt/cash.c
@@ -26,7 +26,6 @@
 #include "libpq/pqformat.h"
 #include "utils/builtins.h"
 #include "utils/cash.h"
-#include "utils/int8.h"
 #include "utils/numeric.h"
 #include "utils/pg_locale.h"
 
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 206576d4bd..1f17987e85 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -92,7 +92,6 @@
 #include "utils/datetime.h"
 #include "utils/float.h"
 #include "utils/formatting.h"
-#include "utils/int8.h"
 #include "utils/memutils.h"
 #include "utils/numeric.h"
 #include "utils/pg_locale.h"
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 0ff9394a2f..98e7b52ce9 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -18,12 +18,12 @@
 #include 
 
 #include "common/int.h"
+#include "common/string.h"
 #include "funcapi.h"
 #include "libpq/pqformat.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
 #include "optimizer/optimizer.h"
-#include "utils/int8.h"
 #include "utils/builtins.h"
 
 
@@ -47,89 +47,6 @@ typedef struct
  * Formatting and conversion routines.
  *-*/
 
-/*
- * scanint8 --- try to parse a string into an int8.
- *
- * If errorOK is false, ereport a useful error message if the string is ba

Re: [sqlsmith] Crash in mcv_get_match_bitmap

2019-07-13 Thread Tom Lane

Tomas Vondra  writes:
> On Thu, Jul 11, 2019 at 05:08:22PM +0200, Tomas Vondra wrote:
>> On Wed, Jul 10, 2019 at 06:48:16PM -0400, Tom Lane wrote:
>>> The right way to determine operator semantics is to look to see
>>> whether they are in a btree opclass.  That's what the rest of the
>>> planner does, and there is no good reason for the mcv code to
>>> do it some other way.

>> Hmmm, but that will mean we're unable to estimate operators that are not
>> part of a btree opclass. Which is a bit annoying, because that would also
>> affect equalities (and thus functional dependencies), in which case the
>> type may easily have just hash opclass or something.

After thinking about this more, I may have been analogizing to the wrong
code.  It's necessary to use opclass properties when we're reasoning about
operators in a way that *must* be correct, for instance to conclude that
a partition can be pruned from a query.  But this code is just doing
selectivity estimation, so the correctness standards are a lot lower.
In particular I see that the long-established range-query-detection
code in clauselist_selectivity is looking for operators that have
F_SCALARLTSEL etc. as restriction estimators (in fact I'm guessing you
lifted parts of the mcv code from that, cause it looks pretty similar).
So if we've gotten away with that so far over there, there's probably
no reason not to do likewise here.

I am a little troubled by the point I made about operators possibly
wanting to have a more-specific estimator than scalarltsel, but that
seems like an issue to solve some other time; and if we do change that
logic then clauselist_selectivity needs to change too.

> Here are WIP patches addressing two of the issues:

> 1) determining operator semantics by matching them to btree opclasses

Per above, I'm sort of inclined to drop this, unless you feel better
about doing it like this than the existing way.

> 2) deconstructing OpExpr into Var/Const nodes

deconstruct_opexpr is still failing to verify that the Var is a Var.
I'd try something like

leftop = linitial(expr->args);
while (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
// and similarly for rightop
if (IsA(leftop, Var) && IsA(rightop, Const))
// return appropriate results
else if (IsA(leftop, Const) && IsA(rightop, Var))
// return appropriate results
else
// fail

Also, I think deconstruct_opexpr is far too generic a name for what
this is actually doing.  It'd be okay as a static function name
perhaps, but not if you're going to expose it globally.

> a) I don't think unary-argument OpExpr are an issue, because this is
> checked when determining which clauses are compatible (and the code only
> allows the case with 2 arguments).

OK.

> b) Const with constisnull=true - I'm not yet sure what to do about this.
> The easiest fix would be to deem those clauses incompatible, but that
> seems a bit too harsh. The right thing is probably passing the NULL to
> the operator proc (but that means we can't use FunctionCall).

No, because most of the functions in question are strict and will just
crash on null inputs.  Perhaps you could just deem that cases involving
a null Const don't match what you're looking for.

> Now, looking at this code, I wonder if using DEFAULT_COLLATION_OID when
> calling the operator is the right thing. We're using type->typcollation
> when building the stats, so maybe we should do the same thing here.

Yeah, I was wondering that too.  But really you should be using the
column's collation not the type's default collation.  See commit
5e0928005.

regards, tom lane

Re: POC: converting Lists into arrays

2019-07-13 Thread Tom Lane

David Rowley  writes:
> Thanks for the speedy turnaround. I've looked at v8, as far as a diff
> between the two patches and I'm happy.
> I've marked as ready for committer.

So ... last chance for objections?

I see from the cfbot that v8 is already broken (new call of lnext
to be fixed).  Don't really want to keep chasing a moving target,
so unless I hear objections I'm going to adjust the additional
spot(s) and commit this pretty soon, like tomorrow or Monday.

regards, tom lane

Re: initdb recommendations

2019-07-13 Thread Julien Rouhaud

On Sat, Jul 13, 2019 at 2:44 PM Peter Eisentraut
 wrote:
>
> On 2019-07-11 21:34, Julien Rouhaud wrote:
> >> Note that with this change, running initdb without arguments will now
> >> error on those platforms: You need to supply either a password or select
> >> a different default authentication method.
> > Should we make this explicitly stated in the documentation?  As a
> > reference, it's saying:
> >
> > The default client authentication setup is such that users can connect
> > over the Unix-domain socket to the same database user name as their
> > operating system user names (on operating systems that support this,
> > which are most modern Unix-like systems, but not Windows) and
> > otherwise with a password. To assign a password to the initial
> > database superuser, use one of initdb's -W, --pwprompt or -- pwfile
> > options.
>
> Do you have a suggestion for where to put this and exactly how to phrase
> this?
>
> I think the initdb reference page would be more appropriate than
> runtime.sgml.

Yes initdb.sgml seems more suitable.  I was thinking something very
similar to your note, maybe like (also attached if my MUA ruins it):

diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index c47b9139eb..764cf737c7 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -143,6 +143,15 @@ PostgreSQL documentation
 connections.


+   
+
+ Running initdb without arguments on platforms lacking
+ peer or Unix-domain socket connections will exit
+ with an error.  On such environments, you need to either provide a
+ password or choose a different authentication method.
+
+   
+

 Do not use
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index c47b9139eb..764cf737c7 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -143,6 +143,15 @@ PostgreSQL documentation
 connections.

 
+   
+
+ Running initdb without arguments on platforms lacking
+ peer or Unix-domain socket connections will exit
+ with an error.  On such environments, you need to either provide a
+ password or choose a different authentication method.
+
+   
+

 Do not use trust unless you trust all local users on your
 system.

Re: Add CREATE DATABASE LOCALE option

2019-07-13 Thread Fabien COELHO




Hello Peter,


I think pg_dump/t/002_pg_dump.pl might be a good place.  Not the easiest
program in the world to work with, admittedly.


Updated patch with test and expanded documentation.


Patch v2 applies cleanly, compiles, make check-world ok with tap enabled. 
Doc gen ok.


The addition looks reasonable.

The second error message about conflicting option could more explicit than 
a terse "conflicting or redundant options"? The user may expect later 
options to superseedes earlier options, for instance.


About the pg_dump code, I'm wondering whether it is worth generating 
LOCALE as it breaks backward compatibility (eg dumping a new db to load it 
into a older version).


If it is to be generated, I'd do merge the two conditions instead of 
nesting.


  if (strlen(collate) > 0 && strcmp(collate, ctype) == 0)
// generate LOCALE

--
Fabien.

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-13 Thread Joe Conway

On 7/13/19 9:38 AM, Joe Conway wrote:
> On 7/11/19 9:05 PM, Bruce Momjian wrote:
>> On Thu, Jul 11, 2019 at 08:41:52PM -0400, Joe Conway wrote:
>>> On 7/11/19 6:37 PM, Bruce Momjian wrote:
>>> > Our first implementation will encrypt the entire cluster.  We can later
>>> > consider encryption per table or tablespace.  It is unclear if
>>> > encrypting different parts of the system with different keys is useful
>>> > or feasible.  (This is separate from key rotation.)
>>> 
>>> I still object strongly to using a single key for the entire database. I
>>> think we can use a single key for WAL, but we need some way to split the
>>> heap so that multiple keys are used. If not by tablespace, then some
>>> other method.
>> 
>> What do you base this on?

Ok, so here we go. See links below. I skimmed through the entire thread
and FWIW it was exhausting.

To some extent this degenerated into a general search for relevant
information:

---
[1] and [2] show that at least some file system encryption uses a
different key per file.
---
[2] also shows that file system encryption uses a KDF (key derivation
function) which we may want to use ourselves. The analogy would be
per-table derived key instead of per file derived key. Note that KDF is
a safe way to derive a key and it is not the same as a "related key"
which was mentioned on another email as an attack vector.
---
[2] also says provides additional support for AES 256. It also mentions
CBC versus XTS -- I came across this elsewhere and it bears discussion:

"Currently, the following pairs of encryption modes are supported:

AES-256-XTS for contents and AES-256-CTS-CBC for filenames
AES-128-CBC for contents and AES-128-CTS-CBC for filenames
Adiantum for both contents and filenames

If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair.

AES-128-CBC was added only for low-powered embedded devices with crypto
accelerators such as CAAM or CESA that do not support XTS."
---
[2] also states this, which again makes me think in terms of table being
the moral equivalent to a file:

"Unlike dm-crypt, fscrypt operates at the filesystem level rather than
at the block device level. This allows it to encrypt different files
with different keys and to have unencrypted files on the same
filesystem. This is useful for multi-user systems where each user’s
data-at-rest needs to be cryptographically isolated from the others.
However, except for filenames, fscrypt does not encrypt filesystem
metadata."
---
[3] suggests 68 GB per key and unique IV in GCM mode.
---
[4] specifies 68 GB per key and unique IV in CTR mode -- this applies
directly to our proposal to use CTR for WAL.
---
[5] has this to say which seems independent of mode:

"When encrypting data with a symmetric block cipher, which uses blocks
of n bits, some security concerns begin to appear when the amount of
data encrypted with a single key comes close to 2n/2 blocks, i.e. n*2n/2
bits. With AES, n = 128 (AES-128, AES-192 and AES-256 all use 128-bit
blocks). This means a limit of more than 250 millions of terabytes,
which is sufficiently large not to be a problem. That's precisely why
AES was defined with 128-bit blocks, instead of the more common (at that
time) 64-bit blocks: so that data size is practically unlimited."

But goes on to say:
"I wouldn't use n*2^(n/2) bits in any sort of recommendation. Once you
reach that number of bits the probability of a collision will grow
quickly and you will be way over 50% probability of a collision by the
time you reach 2*n*2^(n/2) bits. In order to keep the probability of a
collision negligible I recommend encrypting no more than n*2^(n/4) bits
with the same key. In the case of AES that works out to 64GB"

It is hard to say if that recommendation is per key or per key+IV.
---
[6] shows that Azure SQL Database uses AES 256 for TDE. It also seems to
imply a single key is used although at one point it says "transparent
data encryption master key, also known as the transparent data
encryption protector". The term "master key" indicates that they likely
use derived keys under the covers.
---
[7] is generally useful read about how many of the things we have been
discussing are done in SQL Server
---
[8] was referenced by Sehrope. In addition to support for AES 256 for
long term use, table 5.1 is interesting. It lists CBC mode as "legacy"
but not "future".
---
[9] IETF RFC for KDF
---
[10] IETF RFC for Key wrapping -- this is probably how we should wrap
the master key with the Key Encryption Key (KEK) -- i.e. the outer key
provided by the user or command on postmaster start
---

Based on all of that I cannot find a requirement that we use more than
one key per database.

But I did find that files in an encrypted file system are encrypted with
derived keys from a master key, and I view this as analogous to what we
are doing.

As an aside to the specific question, I also found more evidence that
AES 256 is appropriate.

Joe

[1]
https://www.po

Re: [PATCH] Improve performance of NOTIFY over many databases (issue blocking on AccessExclusiveLock on object 0 of class 1262 of database 0)

2019-07-13 Thread Tom Lane

Martijn van Oosterhout  writes:
> Please find attached updated versions of the patches, I've now tested
> them. Also attached is a reproduction script to verify that they
> actually work.

I looked through these (a bit cursorily).

I'm generally on board with the idea of 0001, but not with the patch
details.  As coded, asyncQueueAdvanceTail is supposing that it can
examine the shared QUEUE_HEAD and QUEUE_TAIL pointers without any
lock whatsoever.  That's probably unsafe, and if it is safe for some
reason, you haven't made the argument why.  Moreover, it seems
unnecessary to make any such assumption.  Why not put back the
advanceTail tests you removed, but adjust them so that advanceTail
isn't set true unless QUEUE_HEAD and QUEUE_TAIL point to different
pages?  (Note that in the existing coding, those tests are made
while holding an appropriate lock, so it's safe to look at those
pointers there.)

It might be a good idea to make a macro encapsulating this new,
more complicated rule for setting advanceTail, instead of relying
on keeping the various call sites in sync.

More attention to comments is also needed.  For instance, you've
made a lie out of the documentation of the tail pointer:

QueuePosition tail; /* the global tail is equivalent to the pos of
 * the "slowest" backend */

It needs to say something like "is <= the pos of the slowest backend",
instead.  I think the explanation of why this algorithm is good could
use more effort, too.

Comments for 0002 are about the same: for no explained reason, and
certainly no savings, you've put the notify_all test in an unsafe
place rather than a safe one (viz, two lines down, *after* taking
the relevant lock).  And 0002 needs more commentary about why
its optimization is safe and useful, too.  In particular it's
not obvious why QUEUE_HEAD being on a different page from QUEUE_TAIL
has anything to do with whether we should wake up other backends.

I'm not very persuaded by 0003, mainly because it seems likely to
me that 0001 and 0002 will greatly reduce the possibility that
the early-exit can happen.  So it seems like it's adding cycles
(in a spot where we hold exclusive lock) without a good chance of
saving any cycles.

Taking a step back in hopes of seeing the bigger picture ...
as you already noted, these changes don't really fix the "thundering
herd of wakeups" problem, they just arrange for it to happen
once per SLRU page rather than once per message.  I wonder if we
could improve matters by stealing an idea from the sinval code:
when we're trying to cause advance of the global QUEUE_TAIL, waken
only the slowest backend, and have it waken the next-slowest after
it's done.  In sinval there are some additional provisions to prevent
a nonresponsive backend from delaying matters for other backends,
but I think maybe we don't need that here.  async.c doesn't have
anything equivalent to sinval reset, so there's no chance of
overruling a slow backend's failure to advance its pos pointer,
so there's not much reason not to just wait till it does do so.

A related idea is to awaken only one backend at a time when we
send a new message (i.e., advance QUEUE_HEAD) but I think that
would likely be bad.  The hazard with the chained-wakeups method
is that a slow backend blocks wakeup of everything else.  We don't
care about that hugely for QUEUE_TAIL advance, because we're just
hoping to free some SLRU space.  But for QUEUE_HEAD advance it'd
mean failing to deliver notifies in a timely way, which we do care
about.  (Also, if I remember correctly, the processing on that side
only requires shared lock so it's less of a problem if many backends
do it at once.)

regards, tom lane

Re: [sqlsmith] Crash in mcv_get_match_bitmap

2019-07-13 Thread Tomas Vondra


On Sat, Jul 13, 2019 at 11:39:55AM -0400, Tom Lane wrote:

Tomas Vondra  writes:

On Thu, Jul 11, 2019 at 05:08:22PM +0200, Tomas Vondra wrote:

On Wed, Jul 10, 2019 at 06:48:16PM -0400, Tom Lane wrote:

The right way to determine operator semantics is to look to see
whether they are in a btree opclass.  That's what the rest of the
planner does, and there is no good reason for the mcv code to
do it some other way.



Hmmm, but that will mean we're unable to estimate operators that are not
part of a btree opclass. Which is a bit annoying, because that would also
affect equalities (and thus functional dependencies), in which case the
type may easily have just hash opclass or something.


After thinking about this more, I may have been analogizing to the wrong
code.  It's necessary to use opclass properties when we're reasoning about
operators in a way that *must* be correct, for instance to conclude that
a partition can be pruned from a query.  But this code is just doing
selectivity estimation, so the correctness standards are a lot lower.
In particular I see that the long-established range-query-detection
code in clauselist_selectivity is looking for operators that have
F_SCALARLTSEL etc. as restriction estimators (in fact I'm guessing you
lifted parts of the mcv code from that, cause it looks pretty similar).
So if we've gotten away with that so far over there, there's probably
no reason not to do likewise here.

I am a little troubled by the point I made about operators possibly
wanting to have a more-specific estimator than scalarltsel, but that
seems like an issue to solve some other time; and if we do change that
logic then clauselist_selectivity needs to change too.


Here are WIP patches addressing two of the issues:



1) determining operator semantics by matching them to btree opclasses


Per above, I'm sort of inclined to drop this, unless you feel better
about doing it like this than the existing way.



OK. TBH I don't have a very strong opinion on this - I always disliked
how we rely on the estimator OIDs in this code, and relying on btree
opclasses seems somewhat more principled. But I'm not sure I understand
all the implications of such change (and I have some concerns about it
too, per my last message), so I'd revisit that in PG13.


2) deconstructing OpExpr into Var/Const nodes


deconstruct_opexpr is still failing to verify that the Var is a Var.
I'd try something like

leftop = linitial(expr->args);
while (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
// and similarly for rightop
if (IsA(leftop, Var) && IsA(rightop, Const))
// return appropriate results
else if (IsA(leftop, Const) && IsA(rightop, Var))
// return appropriate results
else
// fail



Ah, right. The RelabelType might be on top of something that's not Var.


Also, I think deconstruct_opexpr is far too generic a name for what
this is actually doing.  It'd be okay as a static function name
perhaps, but not if you're going to expose it globally.



I agree. I can't quite make it static, because it's used from multiple
places, but I'll move it to extended_stats_internal.h (and I'll see if I
can think of a better name too).


a) I don't think unary-argument OpExpr are an issue, because this is
checked when determining which clauses are compatible (and the code only
allows the case with 2 arguments).


OK.


b) Const with constisnull=true - I'm not yet sure what to do about this.
The easiest fix would be to deem those clauses incompatible, but that
seems a bit too harsh. The right thing is probably passing the NULL to
the operator proc (but that means we can't use FunctionCall).


No, because most of the functions in question are strict and will just
crash on null inputs.  Perhaps you could just deem that cases involving
a null Const don't match what you're looking for.



Makes sense, I'll do that.


Now, looking at this code, I wonder if using DEFAULT_COLLATION_OID when
calling the operator is the right thing. We're using type->typcollation
when building the stats, so maybe we should do the same thing here.


Yeah, I was wondering that too.  But really you should be using the
column's collation not the type's default collation.  See commit
5e0928005.



OK, thanks.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-13 Thread Tomas Vondra

On Sat, Jul 13, 2019 at 02:41:34PM -0400, Joe Conway wrote:

On 7/13/19 9:38 AM, Joe Conway wrote:

On 7/11/19 9:05 PM, Bruce Momjian wrote:

On Thu, Jul 11, 2019 at 08:41:52PM -0400, Joe Conway wrote:

On 7/11/19 6:37 PM, Bruce Momjian wrote:
> Our first implementation will encrypt the entire cluster.  We can later
> consider encryption per table or tablespace.  It is unclear if
> encrypting different parts of the system with different keys is useful
> or feasible.  (This is separate from key rotation.)

I still object strongly to using a single key for the entire database. I
think we can use a single key for WAL, but we need some way to split the
heap so that multiple keys are used. If not by tablespace, then some
other method.

What do you base this on?

Ok, so here we go. See links below. I skimmed through the entire thread
and FWIW it was exhausting.

To some extent this degenerated into a general search for relevant
information:

---
[1] and [2] show that at least some file system encryption uses a
different key per file.
---
[2] also shows that file system encryption uses a KDF (key derivation
function) which we may want to use ourselves. The analogy would be
per-table derived key instead of per file derived key. Note that KDF is
a safe way to derive a key and it is not the same as a "related key"
which was mentioned on another email as an attack vector.
---
[2] also says provides additional support for AES 256. It also mentions
CBC versus XTS -- I came across this elsewhere and it bears discussion:

"Currently, the following pairs of encryption modes are supported:

   AES-256-XTS for contents and AES-256-CTS-CBC for filenames
   AES-128-CBC for contents and AES-128-CTS-CBC for filenames
   Adiantum for both contents and filenames

If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair.

AES-128-CBC was added only for low-powered embedded devices with crypto
accelerators such as CAAM or CESA that do not support XTS."
---
[2] also states this, which again makes me think in terms of table being
the moral equivalent to a file:

"Unlike dm-crypt, fscrypt operates at the filesystem level rather than
at the block device level. This allows it to encrypt different files
with different keys and to have unencrypted files on the same
filesystem. This is useful for multi-user systems where each user’s
data-at-rest needs to be cryptographically isolated from the others.
However, except for filenames, fscrypt does not encrypt filesystem
metadata."
---
[3] suggests 68 GB per key and unique IV in GCM mode.
---
[4] specifies 68 GB per key and unique IV in CTR mode -- this applies
directly to our proposal to use CTR for WAL.
---
[5] has this to say which seems independent of mode:

"When encrypting data with a symmetric block cipher, which uses blocks
of n bits, some security concerns begin to appear when the amount of
data encrypted with a single key comes close to 2n/2 blocks, i.e. n*2n/2
bits. With AES, n = 128 (AES-128, AES-192 and AES-256 all use 128-bit
blocks). This means a limit of more than 250 millions of terabytes,
which is sufficiently large not to be a problem. That's precisely why
AES was defined with 128-bit blocks, instead of the more common (at that
time) 64-bit blocks: so that data size is practically unlimited."

FWIW I was a bit confused at first, because the copy paste mangled the
formulas a bit - it should have been 2^(n/2) and n*2^(n/2).

But goes on to say:
"I wouldn't use n*2^(n/2) bits in any sort of recommendation. Once you
reach that number of bits the probability of a collision will grow
quickly and you will be way over 50% probability of a collision by the
time you reach 2*n*2^(n/2) bits. In order to keep the probability of a
collision negligible I recommend encrypting no more than n*2^(n/4) bits
with the same key. In the case of AES that works out to 64GB"

It is hard to say if that recommendation is per key or per key+IV.

Hmm, yeah. The question is what collisions they have in mind? Presumably
it's AES(block1,key) = AES(block2,key) in which case it'd be with fixed
IV, so per key+IV.

---
[6] shows that Azure SQL Database uses AES 256 for TDE. It also seems to
imply a single key is used although at one point it says "transparent
data encryption master key, also known as the transparent data
encryption protector". The term "master key" indicates that they likely
use derived keys under the covers.
---
[7] is generally useful read about how many of the things we have been
discussing are done in SQL Server
---
[8] was referenced by Sehrope. In addition to support for AES 256 for
long term use, table 5.1 is interesting. It lists CBC mode as "legacy"
but not "future".
---
[9] IETF RFC for KDF
---
[10] IETF RFC for Key wrapping -- this is probably how we should wrap
the master key with the Key Encryption Key (KEK) -- i.e. the outer key
provided by the user or command on postmaster start
---

Based on all of that I cannot find a requirement that we use more than
one key per

Re: Patch to document base64 encoding

2019-07-13 Thread Karl O. Pinc

Hi Fabien,

Attached is doc_base64_v10.patch

On Fri, 12 Jul 2019 15:58:21 +0200 (CEST)
Fabien COELHO  wrote:

> >> I suggested "Function <>decode ...", which is the kind of thing
> >> we do in academic writing to improve precision, because I thought
> >> it could be better:-)  
> >
> > "Function <>decode ..." just does not work in English.  
> 
> It really works in research papers: "Theorem X can be proven by
> applying Proposition Y. See Figure 2 for details. Algorithm Z
> describes whatever, which is listed in Table W..."

I've not thought about it before but I suppose the difference
is between declarative and descriptive, the latter being
more inviting and better allows for flow between sentences.
Otherwise you're writing in bullet points.  So it is a
question of balance between specification and narration.
In regular prose you're always going to see the "the"
unless the sentence starts with the name.  The trouble
is that we can't start sentences with function names
because of capitalization confusion.

> > I hope that 3 patches will make review easier.  
> 
> Not really. I'm reviewing the 3 patches put together rather than each
> one individually, which would require more work.

I figured with e.g. a separate patch for moving and alphabetizing
that it'd be easier to check that nothing got lost.  Anyhow,
Just one patch this time.

> convert: I'd merge the 2 first sentences to state that if convert
> from X to Y. The doc does not say explicitely what happens if a
> character cannot be converted. After testing, an error is raised. The
> example comment could add ", if possible".

Done.  Good idea.  I reworked the whole paragraph to shorten and
clarify since I was altering it anyway.  This does introduce
some inconsistency with wording that appears elsewhere but it seems
worth it because the listentry box was getting overfull.

> to_hex: add "." at the end of the sentence?

I left as-is, without a ".".  The only consistent rule about
periods in the listentrys seems to be that if there's more than
one sentence then there's periods -- I think.  In any case a
lot of them don't have periods and probably don't need
periods.  I don't know what to do and since the original did
not have a period it seems better to leave well enough alone.

> Minor comment: you usually put two spaces between a "." and the first 
> world of then next sentence, but not always.

I now always put 2 spaces after the end of a sentence.  But
I've only done this where I've changed text, not when
moving pre-existing text around.  Again, there seems
to be no consistency in the original.  (I believe docbook
redoes all inter-sentence spacing anyway.)

Thanks for the help.

I'll be sure to sign up to review a patch (or patches) when life
permits.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a25c122ac8..131a63b36c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1795,48 +1795,6 @@

- convert
-
-convert(string bytea,
-src_encoding name,
-dest_encoding name)
-   
-   bytea
-   
-Convert string to dest_encoding.  The
-original encoding is specified by
-src_encoding. The
-string must be valid in this encoding.
-Conversions can be defined by CREATE CONVERSION.
-Also there are some predefined conversions. See  for available conversions.
-   
-   convert('text_in_utf8', 'UTF8', 'LATIN1')
-   text_in_utf8 represented in Latin-1
-   encoding (ISO 8859-1)
-  
-
-  
-   
-
- convert_from
-
-convert_from(string bytea,
-src_encoding name)
-   
-   text
-   
-Convert string to the database encoding.  The original encoding
-is specified by src_encoding. The
-string must be valid in this encoding.
-   
-   convert_from('text_in_utf8', 'UTF8')
-   text_in_utf8 represented in the current database encoding
-  
-
-  
-   
-
  convert_to

 convert_to(string text,
@@ -1844,7 +1802,8 @@

bytea

-Convert string to dest_encoding.
+Convert string to dest_encoding.  See  for available conversions.

convert_to('some text', 'UTF8')
some text represented in the UTF8 encoding
@@ -1855,39 +1814,24 @@

  decode

+
+  base64 encoding
+
 decode(string text,
 format text)

bytea

 Decode binary data from textual representation in string.
-Options for format are same as in encode.
+Options
+for format are same as
+in encode.

decode('MTIzAAE=', 'base64')
\x3132330001

-   
-
- encode
-
-

Re: SHOW CREATE

2019-07-13 Thread Michael Glaesemann

> On 2019–07–05, at 12:14, Corey Huinker  wrote:
> 
> In doing that work, it became clear that the command was serving two masters:
> 1. A desire to see the underlying nuts and bolts of a given database object.
> 2. A desire to essentially make the schema portion of pg_dump a server side 
> command.
> 
> To that end, I see splitting this into two commands, SHOW CREATE and SHOW 
> DUMP.

I like the idea of having these features available via SQL as opposed to 
separate tools. Is it necessary to have specific commands for them? It seems 
they would potentially more useful as functions, where they'd be available for 
all of the programmatic features of the rest of SQL.

Michael Glaesemann
grzm seespotcode net

Re: Avoiding hash join batch explosions with extreme skew and weird stats

2019-07-13 Thread Tomas Vondra


On Wed, Jul 03, 2019 at 02:22:09PM -0700, Melanie Plageman wrote:

On Tue, Jun 18, 2019 at 3:24 PM Melanie Plageman 
wrote:



These questions will probably make a lot more sense with corresponding
code, so I will follow up with the second version of the state machine
patch once I finish it.



I have changed the state machine and resolved the questions I had
raised in the previous email. This seems to work for the parallel and
non-parallel cases. I have not yet rewritten the unmatched outer tuple
status as a bitmap in a spill file (for ease of debugging).

Before doing that, I wanted to ask what a desirable fallback condition
would be. In this patch, fallback to hashloop join happens only when
inserting tuples into the hashtable after batch 0 when inserting
another tuple from the batch file would exceed work_mem. This means
you can't increase nbatches, which, I would think is undesirable.



Yes, I think that's undesirable.


I thought a bit about when fallback should happen. So, let's say that
we would like to fallback to hashloop join when we have increased
nbatches X times. At that point, since we do not want to fall back to
hashloop join for all batches, we have to make a decision. After
increasing nbatches the Xth time, do we then fall back for all batches
for which inserting inner tuples exceeds work_mem? Do we use this
strategy but work_mem + some fudge factor?

Or, do we instead try to determine if data skew led us to increase
nbatches both times and then determine which batch, given new
nbatches, contains that data, set fallback to true only for that
batch, and let all other batches use the existing logic (with no
fallback option) unless they contain a value which leads to increasing
nbatches X number of times?



I think we should try to detect the skew and use this hashloop logic
only for the one batch. That's based on the assumption that the hashloop
is less efficient than the regular hashjoin.

We may need to apply it even for some non-skewed (but misestimated)
cases, though. At some point we'd need more than work_mem for BufFiles,
at which point we ought to use this hashloop.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: SHOW CREATE

2019-07-13 Thread David Fetter

On Sat, Jul 13, 2019 at 06:32:41PM -0500, Michael Glaesemann wrote:
> 
> 
> > On 2019–07–05, at 12:14, Corey Huinker  wrote:
> > 
> > In doing that work, it became clear that the command was serving two 
> > masters:
> > 1. A desire to see the underlying nuts and bolts of a given database object.
> > 2. A desire to essentially make the schema portion of pg_dump a server side 
> > command.
> > 
> > To that end, I see splitting this into two commands, SHOW CREATE
> > and SHOW DUMP.
> 
> I like the idea of having these features available via SQL as
> opposed to separate tools. Is it necessary to have specific commands
> for them? It seems they would potentially more useful as functions,
> where they'd be available for all of the programmatic features of
> the rest of SQL.

Having commands for them would help meet people's expectations coming
from other RDBMSs.

On the other hand, making functions could just be done in SQL, which
might hurry the process along.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Conflict handling for COPY FROM

2019-07-13 Thread Thomas Munro

On Fri, Jul 12, 2019 at 1:42 AM Surafel Temesgen  wrote:
> Here are the patch that contain all the comment given except adding a way to 
> specify
> to ignoring all error because specifying a highest number can do the work and 
> may be
> try to store such badly structure data is a bad idea

Hi Surafel,

FYI GCC warns:

copy.c: In function ‘CopyFrom’:
copy.c:3383:8: error: ‘dest’ may be used uninitialized in this
function [-Werror=maybe-uninitialized]
(void) dest->receiveSlot(myslot, dest);
^
copy.c:2702:16: note: ‘dest’ was declared here
  DestReceiver *dest;
^

-- 
Thomas Munro
https://enterprisedb.com

Re: pgbench - implement strict TPC-B benchmark

2019-07-13 Thread Thomas Munro

On Tue, Apr 9, 2019 at 3:58 AM Fabien COELHO  wrote:
> The attached patch does $SUBJECT, as a showcase for recently added
> features, including advanced expressions (CASE...), \if, \gset, ending SQL
> commands at ";"...

Hi Fabien,

+   the account branch has a 15% probability to be in the same branch
as the teller (unless

I would say "... has a 15% probability of being in the same ...".  The
same wording appears further down in the comment.

I see that the parameters you propose match the TPCB 2.0
description[1], and the account balance was indeed supposed to be
returned to the driver.  I wonder if "strict" is the right name here
though.  "tpcb-like-2" at least leaves room for someone to propose yet
another variant, and still includes the "-like" disclaimer, which I
interpret as a way of making it clear that this benchmark and results
produced by it have no official TPC audited status.

> There is also a small fix to the doc which describes the tpcb-like
> implementation but gets one variable name wrong: balance -> delta.

Agreed.  I committed that part.  Thanks!

[1] http://www.tpc.org/tpcb/spec/tpcb_current.pdf

-- 
Thomas Munro
https://enterprisedb.com

Re: Bad canonicalization for dateranges with 'infinity' bounds

2019-07-13 Thread Thomas Munro

On Sun, Jul 14, 2019 at 12:44 AM Thomas Munro  wrote:
> Even though !(X || Y) is equivalent to !X && !Y, by my reading of
> range_in(), lower.value can be uninitialised when lower.infinite is
> true, and it's also a bit hard to read IMHO, so I'd probably write
> that as !upper.infinite && !DATE_NOT_FINITE(upper.val) &&
> upper.inclusive.  I don't think it can affect the result but it might
> upset Valgrind or similar.

I take back the bit about reading an uninitialised value (X || Y
doesn't access Y if X is true... duh), but I still think the other way
of putting it is a bit easier to read.  YMMV.

Generally, +1 for this patch.  I'll wait a couple of days for more
feedback to appear.

-- 
Thomas Munro
https://enterprisedb.com

Re: Built-in connection pooler

2019-07-13 Thread Thomas Munro

On Tue, Jul 9, 2019 at 8:30 AM Konstantin Knizhnik
 wrote:
 Rebased version of the patch is attached.

Thanks for including nice documentation in the patch, which gives a
good overview of what's going on.  I haven't read any code yet, but I
took it for a quick drive to understand the user experience.  These
are just some first impressions.

I started my server with -c connection_proxies=1 and tried to connect
to port 6543 and the proxy segfaulted on null ptr accessing
port->gss->enc.  I rebuilt without --with-gssapi to get past that.

Using SELECT pg_backend_pid() from many different connections, I could
see that they were often being served by the same process (although
sometimes it created an extra one when there didn't seem to be a good
reason for it to do that).  I could see the proxy managing these
connections with SELECT * FROM pg_pooler_state() (I suppose this would
be wrapped in a view with a name like pg_stat_proxies).  I could see
that once I did something like SET foo.bar = 42, a backend became
dedicated to my connection and no other connection could use it.  As
described.  Neat.

Obviously your concept of tainted backends (= backends that can't be
reused by other connections because they contain non-default session
state) is quite simplistic and would help only the very simplest use
cases.  Obviously the problems that need to be solved first to do
better than that are quite large.  Personally I think we should move
all GUCs into the Session struct, put the Session struct into shared
memory, and then figure out how to put things like prepared plans into
something like Ideriha-san's experimental shared memory context so
that they also can be accessed by any process, and then we'll mostly
be tackling problems that we'll have to tackle for threads too.  But I
think you made the right choice to experiment with just reusing the
backends that have no state like that.

On my FreeBSD box (which doesn't have epoll(), so it's latch.c's old
school poll() for now), I see the connection proxy process eating a
lot of CPU and the temperature rising.  I see with truss that it's
doing this as fast as it can:

poll({ 13/POLLIN 17/POLLIN|POLLOUT },2,1000) = 1 (0x1)

Ouch.  I admit that I had the idea to test on FreeBSD because I
noticed the patch introduces EPOLLET and I figured this might have
been tested only on Linux.  FWIW the same happens on a Mac.

That's all I had time for today, but I'm planning to poke this some
more, and get a better understand of how this works at an OS level.  I
can see fd passing, IO multiplexing, and other interesting things
happening.  I suspect there are many people on this list who have
thoughts about the architecture we should use to allow a smaller
number of PGPROCs and a larger number of connections, with various
different motivations.

> Thank you, I will look at Takeshi Ideriha's patch.

Cool.

> > Could you please fix these compiler warnings so we can see this
> > running check-world on CI?
> >
> > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.46324
> > https://travis-ci.org/postgresql-cfbot/postgresql/builds/555180678
> >
> Sorry, I do not have access to Windows host, so can not check Win32
> build myself.

  C:\projects\postgresql\src\include\../interfaces/libpq/libpq-int.h(33):
fatal error C1083: Cannot open include file: 'pthread-win32.h': No
such file or directory (src/backend/postmaster/proxy.c)
[C:\projects\postgresql\postgres.vcxproj]

These relative includes in proxy.c are part of the problem:

#include "../interfaces/libpq/libpq-fe.h"
#include "../interfaces/libpq/libpq-int.h"

I didn't dig into this much but some first reactions:

1.  I see that proxy.c uses libpq, and correctly loads it as a dynamic
library just like postgres_fdw.  Unfortunately it's part of core, so
it can't use the same technique as postgres_fdw to add the libpq
headers to the include path.

2.  libpq-int.h isn't supposed to be included by code outside libpq,
and in this case it fails to find pthead-win32.h which is apparently
expects to find in either the same directory or the include path.  I
didn't look into what exactly is going on (I don't have Windows
either) but I think we can say the root problem is that you shouldn't
be including that from backend code.

-- 
Thomas Munro
https://enterprisedb.com

Fix typos and inconsistencies for HEAD (take 6)

2019-07-13 Thread Alexander Lakhin

Hello hackers,

Please consider fixing the next batch of typos and inconsistencies in
the tree:
6.1. FADVISE_WILLNEED -> POSIX_FADV_WILLNEED
6.2. failOK -> missing_ok
6.3. failOnerror -> failOnSignal
6.4. fakebits -> remove (irrelevant since introduction in 945543d9)
6.5. FastPathGetLockEntry -> FastPathGetRelationLockEntry
6.6. FAST_PATH_HASH_BUCKETS -> FAST_PATH_STRONG_LOCK_HASH_PARTITIONS
6.7. FastPathTransferLocks -> FastPathTransferRelationLocks
6.8. GenericOptionFlags -> remove (unused since 090173a3)
6.9. fetch_data -> fetched_data
6.10. fildes -> fd
6.11. filedescriptors -> file descriptors
6.12. fillatt -> remove (orphaned since 8609d4ab)
6.13. finalfunction -> finalfn
6.14. flail -> fail
6.15. FlushBuffers -> FlushBuffer & rephrase a comment (incorrectly
updated in 6f5c38dc)
6.16. flush_context -> wb_context
6.17. followon -> follow-on
6.18. force_quotes -> remove (orphaned since e18d900d)
6.19. formatstring -> format-string
6.20. formarray, formfloat -> remove (orphaned since a237dd2b)
6.21. found_row_type -> found_whole_row
6.22. freeScanStack -> remove a comment (irrelevant since 2a636834)
6.23. free_segment_counter -> freed_segment_counter
6.24. FreeSpace Map -> FreeSpaceMap
6.25. user friendly-operand -> user-friendly operand
6.26. frozenids -> frozenxids
6.27. fsm_internal.h -> fsm_internals.h
6.28. fsm_size_to_avail_cat -> fsm_space_avail_to_cat
6.29. full_result -> full_results
6.30. FULL_SIZ -> remove (orphaned since 65b731bd)
6.31. funxtions -> functions
6.32. generate_nonunion_plan, generate_union_plan ->
generate_nonunion_paths, generate_union_paths
6.33. getaddinfo -> getaddrinfo
6.34. get_expr, get_indexdef, get_ruledef, get_viewdef, get_triggerdef,
get_userbyid -> pg_get_*
6.35. GetHashPageStatis -> GetHashPageStats
6.36. GetNumShmemAttachedBgworkers -> remove (orphaned since 6bc8ef0b)
6.37. get_one_range_partition_bound_string -> get_range_partbound_string
6.38. getPartitions -> remove a comment (irrelevant since 44c52881)
6.39. GetRecordedFreePage -> GetRecordedFreeSpace
6.40. get_special_varno -> resolve_special_varno
6.41. gig -> GB
6.42. GinITupIsCompressed -> GinItupIsCompressed
6.43. GinPostingListSegmentMaxSize-bytes -> GinPostingListSegmentMaxSize
bytes
6.44. gistfindCorrectParent -> gistFindCorrectParent
6.45. gistinserthere -> gistinserttuple
6.46. GISTstate -> giststate
6.47. GlobalSerializableXmin, SerializableGlobalXmin -> SxactGlobalXmin
6.48. Greenwish -> Greenwich
6.49. groupClauseVars -> groupClauseCommonVars

As a side note, while looking at dt_common.c (fixing 6.47), I've got a
feeling that the datetktbl is largely outdated and thus mostly unuseful
(e.g. USSR doesn't exist for almost 30 years).

Best regards,
Alexander
diff --git a/doc/src/sgml/gist.sgml b/doc/src/sgml/gist.sgml
index 763b8cf7fd..c95776eb75 100644
--- a/doc/src/sgml/gist.sgml
+++ b/doc/src/sgml/gist.sgml
@@ -921,7 +921,7 @@ my_fetch(PG_FUNCTION_ARGS)
  * Convert 'fetched_data' into the a Datum of the original datatype.
  */
 
-/* fill *retval from fetch_data. */
+/* fill *retval from fetched_data. */
 gistentryinit(*retval, PointerGetDatum(converted_datum),
   entry->rel, entry->page, entry->offset, FALSE);
 
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 47db7a8a88..d78f706ff7 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -155,7 +155,7 @@ static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
  * NB: this is a low-level routine and is NOT the preferred entry point
  * for most uses; functions in transam.c are the intended callers.
  *
- * XXX Think about issuing FADVISE_WILLNEED on pages that we will need,
+ * XXX Think about issuing POSIX_FADV_WILLNEED on pages that we will need,
  * but aren't yet in cache, as well as hinting pages not to fall out of
  * cache yet.
  */
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 05c6ca81b9..0aa12e8dde 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -2399,7 +2399,7 @@ TSParserIsVisible(Oid prsId)
 /*
  * get_ts_dict_oid - find a TS dictionary by possibly qualified name
  *
- * If not found, returns InvalidOid if failOK, else throws error
+ * If not found, returns InvalidOid if missing_ok, else throws error
  */
 Oid
 get_ts_dict_oid(List *names, bool missing_ok)
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 6745a2432e..e2bed96b9b 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -216,9 +216,9 @@ static PROCLOCK *FastPathGetRelationLockEntry(LOCALLOCK *locallock);
 
 /*
  * To make the fast-path lock mechanism work, we must have some way of
- * preventing the use of the fast-path when a conflicting lock might be
- * present.  We partition* the locktag space into FAST_PATH_HASH_BUCKETS
- * partitions, and maintain an integer count of the number of "strong" lockers
+

60 matches

Mail list logo