Re: Efficient output for integer types

2019-09-17 Thread David Fetter
On Wed, Sep 18, 2019 at 05:42:01AM +0200, David Fetter wrote:
> On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote:
> > On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote:
> > > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote:
> > > > Folks,
> > > > 
> > > > Please find attached a couple of patches intended to $subject.
> > > > 
> > > > This patch set cut the time to copy ten million rows of randomly sized
> > > > int8s (10 of them) by about a third, so at least for that case, it's
> > > > pretty decent.
> > > 
> > > Added int4 output, removed the sprintf stuff, as it didn't seem to
> > > help in any cases I was testing.
> > 
> > Found a couple of "whiles" that should have been "ifs."
> 
> Factored out some inefficient functions and made the guts use the more
> efficient function.

Fix copy-paste-o that introduced some unneeded 64-bit math.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From b9b2e2dac6f5c6a15cf4161ff135d201ea52a207 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Sun, 15 Sep 2019 00:06:29 -0700
Subject: [PATCH v5] Make int4 and int8 operations more efficent
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.21.0"

This is a multi-part message in MIME format.
--2.21.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


- Output routines now do more digits per iteration, and
- Code determines the number of decimal digits in int4/int8 efficiently
- Split off pg_ltoa_n from pg_ltoa
- Use same to make other functions shorter

diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c
index 651ade14dd..17ca533b87 100644
--- a/src/backend/access/common/printsimple.c
+++ b/src/backend/access/common/printsimple.c
@@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self)
 			case INT8OID:
 {
 	int64		num = DatumGetInt64(value);
-	char		str[23];	/* sign, 21 digits and '\0' */
+	char		str[MAXINT8LEN];
 
 	pg_lltoa(num, str);
 	pq_sendcountedtext(, str, strlen(str), false);
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 580043233b..3818dbaa85 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -39,6 +39,8 @@ jsonpath_scan.c: FLEX_NO_BACKUP=yes
 # jsonpath_scan is compiled as part of jsonpath_gram
 jsonpath_gram.o: jsonpath_scan.c
 
+numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT)
+
 # jsonpath_gram.c and jsonpath_scan.c are in the distribution tarball,
 # so they are not cleaned here.
 clean distclean maintainer-clean:
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 0ff9394a2f..6230807906 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -27,8 +27,6 @@
 #include "utils/builtins.h"
 
 
-#define MAXINT8LEN		25
-
 typedef struct
 {
 	int64		current;
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 70138feb29..8ef9fac717 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -20,6 +20,58 @@
 
 #include "common/int.h"
 #include "utils/builtins.h"
+#include "port/pg_bitutils.h"
+
+/*
+ * A table of all two-digit numbers. This is used to speed up decimal digit
+ * generation by copying pairs of digits into the final output.
+ */
+static const char DIGIT_TABLE[200] = {
+	'0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '0', '8', '0', '9',
+	'1', '0', '1', '1', '1', '2', '1', '3', '1', '4', '1', '5', '1', '6', '1', '7', '1', '8', '1', '9',
+	'2', '0', '2', '1', '2', '2', '2', '3', '2', '4', '2', '5', '2', '6', '2', '7', '2', '8', '2', '9',
+	'3', '0', '3', '1', '3', '2', '3', '3', '3', '4', '3', '5', '3', '6', '3', '7', '3', '8', '3', '9',
+	'4', '0', '4', '1', '4', '2', '4', '3', '4', '4', '4', '5', '4', '6', '4', '7', '4', '8', '4', '9',
+	'5', '0', '5', '1', '5', '2', '5', '3', '5', '4', '5', '5', '5', '6', '5', '7', '5', '8', '5', '9',
+	'6', '0', '6', '1', '6', '2', '6', '3', '6', '4', '6', '5', '6', '6', '6', '7', '6', '8', '6', '9',
+	'7', '0', '7', '1', '7', '2', '7', '3', '7', '4', '7', '5', '7', '6', '7', '7', '7', '8', '7', '9',
+	'8', '0', '8', '1', '8', '2', '8', '3', '8', '4', '8', '5', '8', '6', '8', '7', '8', '8', '8', '9',
+	'9', '0', '9', '1', '9', '2', '9', '3', '9', '4', '9', '5', '9', '6', '9', '7', '9', '8', '9', '9'
+};
+
+/*
+ * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10
+ */
+static inline uint32
+decimalLength32(const uint32 v)
+{
+	uint32			t;
+	static uint64	PowersOfTen[] =
+	{1,10,100,
+	 1000, 1, 10,
+	 100,  1000,  1,
+	 10};
+
+	t = (pg_leftmost_one_pos32(v) + 1)*1233/4096;
+	return t + (v >= 

Re: pgbench - allow to create partitioned tables

2019-09-17 Thread Fabien COELHO



Hello Erikjan,


[pgbench-init-partitioned-9.patch]


Turns out this patch needed a dos2unix treatment.


It's easy to do but it takes time to figure it out (I'm dumb).  I for one 
would be happy to receive patches not so encumbered :)


AFAICR this is usually because your mailer does not conform to MIME spec, 
which *requires* that text files be sent over with \r\n terminations, so 
my mailer does it for text/x-diff, and your mailer should translate back 
EOL for your platform, but it does not, so you have to do it manually.


I could edit my /etc/mime.types file to switch patch files to some binary 
mime type, but it may have side effects on my system, so I refrained.


Hoping that mailer writers read and conform to MIME seems desperate.

Last time this discussion occured there was no obvious solution beside me 
switching to another bug-compatible mailer, but this is not really 
convenient for me. ISTM that the "patch" command accepts these files with 
warnings.


--
Fabien.




Re: [DOC] Document auto vacuum interruption

2019-09-17 Thread Amit Kapila
On Tue, Sep 17, 2019 at 5:48 PM James Coleman  wrote:
>
> On Tue, Sep 17, 2019 at 2:21 AM Amit Kapila  wrote:
> >
> >
> > Let me know what you think of attached?  I think we can back-patch
> > this patch.  What do you think?  Does anyone else have an opinion on
> > this patch especially if we see any problem in back-patching this?
>
> The attached looks great!
>
> I was working on HEAD for the patch, but this concern has been an
> issue for quite a long time. We were running into it on 9.6 in
> production, for example. And given how frequently it seems like there
> are large-scale production issues related to auto vacuum, I think any
> amount of back patching we can do to make that footgun less likely
> would be a good thing.
>

Okay, I will commit this tomorrow unless someone has any comments or objections.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: scorpionfly needs more semaphores

2019-09-17 Thread jungle boogie

Thus said Tom Lane  on Wed, 18 Sep 2019 00:33:19 -0400

Thomas Munro  writes:

We're seeing occasional failures like this:
running bootstrap script ... 2019-09-13 12:11:26.882 PDT [64926]
FATAL:  could not create semaphores: No space left on device
2019-09-13 12:11:26.882 PDT [64926] DETAIL:  Failed system call was
semget(5728001, 17, 03600).



I think you should switch to using "named" POSIX semaphores by
building with USE_NAMED_POSIX_SEMAPHORES (then it'll create a
squillion little files under /tmp and mmap() them), or increase the
number of SysV semaphores you can create with sysctl[1], or finish
writing your operating system[2] so you can switch to "unnamed" POSIX
semaphores :-)


I'd recommend the second option.  Since the discussion in [1],
we've fixed our docs for OpenBSD to say

 In OpenBSD 3.3 and later, IPC parameters can be adjusted using sysctl,
 for example:
 # sysctl kern.seminfo.semmni=100
 To make these settings persist over reboots, modify /etc/sysctl.conf.
 You will usually want to increase kern.seminfo.semmni and
 kern.seminfo.semmns, as OpenBSD's default settings for these are
 uncomfortably small.



Thanks, Thomas and Tom for reaching out to me. I certainly don't want to 
recompile my kernel, as I basically run -current OpenBSD via snapshots.


That said, I've made the adjustment to the sysctl:

$ sysctl | ag kern.seminfo.semmni
kern.seminfo.semmni=100




Scorpionfly also seems to be having problems with its git repo breaking on
a regular basis.  I have no idea what's up with that.


That is a mystery to me as well. 9.4 stable seems to be the branch with 
the most problems:

https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=scorpionfly=REL9_4_STABLE


My cronjobs:
0 */6 * * * cd /home/pgbuilder/bin/REL10 && ./run_build.pl  --verbose
0 */12 * * * cd /home/pgbuilder/bin/REL10 && ./run_branches.pl  --run-all


I'm willing to make more tweaks to prevent these false positives, so 
feel free to continue monitoring to see how things work out over the 
next several builds.




regards, tom lane









Re: scorpionfly needs more semaphores

2019-09-17 Thread Tom Lane
Thomas Munro  writes:
> We're seeing occasional failures like this:
> running bootstrap script ... 2019-09-13 12:11:26.882 PDT [64926]
> FATAL:  could not create semaphores: No space left on device
> 2019-09-13 12:11:26.882 PDT [64926] DETAIL:  Failed system call was
> semget(5728001, 17, 03600).

> I think you should switch to using "named" POSIX semaphores by
> building with USE_NAMED_POSIX_SEMAPHORES (then it'll create a
> squillion little files under /tmp and mmap() them), or increase the
> number of SysV semaphores you can create with sysctl[1], or finish
> writing your operating system[2] so you can switch to "unnamed" POSIX
> semaphores :-)

I'd recommend the second option.  Since the discussion in [1],
we've fixed our docs for OpenBSD to say

In OpenBSD 3.3 and later, IPC parameters can be adjusted using sysctl,
for example:
# sysctl kern.seminfo.semmni=100
To make these settings persist over reboots, modify /etc/sysctl.conf.
You will usually want to increase kern.seminfo.semmni and
kern.seminfo.semmns, as OpenBSD's default settings for these are
uncomfortably small.

Scorpionfly also seems to be having problems with its git repo breaking on
a regular basis.  I have no idea what's up with that.

regards, tom lane




Re: dropdb --force

2019-09-17 Thread Pavel Stehule
st 18. 9. 2019 v 5:59 odesílatel vignesh C  napsal:

> On Wed, Sep 18, 2019 at 8:30 AM Pavel Stehule 
> wrote:
> >
> >
> Hi Pavel,
>
> One Comment:
> In the documentation we say drop database will fail after 60 seconds
>   
> FORCE
> 
>  
>   Attempt to terminate all existing connections to the target database.
>  
>  
>   This will fail, if current user has no permissions to terminate other
>   connections. Required permissions are the same as with
>   pg_terminate_backend, described
>   in .
>
>   This will also fail, if the connections do not terminate in 60
> seconds.
>  
> 
>
>

This is not valid. With FORCE flag the clients are closed immediately


>
> But in TerminateOtherDBBackends:
> foreach (lc, pids)
> + {
> + int pid = lfirst_int(lc);
> +
> + (void) kill(pid, SIGTERM); /* ignore any error */
> + }
> +
> + /* sleep 100ms */
> + pg_usleep(100 * 1000L);
> + }
>
> We check for any connected backends after sending kill signal in
> CountOtherDBBackends and throw error immediately.
>
> I had also tested this scenario to get the following error immediately:
> test=# drop database (force) test1;
> ERROR:  database "test1" is being accessed by other users
> DETAIL:  There is 1 other session using the database.
>
>
sure - you cannot to kill self


> I feel some change is required to keep documentation and code in sync.
>

I am waiting to Tom's reply about necessary rights. But the doc part is not
synced, and should be fixed.

Pavel

>
> Regards,
> Vignesh
> EnterpriseDB: http://www.enterprisedb.com
>


Re: range test for hash index?

2019-09-17 Thread Amit Kapila
On Mon, Sep 16, 2019 at 11:24 PM Paul A Jungwirth
 wrote:
>
> On Mon, Sep 16, 2019 at 5:28 AM Amit Kapila  wrote:
> > I don't see this function on the master branch.  Is this function name
> > correct?  Are you looking at some different branch?
>
> Sorry about that! You're right, I was on my multirange branch. But I
> see the same thing on latest master (but calling hash_range instead of
> hash_range_internal).
>

No problem, attached is a patch with a proposed commit message.  I
will wait for a few days to see if Heikki/Jeff or anyone else responds
back, otherwise will commit and backpatch this early next week.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


0001-Fix-oversight-in-commit-4429f6a9e3e12bb4af6e3677fbc7.patch
Description: Binary data


Re: dropdb --force

2019-09-17 Thread vignesh C
On Wed, Sep 18, 2019 at 8:30 AM Pavel Stehule  wrote:
>
>
Hi Pavel,

One Comment:
In the documentation we say drop database will fail after 60 seconds
  
FORCE

 
  Attempt to terminate all existing connections to the target database.
 
 
  This will fail, if current user has no permissions to terminate other
  connections. Required permissions are the same as with
  pg_terminate_backend, described
  in .

  This will also fail, if the connections do not terminate in 60 seconds.
 

   


But in TerminateOtherDBBackends:
foreach (lc, pids)
+ {
+ int pid = lfirst_int(lc);
+
+ (void) kill(pid, SIGTERM); /* ignore any error */
+ }
+
+ /* sleep 100ms */
+ pg_usleep(100 * 1000L);
+ }

We check for any connected backends after sending kill signal in
CountOtherDBBackends and throw error immediately.

I had also tested this scenario to get the following error immediately:
test=# drop database (force) test1;
ERROR:  database "test1" is being accessed by other users
DETAIL:  There is 1 other session using the database.

I feel some change is required to keep documentation and code in sync.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Efficient output for integer types

2019-09-17 Thread David Fetter
On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote:
> On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote:
> > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote:
> > > Folks,
> > > 
> > > Please find attached a couple of patches intended to $subject.
> > > 
> > > This patch set cut the time to copy ten million rows of randomly sized
> > > int8s (10 of them) by about a third, so at least for that case, it's
> > > pretty decent.
> > 
> > Added int4 output, removed the sprintf stuff, as it didn't seem to
> > help in any cases I was testing.
> 
> Found a couple of "whiles" that should have been "ifs."

Factored out some inefficient functions and made the guts use the more
efficient function.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 8ba1ed73a3a96b947e591c9edd22acb89bfa096b Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Sun, 15 Sep 2019 00:06:29 -0700
Subject: [PATCH v4] Make int4 and int8 operations more efficent
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.21.0"

This is a multi-part message in MIME format.
--2.21.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


- Output routines now do more digits per iteration, and
- Code determines the number of decimal digits in int4/int8 efficiently
- Split off pg_ltoa_n from pg_ltoa
- Use same to make other functions shorter

diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c
index 651ade14dd..17ca533b87 100644
--- a/src/backend/access/common/printsimple.c
+++ b/src/backend/access/common/printsimple.c
@@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self)
 			case INT8OID:
 {
 	int64		num = DatumGetInt64(value);
-	char		str[23];	/* sign, 21 digits and '\0' */
+	char		str[MAXINT8LEN];
 
 	pg_lltoa(num, str);
 	pq_sendcountedtext(, str, strlen(str), false);
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 580043233b..3818dbaa85 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -39,6 +39,8 @@ jsonpath_scan.c: FLEX_NO_BACKUP=yes
 # jsonpath_scan is compiled as part of jsonpath_gram
 jsonpath_gram.o: jsonpath_scan.c
 
+numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT)
+
 # jsonpath_gram.c and jsonpath_scan.c are in the distribution tarball,
 # so they are not cleaned here.
 clean distclean maintainer-clean:
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 0ff9394a2f..6230807906 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -27,8 +27,6 @@
 #include "utils/builtins.h"
 
 
-#define MAXINT8LEN		25
-
 typedef struct
 {
 	int64		current;
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 70138feb29..47280b68d6 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -20,6 +20,58 @@
 
 #include "common/int.h"
 #include "utils/builtins.h"
+#include "port/pg_bitutils.h"
+
+/*
+ * A table of all two-digit numbers. This is used to speed up decimal digit
+ * generation by copying pairs of digits into the final output.
+ */
+static const char DIGIT_TABLE[200] = {
+	'0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '0', '8', '0', '9',
+	'1', '0', '1', '1', '1', '2', '1', '3', '1', '4', '1', '5', '1', '6', '1', '7', '1', '8', '1', '9',
+	'2', '0', '2', '1', '2', '2', '2', '3', '2', '4', '2', '5', '2', '6', '2', '7', '2', '8', '2', '9',
+	'3', '0', '3', '1', '3', '2', '3', '3', '3', '4', '3', '5', '3', '6', '3', '7', '3', '8', '3', '9',
+	'4', '0', '4', '1', '4', '2', '4', '3', '4', '4', '4', '5', '4', '6', '4', '7', '4', '8', '4', '9',
+	'5', '0', '5', '1', '5', '2', '5', '3', '5', '4', '5', '5', '5', '6', '5', '7', '5', '8', '5', '9',
+	'6', '0', '6', '1', '6', '2', '6', '3', '6', '4', '6', '5', '6', '6', '6', '7', '6', '8', '6', '9',
+	'7', '0', '7', '1', '7', '2', '7', '3', '7', '4', '7', '5', '7', '6', '7', '7', '7', '8', '7', '9',
+	'8', '0', '8', '1', '8', '2', '8', '3', '8', '4', '8', '5', '8', '6', '8', '7', '8', '8', '8', '9',
+	'9', '0', '9', '1', '9', '2', '9', '3', '9', '4', '9', '5', '9', '6', '9', '7', '9', '8', '9', '9'
+};
+
+/*
+ * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10
+ */
+static inline uint32
+decimalLength32(const uint32 v)
+{
+	uint32			t;
+	static uint64	PowersOfTen[] =
+	{1,10,100,
+	 1000, 1, 10,
+	 100,  1000,  1,
+	 10};
+
+	t = (pg_leftmost_one_pos32(v) + 1)*1233/4096;
+	return t + (v >= PowersOfTen[t]);
+}
+
+static inline uint32
+decimalLength64(const uint64 v)
+{
+	uint32			t;
+	static uint64	PowersOfTen[] =
+	{1,10,

Re: dropdb --force

2019-09-17 Thread Pavel Stehule
st 18. 9. 2019 v 4:57 odesílatel Pavel Stehule 
napsal:

>
>
> st 18. 9. 2019 v 4:53 odesílatel Ryan Lambert 
> napsal:
>
>> Hi Pavel,
>> I took a quick look through the patch, I'll try to build and test it
>> tomorrow.
>>
>>
>> --- a/src/include/nodes/parsenodes.h
>> +++ b/src/include/nodes/parsenodes.h
>> @@ -3145,6 +3145,7 @@ typedef struct DropdbStmt
>> NodeTag type;
>> char   *dbname; /* database to drop */
>> bool missing_ok; /* skip error if db is missing? */
>> + List   *options; /* currently only FORCE is supported */
>> } DropdbStmt;
>>
>> Why put FORCE as the single item in an options list?  A bool var seems
>> like it would be more clear and consistent.
>>
>
> see discussion - it was one from Tom's objections. It is due possible
> future enhancing or modification. It can looks strange, because now there
> is only one option, but it allow very easy modifications. More it is
> consistent with lot of other pg commands.
>

I can imagine so somebody will write support for concurrently dropping - so
this list will be longer

>
>
>> - * DROP DATABASE [ IF EXISTS ]
>> + * DROP DATABASE [ ( FORCE ) ] [ IF EXISTS ]
>>
>> Why is it `[ ( FORCE ) ]` instead of `[ FORCE ]`?
>> There are also places in the code that seem like extra () are around
>> FORCE.  Like here:
>>
>
> It was discussed before. FORCE is not reserved keyword, so inside list is
> due safety against possible collisions.
>
>
>> + appendPQExpBuffer(, "DROP DATABASE %s%s%s;",
>> +  (force ? " (FORCE) " : ""),
>> +  (if_exists ? "IF EXISTS " : ""), fmtId(dbname));
>>
>> And here:
>>
>> +$node->safe_psql('postgres', 'CREATE DATABASE foobar2');
>> +$node->issues_sql_like(
>> + [ 'dropdb', '--force', 'foobar2' ],
>> + qr/statement: DROP DATABASE (FORCE) foobar2/,
>> + 'SQL DROP DATABASE (FORCE) run');
>> +
>>
>> I'll try to build and test the patch tomorrow.
>>
>> Thanks,
>>
>> *Ryan Lambert*
>>
>>>



Re: dropdb --force

2019-09-17 Thread Pavel Stehule
st 18. 9. 2019 v 4:53 odesílatel Ryan Lambert 
napsal:

> Hi Pavel,
> I took a quick look through the patch, I'll try to build and test it
> tomorrow.
>
>
> --- a/src/include/nodes/parsenodes.h
> +++ b/src/include/nodes/parsenodes.h
> @@ -3145,6 +3145,7 @@ typedef struct DropdbStmt
> NodeTag type;
> char   *dbname; /* database to drop */
> bool missing_ok; /* skip error if db is missing? */
> + List   *options; /* currently only FORCE is supported */
> } DropdbStmt;
>
> Why put FORCE as the single item in an options list?  A bool var seems
> like it would be more clear and consistent.
>

see discussion - it was one from Tom's objections. It is due possible
future enhancing or modification. It can looks strange, because now there
is only one option, but it allow very easy modifications. More it is
consistent with lot of other pg commands.


> - * DROP DATABASE [ IF EXISTS ]
> + * DROP DATABASE [ ( FORCE ) ] [ IF EXISTS ]
>
> Why is it `[ ( FORCE ) ]` instead of `[ FORCE ]`?
> There are also places in the code that seem like extra () are around
> FORCE.  Like here:
>

It was discussed before. FORCE is not reserved keyword, so inside list is
due safety against possible collisions.


> + appendPQExpBuffer(, "DROP DATABASE %s%s%s;",
> +  (force ? " (FORCE) " : ""),
> +  (if_exists ? "IF EXISTS " : ""), fmtId(dbname));
>
> And here:
>
> +$node->safe_psql('postgres', 'CREATE DATABASE foobar2');
> +$node->issues_sql_like(
> + [ 'dropdb', '--force', 'foobar2' ],
> + qr/statement: DROP DATABASE (FORCE) foobar2/,
> + 'SQL DROP DATABASE (FORCE) run');
> +
>
> I'll try to build and test the patch tomorrow.
>
> Thanks,
>
> *Ryan Lambert*
>
>>
>>>


Re: dropdb --force

2019-09-17 Thread Ryan Lambert
Hi Pavel,
I took a quick look through the patch, I'll try to build and test it
tomorrow.


--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3145,6 +3145,7 @@ typedef struct DropdbStmt
NodeTag type;
char   *dbname; /* database to drop */
bool missing_ok; /* skip error if db is missing? */
+ List   *options; /* currently only FORCE is supported */
} DropdbStmt;

Why put FORCE as the single item in an options list?  A bool var seems like
it would be more clear and consistent.

- * DROP DATABASE [ IF EXISTS ]
+ * DROP DATABASE [ ( FORCE ) ] [ IF EXISTS ]

Why is it `[ ( FORCE ) ]` instead of `[ FORCE ]`?
There are also places in the code that seem like extra () are around
FORCE.  Like here:

+ appendPQExpBuffer(, "DROP DATABASE %s%s%s;",
+  (force ? " (FORCE) " : ""),
+  (if_exists ? "IF EXISTS " : ""), fmtId(dbname));

And here:

+$node->safe_psql('postgres', 'CREATE DATABASE foobar2');
+$node->issues_sql_like(
+ [ 'dropdb', '--force', 'foobar2' ],
+ qr/statement: DROP DATABASE (FORCE) foobar2/,
+ 'SQL DROP DATABASE (FORCE) run');
+

I'll try to build and test the patch tomorrow.

Thanks,

*Ryan Lambert*

>
>>


Re: [HACKERS] CLUSTER command progress monitor

2019-09-17 Thread Michael Paquier
On Tue, Sep 17, 2019 at 10:50:22PM -0300, Alvaro Herrera wrote:
> On 2019-Sep-18, Michael Paquier wrote:
>> So, with the clock ticking and the release getting close by, what do
>> we do for this set of issues?  REINDEX, CREATE INDEX and CLUSTER all
>> try to build indexes and the current infrastructure is not really
>> adapted to hold all that.  Robert, Alvaro and Peter E, do you have any
>> comments to offer?
> 
> Which part of it is not already fixed?

I can still see at least two problems.  There is one issue with
pgstat_progress_update_param() which gets called in reindex_table() 
for a progress phase of CLUSTER, and this even if
REINDEXOPT_REPORT_PROGRESS is not set in the options.  Also it seems
to me that the calls to pgstat_progress_start_command() and
pgstat_progress_end_command() are at incorrect locations for
reindex_index() and that those should be one level higher on the stack
to avoid any kind of interactions with another command whose progress
has already started.
--
Michael


signature.asc
Description: PGP signature


Re: Define jsonpath functions as stable

2019-09-17 Thread Jonathan S. Katz
On 9/17/19 10:00 PM, Chapman Flack wrote:
> On 09/17/19 21:13, Jonathan S. Katz wrote:
> 
>> to), which describes being able to use 

Re: Define jsonpath functions as stable

2019-09-17 Thread Chapman Flack
On 09/17/19 21:13, Jonathan S. Katz wrote:

> to), which describes being able to use 

Re: [HACKERS] CLUSTER command progress monitor

2019-09-17 Thread Alvaro Herrera
On 2019-Sep-18, Michael Paquier wrote:

> So, with the clock ticking and the release getting close by, what do
> we do for this set of issues?  REINDEX, CREATE INDEX and CLUSTER all
> try to build indexes and the current infrastructure is not really
> adapted to hold all that.  Robert, Alvaro and Peter E, do you have any
> comments to offer?

Which part of it is not already fixed?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pg_rewind docs correction

2019-09-17 Thread Michael Paquier
On Tue, Sep 17, 2019 at 08:38:18AM -0400, James Coleman wrote:
> I don't agree that that's a valid equivalency. I myself spent a lot of
> time trying to understand how this could possibly be true a while
> back, and even looked at source code to be certain. I've asked other
> people and found the same confusion.
> 
> As I read it the 2nd second sentence doesn't actually tell you the
> differences; it makes a quick attempt at summarizing *how* the first
> sentence is true, but if the first sentence isn't accurate, then it's
> hard to read the 2nd one as helping.

Well, then it comes back to the part where I am used to the existing
docs :)

> If you'd prefer something less detailed at this point at that point in
> the docs, then something along the lines of "results in a data
> directory state which can then be safely replayed from the source" or
> some such.

Actually this is a good suggestion, and could replace the first
sentence of this paragraph.

> The docs shouldn't be correct just for someone how already understands
> the intricacies. And the end user shouldn't have to read the "how it
> works" (which incidentally is kinda hidden at the bottom underneath
> the CLI args -- perhaps we could move that?) to extrapolate things in
> the primary documentation.

Perhaps.  This doc page is not that long either.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-17 Thread Michael Paquier
On Tue, Sep 17, 2019 at 01:06:18PM +0900, Michael Paquier wrote:
> On Tue, Sep 17, 2019 at 09:23:45AM +0530, Amit Kapila wrote:
>> If you want to use the same size array, then you might want to just
>> memset the previous array rather than first freeing it and then again
>> allocating it.  This is not a big point, so any which way is fine.
> 
> Sure.  This is less expensive though, so changed it the way you
> are suggesting on my local branch.

I am attaching an updated patch for now that I would like to commit.
Are there more comments about the shape of the patch, the name of the
columns for the function, etc.?
--
Michael
diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out
index 6a09d46a57..c8f6e1b4c6 100644
--- a/contrib/pageinspect/expected/page.out
+++ b/contrib/pageinspect/expected/page.out
@@ -86,80 +86,55 @@ SELECT * FROM fsm_page_contents(get_raw_page('test1', 'fsm', 0));
 -- always be the same in all test runs. we show raw flags by
 -- default: HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID.
 VACUUM FREEZE test1;
-SELECT t_infomask, t_infomask2, flags
+SELECT t_infomask, t_infomask2, raw_flags, combined_flags
 FROM heap_page_items(get_raw_page('test1', 0)),
- LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2) m(flags);
- t_infomask | t_infomask2 |   flags   
-+-+---
-   2816 |   2 | {HEAP_XMAX_INVALID,HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID}
+ LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2) m(raw_flags, combined_flags);
+ t_infomask | t_infomask2 | raw_flags |   combined_flags   
++-+---+
+   2816 |   2 | {HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID} | {HEAP_XMIN_FROZEN}
 (1 row)
 
 -- output the decoded flag HEAP_XMIN_FROZEN instead
-SELECT t_infomask, t_infomask2, flags
+SELECT t_infomask, t_infomask2, raw_flags, combined_flags
 FROM heap_page_items(get_raw_page('test1', 0)),
- LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2, true) m(flags);
- t_infomask | t_infomask2 |flags 
-+-+--
-   2816 |   2 | {HEAP_XMAX_INVALID,HEAP_XMIN_FROZEN}
+ LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2) m(raw_flags, combined_flags);
+ t_infomask | t_infomask2 | raw_flags |   combined_flags   
++-+---+
+   2816 |   2 | {HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID} | {HEAP_XMIN_FROZEN}
 (1 row)
 
 -- tests for decoding of combined flags
 -- HEAP_XMAX_SHR_LOCK = (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)
-SELECT heap_tuple_infomask_flags(x'0050'::int, 0, true);
- heap_tuple_infomask_flags 

- {HEAP_XMAX_SHR_LOCK}
-(1 row)
-
-SELECT heap_tuple_infomask_flags(x'0050'::int, 0, false);
-  heap_tuple_infomask_flags  
--
- {HEAP_XMAX_EXCL_LOCK,HEAP_XMAX_KEYSHR_LOCK}
+SELECT * FROM heap_tuple_infomask_flags(x'0050'::int, 0);
+  raw_flags  |combined_flags
+-+--
+ {HEAP_XMAX_KEYSHR_LOCK,HEAP_XMAX_EXCL_LOCK} | {HEAP_XMAX_SHR_LOCK}
 (1 row)
 
 -- HEAP_XMIN_FROZEN = (HEAP_XMIN_COMMITTED | HEAP_XMIN_INVALID)
-SELECT heap_tuple_infomask_flags(x'0300'::int, 0, true);
- heap_tuple_infomask_flags 

- {HEAP_XMIN_FROZEN}
-(1 row)
-
-SELECT heap_tuple_infomask_flags(x'0300'::int, 0, false);
-heap_tuple_infomask_flags
--
- {HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID}
+SELECT * FROM heap_tuple_infomask_flags(x'0300'::int, 0);
+raw_flags|   combined_flags   
+-+
+ {HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID} | {HEAP_XMIN_FROZEN}
 (1 row)
 
 -- HEAP_MOVED = (HEAP_MOVED_IN | HEAP_MOVED_OFF)
-SELECT heap_tuple_infomask_flags(x'C000'::int, 0, true);
- heap_tuple_infomask_flags 

- {HEAP_MOVED}
+SELECT * FROM heap_tuple_infomask_flags(x'C000'::int, 0);
+   raw_flags| combined_flags 
++
+ {HEAP_MOVED_OFF,HEAP_MOVED_IN} | {HEAP_MOVED}
 (1 row)
 
-SELECT heap_tuple_infomask_flags(x'C000'::int, 0, false);
-   heap_tuple_infomask_flags
-
- {HEAP_MOVED_IN,HEAP_MOVED_OFF}
-(1 row)
-
--- HEAP_LOCKED_UPGRADED = (HEAP_XMAX_IS_MULTI | HEAP_XMAX_LOCK_ONLY)
-SELECT heap_tuple_infomask_flags(x'1080'::int, 0, 

Re: Compiler warnings with MinGW

2019-09-17 Thread Michael Paquier
On Tue, Sep 17, 2019 at 12:00:39PM +0200, Peter Eisentraut wrote:
> committed

Thanks, Peter.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] CLUSTER command progress monitor

2019-09-17 Thread Michael Paquier
On Sat, Sep 14, 2019 at 11:45:47AM +0900, Michael Paquier wrote:
> I have provided a short summary of the two issues on the open item
> page (https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items) as
> the open item was too much evasive.  Here is a copy-paste for the
> archives of what I wrote:
> 1) A progress may be started while another one is already in progress.
> Hence, if progress gets stopped the previously-started state is
> removed, causing all follow-up updates to not happen.
> 2) Progress updates happening in a code path shared between those
> three commands may clobber a previous state present.
> 
> Regarding 1) and based on what I found in the code, you can blame
> REINDEX reporting which has added progress_start calls in code paths
> which are also taken by CREATE INDEX and CLUSTER, causing their
> progress reporting to go to the void.  In order to fix this one we
> could do what I summarized in [1].
> 
> [1]: https://www.postgresql.org/message-id/20190905010316.gb14...@paquier.xyz

So, with the clock ticking and the release getting close by, what do
we do for this set of issues?  REINDEX, CREATE INDEX and CLUSTER all
try to build indexes and the current infrastructure is not really
adapted to hold all that.  Robert, Alvaro and Peter E, do you have any
comments to offer?
--
Michael


signature.asc
Description: PGP signature


PostgreSQL 12 RC1 + GA Dates

2019-09-17 Thread Michael Paquier
Hi all,

Based on the current status of the open items and where we are at in
the release cycle, the date for the first release candidate of
PostgreSQL 12 will be 2019-09-26.

If all goes well with RC1, the PostgreSQL 12.0 GA release will be
2019-10-03.  This is subject to change if we find any issues during
the RC1 period that indicate we need to make an additional release
candidate prior to going GA.

To the entire community, patch authors, reviewers and committers,
thank you for all of your hard work on developing PostgreSQL 12!

On behalf of the RMT,
--
Michael


signature.asc
Description: PGP signature


Re: refactoring - share str2*int64 functions

2019-09-17 Thread Michael Paquier
On Tue, Sep 17, 2019 at 11:29:13AM +0900, Michael Paquier wrote:
> In order to unify the parsing interface and put all the conversion
> routines in a single place, I still think that the patch has value so
> I would still keep it (with a fix for the queryId parsing of course),
> but there is much more to it.

As of now, here is an updated patch which takes the path to not
complicate the refactored APIs and fixes the issue with queryID in
readfuncs.c.  Thoughts?
--
Michael
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 221b47298c..7af05fc009 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -1022,7 +1022,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		/* parse command tag to retrieve the number of affected rows. */
 		if (completionTag &&
 			strncmp(completionTag, "COPY ", 5) == 0)
-			rows = pg_strtouint64(completionTag + 5, NULL, 10);
+			rows = pg_strtouint64_check(completionTag + 5);
 		else
 			rows = 0;
 
diff --git a/contrib/spi/refint.c b/contrib/spi/refint.c
index adf0490f85..b063f1e122 100644
--- a/contrib/spi/refint.c
+++ b/contrib/spi/refint.c
@@ -306,7 +306,7 @@ check_foreign_key(PG_FUNCTION_ARGS)
 		/* internal error */
 		elog(ERROR, "check_foreign_key: too short %d (< 5) list of arguments", nargs);
 
-	nrefs = pg_strtoint32(args[0]);
+	nrefs = pg_strtoint32_check(args[0]);
 	if (nrefs < 1)
 		/* internal error */
 		elog(ERROR, "check_foreign_key: %d (< 1) number of references specified", nrefs);
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 2c0ae395ba..1272b4ee04 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2338,8 +2338,7 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 	CreateTableAsStmt *ctastmt = (CreateTableAsStmt *) stmt->utilityStmt;
 
 	if (strncmp(completionTag, "SELECT ", 7) == 0)
-		_SPI_current->processed =
-			pg_strtouint64(completionTag + 7, NULL, 10);
+		_SPI_current->processed = pg_strtouint64_check(completionTag + 7);
 	else
 	{
 		/*
@@ -2361,8 +2360,7 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 else if (IsA(stmt->utilityStmt, CopyStmt))
 {
 	Assert(strncmp(completionTag, "COPY ", 5) == 0);
-	_SPI_current->processed = pg_strtouint64(completionTag + 5,
-			 NULL, 10);
+	_SPI_current->processed = pg_strtouint64_check(completionTag + 5);
 }
 			}
 
diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
index a9bd47d937..ac2bf2f2c4 100644
--- a/src/backend/libpq/pqmq.c
+++ b/src/backend/libpq/pqmq.c
@@ -286,10 +286,10 @@ pq_parse_errornotice(StringInfo msg, ErrorData *edata)
 edata->hint = pstrdup(value);
 break;
 			case PG_DIAG_STATEMENT_POSITION:
-edata->cursorpos = pg_strtoint32(value);
+edata->cursorpos = pg_strtoint32_check(value);
 break;
 			case PG_DIAG_INTERNAL_POSITION:
-edata->internalpos = pg_strtoint32(value);
+edata->internalpos = pg_strtoint32_check(value);
 break;
 			case PG_DIAG_INTERNAL_QUERY:
 edata->internalquery = pstrdup(value);
@@ -316,7 +316,7 @@ pq_parse_errornotice(StringInfo msg, ErrorData *edata)
 edata->filename = pstrdup(value);
 break;
 			case PG_DIAG_SOURCE_LINE:
-edata->lineno = pg_strtoint32(value);
+edata->lineno = pg_strtoint32_check(value);
 break;
 			case PG_DIAG_SOURCE_FUNCTION:
 edata->funcname = pstrdup(value);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 764e3bb90c..4ed69f3d2b 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -80,7 +80,7 @@
 #define READ_UINT64_FIELD(fldname) \
 	token = pg_strtok();		/* skip :fldname */ \
 	token = pg_strtok();		/* get field value */ \
-	local_node->fldname = pg_strtouint64(token, NULL, 10)
+	local_node->fldname = strtouint64(token)
 
 /* Read a long integer field (anything written as ":fldname %ld") */
 #define READ_LONG_FIELD(fldname) \
@@ -189,6 +189,27 @@
 #define nullable_string(token,length)  \
 	((length) == 0 ? NULL : debackslash(token, length))
 
+/*
+ * pg_strtouint64
+ *	  Converts 'str' into an unsigned 64-bit integer.
+ *
+ * This has an identical API to strtoul(3) with base 10, except that
+ * it will handle 64-bit ints even where "long" is narrower than that.
+ *
+ * This has the advantage to not consider as a syntax failure any characters
+ * present in the string after the conversion is done.
+ */
+static uint64
+strtouint64(const char *str)
+{
+#ifdef _MSC_VER/* MSVC only */
+   return _strtoui64(str, NULL, 10);
+#elif defined(HAVE_STRTOULL) && SIZEOF_LONG < 8
+   return strtoull(str, NULL, 10);
+#else
+   return strtoul(str, NULL, 10);
+#endif
+}
 
 /*
  * _readBitmapset
diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index 1baf7ef31f..cc846a71f1 100644
--- 

Re: Define jsonpath functions as stable

2019-09-17 Thread Jonathan S. Katz
On 9/17/19 6:40 PM, Tom Lane wrote:
> "Jonathan S. Katz"  writes:
>> v2 attached. Thanks!
> 
> I whacked this around some (well, quite a bit actually);

So I see :) Thanks.

> notably,
> I thought we'd better describe things that are in our engine but
> not XQuery, as well as vice-versa.

Yeah, that makes sense. Overall it reads really well. One question I had
in my head (and probably should have asked) was answered around the \w
character class wrt collation.

> After a re-read of the XQuery spec, it seems to me that the character
> entry form that they have and we don't is actually 

Re: subscriptionCheck failures on nightjar

2019-09-17 Thread Michael Paquier
On Tue, Sep 17, 2019 at 09:45:10PM +0200, Tomas Vondra wrote:
> FWIW I agree with Andres that there probably is an actual bug. The file
> should not just disappear like this, it's clearly unexpected so the
> PANIC does not seem entirely inappropriate.

Agreed.

> I've tried reproducing the  issue on my local systems, with the extra
> sleeps between fsyncs and so on, but I haven't managed to trigger it so
> far :-(

On my side, I have let this thing run for a couple of hours with a
patched version to include a sleep between the rename and the sync but
I could not reproduce it either:
#!/bin/bash
attempt=0
while true; do
attempt=$((attempt+1))
echo "Attempt $attempt"
cd $HOME/postgres/src/test/recovery/
PROVE_TESTS=t/006_logical_decoding.pl make check > /dev/null 2>&1
ERRNUM=$?
if [ $ERRNUM != 0 ]; then
echo "Failed at attempt $attempt"
exit $ERRNUM
fi
done
> Yes, it should be moved to the older section - it's clearly a v11 bug.

And agreed.
--
Michael


signature.asc
Description: PGP signature


scorpionfly needs more semaphores

2019-09-17 Thread Thomas Munro
Hi,

We're seeing occasional failures like this:

running bootstrap script ... 2019-09-13 12:11:26.882 PDT [64926]
FATAL:  could not create semaphores: No space left on device
2019-09-13 12:11:26.882 PDT [64926] DETAIL:  Failed system call was
semget(5728001, 17, 03600).

I think you should switch to using "named" POSIX semaphores by
building with USE_NAMED_POSIX_SEMAPHORES (then it'll create a
squillion little files under /tmp and mmap() them), or increase the
number of SysV semaphores you can create with sysctl[1], or finish
writing your operating system[2] so you can switch to "unnamed" POSIX
semaphores :-)

[1] https://www.postgresql.org/message-id/flat/27582.1546928073%40sss.pgh.pa.us
[2] https://github.com/openbsd/src/blob/master/lib/librthread/rthread_sem.c#L112

-- 
Thomas Munro
https://enterprisedb.com




Re: Define jsonpath functions as stable

2019-09-17 Thread Tom Lane
"Jonathan S. Katz"  writes:
> v2 attached. Thanks!

I whacked this around some (well, quite a bit actually); notably,
I thought we'd better describe things that are in our engine but
not XQuery, as well as vice-versa.

After a re-read of the XQuery spec, it seems to me that the character
entry form that they have and we don't is actually 

Re: Attempt to consolidate reading of XLOG page

2019-09-17 Thread Alvaro Herrera
I was confused by the struct name XLogSegment -- the struct is used to
represent a WAL segment while it's kept open, rather than just a WAL
segment in abstract.  Also, now that we've renamed everything to use the
term WAL, it seems wrong to use the name XLog for new structs.  I
propose the name WALOpenSegment for the struct, which solves both
problems.  (Its initializer function would get the name
WALOpenSegmentInit.)

Now, the patch introduces a callback for XLogRead, the type of which is
called XLogOpenSegment.  If we rename it from XLog to WAL, both names
end up the same.  I propose to rename the function type to
WALSegmentOpen, which in a "noun-verb" view of the world, represents the
action of opening a WAL segment.

I attach a patch for all this renaming, on top of your series.

I wonder if each of those WALSegmentOpen callbacks should reset [at
least some members of] the struct; they're already in charge of setting
->file, and apparently we're leaving the responsibility of setting the
rest of the members to XLogRead.  That seems weird.  Maybe we should say
that the CB should only open the segment and not touch the struct at all
and XLogRead is in charge of everything.  Perhaps the other way around
-- the CB should set everything correctly ... I'm not sure which is
best.  But having half here and half there seems a recipe for confusion
and bugs.


Another thing I didn't like much is that everything seems to assume that
the only error possible from XLogRead is a read error.  Maybe that's
okay, because it seems to be the current reality, but it seemed odd.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 2a80bc5823..3e03d72a18 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -100,7 +100,7 @@ XLogReaderAllocate(int wal_segment_size, XLogPageReadCB pagereadfunc,
 	}
 
 	/* Initialize segment pointer. */
-	XLogSegmentInit(>seg, wal_segment_size);
+	WALOpenSegmentInit(>seg, wal_segment_size);
 
 	state->read_page = pagereadfunc;
 	/* system_identifier initialized to zeroes above */
@@ -1006,7 +1006,7 @@ out:
  * Initialize the passed segment pointer.
  */
 void
-XLogSegmentInit(XLogSegment *seg, int size)
+WALOpenSegmentInit(WALOpenSegment *seg, int size)
 {
 	seg->file = -1;
 	seg->num = 0;
@@ -1032,7 +1032,7 @@ XLogSegmentInit(XLogSegment *seg, int size)
  */
 bool
 XLogRead(char *buf, XLogRecPtr startptr, Size count,
-		 TimeLineID *tli, XLogSegment *seg, XLogOpenSegment openSegment)
+		 TimeLineID *tli, WALOpenSegment *seg, WALSegmentOpen openSegment)
 {
 	char	   *p;
 	XLogRecPtr	recptr;
@@ -1120,7 +1120,7 @@ XLogRead(char *buf, XLogRecPtr startptr, Size count,
  * Backend-specific code to handle errors encountered by XLogRead().
  */
 void
-XLogReadProcessError(XLogSegment *seg)
+XLogReadProcessError(WALOpenSegment *seg)
 {
 	if (errno != 0)
 	{
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 5b0ba39f17..2b9758c3ce 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -776,13 +776,15 @@ XLogReadDetermineTimeline(XLogReaderState *state, XLogRecPtr wantPage, uint32 wa
  */
 static void
 read_local_xlog_page_open_segment(XLogSegNo nextSegNo, TimeLineID *tli,
-  XLogSegment *seg)
+  WALOpenSegment *seg)
 {
 	char		path[MAXPGPATH];
 
 	XLogFilePath(path, *tli, nextSegNo, seg->size);
 	seg->file = BasicOpenFile(path, O_RDONLY | PG_BINARY);
 
+	/* XXX should we clear the other fields of WALOpenSegment?? */
+
 	if (seg->file < 0)
 	{
 		if (errno == ENOENT)
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 072d46d853..a5ca834020 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -128,7 +128,7 @@ bool		log_replication_commands = false;
  */
 bool		wake_wal_senders = false;
 
-static XLogSegment *sendSeg = NULL;
+static WALOpenSegment *sendSeg = NULL;
 
 /*
  * These variables keep track of the state of the timeline we're currently
@@ -248,7 +248,7 @@ static TimeOffset LagTrackerRead(int head, XLogRecPtr lsn, TimestampTz now);
 static bool TransactionIdInRecentPast(TransactionId xid, uint32 epoch);
 
 static void WalSndOpenSegment(XLogSegNo nextSegNo, TimeLineID *tli,
-			  XLogSegment *seg);
+			  WALOpenSegment *seg);
 
 
 /* Initialize walsender process before entering the main command loop */
@@ -279,8 +279,9 @@ InitWalSender(void)
 	lag_tracker = MemoryContextAllocZero(TopMemoryContext, sizeof(LagTracker));
 
 	/* Make sure we can remember the current read position in XLOG. */
-	sendSeg = (XLogSegment *) MemoryContextAlloc(TopMemoryContext, sizeof(XLogSegment));
-	XLogSegmentInit(sendSeg, wal_segment_size);
+	sendSeg = (WALOpenSegment *)
+		

Re: some PostgreSQL 12 release notes comments

2019-09-17 Thread Jonathan S. Katz
On 9/17/19 4:10 PM, Peter Eisentraut wrote:
> On 2019-09-17 21:55, Jonathan S. Katz wrote:
>>> * Encrypted TCP/IP connections using GSSAPI encryption
>>
>> +1, though I would s/GSSAPI encryption/ with s/GSSAPI authentcation/
> 
> But you're not encrypting the communication using GSSAPI authentication,
> you're encrypting it using GSSAPI encryption.

Ah, I also did a s/using/for/ in my head, but that's still not correct.
+1 to your suggested wording with no alterations.

 * Discovery of LDAP servers if PostgreSQL is built with OpenLDAP
>>>
>>> I would remove the "if" part from the major features list, since it's a
>>> qualification of minor importance.  Instead I'd write something like
>>>
>>> * Discovery of LDAP servers using DNS SRV
>>>
>>> which is a clearer concept that people can easily recognize.
>>
>> I agree it's clearer, I'm not sure if the OpenLDAP semantic above
>> changes things? I'm not sure the relative frequency of PostgreSQL being
>> built with OpenLDAP vs. other LDAP libs.
> 
> I suppose it's not-Windows vs. Windows.
> 
> It's OK if we mention OpenLDAP in the release notes, but it doesn't seem
> to belong in the major features section.

Well, if you're a Windows user, you may then be disappointed later on to
find out it does not work :)

That said, I'm ok with the more concise wording.

Jonathan



signature.asc
Description: OpenPGP digital signature


Re: some PostgreSQL 12 release notes comments

2019-09-17 Thread Tom Lane
Peter Eisentraut  writes:
> * Add GSSAPI encryption support (Robbie Harwood, Stephen Frost)

>   This allows TCP/IP connections to be encrypted when using GSSAPI
>   authentication without having to set up a separate encryption facility
>   like SSL.

Hmm, does that imply that you don't have to have compiled --with-openssl,
or just that you don't have to bother with setting up SSL certificates?
But you already don't have to do the latter.  I'd be the first to admit
that I know nothing about GSSAPI, but this text still doesn't enlighten
me about why I should learn.

>> * Discovery of LDAP servers if PostgreSQL is built with OpenLDAP

> I would remove the "if" part from the major features list, since it's a
> qualification of minor importance.

OK

>> * Allow data type name to use non-C collations

> I'm not sure why that is listed in the "Migration" section.

I think Bruce was reacting to this comment in the commit log for
478cacb50:

Prior to v12, if you used a collation-sensitive regex feature in a
pattern handled by processSQLNamePattern() (for instance, \d '\\w+'
in psql), the behavior you got matched the database's default collation.
Since commit 586b98fdf you'd usually get C-collation behavior, because
the catalog "name"-type columns are now marked as COLLATE "C".  Add
explicit COLLATE specifications to restore the prior behavior.

(Note for whoever writes the v12 release notes: the need for this shows
that while 586b98fdf preserved pre-v12 behavior of "name" columns for
simple comparison operators, it changed the behavior of regex operators
on those columns.  Although this patch fixes it for pattern matches
generated by our own tools, user-written queries will still be affected.
So we'd better mention this issue as a compatibility item.)

The existing text for this item doesn't make that aspect clear
enough, perhaps.

regards, tom lane




Re: another look at macOS SIP

2019-09-17 Thread Tom Lane
Peter Eisentraut  writes:
> On 2019-09-17 21:43, Tom Lane wrote:
>> Yeah, that's what I was suggesting.  "Use another copy of Perl" doesn't
>> seem like an acceptable answer, or at least it's hardly better than
>> "turn off SIP".

> In my mind, the Perl aspect of this is the most trivial part of the
> problem.  "brew install perl" is probably faster than writing out this
> email.  But I suppose everyone has their own workflows.

There's a not-insignificant contingent that don't wish to install
either homebrew or macports.

regards, tom lane




Re: Allow an alias to be attached directly to a JOIN ... USING

2019-09-17 Thread Peter Eisentraut
On 2019-09-17 19:37, Alvaro Herrera wrote:
> On 2019-Aug-01, Thomas Munro wrote:
> 
>> Indeed, that seems like a problem, and it's a good question.  You can
>> see this on unpatched master with SELECT x.filler FROM
>> (pgbench_tellers AS t JOIN b USING (bid)) AS x.
> 
> I'm not sure I understand why that problem is a blocker for this patch.

I tried to analyze the spec for what the behavior should be here, but I
got totally lost.  I'll give it another look.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: another look at macOS SIP

2019-09-17 Thread Peter Eisentraut
On 2019-09-17 21:43, Tom Lane wrote:
> Yeah, that's what I was suggesting.  "Use another copy of Perl" doesn't
> seem like an acceptable answer, or at least it's hardly better than
> "turn off SIP".

In my mind, the Perl aspect of this is the most trivial part of the
problem.  "brew install perl" is probably faster than writing out this
email.  But I suppose everyone has their own workflows.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: ERROR: multixact X from before cutoff Y found to be still running

2019-09-17 Thread Bossart, Nathan
On 9/6/19, 10:26 AM, "Robert Haas"  wrote:
> On Thu, Sep 5, 2019 at 4:08 PM Bossart, Nathan  wrote:
>> Right, the v2 patch will effectively ramp-down the freezemin as your
>> freeze_max_age gets smaller, while the v1 patch will set the effective
>> freezemin to zero as soon as your multixact age passes the threshold.
>> I think what is unclear to me is whether this ramp-down behavior is
>> the intended functionality or we should be doing something similar to
>> what we do for regular transaction IDs (i.e. force freezemin to zero
>> right after it hits the "oldest xmin is far in the past" threshold).
>> The comment above MultiXactMemberFreezeThreshold() explains things
>> pretty well, but AFAICT it is more geared towards influencing
>> autovacuum scheduling.  I agree that v2 is safer from the standpoint
>> that it changes as little as possible, though.
>
> I don't presently have a view on fixing the actual but here, but I can
> certainly confirm that I intended MultiXactMemberFreezeThreshold() to
> ratchet up the pressure gradually rather than all at once, and my
> suspicion is that this behavior may be good to retain, but I'm not
> sure.

Thanks for the detailed background information.  FWIW I am now in
favor of the v2 patch.

Nathan



Re: some PostgreSQL 12 release notes comments

2019-09-17 Thread Peter Eisentraut
On 2019-09-17 21:55, Jonathan S. Katz wrote:
>> * Encrypted TCP/IP connections using GSSAPI encryption
> 
> +1, though I would s/GSSAPI encryption/ with s/GSSAPI authentcation/

But you're not encrypting the communication using GSSAPI authentication,
you're encrypting it using GSSAPI encryption.

>>> * Discovery of LDAP servers if PostgreSQL is built with OpenLDAP
>>
>> I would remove the "if" part from the major features list, since it's a
>> qualification of minor importance.  Instead I'd write something like
>>
>> * Discovery of LDAP servers using DNS SRV
>>
>> which is a clearer concept that people can easily recognize.
> 
> I agree it's clearer, I'm not sure if the OpenLDAP semantic above
> changes things? I'm not sure the relative frequency of PostgreSQL being
> built with OpenLDAP vs. other LDAP libs.

I suppose it's not-Windows vs. Windows.

It's OK if we mention OpenLDAP in the release notes, but it doesn't seem
to belong in the major features section.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Small const correctness patch

2019-09-17 Thread Peter Eisentraut
On 2019-08-08 08:46, Mark G wrote:
> Please find attached a trivial patch making a few arrays const (in
> addition to the data they point to).

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




[WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR

2019-09-17 Thread Dent John
Hi folks,Prompted originally by a post by Roman Pekar [1], I wanted to share a revised version of a patch that allows REFCURSOR results to be consumed as data in a regular SQL query as well as my thoughts on how to improve the area as a whole.In order to be clear about the purpose and how I see it fitting into a broader context, I’ve started a new thread and I’d welcome discussion about it.Background--The ambition of this contribution is to make PostgreSQL able to efficiently support procedural language functions that either produce or consume sets (or both).PostgreSQL already has some support for this in functions that return SETOFs. However, as my review [3] identified, there are some gaps in PostgreSQL’s current capability, as well as scope for extension to improve its overall capability.This first patch addresses only a small part of the overall ambition, but I wanted to share both the patch and the overall ambition as work in progress, and I’d welcome comments on both. (The patch is still based on 12beta2.)Problems to be solved-1. Allow procedural languages (PLs) to efficiently consume sets of recordsPostgreSQL does allow PL functions to consume sets, however it does so be feeding records to the function, one row per function invocation. REFCURSORs, however can be supplied as input parameters and their content consumed by the function as it wishes, but only if the PL supports the REFCURSOR concept.Typically, PLs do allow SQL queries to be executed within the PL function [5, 6, 7]. However REFCURSOR results cannot be effectively consumed in a regular SQL SELECT, significantly limiting their use.2. Allow plpgsql functions to efficiently return sets of recordsBy ‘efficiently’, I mean that a large result set should not be required to be staged before the executor is able to process it. Staging is not an issue for small sets, but for large sets and especially if they are subject to further processing, intermediate staging it is a performance disadvantage.PostgreSQL already has some support for this functions that return SETOFs. At present, plpgsql cannot take advantage of this support while also achieving the efficiency criteria because, as the documentation [4] notes, all returned data is staged before it is retuned.Addressing this limitation could also of benefit to other PLs, however a quick survey finds at least PL Python is already well-adapted to efficiently return SETOFs.3. Allow optimisation of a returned queryplpgsql offers a syntactic shortcut to return the results of a SQL query directly. Despite appearing to return a query, the RETURN QUERY syntax actually returns the /results/ of the query [4]. This means the optimiser has no opportunity to influence its execution, such as by pushing down expressions into its WHERE clause, or taking advantage of alternative indexes to modify its sort order.Other PLs are similarly constrained. Most PLs lack plpgsql’s syntactic sugar, but even though some PLs are better able to efficiently return SETOFs row-by-row, the optimiser cannot see “inside” the query that the PL executes even if its intent is to return the results directly.Only SQL language functions are afforded the luxury of integration into then outer statement’s plan [8], but even SQL language functions are somewhat constrained in the types of dynamism that are permitted.4. Allow a set-returning query to be supplied as an input parameterIt is possible to supply a scalar value, or function call that returns a scalar value as an input parameter. And, with problems 1 & 2 addressed, sets can be supplied as input parameters. However, a literal set-returning SQL query cannot be supplied as a parameter (not without PostgreSQL invoking the ‘calling’ function for each row in the set). REFCURSORs cannot be constructed natively from SQL.A simplistic response would provide a trivial constructor for REFCURSORs, accepting the query as a text parameter. However it is quite unnatural to supply SQL in textual form, more especially to do so safely. So the challenge is to allow a set-returning subquery to be provided as a parameter in literal form.Why are these problems important?-My personal wish is for PostgreSQL to offer a feature set that is consistent with itself and without arbitrary limitation. A better argument might be that that it is desirable to match the features of other RDBMSs [9], or for reason of the use cases they address [10] that are new or push the boundaries of what PostgreSQL can do [1], or that are important such as fulfilling a DW/ETL need [11], or to more directly address approaches touted of NoSQL such as Map Reduce [12].Design and implementation-1. Set returning function (SRF) for REFCURSORTackling first problems 1 and (part of) 2, it seems easy and obvious to allow that REFCURSORs can be consumed in a SELECT query.PostgreSQL already allows an array to be consumed one record per entry via the UNNEST(anyarray) 

Re: [HACKERS] Restricting maximum keep segments by repslots

2019-09-17 Thread Alvaro Herrera
Hello

I have a couple of API-level reservation about this patch series.

Firstly, "behind" when used as a noun refers to buttocks.  Therefore,
the ReplicationSlotsEnumerateBehinds function name seems funny (I think
when used as a preposition you wouldn't put it in plural).  I don't
suggest a substitute name, because the API itself doesn't convince me; I
think it would be sufficient to have it return a single slot name,
perhaps the one that is behind the most ... or maybe the one that is
behind the least?  This simplifies a lot of code (in particular you do
away with the bunch of statics, right?), and I don't think the warning
messages loses anything, because for details the user should really look
into the monitoring view anyway.

I didn't like GetLsnAvailability() returning a string either.  It seems
more reasonable to me to define a enum with possible return states, and
have the enum value be expanded to some string in
pg_get_replication_slots().

In the same function, I think that setting restBytes to -1 when
"useless" is bad style.  I would just leave that variable alone when the
returned status is not one that receives the number of bytes.  So the
caller is only entitled to read the value if the returned enum value is
such-and-such ("keeping" and "streaming" I think).

I'm somewhat uncomfortable with the API change to GetOldestKeepSegment
in 0002.  Can't its caller do the math itself instead?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: some PostgreSQL 12 release notes comments

2019-09-17 Thread Jonathan S. Katz
On 9/17/19 1:09 PM, Peter Eisentraut wrote:
>> * Client- and server-side encryption for authentication using GSSAPI
> 
> This is on the wire encryption, so I don't know why it says client-side
> and server-side.  Proposal:
> 
> * Encrypted TCP/IP connections using GSSAPI encryption

+1, though I would s/GSSAPI encryption/ with s/GSSAPI authentcation/

> in the major features section, and later
> 
> * Add GSSAPI encryption support (Robbie Harwood, Stephen Frost)

Perhaps "* Add encrypted connection support for GSSAPI authentication
(Robbie Harwood, Stephen Frost)"

>   This allows TCP/IP connections to be encrypted when using GSSAPI
>   authentication without having to set up a separate encryption facility
>   like SSL.

+1.

>> * Discovery of LDAP servers if PostgreSQL is built with OpenLDAP
> 
> I would remove the "if" part from the major features list, since it's a
> qualification of minor importance.  Instead I'd write something like
> 
> * Discovery of LDAP servers using DNS SRV
> 
> which is a clearer concept that people can easily recognize.

I agree it's clearer, I'm not sure if the OpenLDAP semantic above
changes things? I'm not sure the relative frequency of PostgreSQL being
built with OpenLDAP vs. other LDAP libs.

Regardless, I do like your change and would +1 it.

Would you like me to make a patch for it or are you planning to?

>> * Allow data type name to use non-C collations
> 
> I'm not sure why that is listed in the "Migration" section.
> 
> It's also a bit confusing as a release note item relative to PostgreSQL
> 11.  I believe the changes were that "name" was made collation aware and
> that the collation was set to "C" in the system catalogs (which is a
> separate item later).  This group of items could use a reshuffling.

I can't make an informed opinion on this one, so I defer to the experts.

Thanks!

Jonathan



signature.asc
Description: OpenPGP digital signature


Re: pgbench - allow to create partitioned tables

2019-09-17 Thread Erikjan Rijkers

On 2019-09-17 20:49, Fabien COELHO wrote:

Attached v9:

[pgbench-init-partitioned-9.patch]


Turns out this patch needed a dos2unix treatment.

It's easy to do but it takes time to figure it out (I'm dumb).  I for 
one would be happy to receive patches not so encumbered :)



thanks,

Erik Rijkers




Re: subscriptionCheck failures on nightjar

2019-09-17 Thread Tomas Vondra

On Tue, Sep 17, 2019 at 12:39:33PM -0400, Tom Lane wrote:

Robert Haas  writes:

On Mon, Aug 26, 2019 at 9:29 AM Tomas Vondra
 wrote:

This is one of the remaining open items, and we don't seem to be moving
forward with it :-(



Why exactly is this an open item, anyway?


The reason it's still here is that Andres expressed a concern that
there might be more than meets the eye in this.  What meets the eye
is that PANICing on file-not-found is not appropriate here, but Andres
seemed to think that the file not being present might reflect an
actual bug not just an expectable race condition [1].

Personally I'd be happy just to treat it as an expectable case and
fix the code to not PANIC on file-not-found.



FWIW I agree with Andres that there probably is an actual bug. The file
should not just disappear like this, it's clearly unexpected so the
PANIC does not seem entirely inappropriate.

I've tried reproducing the  issue on my local systems, with the extra
sleeps between fsyncs and so on, but I haven't managed to trigger it so
far :-(


In either case, it probably belongs in the "older bugs" section;
nightjar is showing the same failure on v11 from time to time.



Yes, it should be moved to the older section - it's clearly a v11 bug.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: another look at macOS SIP

2019-09-17 Thread Tom Lane
Andres Freund  writes:
> On 2019-09-10 19:14:19 +0200, Peter Eisentraut wrote:
>> There is a minor second issue, namely that /usr/bin/perl also filters
>> out DYLD_* environment variables.  This can be worked around again by
>> using a third-party installation of Perl.

> Hm, could we just have perl code set DYLD_* again? I assume we don't
> need prove itself to have it set, and for the testscripts we could just
> set it in TestLib.pm or such?

Yeah, that's what I was suggesting.  "Use another copy of Perl" doesn't
seem like an acceptable answer, or at least it's hardly better than
"turn off SIP".

regards, tom lane




Re: pgbench - allow to create partitioned tables

2019-09-17 Thread Fabien COELHO


Attached v9:

 - remove the PART_UNKNOWN and use partitions = -1 to tell
   that there is an error, and partitions >= 1 to print info
 - use search_path to find at most one pgbench_accounts
   It still uses left join because I still think that it is appropriate.
   I added a lateral to avoid repeating the array_position call
   to manage the search_path, and use explicit pg_catalog everywhere.
 - let the wrongly repeated test name as is
 - somehow use pg_checksums tablespace creation method, however:
   - I kept testing that mkdir succeeds
   - I kept escaping single quotes, if the path contains a "'"
   so the only difference is that on some msys platform it may
   avoid some unclear issue.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index c857aa3cba..e3a0abb4c7 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -306,6 +306,31 @@ pgbench  options  d
   
  
 
+ 
+  --partitions=NUM
+  
+   
+Create a partitioned pgbench_accounts table with
+NUM partitions of nearly equal size for
+the scaled number of accounts.
+Default is 0, meaning no partitioning.
+   
+  
+ 
+
+ 
+  --partition-method=NAME
+  
+   
+Create a partitioned pgbench_accounts table with
+NAME method.
+Expected values are range or hash.
+This option requires that --partitions is set to non-zero.
+If unspecified, default is range.
+   
+  
+ 
+
  
   --tablespace=tablespace
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ed7652bfbf..0385932208 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -186,6 +186,15 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
+/* partitioning for pgbench_accounts table, 0 for no partitioning, -1 for bad */
+static int 		partitions = 0;
+
+typedef enum { PART_NONE, PART_RANGE, PART_HASH }
+  partition_method_t;
+
+static partition_method_t partition_method = PART_NONE;
+static const char *PARTITION_METHOD[] = { "none", "range", "hash" };
+
 /* random seed used to initialize base_random_sequence */
 int64		random_seed = -1;
 
@@ -617,6 +626,9 @@ usage(void)
 		   "  --foreign-keys   create foreign key constraints between tables\n"
 		   "  --index-tablespace=TABLESPACE\n"
 		   "   create indexes in the specified tablespace\n"
+		   "  --partitions=NUM partition pgbench_accounts in NUM parts (default: 0)\n"
+		   "  --partition-method=(range|hash)\n"
+		   "   partition pgbench_accounts with this method (default: range)\n"
 		   "  --tablespace=TABLESPACE  create tables in the specified tablespace\n"
 		   "  --unlogged-tablescreate tables as unlogged tables\n"
 		   "\nOptions to select what to run:\n"
@@ -3601,6 +3613,17 @@ initDropTables(PGconn *con)
 	 "pgbench_tellers");
 }
 
+/*
+ * add fillfactor percent option if not 100.
+ */
+static void
+append_fillfactor(char *opts, int len)
+{
+	if (fillfactor < 100)
+		snprintf(opts + strlen(opts), len - strlen(opts),
+ " with (fillfactor=%d)", fillfactor);
+}
+
 /*
  * Create pgbench's standard tables
  */
@@ -3664,9 +3687,15 @@ initCreateTables(PGconn *con)
 
 		/* Construct new create table statement. */
 		opts[0] = '\0';
-		if (ddl->declare_fillfactor)
+
+		/* Partition pgbench_accounts table */
+		if (partitions >= 1 && strcmp(ddl->table, "pgbench_accounts") == 0)
 			snprintf(opts + strlen(opts), sizeof(opts) - strlen(opts),
-	 " with (fillfactor=%d)", fillfactor);
+	 " partition by %s (aid)", PARTITION_METHOD[partition_method]);
+		else if (ddl->declare_fillfactor)
+			/* fillfactor is only expected on actual tables */
+			append_fillfactor(opts, sizeof(opts));
+
 		if (tablespace != NULL)
 		{
 			char	   *escape_tablespace;
@@ -3686,6 +3715,57 @@ initCreateTables(PGconn *con)
 
 		executeStatement(con, buffer);
 	}
+
+	/* if needed, pgbench_accounts partitions must be created manually */
+	if (partitions >= 1)
+	{
+		char		ff[64];
+
+		ff[0] = '\0';
+		append_fillfactor(ff, sizeof(ff));
+
+		fprintf(stderr, "creating %d partitions...\n", partitions);
+
+		for (int p = 1; p <= partitions; p++)
+		{
+			char		query[256];
+
+			if (partition_method == PART_RANGE)
+			{
+int64		part_size = (naccounts * (int64) scale + partitions - 1) / partitions;
+char		minvalue[32], maxvalue[32];
+
+/* For RANGE, we use open-ended partitions at the beginning and end */
+if (p == 1)
+	sprintf(minvalue, "minvalue");
+else
+	sprintf(minvalue, INT64_FORMAT, (p - 1) * part_size + 1);
+
+if (p < partitions)
+	sprintf(maxvalue, INT64_FORMAT, p * part_size + 1);
+else
+	sprintf(maxvalue, "maxvalue");
+
+snprintf(query, sizeof(query),
+		 "create%s table pgbench_accounts_%d\n"
+		 "  partition of pgbench_accounts\n"
+		 "  

Re: Implementing Incremental View Maintenance

2019-09-17 Thread Paul Draper
Have you had any thoughts for more than two joined tables?

Either there needs to be an quadratic number of joins, or intermediate join
results need to be stored and reused.

On Tue, Sep 17, 2019 at 8:50 AM Yugo Nagata  wrote:

> Hi Paul,
>
> Thank you for your suggestion.
>
> On Sun, 15 Sep 2019 11:52:22 -0600
> Paul Draper  wrote:
>
> > As I understand it, the current patch performs immediate IVM using AFTER
> > STATEMENT trigger transition tables.
> >
> > However, multiple tables can be modified *before* AFTER STATEMENT
> triggers
> > are fired.
> >
> > CREATE TABLE example1 (a int);
> > CREATE TABLE example2 (a int);
> >
> > CREATE INCREMENTAL MATERIALIZED VIEW mv AS
> > SELECT example1.a, example2.a
> > FROM example1 JOIN example2 ON a;
> >
> > WITH
> >   insert1 AS (INSERT INTO example1 VALUES (1)),
> >   insert2 AS (INSERT INTO example2 VALUES (1))
> > SELECT NULL;
> >
> > Changes to example1 are visible in an AFTER STATEMENT trigger on
> example2,
> > and vice versa. Would this not result in the (1, 1) tuple being
> > "double-counted"?
> >
> > IVM needs to either:
> >
> > (1) Evaluate deltas "serially' (e.g. EACH ROW triggers)
> >
> > (2) Have simultaneous access to multiple deltas:
> > delta_mv = example1 x delta_example2 + example2 x delta_example1 -
> > delta_example1 x delta_example2
> >
> > This latter method is the "logged" approach that has been discussed for
> > deferred evaluation.
> >
> > tl;dr It seems that AFTER STATEMENT triggers required a deferred-like
> > implementation anyway.
>
> You are right,  the latest patch doesn't support the situation where
> multiple tables are modified in a query. I noticed this when working
> on self-join, which also virtually need to handle multiple table
> modification.
>
> I am now working on this issue and the next patch will enable to handle
> this situation. I plan to submit the patch during this month. Roughly
> speaking, in the new implementation, AFTER STATEMENT triggers are used to
> collect  information of modified table and its changes (= transition
> tables),
> and then the only last trigger updates the view. This will avoid the
> double-counting. I think this implementation also would be a base of
> deferred approach implementation in future where "logs" are used instead
> of transition tables.
>
> Regards,
> Yugo Nagata
>
> --
> Yugo Nagata 
>


Re: Psql patch to show access methods info

2019-09-17 Thread Alvaro Herrera
It seems strange that there's a way to display AMs, and a way to display
ops and procs in an opfamily; but there's no way to list what opfamilies
exist (possibly given an AM as pattern).  Should we add that too?  We
had \dAf in the original submission, but that seems to have lost along
the way, not sure why.

I think \dAf is just as critical as \dAo; the former lets you know which
opfamilies you can use in CREATE INDEX, while the latter lets you know
which operators would be helped by such an index.  (But, really, only if
the opfamily name is printed in \d of the index, which we currently
don't print unless it's non-default ... which is an omission that
perhaps we should consider fixing).

On the other hand, from a user perspective, what you really want to know
is: what opfamilies exist for datatype T, and what operators are
supported by the opfamily I have chosen?  The current patch doesn't
really help you find that out.

I think \dAp isn't terribly informative from a user perspective.  The
support procs are just an opfamily implementation detail.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Define jsonpath functions as stable

2019-09-17 Thread Jonathan S. Katz
On 9/17/19 12:09 PM, Erik Rijkers wrote:
> On 2019-09-17 17:38, Jonathan S. Katz wrote:
>> [regex.patch]

Thanks for the review!

> "Several other parts of the SQL standard
> also define LIKE_REGEX equivalents that refer
> to this implementation, including the
> SQL/JSON path like_regex filter."
> 
> As I understand this text, 'concept' seems better.
> I'd drop 'also', too.

I rewrote this to be:

"Several other parts of the SQL standard refer to the LIKE_REGEX
specification to define similar operations, including..."

> 2.
> 'whereas the POSIX will those'  should be
> 'whereas POSIX will regard those'
>  or maybe 'read those'

I used "treat those"

> 
> 3.
> + The SQL/JSON standard borrows its definition for how regular
> expressions
> + from the LIKE_REGEX operator, which in turns
> uses the
> + XQuery standard.
> That sentence needs the verb 'work', no?  'for how regular expressions
> work [..]'
> Or alternatively drop 'how'.

I dropped the "how".

v2 attached. Thanks!

Jonathan
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2b4fe0cb59..c867ea13de 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -5968,6 +5968,88 @@ SELECT regexp_match('abc01234xyz', 
'(?:(.*?)(\d+)(.*)){1,1}');


 
+   
+   Differences with LIKE_REGEX
+
+   
+The LIKE_REGEX operator is specified starting with
+the SQL:2008 standard to provide a more robust specification for
+comparisons using regular expressions. Several other parts of the SQL
+standard refer to the LIKE_REGEX specification
+to define similar operations, including the
+SQL/JSON path
+ like_regex filter.
+   
+   
+The SQL standard states that regular expressions are evaluated according to
+the XQuery standard for regular expressions. While POSIX regular
+expressions are similar to XQuery regular expressions, there exist some
+differences where the behavior deviates from what is defined for
+LIKE_REGEX. Notably, regular expressions evaluated for
+LIKE_REGEX are defined to work on Unicode encoded 
strings,
+whereas POSIX regular expressions can work on strings of any encoding.
+   
+   
+Other differences include:
+
+ 
+  
+   Character class subtraction is not supported (for example, using the
+   following to search for only consonants:
+   [a-z-[aeiou]]).
+  
+ 
+ 
+  
+   The LIKE_REGEX specification states that a single
+   . should match a Windows newline
+   (\r\n) whereas POSIX will treat those as two separate
+   characters.
+  
+ 
+ 
+  
+   The format #NN where NN
+   represents two hex digits used for character class elements is not
+   supported. The same character class elements can be used with POSIX by
+   specifying \xNN.
+  
+ 
+ 
+  
+   Character class elements using \p{UnicodeProperty}
+   or the inverse \P{UnicodeProperty} are not supported.
+  
+ 
+ 
+  
+   Character class shorthands \i,
+   \I, \c, and \C
+   are not supported.
+  
+ 
+ 
+  
+   The specification for LIKE_REGEX may allow for more
+   characters for the \w character class shorthand, and
+   by extensions, excludes more characters with the complement
+   \W. As PostgreSQL depends on the underlying system's
+   locale, this may cause the behavior of \w and
+   \W to be equivalent to what POSIX provides.
+  
+ 
+ 
+  
+   The x flag in PostgreSQL extends on the specification
+   for LIKE_REGEX by allowing for comments specified
+   with #.
+  
+ 
+
+   
+
+  
+
 
 
   
@@ -11872,6 +11954,38 @@ table2-mapping
 

 
+   
+Regular Expressions
+
+
+ SQL/JSON path expressions support the ability to match text using regular
+ expressions with the like_regex filter. For example,
+ the following SQL/JSON path query would case-insensitively match all
+ strings in an array that start with a vowel:
+
+'$[*] ? (@ like_regex "^[aeiou]" flag "i")'
+
+
+
+
+ The SQL/JSON standard borrows its definition for regular expressions
+ from the LIKE_REGEX operator, which in turns uses the
+ XQuery standard. PostgreSQL does not support the
+ LIKE_REGEX operator as it currently implements
+ POSIX regular expressions.
+
+
+
+ For its implementation of the SQL/JSON path like_regex
+ filter, PostgreSQL uses its POSIX implementation to evaluate the
+ regular expressions. While similar to the SQL standard specification for
+ the LIKE_REGEX operator, there are some noted
+ differences that you can read about in
+ .
+
+
+   
+

SQL/JSON Path Operators and Methods
 
@@ -12114,9 +12228,10 @@ table2-mapping
 like_regex
 
   Tests pattern matching with POSIX regular expressions
-  (see ).  Supported flags
-  are i, s, m,
-  x, and q.
+

Re: another look at macOS SIP

2019-09-17 Thread Andres Freund
Hi,

On 2019-09-10 19:14:19 +0200, Peter Eisentraut wrote:
> I think the way forward here is to get rid of all uses of system() for
> calling between PostgreSQL programs.  There are only a handful of those,
> and we already have well-tested replacement code like spawn_process() in
> pg_regress.c that could be used.  (Perhaps we could also use that
> opportunity to get rid of the need for shell quoting?)

Yea, I think that'd be good, regardless of SIP.


> There is a minor second issue, namely that /usr/bin/perl also filters
> out DYLD_* environment variables.  This can be worked around again by
> using a third-party installation of Perl.  You just need to make sure
> that the "prove" program calls that installation instead of the system
> one.  (I just manually edited the shebang line.  There is probably a
> proper way to do it.)

Hm, could we just have perl code set DYLD_* again? I assume we don't
need prove itself to have it set, and for the testscripts we could just
set it in TestLib.pm or such?


Greetings,

Andres Freund




Re: block-level incremental backup

2019-09-17 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Tue, Sep 17, 2019 at 12:09 PM Stephen Frost  wrote:
> > We need to be sure that we can detect if the WAL level has ever been set
> > to minimal between a full and an incremental and, if so, either refuse
> > to run the incremental, or promote it to a full, or make it a
> > checksum-based incremental instead of trusting the WAL stream.
> 
> Sure. What about checksum collisions?

Certainly possible, of course, but a sha256 of each file is at least
somewhat better than, say, our page-level checksums.  I do agree that
having the option to just say "promote it to a full", or "do a
byte-by-byte comparison against the prior backed up file" would be
useful for those who are concerned about sha256 collision probabilities.

Having a cross-check of "does this X% of files that we decided not to
back up due to whatever really still match what we think is in the
backup?" is definitely a valuable feature and one which I'd hope we get
to at some point.

> > Ok..  I can understand that but I don't get how these changes to
> > pg_basebackup will help facilitate that.  If they don't and what you're
> > talking about here is independent, then great, that clarifies things,
> > but if you're saying that these changes to pg_basebackup are to help
> > with backing up directly into those Enterprise systems then I'm just
> > asking for some help in understanding how- what's the use-case here that
> > we're adding to pg_basebackup that makes it work with these Enterprise
> > systems?
> >
> > I'm not trying to be difficult here, I'm just trying to understand.
> 
> Man, I feel like we're totally drifting off into the weeds here.  I'm
> not arguing that these changes to pg_basebackup will help enterprise
> users except insofar as those users want incremental backup.  All of
> this discussion started with this comment from you:
> 
> "Having a system of keeping track of which backups are full and which
> are differential in an overall system also gives you the ability to do
> things like expiration in a sensible way, including handling WAL
> expiration."
> 
> All I was doing was saying that for an enterprise user, the overall
> system might be something entirely outside of our control, like
> NetBackup or Tivoli. Therefore, whatever functionality we provide to
> do that kind of thing should be able to be used in such contexts. That
> hardly seems like a controversial proposition.

And all I was trying to understand was how what pg_basebackup does in
this context is really different from what can be done with pgbackrest,
since you brought it up:

"True, but I'm not sure that functionality belongs in core. It
certainly needs to be possible for out-of-core code to do this part of
the work if desired, because people want to integrate with enterprise
backup systems, and we can't come in and say, well, you back up
everything else using Netbackup or Tivoli, but for PostgreSQL you have
to use pg_backrest. I mean, maybe you can win that argument, but I
know I can't."

What it sounds like you're argueing here is that what pg_basebackup
"has" in it is that it specifically doesn't include any kind of
expiration management of any kind, and that's somehow helpful to people
who want to use Enterprise backup solutions.  Maybe that's what you were
getting at, in which case, I'm sorry for misunderstanding and dragging
it out, and thanks for helping me understand.

> > How would that tool work, if it's to be able to work regardless of where
> > the WAL is actually stored..?  Today, pg_archivecleanup just works
> > against a POSIX filesystem- are you thinking that the tool would have a
> > pluggable storage system, so that it could work with, say, a POSIX
> > filesystem, or a CIFS mount, or a s3-like system?
> 
> Again, I was making a general statement about design goals -- "we
> should try to work nicely with enterprise backup products" -- not
> proposing a specific design for a specific thing. I don't think the
> idea of some pluggability in that area is a bad one, but it's not even
> slightly what this thread is about.

Well, I agree with you, as I said up-thread, that this seemed to be
going in a different and perhaps not entirely relevant direction.

> > Provided the WAL level is at the level that you need it to be that will
> > be true for things which are actually supported with PITR, replication
> > to standby servers, et al.  I can see how it might come across as an
> > overreaction but this strikes me as a pretty glaring issue and I worry
> > that if it was overlooked until now that there'll be other more subtle
> > issues, and backups are just plain complicated to get right, just to
> > begin with already, something that I don't think people appreciate until
> > they've been dealing with them for quite a while.
> 
> Permit me to be unpersuaded. If it was such a glaring issue, and if
> experience is the key to spotting such issues, then why didn't YOU
> spot it?

I'm not designing the 

Re: Allow an alias to be attached directly to a JOIN ... USING

2019-09-17 Thread Alvaro Herrera
On 2019-Aug-01, Thomas Munro wrote:

> Indeed, that seems like a problem, and it's a good question.  You can
> see this on unpatched master with SELECT x.filler FROM
> (pgbench_tellers AS t JOIN b USING (bid)) AS x.

I'm not sure I understand why that problem is a blocker for this patch.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




some PostgreSQL 12 release notes comments

2019-09-17 Thread Peter Eisentraut
> * Client- and server-side encryption for authentication using GSSAPI

This is on the wire encryption, so I don't know why it says client-side
and server-side.  Proposal:

* Encrypted TCP/IP connections using GSSAPI encryption

in the major features section, and later

* Add GSSAPI encryption support (Robbie Harwood, Stephen Frost)

  This allows TCP/IP connections to be encrypted when using GSSAPI
  authentication without having to set up a separate encryption facility
  like SSL.


> * Discovery of LDAP servers if PostgreSQL is built with OpenLDAP

I would remove the "if" part from the major features list, since it's a
qualification of minor importance.  Instead I'd write something like

* Discovery of LDAP servers using DNS SRV

which is a clearer concept that people can easily recognize.


> * Allow data type name to use non-C collations

I'm not sure why that is listed in the "Migration" section.

It's also a bit confusing as a release note item relative to PostgreSQL
11.  I believe the changes were that "name" was made collation aware and
that the collation was set to "C" in the system catalogs (which is a
separate item later).  This group of items could use a reshuffling.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Leakproofness of texteq()/textne()

2019-09-17 Thread Tom Lane
Peter Eisentraut  writes:
> On 2019-09-16 06:24, Tom Lane wrote:
>> So it seems that the consensus is that it's okay to mark these
>> functions leakproof, because if any of the errors they throw
>> are truly reachable for other than data-corruption reasons,
>> we would wish to try to prevent such errors.  (Maybe through
>> upstream validity checks?  Hard to say how we'd do it exactly,
>> when we don't have an idea what the problem is.)

> Yeah, it seems like as we expand our Unicode capabilities, we will see
> more cases like "it could fail here in theory, but it shouldn't happen
> for normal data", and the answer can't be to call all that untrusted or
> leaky.  It's the job of the database software to sort that out.
> Obviously, it will require careful evaluation in each case.

Here's a proposed patch to mark functions that depend on varstr_cmp
as leakproof.  I think we can apply this to HEAD and then close the
open item as "won't fix for v12".

regards, tom lane

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index d36156f..8c18294 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1463,6 +1463,12 @@ check_collation_set(Oid collid)
  *	to allow null-termination for inputs to strcoll().
  * Returns an integer less than, equal to, or greater than zero, indicating
  * whether arg1 is less than, equal to, or greater than arg2.
+ *
+ * Note: many functions that depend on this are marked leakproof; therefore,
+ * avoid reporting the actual contents of the input when throwing errors.
+ * All errors herein should be things that can't happen except on corrupt
+ * data, anyway; otherwise we will have trouble with indexing strings that
+ * would cause them.
  */
 int
 varstr_cmp(const char *arg1, int len1, const char *arg2, int len2, Oid collid)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index e6645f1..5e69dee 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -681,44 +681,44 @@
   proname => 'nameeqtext', proleakproof => 't', prorettype => 'bool',
   proargtypes => 'name text', prosrc => 'nameeqtext' },
 { oid => '241',
-  proname => 'namelttext', prorettype => 'bool', proargtypes => 'name text',
-  prosrc => 'namelttext' },
+  proname => 'namelttext', proleakproof => 't', prorettype => 'bool',
+  proargtypes => 'name text', prosrc => 'namelttext' },
 { oid => '242',
-  proname => 'nameletext', prorettype => 'bool', proargtypes => 'name text',
-  prosrc => 'nameletext' },
+  proname => 'nameletext', proleakproof => 't', prorettype => 'bool',
+  proargtypes => 'name text', prosrc => 'nameletext' },
 { oid => '243',
-  proname => 'namegetext', prorettype => 'bool', proargtypes => 'name text',
-  prosrc => 'namegetext' },
+  proname => 'namegetext', proleakproof => 't', prorettype => 'bool',
+  proargtypes => 'name text', prosrc => 'namegetext' },
 { oid => '244',
-  proname => 'namegttext', prorettype => 'bool', proargtypes => 'name text',
-  prosrc => 'namegttext' },
+  proname => 'namegttext', proleakproof => 't', prorettype => 'bool',
+  proargtypes => 'name text', prosrc => 'namegttext' },
 { oid => '245',
   proname => 'namenetext', proleakproof => 't', prorettype => 'bool',
   proargtypes => 'name text', prosrc => 'namenetext' },
 { oid => '246', descr => 'less-equal-greater',
-  proname => 'btnametextcmp', prorettype => 'int4', proargtypes => 'name text',
-  prosrc => 'btnametextcmp' },
+  proname => 'btnametextcmp', proleakproof => 't', prorettype => 'int4',
+  proargtypes => 'name text', prosrc => 'btnametextcmp' },
 { oid => '247',
   proname => 'texteqname', proleakproof => 't', prorettype => 'bool',
   proargtypes => 'text name', prosrc => 'texteqname' },
 { oid => '248',
-  proname => 'textltname', prorettype => 'bool', proargtypes => 'text name',
-  prosrc => 'textltname' },
+  proname => 'textltname', proleakproof => 't', prorettype => 'bool',
+  proargtypes => 'text name', prosrc => 'textltname' },
 { oid => '249',
-  proname => 'textlename', prorettype => 'bool', proargtypes => 'text name',
-  prosrc => 'textlename' },
+  proname => 'textlename', proleakproof => 't', prorettype => 'bool',
+  proargtypes => 'text name', prosrc => 'textlename' },
 { oid => '250',
-  proname => 'textgename', prorettype => 'bool', proargtypes => 'text name',
-  prosrc => 'textgename' },
+  proname => 'textgename', proleakproof => 't', prorettype => 'bool',
+  proargtypes => 'text name', prosrc => 'textgename' },
 { oid => '251',
-  proname => 'textgtname', prorettype => 'bool', proargtypes => 'text name',
-  prosrc => 'textgtname' },
+  proname => 'textgtname', proleakproof => 't', prorettype => 'bool',
+  proargtypes => 'text name', prosrc => 'textgtname' },
 { oid => '252',
   proname => 'textnename', proleakproof => 't', prorettype => 'bool',
   proargtypes => 'text name', prosrc => 'textnename' },
 { oid => '253', descr => 'less-equal-greater',
-  proname 

Re: block-level incremental backup

2019-09-17 Thread Robert Haas
On Tue, Sep 17, 2019 at 12:09 PM Stephen Frost  wrote:
> We need to be sure that we can detect if the WAL level has ever been set
> to minimal between a full and an incremental and, if so, either refuse
> to run the incremental, or promote it to a full, or make it a
> checksum-based incremental instead of trusting the WAL stream.

Sure. What about checksum collisions?

> Just to be clear, I see your points and I like the general idea of
> finding solutions, but it seems like the issues are likely to be pretty
> complex and I'm not sure that's being appreciated very well.

Definitely possible, but it's more helpful if you can point out the
actual issues.

> Ok..  I can understand that but I don't get how these changes to
> pg_basebackup will help facilitate that.  If they don't and what you're
> talking about here is independent, then great, that clarifies things,
> but if you're saying that these changes to pg_basebackup are to help
> with backing up directly into those Enterprise systems then I'm just
> asking for some help in understanding how- what's the use-case here that
> we're adding to pg_basebackup that makes it work with these Enterprise
> systems?
>
> I'm not trying to be difficult here, I'm just trying to understand.

Man, I feel like we're totally drifting off into the weeds here.  I'm
not arguing that these changes to pg_basebackup will help enterprise
users except insofar as those users want incremental backup.  All of
this discussion started with this comment from you:

"Having a system of keeping track of which backups are full and which
are differential in an overall system also gives you the ability to do
things like expiration in a sensible way, including handling WAL
expiration."

All I was doing was saying that for an enterprise user, the overall
system might be something entirely outside of our control, like
NetBackup or Tivoli. Therefore, whatever functionality we provide to
do that kind of thing should be able to be used in such contexts. That
hardly seems like a controversial proposition.

> How would that tool work, if it's to be able to work regardless of where
> the WAL is actually stored..?  Today, pg_archivecleanup just works
> against a POSIX filesystem- are you thinking that the tool would have a
> pluggable storage system, so that it could work with, say, a POSIX
> filesystem, or a CIFS mount, or a s3-like system?

Again, I was making a general statement about design goals -- "we
should try to work nicely with enterprise backup products" -- not
proposing a specific design for a specific thing. I don't think the
idea of some pluggability in that area is a bad one, but it's not even
slightly what this thread is about.

> Provided the WAL level is at the level that you need it to be that will
> be true for things which are actually supported with PITR, replication
> to standby servers, et al.  I can see how it might come across as an
> overreaction but this strikes me as a pretty glaring issue and I worry
> that if it was overlooked until now that there'll be other more subtle
> issues, and backups are just plain complicated to get right, just to
> begin with already, something that I don't think people appreciate until
> they've been dealing with them for quite a while.

Permit me to be unpersuaded. If it was such a glaring issue, and if
experience is the key to spotting such issues, then why didn't YOU
spot it?

I'm not arguing that this stuff isn't hard. It is. Nor am I arguing
that I didn't screw up. I did. But designs need to be accepted or
rejected based on facts, not FUD. You've raised some good technical
points and if you've got more concerns, I'd like to hear them, but I
don't think arguing vaguely that a certain approach will probably run
into trouble gets us anywhere.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




strong memory leak in plpgsql from handled rollback and lazy cast

2019-09-17 Thread Pavel Stehule
Hi

When I tested some hypothesis I wrote buggy code. It was surprise how fast
I lost all free memory

do $$
begin
  for i in 1..300
  loop
begin
  -- do some error
  if i then end if;
exception when others then
  -- do nothing
end;
  end loop;
end;
$$;

problem is somewhere in implicit casting inside IF statement. When I use
explicit casting -

  IF i::boolean THEN

then there is not memory leak.

Regards

Pavel


Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2019-09-17 Thread Anastasia Lubennikova

16.09.2019 21:58, Peter Geoghegan wrote:

On Mon, Sep 16, 2019 at 8:48 AM Anastasia Lubennikova
 wrote:

I tested patch with nbtree_wal_test, and found out that the real issue is
not the dedup WAL records themselves, but the full page writes that they 
trigger.
Here are test results (config is standard, except fsync=off to speedup tests):

'FPW on' and 'FPW off' are tests on v14.
NO_IMAGE is the test on v14 with REGBUF_NO_IMAGE in bt_dedup_one_page().

I think that is makes sense to focus on synthetic cases without
FPWs/FPIs from checkpoints. At least for now.


With random insertions into btree it's highly possible that deduplication will 
often be
the first write after checkpoint, and thus will trigger FPW, even if only a few 
tuples were compressed.

<...>

I think that the problem here is that you didn't copy this old code
from _bt_split() over to _bt_dedup_one_page():

 /*
  * Copy the original page's LSN into leftpage, which will become the
  * updated version of the page.  We need this because XLogInsert will
  * examine the LSN and possibly dump it in a page image.
  */
 PageSetLSN(leftpage, PageGetLSN(origpage));
 isleaf = P_ISLEAF(oopaque);

Note that this happens at the start of _bt_split() -- the temp page
buffer based on origpage starts out with the same LSN as origpage.
This is an important step of the WAL volume optimization used by
_bt_split().


That's it. I suspected that such enormous amount of FPW reflects some bug.

That's why there is no significant difference with log_newpage_buffer() 
approach.
And that's why "lazy" deduplication doesn't help to decrease amount of WAL.

My point was that the problem is extra FPWs, so it doesn't matter
whether we deduplicate just several entries to free enough space or all 
of them.



The term "lazy deduplication" is seriously overloaded here. I think
that this could cause miscommunications. Let me list the possible
meanings of that term here:

1. First of all, the basic approach to deduplication is already lazy,
unlike GIN, in the sense that _bt_dedup_one_page() is called to avoid
a page split. I'm 100% sure that we both think that that works well
compared to an eager approach (like GIN's).

Sure.

2. Second of all, there is the need to incrementally WAL log. It looks
like v14 does that well, in that it doesn't create
"xlrec_dedup.n_intervals" space when it isn't truly needed. That's
good.

In v12-v15 I mostly concentrated on this feature.
The last version looks good to me.


3. Third, there is incremental writing of the page itself -- avoiding
using a temp buffer. Not sure where I stand on this.


I think it's a good idea.  memmove must be much faster than copying 
items tuple by tuple.

I'll send next patch by the end of the week.


4. Finally, there is the possibility that we could make deduplication
incremental, in order to avoid work that won't be needed altogether --
this would probably be combined with 3. Not sure where I stand on
this, either.

We should try to be careful when using these terms, as there is a very
real danger of talking past each other.


Another, and more realistic approach is to make deduplication less intensive:
if freed space is less than some threshold, fall back to not changing page at 
all and not generating xlog record.

I see that v14 uses the "dedupInterval" struct, which provides a
logical description of a deduplicated set of tuples. That general
approach is at least 95% of what I wanted from the
_bt_dedup_one_page() WAL-logging.


Probably that was the reason, why patch became faster after I added 
BT_COMPRESS_THRESHOLD in early versions,
not because deduplication itself is cpu bound or something, but because WAL 
load decreased.

I think so too -- BT_COMPRESS_THRESHOLD definitely makes compression
faster as things are. I am not against bringing back
BT_COMPRESS_THRESHOLD. I just don't want to do it right now because I
think that it's a distraction. It may hide problems that we want to
fix. Like the PageSetLSN() problem I mentioned just now, and maybe
others.

We will definitely need to have page space accounting that's a bit
similar to nbtsplitloc.c, to avoid the case where a leaf page is 100%
full (or has 4 bytes left, or something). That happens regularly now.
That must start with teaching _bt_dedup_one_page() about how much
space it will free. Basing it on the number of items on the page or
whatever is not going to work that well.

I think that it would be possible to have something like
BT_COMPRESS_THRESHOLD to prevent thrashing, and *also* make the
deduplication incremental, in the sense that it can give up on
deduplication when it frees enough space (i.e. something like v13's
0002-* patch). I said that these two things are closely related, which
is true, but it's also true that they don't overlap.

Don't forget the reason why I removed BT_COMPRESS_THRESHOLD: Doing so
made a handful of specific indexes (mostly from TPC-H) significantly
smaller. I never tried to debug the 

Re: subscriptionCheck failures on nightjar

2019-09-17 Thread Tom Lane
Robert Haas  writes:
> On Mon, Aug 26, 2019 at 9:29 AM Tomas Vondra
>  wrote:
>> This is one of the remaining open items, and we don't seem to be moving
>> forward with it :-(

> Why exactly is this an open item, anyway?

The reason it's still here is that Andres expressed a concern that
there might be more than meets the eye in this.  What meets the eye
is that PANICing on file-not-found is not appropriate here, but Andres
seemed to think that the file not being present might reflect an
actual bug not just an expectable race condition [1].

Personally I'd be happy just to treat it as an expectable case and
fix the code to not PANIC on file-not-found.

In either case, it probably belongs in the "older bugs" section;
nightjar is showing the same failure on v11 from time to time.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/20190213215147.cjbymfojf6xndr4t%40alap3.anarazel.de




Re: [PATCH][PROPOSAL] Add enum releation option type

2019-09-17 Thread Alvaro Herrera
I decided I didn't dislike that patch all that much anyway, so I cleaned
it up a little bit and here's v8.

The add_enum_reloption stuff is still broken.  Please fix it and
resubmit.  I'm marking this Waiting on Author now.

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 20f4ed3c38..89a5775229 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -433,7 +433,25 @@ static relopt_real realRelOpts[] =
 	{{NULL}}
 };
 
-static relopt_string stringRelOpts[] =
+/* values from GistOptBufferingMode */
+relopt_enum_elt_def gistBufferingOptValues[] =
+{
+	{"auto", GIST_OPTION_BUFFERING_AUTO},
+	{"on", GIST_OPTION_BUFFERING_ON},
+	{"off", GIST_OPTION_BUFFERING_OFF},
+	{(const char *) NULL}		/* list terminator */
+};
+
+/* values from ViewOptCheckOption */
+relopt_enum_elt_def viewCheckOptValues[] =
+{
+	/* no value for NOT_SET */
+	{"local", VIEW_OPTION_CHECK_OPTION_LOCAL},
+	{"cascaded", VIEW_OPTION_CHECK_OPTION_CASCADED},
+	{(const char *) NULL}		/* list terminator */
+};
+
+static relopt_enum enumRelOpts[] =
 {
 	{
 		{
@@ -442,10 +460,9 @@ static relopt_string stringRelOpts[] =
 			RELOPT_KIND_GIST,
 			AccessExclusiveLock
 		},
-		4,
-		false,
-		gistValidateBufferingOption,
-		"auto"
+		gistBufferingOptValues,
+		GIST_OPTION_BUFFERING_AUTO,
+		"Valid values are \"on\", \"off\", and \"auto\"."
 	},
 	{
 		{
@@ -454,15 +471,20 @@ static relopt_string stringRelOpts[] =
 			RELOPT_KIND_VIEW,
 			AccessExclusiveLock
 		},
-		0,
-		true,
-		validateWithCheckOption,
-		NULL
+		viewCheckOptValues,
+		VIEW_OPTION_CHECK_OPTION_NOT_SET,
+		"Valid values are \"local\" and \"cascaded\"."
 	},
 	/* list terminator */
 	{{NULL}}
 };
 
+static relopt_string stringRelOpts[] =
+{
+	/* list terminator */
+	{{NULL}}
+};
+
 static relopt_gen **relOpts = NULL;
 static bits32 last_assigned_kind = RELOPT_KIND_LAST_DEFAULT;
 
@@ -505,6 +527,12 @@ initialize_reloptions(void)
    realRelOpts[i].gen.lockmode));
 		j++;
 	}
+	for (i = 0; enumRelOpts[i].gen.name; i++)
+	{
+		Assert(DoLockModesConflict(enumRelOpts[i].gen.lockmode,
+   enumRelOpts[i].gen.lockmode));
+		j++;
+	}
 	for (i = 0; stringRelOpts[i].gen.name; i++)
 	{
 		Assert(DoLockModesConflict(stringRelOpts[i].gen.lockmode,
@@ -543,6 +571,14 @@ initialize_reloptions(void)
 		j++;
 	}
 
+	for (i = 0; enumRelOpts[i].gen.name; i++)
+	{
+		relOpts[j] = [i].gen;
+		relOpts[j]->type = RELOPT_TYPE_ENUM;
+		relOpts[j]->namelen = strlen(relOpts[j]->name);
+		j++;
+	}
+
 	for (i = 0; stringRelOpts[i].gen.name; i++)
 	{
 		relOpts[j] = [i].gen;
@@ -640,6 +676,9 @@ allocate_reloption(bits32 kinds, int type, const char *name, const char *desc)
 		case RELOPT_TYPE_REAL:
 			size = sizeof(relopt_real);
 			break;
+		case RELOPT_TYPE_ENUM:
+			size = sizeof(relopt_enum);
+			break;
 		case RELOPT_TYPE_STRING:
 			size = sizeof(relopt_string);
 			break;
@@ -718,6 +757,24 @@ add_real_reloption(bits32 kinds, const char *name, const char *desc, double defa
 	add_reloption((relopt_gen *) newoption);
 }
 
+/*
+ * add_enum_reloption
+ *		Add a new enum reloption
+ */
+void
+add_enum_reloption(bits32 kinds, const char *name, const char *desc,
+   relopt_enum_elt_def *members, int default_val)
+{
+	relopt_enum *newoption;
+
+	newoption = (relopt_enum *) allocate_reloption(kinds, RELOPT_TYPE_ENUM,
+   name, desc);
+	newoption->members = members;
+	newoption->default_val = default_val;
+
+	add_reloption((relopt_gen *) newoption);
+}
+
 /*
  * add_string_reloption
  *		Add a new string reloption
@@ -1234,6 +1291,36 @@ parse_one_reloption(relopt_value *option, char *text_str, int text_len,
 	   optreal->min, optreal->max)));
 			}
 			break;
+		case RELOPT_TYPE_ENUM:
+			{
+relopt_enum *optenum = (relopt_enum *) option->gen;
+relopt_enum_elt_def *elt;
+
+parsed = false;
+for (elt = optenum->members; elt->string_val; elt++)
+{
+	if (pg_strcasecmp(value, elt->string_val) == 0)
+	{
+		option->values.enum_val = elt->symbol_val;
+		parsed = true;
+		break;
+	}
+}
+if (validate && !parsed)
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			 errmsg("invalid value for \"%s\" option",
+	option->gen->name),
+			 errdetail("%s", optenum->detailmsg)));
+
+/*
+ * If value is not among the allowed string values, but we are
+ * not asked to validate, just use the default numeric value.
+ */
+if (!parsed)
+	option->values.enum_val = optenum->default_val;
+			}
+			break;
 		case RELOPT_TYPE_STRING:
 			{
 relopt_string *optstring = (relopt_string *) option->gen;
@@ -1327,6 +1414,11 @@ fillRelOptions(void *rdopts, Size basesize,
 			options[i].values.real_val :
 			((relopt_real *) options[i].gen)->default_val;
 		

Re: Define jsonpath functions as stable

2019-09-17 Thread Erik Rijkers

On 2019-09-17 17:38, Jonathan S. Katz wrote:

On 9/16/19 6:39 PM, Jonathan S. Katz wrote:
[regex.patch]


A few things/typos caught my eye:

1.
'implementation' seems the wrong word in sentence:

"Several other parts of the SQL standard
also define LIKE_REGEX equivalents that refer
to this implementation, including the
SQL/JSON path like_regex filter."

As I understand this text, 'concept' seems better.
I'd drop 'also', too.

2.
'whereas the POSIX will those'  should be
'whereas POSIX will regard those'
 or maybe 'read those'

3.
+ The SQL/JSON standard borrows its definition for how regular 
expressions
+ from the LIKE_REGEX operator, which in turns 
uses the

+ XQuery standard.
That sentence needs the verb 'work', no?  'for how regular expressions 
work [..]'

Or alternatively drop 'how'.


thanks,

Erik Rijkers






Re: block-level incremental backup

2019-09-17 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Sep 16, 2019 at 3:38 PM Stephen Frost  wrote:
> > As discussed nearby, not everything that needs to be included in the
> > backup is actually going to be in the WAL though, right?  How would that
> > ever be able to handle the case where someone starts the server under
> > wal_level = logical, takes a full backup, then restarts with wal_level =
> > minimal, writes out a bunch of new data, and then restarts back to
> > wal_level = logical and takes an incremental?
> 
> Fair point. I think the WAL-scanning approach can only work if
> wal_level > minimal. But, I also think that few people run with
> wal_level = minimal in this era where the default has been changed to
> replica; and I think we can detect the WAL level in use while scanning
> WAL. It can only change at a checkpoint.

We need to be sure that we can detect if the WAL level has ever been set
to minimal between a full and an incremental and, if so, either refuse
to run the incremental, or promote it to a full, or make it a
checksum-based incremental instead of trusting the WAL stream.

I'm also glad that we ended up changing the default though and I do hope
that there's relatively few people running with minimal and that there's
even fewer who play around with flipping it back and forth.

> > On larger systems, so many of the files are 1GB in size that checking
> > the file size is quite close to meaningless.  Yes, having to checksum
> > all of the files definitely adds to the cost of taking the backup, but
> > to avoid it we need strong assurances that a given file hasn't been
> > changed since our last full backup.  WAL, today at least, isn't quite
> > that, and timestamps can possibly be fooled with, so if you'd like to be
> > particularly careful, there doesn't seem to be a lot of alternatives.
> 
> I see your points, but it feels like you're trying to talk down the
> WAL-based approach over what seem to me to be fairly manageable corner
> cases.

Just to be clear, I see your points and I like the general idea of
finding solutions, but it seems like the issues are likely to be pretty
complex and I'm not sure that's being appreciated very well.

> > I'm not asking you to be an expert on those systems, just to help me
> > understand the statements you're making.  How is backing up to a
> > pgbackrest repo different than running a pg_basebackup in the context of
> > using some other Enterprise backup system?  In both cases, you'll have a
> > full copy of the backup (presumably compressed) somewhere out on a disk
> > or filesystem which is then backed up by the Enterprise tool.
> 
> Well, I think that what people really want is to be able to backup
> straight into the enterprise tool, without an intermediate step.

Ok..  I can understand that but I don't get how these changes to
pg_basebackup will help facilitate that.  If they don't and what you're
talking about here is independent, then great, that clarifies things,
but if you're saying that these changes to pg_basebackup are to help
with backing up directly into those Enterprise systems then I'm just
asking for some help in understanding how- what's the use-case here that
we're adding to pg_basebackup that makes it work with these Enterprise
systems?

I'm not trying to be difficult here, I'm just trying to understand.

> My basic point here is: As with practically all PostgreSQL
> development, I think we should try to expose capabilities and avoid
> making policy on behalf of users.
> 
> I'm not objecting to the idea of having tools that can help users
> figure out how much WAL they need to retain -- but insofar as we can
> do it, such tools should work regardless of where that WAL is actually
> stored. 

How would that tool work, if it's to be able to work regardless of where
the WAL is actually stored..?  Today, pg_archivecleanup just works
against a POSIX filesystem- are you thinking that the tool would have a
pluggable storage system, so that it could work with, say, a POSIX
filesystem, or a CIFS mount, or a s3-like system?

> I dislike the idea that PostgreSQL would provide something
> akin to a "pgbackrest repository" in core, or I at least I think it
> would be important that we're careful about how much functionality
> gets tied to the presence and use of such a thing, because, at least
> based on my experience working at EnterpriseDB, larger customers often
> don't want to do it that way.

This seems largely independent of the above discussion, but since we're
discussing it, I've certainly had various experiences in this area too-
some larger customers would like to use an s3-like store (which
pgbackrest already supports and will be supporting others going forward
as it has a pluggable storage mechanism for the repo...), and then
there's customers who would like to point their Enterprise backup
solution at a directory on disk to back it up (which pgbackrest also
supports, as mentioned previously), and lastly there's 

pg_upgrade check fails on Solaris 10

2019-09-17 Thread Marina Polyakova

Hello, hackers!

We got an error for pg_upgrade check on the branch REL_11_STABLE (commit 
40ad4202513c72f5c1beeb03e26dfbc8890770c0) on Solaris 10 because IIUC the 
argument to the sed command is not enclosed in quotation marks (see 
[1]):


$ gmake -C src/bin/pg_upgrade/ check
<...>
MAKE=gmake 
bindir="/home/buildfarm/mpolyakova/postgrespro_REL_11_STABLE/inst/bin" 
libdir="/home/buildfarm/mpolyakova/postgrespro_REL_11_STABLE/inst/lib" 
EXTRA_REGRESS_OPTS="" /bin/sh test.sh --install

test.sh: MSYS/MINGW/: not found
gmake: *** [check] Error 1
gmake: Leaving directory 
`/home/buildfarm/mpolyakova/postgrespro_REL_11_STABLE/src/bin/pg_upgrade'

$ sed: command garbled: s/

Attached diff.patch fixes the problem.

About the system: SunOS, Release 5.10, KernelID Generic_141444-09.
About the used shell: according to the manual, it comes from the package 
SUNWcsu.


Thanks to Victor Wagner for his help to investigate this issue.

[1] $ man sh
<...>
Quoting
The following characters have a special meaning to the shell and cause 
termination of a word unless quoted:

;  &  (  )  |  ^  <  >  newline  space  tab

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/bin/pg_upgrade/test.sh b/src/bin/pg_upgrade/test.sh
index 424d89c3feeced91531920e919a2d9e06cc4baa6..f258983b18f8276cfef3db0a425d59fead6493d9 100644
--- a/src/bin/pg_upgrade/test.sh
+++ b/src/bin/pg_upgrade/test.sh
@@ -33,7 +33,7 @@ standard_initdb() {
 
 # What flavor of host are we on?
 # Treat MINGW* (msys1) and MSYS* (msys2) the same.
-testhost=`uname -s | sed s/^MSYS/MINGW/`
+testhost=`uname -s | sed 's/^MSYS/MINGW/'`
 
 # Establish how the server will listen for connections
 case $testhost in


Re: subscriptionCheck failures on nightjar

2019-09-17 Thread Robert Haas
On Mon, Aug 26, 2019 at 9:29 AM Tomas Vondra
 wrote:
> This is one of the remaining open items, and we don't seem to be moving
> forward with it :-(

Why exactly is this an open item, anyway?

I don't find any discussion on the thread which makes a clear argument
that this problem originated with v12. If it didn't, it's still a bug
and it still ought to be fixed at some point, but it's not a
release-blocker.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2019-09-17 Thread Andres Freund
Hi,

On 2019-09-14 14:34:21 -0400, Tom Lane wrote:
> Amit Khandekar  writes:
> > On Fri, 13 Sep 2019 at 22:01, Robert Haas  wrote:
> >> On Fri, Sep 13, 2019 at 12:14 PM Tom Lane  wrote:
> >>> Again, though, the advice that's been given here is that we should
> >>> fix logical decoding to use the VFD API as it stands, not change
> >>> that API.  I concur with that.
> 
> >> A reasonable position.  So I guess logical decoding has to track the
> >> file position itself, but perhaps use the VFD layer for managing FD
> >> pooling.
> 
> > Yeah, something like the attached patch. I think this tracking of
> > offsets would have been cleaner if we add in-built support in VFD. But
> > yeah, for bank branches at least, we need to handle it outside of VFD.
> > Or may be we would add it if we find one more use-case.
> 
> Again, we had that and removed it, for what seem to me to be solid
> reasons.  It adds cycles when we're forced to close/reopen a file,
> and it also adds failure modes that we could do without (ie, failure
> of either the ftell or the lseek, which are particularly nasty because
> they shouldn't happen according to the VFD abstraction).  I do not
> think there is going to be any argument strong enough to make us
> put it back, especially not for non-mainstream callers like logical
> decoding.

Yea, I think that's the right call. Avoiding kernel seeks is quite
worthwhile, and we shouldn't undo it just because of this usecase. And
that'll become more and more important performance-wise (and has already
done so, with all the intel fixes making syscalls much slower).

I could see an argument for adding a separate generic layer providing
position tracking ontop of the VFD abstraction however. Seems quite
possible that there's some other parts of the system that could benefit
from using VFDs rather than plain fds. And they'd probably also need the
positional tracking.

Greetings,

Andres Freund




Re: Define jsonpath functions as stable

2019-09-17 Thread Jonathan S. Katz
On 9/16/19 6:39 PM, Jonathan S. Katz wrote:

> My main question is "where" -- I'm thinking somewhere in the JSON
> path[2] section. After reading your email 3 times, I may have enough
> knowledge to attempt some documentation on the regexp in JSON path.

Here is said attempt to document. Notes:

- I centered it around the specification for LIKE_REGEX, which uses
XQuery, but primarily noted where our implementation of POSIX regex's
differs from what is specified for LIKE_REGEX vis-a-vis XQuery

- I put the pith of the documentation in a subsection off of "POSIX
regular expressions"

- I noted that LIKE_REGEX is specified in SQL:2008, which I read on the
Internet(tm) but was not able to confirm in the spec as I do not have a copy

- For my explanation about the "x" flag differences, I talked about how
we extended it, but I could not capture how Tom described the nuances above.

- From the SQL/JSON path docs, I added a section on regular expressions
stating what the behavior is, and referring back to the main regex docs

- I removed the "x" flag being supported for like_regex in JSON path

I also presume it needs a bit of wordsmithing / accuracy checks, but
hope it's a good start and does not require a massive rewrite.

Thanks,

Jonathan
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2b4fe0cb59..74dbfd9e46 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -5968,6 +5968,88 @@ SELECT regexp_match('abc01234xyz', 
'(?:(.*?)(\d+)(.*)){1,1}');


 
+   
+   Differences with LIKE_REGEX
+
+   
+The LIKE_REGEX operator is specified starting with
+the SQL:2008 standard to provide a more robust specification for
+comparisons using regular expressions. Several other parts of the SQL
+standard also define LIKE_REGEX equivalents that refer
+to this implementation, including the
+SQL/JSON path
+ like_regex filter.
+   
+   
+The SQL standard states that regular expressions are evaluated according to
+the XQuery standard for regular expressions. While POSIX regular
+expressions are similar to XQuery regular expressions, there exist some
+differences where the behavior deviates from what is defined for
+LIKE_REGEX. Notably, regular expressions evaluated for
+LIKE_REGEX are defined to work on Unicode encoded 
strings,
+whereas POSIX regular expressions can work on strings of any encoding.
+   
+   
+Other differences include:
+
+ 
+  
+   Character class subtraction is not supported (for example, using the
+   following to search for only consonants:
+   [a-z-[aeiou]]).
+  
+ 
+ 
+  
+   The LIKE_REGEX specification states that a single
+   . should match a Windows newline
+   (\r\n) whereas the POSIX will those as two separate
+   characters.
+  
+ 
+ 
+  
+   The format #NN where NN
+   represents two hex digits used for character class elements is not
+   supported. The same character class elements can be used with POSIX by
+   specifying \xNN.
+  
+ 
+ 
+  
+   Character class elements using \p{UnicodeProperty}
+   or the inverse \P{UnicodeProperty} are not supported.
+  
+ 
+ 
+  
+   Character class shorthands \i,
+   \I, \c, and \C
+   are not supported.
+  
+ 
+ 
+  
+   The specification for LIKE_REGEX may allow for more
+   characters for the \w character class shorthand, and
+   by extensions, excludes more characters with the complement
+   \W. As PostgreSQL depends on the underlying system's
+   locale, this may cause the behavior of \w and
+   \W to be equivalent to what POSIX provides.
+  
+ 
+ 
+  
+   The x flag in PostgreSQL extends on the specification
+   for LIKE_REGEX by allowing for comments specified
+   with #.
+  
+ 
+
+   
+
+  
+
 
 
   
@@ -11872,6 +11954,38 @@ table2-mapping
 

 
+   
+Regular Expressions
+
+
+ SQL/JSON path expressions support the ability to match text using regular
+ expressions with the like_regex filter. For example,
+ the following SQL/JSON path query would case-insensitively match all
+ strings in an array that start with a vowel:
+
+'$[*] ? (@ like_regex "^[aeiou]" flag "i")'
+
+
+
+
+ The SQL/JSON standard borrows its definition for how regular expressions
+ from the LIKE_REGEX operator, which in turns uses the
+ XQuery standard. PostgreSQL does not support the
+ LIKE_REGEX operator as it currently implements
+ POSIX regular expressions.
+
+
+
+ For its implementation of the SQL/JSON path like_regex
+ filter, PostgreSQL uses its POSIX implementation to evaluate the
+ regular expressions. While similar to the SQL standard specification for
+ the LIKE_REGEX operator, there are some noted
+ differences that you can read about in
+ .
+
+
+   
+

SQL/JSON 

Re: Nondeterministic collations vs. text_pattern_ops

2019-09-17 Thread Tom Lane
Peter Eisentraut  writes:
> On 2019-09-17 01:13, Tom Lane wrote:
>> Whilst poking at the leakproofness-of-texteq issue, I realized
>> that there's an independent problem caused by the nondeterminism
>> patch.  To wit, that the text_pattern_ops btree opclass uses
>> texteq as its equality operator, even though that operator is
>> no longer guaranteed to be bitwise equality.  That means that
>> depending on which collation happens to get attached to the
>> operator, equality might be inconsistent with the other members
>> of the opclass, leading to who-knows-what bad results.

> You can't create a text_pattern_ops index on a column with
> nondeterministic collation:

> create collation c1 (provider = icu, locale = 'und', deterministic = false);
> create table t1 (a int, b text collate c1);
> create index on t1 (b text_pattern_ops);
> ERROR:  nondeterministic collations are not supported for operator class
> "text_pattern_ops"

Oh!  I'd seen that error message, but not realized that it'd get
thrown during index creation.

> There is some discussion in internal_text_pattern_compare().

I don't much like doing it that way: looking up the determinism property
of the collation over again in every single comparison seems pretty
expensive, plus the operator is way exceeding its actual knowledge
of the call context by throwing an error phrased that way.

> Are there other cases we need to consider?

I think that disallowing indexes with this combination of opclass and
collation may actually be sufficient.  A query can request equality
using any collation, but it won't get matched to an index with a
different collation, so I think we're safe against index-related
problems if we have that restriction.

AFAIR, the only other place in the system where non-default opclasses
can be invoked is ORDER BY.  Somebody could write

ORDER BY textcol COLLATE "nondeterm" USING ~<~

However, I think we're actually okay on that one, because although
the equality opclass member is out of sync with the rest, it won't
get consulted during a sort.  internal_text_pattern_compare will
throw an error for this, but I don't believe it actually needs to.

My recommendation is to get rid of the run-time checks and instead
put a hack like this into DefineIndex or some minion thereof:

if ((opclass == TEXT_PATTERN_BTREE_CLASS_OID ||
 opclass == VARCHAR_PATTERN_BTREE_CLASS_OID ||
 opclass == BPCHAR_PATTERN_BTREE_CLASS_OID) &&
!get_collation_isdeterministic(collid))
   ereport(ERROR, ...);

Hard-wiring that is ugly; maybe someday we'd wish to expose "doesn't
allow nondeterminism" as a more generally-available opclass property.
But without some other examples that have a need for it, I think it's
not worth the work to create infrastructure for that.  It's not like
there are no other hard-wired legacy behaviors in DefineIndex...

> I notice that there is a hash opclass text_pattern_ops, which I'd
> actually never heard of until now, and I don't see documented.  What
> would we need to do about that?

Huh.  I wonder why that's there --- I can't see a reason why it'd
behave differently from the regular hash text_ops.  Maybe we feared
that someday it would need to be different?  Anyway, I think we can
just ignore it for this purpose.

> Would it help if one created COLLATE "C" indexes instead of
> text_pattern_ops?  What are the tradeoffs between the two approaches?

Of course, the pattern_ops opclasses long predate our COLLATE support.
I suspect if we'd had COLLATE we never would have invented them.
I believe the pattern_ops are equivalent to a COLLATE "C" index, both
theoretically and in terms of the optimizations we know about making.
There might be some minor performance difference from not having to
look up the collation, but not much if we've done the collate-is-c
fast paths properly.

regards, tom lane




Re: patch: psql - enforce constant width of last column

2019-09-17 Thread Pavel Stehule
út 17. 9. 2019 v 17:06 odesílatel Ahsan Hadi  napsal:

> Hi Pavel,
>
> I have been trying to reproduce the case of badly displaying last columns
> of a query result-set. I played around with the legal values for psql
> border variable but not able to find a case where last columns are badly
> displayed. Can you please share an example that I can use to reproduce this
> problem. I will try out your patch once I am able to reproduce the problem.
>

you need to use pspg, and vertical cursor.

https://github.com/okbob/pspg
vertical cursor should be active

\pset border 1
\pset linestyle ascii
\pset pager always

select * from generate_series(1,3);

Regards

Pavel


> Thanks,
>
> -- Ahsan
>
>
> On Mon, Sep 9, 2019 at 2:32 PM Pavel Stehule 
> wrote:
>
>> Hi
>>
>> When I played with vertical cursor support I got badly displayed last
>> columns when border was not 2. Only when border is 2, then psql displays
>> last column with same width for each row.
>>
>> I think so we can force column width alignment for any border styles
>> today (for alignment and wrapping styles) or as minimum this behave can be
>> optional.
>>
>> I wrote a patch with pset option "final_spaces", but I don't see a reason
>> why we trim rows today.
>>
>> Regards
>>
>> Pavel
>>
>


Re: Feature request: binary NOTIFY

2019-09-17 Thread Pavel Stehule
út 17. 9. 2019 v 16:10 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > út 17. 9. 2019 v 10:01 odesílatel Mitar  napsal:
> >> I might have missed prior discussions about this, but I wonder if it
> >> would be possible to support binary payloads for NOTIFY/LISTEN? Again
> >> and again I find it very limiting with just text (have to base64
> >> encode data, or convert it to JSON).
>
> > I think so is not any problem to pass binary data already.
>
> Yeah it is ... the internal async-queue data structure assumes
> null-terminated strings.  What's a lot worse, so does the
> wire protocol's NotificationResponse message, as well as every
> existing client that can read it.  (For instance, libpq's exposed
> API for notify messages hands back the payload as a null-terminated
> string.)  I don't think this is going to happen.
>

ok, thank you for correction.

Regards

Pavel

>
> regards, tom lane
>


Re: patch: psql - enforce constant width of last column

2019-09-17 Thread Ahsan Hadi
Hi Pavel,

I have been trying to reproduce the case of badly displaying last columns
of a query result-set. I played around with the legal values for psql
border variable but not able to find a case where last columns are badly
displayed. Can you please share an example that I can use to reproduce this
problem. I will try out your patch once I am able to reproduce the problem.

Thanks,

-- Ahsan


On Mon, Sep 9, 2019 at 2:32 PM Pavel Stehule 
wrote:

> Hi
>
> When I played with vertical cursor support I got badly displayed last
> columns when border was not 2. Only when border is 2, then psql displays
> last column with same width for each row.
>
> I think so we can force column width alignment for any border styles today
> (for alignment and wrapping styles) or as minimum this behave can be
> optional.
>
> I wrote a patch with pset option "final_spaces", but I don't see a reason
> why we trim rows today.
>
> Regards
>
> Pavel
>


Re: block-level incremental backup

2019-09-17 Thread Robert Haas
On Mon, Sep 16, 2019 at 3:38 PM Stephen Frost  wrote:
> As discussed nearby, not everything that needs to be included in the
> backup is actually going to be in the WAL though, right?  How would that
> ever be able to handle the case where someone starts the server under
> wal_level = logical, takes a full backup, then restarts with wal_level =
> minimal, writes out a bunch of new data, and then restarts back to
> wal_level = logical and takes an incremental?

Fair point. I think the WAL-scanning approach can only work if
wal_level > minimal. But, I also think that few people run with
wal_level = minimal in this era where the default has been changed to
replica; and I think we can detect the WAL level in use while scanning
WAL. It can only change at a checkpoint.

> On larger systems, so many of the files are 1GB in size that checking
> the file size is quite close to meaningless.  Yes, having to checksum
> all of the files definitely adds to the cost of taking the backup, but
> to avoid it we need strong assurances that a given file hasn't been
> changed since our last full backup.  WAL, today at least, isn't quite
> that, and timestamps can possibly be fooled with, so if you'd like to be
> particularly careful, there doesn't seem to be a lot of alternatives.

I see your points, but it feels like you're trying to talk down the
WAL-based approach over what seem to me to be fairly manageable corner
cases.

> I'm not asking you to be an expert on those systems, just to help me
> understand the statements you're making.  How is backing up to a
> pgbackrest repo different than running a pg_basebackup in the context of
> using some other Enterprise backup system?  In both cases, you'll have a
> full copy of the backup (presumably compressed) somewhere out on a disk
> or filesystem which is then backed up by the Enterprise tool.

Well, I think that what people really want is to be able to backup
straight into the enterprise tool, without an intermediate step.

My basic point here is: As with practically all PostgreSQL
development, I think we should try to expose capabilities and avoid
making policy on behalf of users.

I'm not objecting to the idea of having tools that can help users
figure out how much WAL they need to retain -- but insofar as we can
do it, such tools should work regardless of where that WAL is actually
stored. I dislike the idea that PostgreSQL would provide something
akin to a "pgbackrest repository" in core, or I at least I think it
would be important that we're careful about how much functionality
gets tied to the presence and use of such a thing, because, at least
based on my experience working at EnterpriseDB, larger customers often
don't want to do it that way.

> That's not great, of course, which is why there are trade-offs to be
> made, one of which typically involves using timestamps, but doing so
> quite carefully, to perform the file exclusion.  Other ideas are great
> but it seems like WAL isn't really a great idea unless we make some
> changes there and we, as in PG, haven't got a robust "we know this file
> changed as of this point" to work from.  I worry that we're putting too
> much faith into a system to do something independent of what it was
> actually built and designed to do, and thinking that because we could
> trust it for X, we can trust it for Y.

That seems like a considerable overreaction to me based on the
problems reported thus far. The fact is, WAL was originally intended
for crash recovery and has subsequently been generalized to be usable
for point-in-time recovery, standby servers, and logical decoding.
It's clearly established at this point as the canonical way that you
know what in the database has changed, which is the same need that we
have for incremental backup.

At any rate, the same criticism can be leveled - IMHO with a lot more
validity - at timestamps. Last-modification timestamps are completely
outside of our control; they are owned by the OS and various operating
systems can and do have varying behavior. They can go backwards when
things have changed; they can go forwards when things have not
changed. They were clearly not intended to meet this kind of
requirement. Even, they were intended for that purpose much less so
than WAL, which was actually designed for a requirement in this
general ballpark, if not this thing precisely.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pgbench - allow to create partitioned tables

2019-09-17 Thread Amit Kapila
On Tue, Sep 17, 2019 at 6:38 PM Fabien COELHO  wrote:
>
>
> >> It is used at one place where we can set PART_NONE without much loss.
> >> Having lesser invalid values makes code easier to follow.
> >
> > Looking more closely at this case:
> > + else if (PQntuples(res) != 1)
> > + {
> > + /* unsure because multiple (or no) pgbench_accounts found */
> > + partition_method = PART_UNKNOWN;
> > + partitions = 0;
> > + }
> >
> > Is it ever possible to have multiple pgbench_accounts considering we
> > have unique index on (relname, relnamespace) for pg_class?
>
> The issue is that it is not directly obvious which relnamespace will be
> used by the queries which rely on non schema qualified "pgbench_accounts".
>

It seems to me the patch already uses namespace in the query, so this
should not be a problem here.  The part of query is as below:
+ res = PQexec(con,
+ "select p.partstrat, count(p.partrelid) "
+ "from pg_catalog.pg_class as c "

This uses pg_catalog, so it should not have multiple entries for
"pgbench_accounts".


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Feature request: binary NOTIFY

2019-09-17 Thread Tom Lane
Pavel Stehule  writes:
> út 17. 9. 2019 v 10:01 odesílatel Mitar  napsal:
>> I might have missed prior discussions about this, but I wonder if it
>> would be possible to support binary payloads for NOTIFY/LISTEN? Again
>> and again I find it very limiting with just text (have to base64
>> encode data, or convert it to JSON).

> I think so is not any problem to pass binary data already.

Yeah it is ... the internal async-queue data structure assumes
null-terminated strings.  What's a lot worse, so does the
wire protocol's NotificationResponse message, as well as every
existing client that can read it.  (For instance, libpq's exposed
API for notify messages hands back the payload as a null-terminated
string.)  I don't think this is going to happen.

regards, tom lane




Re: [proposal] de-TOAST'ing using a iterator

2019-09-17 Thread Tomas Vondra

On Mon, Sep 16, 2019 at 06:22:51PM -0300, Alvaro Herrera wrote:

On 2019-Sep-10, Binguo Bao wrote:


+/*
+ * Support for de-TOASTing toasted value iteratively. "need" is a pointer
+ * between the beginning and end of iterator's ToastBuffer. The marco
+ * de-TOAST all bytes before "need" into iterator's ToastBuffer.
+ */
+#define PG_DETOAST_ITERATE(iter, need) 
\
+   do {
\
+   Assert((need) >= (iter)->buf->buf && (need) <= 
(iter)->buf->capacity);\
+   while (!(iter)->done && (need) >= (iter)->buf->limit) { 
\
+   detoast_iterate(iter);  
\
+   }   

\
+   } while (0)
 /* WARNING -- unaligned pointer */
 #define PG_DETOAST_DATUM_PACKED(datum) \
pg_detoast_datum_packed((struct varlena *) DatumGetPointer(datum))


In broad terms this patch looks pretty good to me.  I only have a small
quibble with this API definition in fmgr.h -- namely that it forces us
to export the definition of all the structs (that could otherwise be
private to toast_internals.h) in order to satisfy callers of this macro.
I am wondering if it would be possible to change detoast_iterate and
PG_DETOAST_ITERATE in a way that those details remain hidden -- I am
thinking something like "if this returns NULL, then iteration has
finished"; and relieve the macro from doing the "->buf->buf" and
"->buf->limit" checks.  I think changing that would require a change in
how the rest of the code is structured around this (the textpos internal
function), but seems like it would be better overall.

(AFAICS that would enable us to expose much less about the
iterator-related structs to detoast.h -- you should be able to move the
struct defs to toast_internals.h)

Then again, it might be just wishful thinking, but it seems worth
considering at least.



I do agree hiding the exact struct definition would be nice. IMHO if the
only reason for exposing it is the PG_DETOAST_ITERATE() macro (or rather
the references to buf fields in it) then we can simply provide functions
to return those fields.

Granted, that may have impact on performance, but I'm not sure it'll be
even measurable. Also, the other detoast macros right before this new
one are also ultimately just a function calls.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: pgbench - allow to create partitioned tables

2019-09-17 Thread Fabien COELHO


Hello Amit,


Why can't we change it as attached?


I think that your version works, but I do not like much the condition for
the normal case which is implicitely assumed. The solution I took has 3
clear-cut cases: 1 error against a server without partition support,
detect multiple pgbench_accounts table -- argh, and then the normal
expected case, whether partitioned or not. Your solution has 4 cases
because of the last implicit zero-row select that relies on default, which
would need some explanations.


Why?


Hmmm. This is a coding-philosophy question:-)

To be nice to the code reader?

You have several if cases, but the last one is to keep the default *which 
means something*. ISTM that the default is kept in two cases: when there 
is a pgbench_accounts without partitioning, and when no pgbench_accounts 
was found, in which case the defaults are plain false. I could be okay of 
the default say "we do not know", but for me having all cases explicitely 
covered in one place helps understand the behavior of a code.


Here, we are fetching the partitioning information. If it exists, then 
we remember that to display for later, otherwise, the default should 
apply.


Yep, but the default is also kept if nothing is found, whereas the left 
join solution would give one row when found and empty when not found, 
which for me are quite distinct cases.



Oh no, I am not generally against using left join, but here it appears
like using it without much need.  If nothing else, it will consume
more cycles to fetch one extra row when we can avoid it.


As pointed out, the left join allows to distinguish "not found" from "not 
partitioned" logically, even if no explicit use of that is done 
afterwards.



Irrespective of whether we use left join or not, I think the below
change from my patch is important.
- /* only print partitioning information if some partitioning was detected */
- if (partition_method != PART_NONE && partition_method != PART_UNKNOWN)
+ /* print partitioning information only if there exists any partition */
+ if (partitions > 0)

Basically, it would be good if we just rely on 'partitions' to decide
whether we have partitions or not.


Could be, although I was thinking of telling the user that we do not know 
on unknown. I'll think about this one.


In the case at hand, I find that getting one row in all cases pretty 
elegant because there is just one code for handling them all.


Hmm, I would be fine if you can show some other place in code where
such a method is used


No problem:-) Although there are no other catalog queries in "pgbench", 
there are plenty in "psql" and "pg_dump", and also in some other commands, 
and they often rely on "LEFT" joins:


  sh> grep LEFT src/bin/psql/*.c | wc -l   # 58
  sh> grep LEFT src/bin/pg_dump/*.c | wc -l# 54

Note that there are no "RIGHT" nor "FULL" joins…


What is the need of using regress_pgbench_tap_1_ts in this test?


I wanted to check that tablespace options work appropriately with
partition tables, as I changed the create table stuff significantly, and
just using "pg_default" is kind of cheating.


I think your change will be tested if there is a '--tablespace'
option.


Yes. There is just one, really.

Even if you want to test win non-default tablespace, then also, adding 
additional test would make more sense rather than changing existing one 
which is testing a valid thing.


Tom tends to think that there are already too many tests, so I try to keep 
them as compact/combined as possible. Moreover, the spirit of this test is 
to cover "all possible options", so it made also sense to add the new 
options there, and it achieves both coverage and testing my changes with 
an explicit tablespace.


Also, there is an existing way to create tablespace location in 
"src/bin/pg_checksums/t/002_actions".  I think we can use the same. I 
don't find any problem with your way, but why having multiple ways of 
doing same thing in code.  We need to test this on windows also once as 
this involves some path creation which might vary, although I don't 
think there should be any problem in that especially if we use existing 
way.


Ok, I'll look at the pg_checksums way to do that.


- 'pgbench scale 1 initialization');
+ 'pgbench scale 1 initialization with options');

Similar to the above, it is not clear to me why we need to change this?


Because I noticed that it had the same description as the previous one, 
so I made the test name distinct and more precise, while I was adding 
options on it.


Hmmm. Keeping the same name is really a copy paste error, and I wanted to 
avoid a distinct commit for more than very minor thing.



Good observation, but better be done separately.  I think in general
the more unrelated changes are present in patch, the more time it
takes to review.


Then let's keep the same name.


One more comment:
+typedef enum { PART_NONE, PART_RANGE, PART_HASH, PART_UNKNOWN }
+  partition_method_t;

See, if we can eliminate 

Re: range test for hash index?

2019-09-17 Thread Mahendra Singh
Hello,

I've done some code coverage testing by running make check-world. It
doesn't show any difference in the test coverage. The patch looks good to
me.

-- 
Thanks & Regards,
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com


Re: ICU for global collation

2019-09-17 Thread Daniel Verite
 Hi,

When trying databases defined with ICU locales, I see that backends
that serve such databases seem to have their LC_CTYPE inherited from
the environment (as opposed to a per-database fixed value).

That's a problem for the backend code that depends on libc functions
that themselves depend on LC_CTYPE, such as the full text search parser
and dictionaries.

For instance, if you start the instance with a C locale
(LC_ALL=C pg_ctl...) , and tries to use FTS in an ICU UTF-8 database,
it doesn't work:

template1=# create database "fr-utf8" 
  template 'template0' encoding UTF8
  locale 'fr'
  collation_provider 'icu';

template1=# \c fr-utf8
You are now connected to database "fr-utf8" as user "daniel".

fr-utf8=# show lc_ctype;
 lc_ctype 
--
 fr
(1 row)

fr-utf8=# select to_tsvector('été');
ERROR:  invalid multibyte character for locale
HINT:  The server's LC_CTYPE locale is probably incompatible with the
database encoding.

If I peek into the "real" LC_CTYPE when connected to this database,
I can see it's "C":

fr-utf8=# create extension plperl;
CREATE EXTENSION

fr-utf8=# create function lc_ctype() returns text as '$ENV{LC_CTYPE};'
  language plperl;
CREATE FUNCTION

fr-utf8=# select lc_ctype();
 lc_ctype 
--
 C


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: pgbench - allow to create partitioned tables

2019-09-17 Thread Fabien COELHO



Hello Amit,


One more comment:
+typedef enum { PART_NONE, PART_RANGE, PART_HASH, PART_UNKNOWN }
+  partition_method_t;

See, if we can eliminate PART_UNKNOWN.


I'm not very happy with this one, but I wanted to differentiate "we do 
know that it is not partitioned" from "we do not know if it is 
partitioned", and I did not have a better idea.



 I don't see much use of same.


Although it is not used afterwards, we could display the partitioning 
information differently between the two cases. This is not done because I 
did not want to add more lines on the "normal" case.



It is used at one place where we can set PART_NONE without much loss.
Having lesser invalid values makes code easier to follow.


Looking more closely at this case:
+ else if (PQntuples(res) != 1)
+ {
+ /* unsure because multiple (or no) pgbench_accounts found */
+ partition_method = PART_UNKNOWN;
+ partitions = 0;
+ }

Is it ever possible to have multiple pgbench_accounts considering we
have unique index on (relname, relnamespace) for pg_class?


The issue is that it is not directly obvious which relnamespace will be 
used by the queries which rely on non schema qualified "pgbench_accounts". 
Each schema could theoretically hold a pgbench_accounts table. As this is 
pretty unlikely, I did not attempt to add complexity to resolve taking 
into account the search_path, but just skipped to unknown in this case, 
which I expect nobody would hit in normal circumstances.


Another possible and unlikely issue is that pgbench_accounts could have 
been deleted but not pgbench_branches which is used earlier to get the 
current "scale". If so, the queries will fail later on anyway.


--
Fabien.




Re: pg_rewind docs correction

2019-09-17 Thread James Coleman
On Tue, Sep 17, 2019 at 3:51 AM Michael Paquier  wrote:
>
> On Sun, Sep 15, 2019 at 10:36:04AM -0400, James Coleman wrote:
> > On Sun, Sep 15, 2019 at 10:25 AM Michael Paquier  
> > wrote:
> >> +The rewind operation is not expected to result in a consistent data
> >> +directory state either internally to the node or with respect to the 
> >> rest
> >> +of the cluster. Instead the resulting data directory will only be 
> >> consistent
> >> +after WAL replay has completed to at least the LSN at which changed 
> >> blocks
> >> +copied from the source were originally written on the source.
> >>
> >> That's not necessarily true.  pg_rewind enforces in the control file
> >> of the target the minimum consistency LSN to be
> >> pg_current_wal_insert_lsn() when using a live source or the last
> >> checkpoint LSN for a stopped source, so while that sounds true from
> >> the point of view of all the blocks copied, the control file may still
> >> cause a complain that the target recovering has not reached its
> >> consistent point even if all the blocks are already at a position
> >> not-so-far from what has been registered in the control file.
> >
> > I could just say "after WAL replay has completed to a consistent state"?
>
> I still would not change this paragraph.  The first sentence means
> that we have an equivalency, because that's the case if you think
> about it as we make sure that the target is able to sync with the
> source, and the target gets into a state where it as an on-disk state
> equivalent to the target up to the minimum consistency point defined
> in the control file once the tool has done its work (this last point
> is too precise to be included in a global description to be honest).
> And the second sentence makes clear what are the actual diffs are.
> >> +   the point at which the WAL timelines of the source and target diverged 
> >> plus
> >> +   the current state on the source of any blocks changed on the target 
> >> after
> >> +   that divergence. While only changed blocks from existing relation 
> >> files are
> >>
> >> And here we could mention that all the blocks copied from the source
> >> are the ones which are found in the WAL records of the target until
> >> the end of WAL of its timeline.  Still, that's basically what is
> >> mentioned in the first part of "How It Works", which explains things
> >> better.  I honestly don't really see that all this paragraph is an
> >> improvement over the simplicity of the original when it comes to
> >> understand the global idea of what pg_rewind does.
> >
> > The problem with the original is that while simple, it's actually
> > incorrect in that simplicity. Pg_rewind does *not* result in the data
> > directory on the target matching the data directory on the source.
>
> That's not what I get from the original docs, but I may be too much
> used to it.

I don't agree that that's a valid equivalency. I myself spent a lot of
time trying to understand how this could possibly be true a while
back, and even looked at source code to be certain. I've asked other
people and found the same confusion.

As I read it the 2nd second sentence doesn't actually tell you the
differences; it makes a quick attempt at summarizing *how* the first
sentence is true, but if the first sentence isn't accurate, then it's
hard to read the 2nd one as helping.

If you'd prefer something less detailed at this point at that point in
the docs, then something along the lines of "results in a data
directory state which can then be safely replayed from the source" or
some such.

The docs shouldn't be correct just for someone how already understands
the intricacies. And the end  user shouldn't have to read the "how it
works" (which incidentally is kinda hidden at the bottom underneath
the CLI args -- perhaps we could move that?) to extrapolate things in
the primary documentation.

James Coleman




Re: [DOC] Document auto vacuum interruption

2019-09-17 Thread James Coleman
On Tue, Sep 17, 2019 at 2:21 AM Amit Kapila  wrote:
>
> On Fri, Sep 13, 2019 at 11:59 PM James Coleman  wrote:
> >
> > On Sat, Aug 31, 2019 at 10:51 PM Amit Kapila  
> > wrote:
> > >
> >
> > Updated patch attached. I changed the wording to be about conflicting
> > locks rather than a single lock type, added a link to the conflicting
> > locks table, and fixed a few sgml syntax issues in the original.
> >
>
> I see error while compiling this patch on HEAD.  See the below error:
> /usr/bin/xmllint --path . --noout --valid postgres.sgml
> postgres.sgml:833: element xref: validity error : IDREF attribute
> linkend references an unknown ID
> "mvcc-locking-tables-table-lock-compatibility"
> make: *** [check] Error 4
>
> The tag id mvcc-locking-tables-table-lock-compatibility is wrong.

My apologies; I'd fixed that on my local copy before sending my last
email, but I must have somehow grabbed the wrong patch file to attach
to the email.

> The
> other problem I see is the wrong wording in one of the literals. I
> have fixed both of these issues and slightly tweaked one of the
> sentence.  See the updated patch attached.  On which version, are you
> preparing this patch?  I see both HEAD and 9.4 has the problems fixed
> by me.
>
> Let me know what you think of attached?  I think we can back-patch
> this patch.  What do you think?  Does anyone else have an opinion on
> this patch especially if we see any problem in back-patching this?

The attached looks great!

I was working on HEAD for the patch, but this concern has been an
issue for quite a long time. We were running into it on 9.6 in
production, for example. And given how frequently it seems like there
are large-scale production issues related to auto vacuum, I think any
amount of back patching we can do to make that footgun less likely
would be a good thing.

James Coleman




Re: pgbench - allow to create partitioned tables

2019-09-17 Thread Amit Kapila
On Tue, Sep 17, 2019 at 4:24 PM Amit Kapila  wrote:
>
> On Sat, Sep 14, 2019 at 6:35 PM Fabien COELHO  wrote:
>
> One more comment:
> +typedef enum { PART_NONE, PART_RANGE, PART_HASH, PART_UNKNOWN }
> +  partition_method_t;
>
> See, if we can eliminate PART_UNKNOWN.  I don't see much use of same.
> It is used at one place where we can set PART_NONE without much loss.
> Having lesser invalid values makes code easier to follow.
>

Looking more closely at this case:
+ else if (PQntuples(res) != 1)
+ {
+ /* unsure because multiple (or no) pgbench_accounts found */
+ partition_method = PART_UNKNOWN;
+ partitions = 0;
+ }

Is it ever possible to have multiple pgbench_accounts considering we
have unique index on (relname, relnamespace) for pg_class?



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: progress report for ANALYZE

2019-09-17 Thread vignesh C
On Thu, Sep 5, 2019 at 2:31 AM Alvaro Herrera  wrote:
>
> There were some minor problems in v5 -- bogus Docbook as well as
> outdated rules.out, small "git diff --check" complaint about whitespace.
> This v6 (on today's master) fixes those, no other changes.
>
+ 
+   The command is preparing to begin scanning the heap.  This phase is
+   expected to be very brief.
+ 
In the above after "." there is an extra space, is this intentional. I
had noticed that in lot of places there is couple of spaces and
sometimes single space across this file.

Like in below, there is single space after ".":
 Time when this process' current transaction was started, or null
  if no transaction is active. If the current
  query is the first of its transaction, this column is equal to the
  query_start column.
 

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: dropdb --force

2019-09-17 Thread Pavel Stehule
Hi

I started work on this patch. I changed syntax to

DROP DATABASE [ ( FORCE , ..) ] [IF EXISTS ...]

and now I try to fix all other points from Tom's list

út 17. 9. 2019 v 12:15 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > [ drop-database-force-20190708.patch ]
>
> I took a brief look at this, but I don't think it's really close to
> being committable.
>
> * The documentation claims FORCE will fail if you don't have privileges
> to terminate the other session(s) in the target DB.  This is a lie; the
> code issues kill() without any regard for such niceties.  You could
> perhaps make that better by going through pg_signal_backend().
>

This is question. There are two possible requirements about necessary rights

a) rights for other process termination
b) database owner can drop database

I understand very well to Tom's objection, but request to have
ROLE_SIGNAL_BACKEND or be super user looks too strong and not too much
practical.

If I am a owner of database, and I have a right to drop this database, why
I cannot to kick some other user that block database dropping.

What do you think about it? If I use pg_signal_backend, then the necessary
requirement for using DROP FORCE have to be granted SIGNAL_BACKEND rights.


I am sending updated version

Regards


Pavel



>
> * You've hacked CountOtherDBBackends to do something that's completely
> outside the charter one would expect from its name, and not even
> bothered to update its header comment.  This requires more attention
> to not confusing future hackers; I'd say you can't even use that
> function name anymore.
>
> * You've also ignored the existing code in CountOtherDBBackends
> that is careful *not* to issue a kill() while holding the critical
> ProcArrayLock.  That problem would get enormously worse if you
> tried to sub in pg_signal_backend() there, since that function
> may do catalog accesses --- it's pretty likely that you could
> get actual deadlocks, never mind just trashing performance.
>
> * I really dislike the addition of more hard-wired delays and
> timeouts to dropdb().  It's bad enough that CountOtherDBBackends
> has got that.  Two layers of timeout are a rickety mess, and
> who's to say that a 1-minute timeout is appropriate for anything?
>
> * I'm concerned that the proposed syntax is not future-proof.
> FORCE is not a reserved word, and we surely don't want to make
> it one; but just appending it to the end of the command without
> any decoration seems like a recipe for problems if anybody wants
> to add other options later.  (Possible examples: RESTRICT/CASCADE,
> or a user-defined timeout.)  Maybe some parentheses would help?
> Or possibly I'm being overly paranoid, but ...
>
>
> I hadn't been paying any attention to this thread before now,
> but I'd assumed from the thread title that the idea was to implement
> any attempted kills in the dropdb app, not on the backend side.
> (As indeed it looks like the first version did.)  Maybe it would be
> better to go back to that, instead of putting dubious behaviors
> into the core server.
>
> regards, tom lane
>
>
>
>
>
diff --git a/doc/src/sgml/ref/drop_database.sgml b/doc/src/sgml/ref/drop_database.sgml
index 3ac06c984a..4a230418c1 100644
--- a/doc/src/sgml/ref/drop_database.sgml
+++ b/doc/src/sgml/ref/drop_database.sgml
@@ -21,7 +21,11 @@ PostgreSQL documentation
 
  
 
-DROP DATABASE [ IF EXISTS ] name
+DROP DATABASE [ ( option [, ...] ) ] [ IF EXISTS ] name
+
+where option can be one of:
+
+FORCE
 
  
 
@@ -32,9 +36,11 @@ DROP DATABASE [ IF EXISTS ] name
DROP DATABASE drops a database. It removes the
catalog entries for the database and deletes the directory
containing the data.  It can only be executed by the database owner.
-   Also, it cannot be executed while you or anyone else are connected
-   to the target database.  (Connect to postgres or any
-   other database to issue this command.)
+   Also, it cannot be executed while you are connected to the target database.
+   (Connect to postgres or any other database to issue this
+   command.)
+   If anyone else is connected to the target database, this command will fail
+   - unless you use the FORCE option described below.
   
 
   
@@ -64,6 +70,24 @@ DROP DATABASE [ IF EXISTS ] name
  
 

+
+   
+FORCE
+
+ 
+  Attempt to terminate all existing connections to the target database.
+ 
+ 
+  This will fail, if current user has no permissions to terminate other
+  connections. Required permissions are the same as with 
+  pg_terminate_backend, described
+  in .
+
+  This will also fail, if the connections do not terminate in 60 seconds.
+ 
+
+   
+
   
  
 
diff --git a/doc/src/sgml/ref/dropdb.sgml b/doc/src/sgml/ref/dropdb.sgml
index 3fbdb33716..5d10ad1c92 100644
--- a/doc/src/sgml/ref/dropdb.sgml
+++ b/doc/src/sgml/ref/dropdb.sgml
@@ -86,6 +86,20 @@ PostgreSQL documentation
   
  
 
+ 
+  -f
+  

Re: Support for CALL statement in ecpg

2019-09-17 Thread Ashutosh Sharma
On Tue, Sep 17, 2019 at 1:06 PM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> I don't find this patch in any commit fest.  Seems like a good addition.
>

Thanks for the consideration. Will add an entry for it in the commit fest.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:*http://www.enterprisedb.com *


About copy to view

2019-09-17 Thread Jinhua Luo
Hi,

Why view needs instead of trigger to be the target of "copy to"?
With default view rule, the insert would be successful, so it should
also works for copy.

The reason to ask this question is I need to "copy to" the view using
"replica" session role. But instead of trigger on view could not be
set to "enable always" or "enable replica", because "alter table"
would error it's not a base table, e.g.

tmp=# alter table foobar2_view enable always trigger foobar2_view_trigger;
ERROR:  "foobar2_view" is not a table or foreign table

Help please, thanks.

Regards,
Jinhua Luo




pause recovery if pitr target not reached

2019-09-17 Thread Leif Gunnar Erlandsen
This patch allows PostgreSQL to pause recovery before PITR target is reached 
if recovery_target_time is specified.

Missing WAL's could then be restored from backup and applied on next restart.

Today PostgreSQL opens the database in read/write on a new timeline even when 
PITR tareg is not reached.

make check is run with this patch with result "All 192 tests passed."
Source used is from version 12b4.

For both examples below "recovery_target_time = '2019-09-17 09:24:00'"

_
Log from todays behavior:

[20875] LOG:  starting point-in-time recovery to 2019-09-17 09:24:00+02
[20875] LOG:  restored log file "00010002" from archive
[20875] LOG:  redo starts at 0/228
[20875] LOG:  consistent recovery state reached at 0/2000100
[20870] LOG:  database system is ready to accept read only connections
[20875] LOG:  restored log file "00010003" from archive
[20875] LOG:  restored log file "00010004" from archive
cp: cannot stat '/var/lib/pgsql/12/archivedwal/00010005': No 
such file or directory
[20875] LOG:  redo done at 0/40080C8
[20875] LOG:  last completed transaction was at log time 2019-09-17 
09:13:10.524645+02
[20875] LOG:  restored log file "00010004" from archive
cp: cannot stat '/var/lib/pgsql/12/archivedwal/0002.history': No such file 
or directory
[20875] LOG:  selected new timeline ID: 2
[20875] LOG:  archive recovery complete
cp: cannot stat '/var/lib/pgsql/12/archivedwal/0001.history': No such file 
or directory
[20870] LOG:  database system is ready to accept connections


And with patched source:

[20899] LOG:  starting point-in-time recovery to 2019-09-17 09:24:00+02
[20899] LOG:  restored log file "00010002" from archive
[20899] LOG:  redo starts at 0/228
[20899] LOG:  consistent recovery state reached at 0/20002B0
[20895] LOG:  database system is ready to accept read only connections
[20899] LOG:  restored log file "00010003" from archive
[20899] LOG:  restored log file "00010004" from archive
cp: cannot stat '/var/lib/pgsql/12m/archivedwal/00010005': No 
such file or directory
[20899] LOG:  Recovery target not reached but next WAL record culd not be read
[20899] LOG:  redo done at 0/4007D40
[20899] LOG:  last completed transaction was at log time 2019-09-17 
09:13:10.539546+02
[20899] LOG:  recovery has paused
[20899] HINT:  Execute pg_wal_replay_resume() to continue.


You could restore WAL in several steps and when target is reached you get this 
log

[21943] LOG:  starting point-in-time recovery to 2019-09-17 09:24:00+02
[21943] LOG:  restored log file "00010005" from archive
[21943] LOG:  redo starts at 0/5003C38
[21943] LOG:  consistent recovery state reached at 0/600
[21941] LOG:  database system is ready to accept read only connections
[21943] LOG:  restored log file "00010006" from archive
[21943] LOG:  recovery stopping before commit of transaction 859, time 
2019-09-17 09:24:02.58576+02
[21943] LOG:  recovery has paused
[21943] HINT:  Execute pg_wal_replay_resume() to continue.

Execute pg_wal_replay_resume() as hinted.

[21943] LOG:  redo done at 0/6001830
[21943] LOG:  last completed transaction was at log time 2019-09-17 
09:23:57.496945+02
cp: cannot stat '/var/lib/pgsql/12m/archivedwal/0002.history': No such file 
or directory
[21943] LOG:  selected new timeline ID: 2
[21943] LOG:  archive recovery complete
cp: cannot stat '/var/lib/pgsql/12m/archivedwal/0001.history': No such file 
or directory
[21941] LOG:  database system is ready to accept connections





Leif Gunnar Erlandsen


0001-pause-recovery-if-pitr-target-not-reached.patch
Description: Binary data


Re: psql - improve test coverage from 41% to 88%

2019-09-17 Thread vignesh C
On Thu, Sep 12, 2019 at 4:49 PM Fabien COELHO  wrote:
>
>
> >> Here is a v5.
>
> > Few more in icommand_checks subroutine:
> > Few unwanted code can be removed.
>
> Indeed, more debug and test code.
>
> Attached v6 fixes these, and I checked for remaining scrubs without
> finding any.
>
Few comments:
+ [ 'START TRANSACTION', [ qr{ISOLATION LEVEL}, qr{(?!BEGIN)} ] ],
+ [ 'TABLE', [ qr{ONLY} ] ], # hmmm...
+ [ 'TRUNCATE', [ qr{CONTINUE IDENTITY} ] ],
+ [ 'UNLISTEN', [ ] ],

We can remove # hmmm... if not required

+ [ 'UPDATE', [ qr{RETURNING} ] ],
+ [ 'VACUUM', [ qr{FREEZE} ] ],
+ [ 'VALUES', [ qr{ORDER BY} ] ],
+ [ 'WITH', [ qr{RECURSIVE} ] ], # SELECT duplicate?
+);

We can remove # SELECT duplicate? if not required

+
+psql('--log-file=/dev/null', 0, "SELECT 5432 AS pg\n",
+ [ qr/\b5432\b/ ], $EMPTY, 'psql -L null');
+
+psql('', 0, "\\copy public.regress_psql_tap_1_t1(data) FROM PROGRAM
'echo moe'\n",
+ [ qr/COPY 1\b/ ], $EMPTY, 'psql copy echo');
+psql('', 0, "\\copy public.regress_psql_tap_1_t1(data) TO PROGRAM 'cat'\n",
+ [ qr/COPY 1\b/ ], $EMPTY, 'psql copy cat'); # :-)
+
+END_UNIX_ZONE:

We can remove # :-) if not required


Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Zedstore - compressed in-core columnar storage

2019-09-17 Thread Ashutosh Sharma
On Thu, Aug 29, 2019 at 5:39 PM Heikki Linnakangas  wrote:
>
> On 29/08/2019 14:30, Ashutosh Sharma wrote:
> >
> > On Wed, Aug 28, 2019 at 5:30 AM Alexandra Wang  > > wrote:
> >
> > You are correct that we currently go through each item in the leaf
> > page that
> > contains the given tid, specifically, the logic to retrieve all the
> > attribute
> > items inside a ZSAttStream is now moved to decode_attstream() in the
> > latest
> > code, and then in zsbt_attr_fetch() we again loop through each item we
> > previously retrieved from decode_attstream() and look for the given
> > tid.
> >
> >
> > Okay. Any idea why this new way of storing attribute data as streams
> > (lowerstream and upperstream) has been chosen just for the attributes
> > but not for tids. Are only attribute blocks compressed but not the tids
> > blocks?
>
> Right, only attribute blocks are currently compressed. Tid blocks need
> to be modified when there are UPDATEs or DELETE, so I think having to
> decompress and recompress them would be more costly. Also, there is no
> user data on the TID tree, and the Simple-8b encoded codewords used to
> represent the TIDs are already pretty compact. I'm not sure how much
> gain you would get from passing it through a general purpose compressor.
>
> I could be wrong though. We could certainly try it out, and see how it
> performs.
>
> > One
> > optimization we can to is to tell decode_attstream() to stop
> > decoding at the
> > tid we are interested in. We can also apply other tricks to speed up the
> > lookups in the page, for fixed length attribute, it is easy to do
> > binary search
> > instead of linear search, and for variable length attribute, we can
> > probably
> > try something that we didn't think of yet.
> >
> >
> > I think we can probably ask decode_attstream() to stop once it has found
> > the tid that we are searching for but then we only need to do that for
> > Index Scans.
>
> I've been thinking that we should add a few "bookmarks" on long streams,
> so that you could skip e.g. to the midpoint in a stream. It's a tradeoff
> though; when you add more information for random access, it makes the
> representation less compact.
>
> > Zedstore currently implement update as delete+insert, hence the old
> > tid is not
> > reused. We don't store the tuple in our UNDO log, and we only store the
> > transaction information in the UNDO log. Reusing the tid of the old
> > tuple means
> > putting the old tuple in the UNDO log, which we have not implemented
> > yet.
> >
> > OKay, so that means performing update on a non-key attribute would also
> > require changes in the index table. In short, HOT update is currently
> > not possible with zedstore table. Am I right?
>
> That's right. There's a lot of potential gain for doing HOT updates. For
> example, if you UPDATE one column on every row on a table, ideally you
> would only modify the attribute tree containing that column. But that
> hasn't been implemented.

Thanks Heikki for your reply. After quite some time today I got chance
to look back into the code. I could see that you have changed the
tuple insertion and update mechanism a bit. As per the latest changes
all the tuples being inserted/updated in a transaction are spooled
into a hash table and then flushed at the time of transaction commit
and probably due to this change, I could see that the server crashes
when trying to perform UPDATE operation on a zedstore table having 10
lacs record. See below example,

create table t1(a int, b int) using zedstore;
insert into t1 select i, i+10 from generate_series(1, 100) i;
postgres=# update t1 set b = 200;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Above update statement crashed due to some extensive memory leak.

Further, the UPDATE operation on zedstore table is very slow. I think
that's because in case of zedstore table we have to update all the
btree data structures even if one column is updated and that really
sucks. Please let me know if there is some other reason for it.

I also found some typos when going through the writeup in
zedstore_internal.h and thought of correcting those. Attached is the
patch with the changes.

Thanks,
-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/src/include/access/zedstore_internal.h b/src/include/access/zedstore_internal.h
index 5330c70..c2b726b 100644
--- a/src/include/access/zedstore_internal.h
+++ b/src/include/access/zedstore_internal.h
@@ -177,7 +177,7 @@ ZSBtreeInternalPageIsFull(Page page)
  * Attribute B-tree leaf page layout
  *
  * Leaf pages in the attribute trees don't follow the normal page layout
- * with line pointers and items. They use the standard page page header,
+ * 

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2019-09-17 Thread Surafel Temesgen
Hi Alexey
Here are a few comment
On Sat, Aug 31, 2019 at 11:54 PM  wrote:

> Hi hackers,
>
>
> Anyway, new patch, which adds SET TABLESPACE to REINDEX is attached and
> this functionality seems really useful, so I will be very appreciate if
> someone will take a look on it.
>

* There are NOWAIT option in alter index, is there a reason not to have
similar option here?
* SET TABLESPACE command is not documented
* There are multiple checking for whether the relation is temporary tables
of other sessions, one in check_relation_is_movable and other independently

*+ char *tablespacename;

calling it new_tablespacename will make it consistent with other places

*The patch did't applied cleanly http://cfbot.cputube.org/patch_24_2269.log

regards

Surafel


Re: pgbench - allow to create partitioned tables

2019-09-17 Thread Amit Kapila
On Sat, Sep 14, 2019 at 6:35 PM Fabien COELHO  wrote:
>  I'm ensuring that there is always a one line answer, whether it is
>  partitioned or not. Maybe the count(*) should be count(something in p) to
>  get 0 instead of 1 on non partitioned tables, though, but this is hidden
>  in the display anyway.
> >>>
> >>> Sure, but I feel the code will be simplified.  I see no reason for
> >>> using left join here.
> >>
> >> Without a left join, the query result is empty if there are no partitions,
> >> whereas there is one line with it. This fact simplifies managing the query
> >> result afterwards because we are always expecting 1 row in the "normal"
> >> case, whether partitioned or not.
> >
> > Why can't we change it as attached?
>
> I think that your version works, but I do not like much the condition for
> the normal case which is implicitely assumed. The solution I took has 3
> clear-cut cases: 1 error against a server without partition support,
> detect multiple pgbench_accounts table -- argh, and then the normal
> expected case, whether partitioned or not. Your solution has 4 cases
> because of the last implicit zero-row select that relies on default, which
> would need some explanations.
>

Why?  Here, we are fetching the partitioning information. If it
exists, then we remember that to display for later, otherwise, the
default should apply.

> > I find using left join to always get one row as an ugly way to
> > manipulate the results later.
>
> Hmmm. It is really a matter of taste. I do not share your distate for left
> join on principle.
>

Oh no, I am not generally against using left join, but here it appears
like using it without much need.  If nothing else, it will consume
more cycles to fetch one extra row when we can avoid it.

Irrespective of whether we use left join or not, I think the below
change from my patch is important.
- /* only print partitioning information if some partitioning was detected */
- if (partition_method != PART_NONE && partition_method != PART_UNKNOWN)
+ /* print partitioning information only if there exists any partition */
+ if (partitions > 0)

Basically, it would be good if we just rely on 'partitions' to decide
whether we have partitions or not.

> In the case at hand, I find that getting one row in all
> cases pretty elegant because there is just one code for handling them all.
>

Hmm, I would be fine if you can show some other place in code where
such a method is used or if someone else also shares your viewpoint.

>
> > What is the need of using regress_pgbench_tap_1_ts in this test?
>
> I wanted to check that tablespace options work appropriately with
> partition tables, as I changed the create table stuff significantly, and
> just using "pg_default" is kind of cheating.
>

I think your change will be tested if there is a '--tablespace'
option.  Even if you want to test win non-default tablespace, then
also, adding additional test would make more sense rather than
changing existing one which is testing a valid thing.  Also, there is
an existing way to create tablespace location in
"src/bin/pg_checksums/t/002_actions".  I think we can use the same.  I
don't find any problem with your way, but why having multiple ways of
doing same thing in code.  We need to test this on windows also once
as this involves some path creation which might vary, although I don't
think there should be any problem in that especially if we use
existing way.

> > I think we don't need to change existing tests unless required for the
> > new functionality.
>
> I do agree, but there was a motivation behind the addition.
>
> > *
> > - 'pgbench scale 1 initialization');
> > + 'pgbench scale 1 initialization with options');
> >
> > Similar to the above, it is not clear to me why we need to change this?
>
> Because I noticed that it had the same description as the previous one, so
> I made the test name distinct and more precise, while I was adding options
> on it.
>

Good observation, but better be done separately.  I think in general
the more unrelated changes are present in patch, the more time it
takes to review.

One more comment:
+typedef enum { PART_NONE, PART_RANGE, PART_HASH, PART_UNKNOWN }
+  partition_method_t;

See, if we can eliminate PART_UNKNOWN.  I don't see much use of same.
It is used at one place where we can set PART_NONE without much loss.
Having lesser invalid values makes code easier to follow.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Psql patch to show access methods info

2019-09-17 Thread vignesh C
On Sat, Sep 14, 2019 at 1:45 PM Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:
>
> On Sat, Sep 14, 2019 at 10:39 AM Alexander Korotkov
>  wrote:
> > On Sat, Sep 14, 2019 at 12:36 AM Alvaro Herrera
> >  wrote:
> > > On 2019-Aug-06, Alexander Korotkov wrote:
> > >
> > > > Revised patch is attached.  Changes to \dA+ command are reverted.
It
> > > > also contains some minor improvements.
> > > >
> > > > Second patch looks problematic for me, because it provides index
> > > > description alternative to \d+.  IMHO, if there is something really
> > > > useful to display about index, we should keep it in \d+.  So, I
> > > > propose to postpone this.
> > >
> > > Are you saying that we should mark this entire CF entry as Returned
with
> > > Feedback?  Or do you see a subset of your latest 0001 as a commitable
> > > patch?
> >
> > Still hope to commit 0001.  Please, don't mark RFC for now.
>
> Sorry, I meant don't mark it RWF for now :)
>
Few Comments:
+
+\dA+
+ List of access methods
+  Name  | Type  |   Handler|  Description

++---+--+
+ brin   | Index | brinhandler  | block range index (BRIN) access
method

We can add test for \dA+ brin btree

When we specify multiple arguments along with \dA+, like in case of:
\dA+ brin btree
We should display a message like \d+: extra argument "btree" ignored.

postgres=# \dA+ brin btree
   List of access methods
 Name | Type  |   Handler   |  Description
--+---+-+
 brin | Index | brinhandler | block range index (BRIN) access method
(1 row)

Like in case of \d+ we get the message:
postgres=# \d+ t1 t2
Table "public.t1"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target
| Description
+-+---+--+-+-+--+-
 c1 | integer |   |  | | plain   |
 |
Access method: heap

\d+: extra argument "t2" ignored

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Re: Implementing Incremental View Maintenance

2019-09-17 Thread Yugo Nagata
Hi Paul,

Thank you for your suggestion.

On Sun, 15 Sep 2019 11:52:22 -0600
Paul Draper  wrote:

> As I understand it, the current patch performs immediate IVM using AFTER
> STATEMENT trigger transition tables.
> 
> However, multiple tables can be modified *before* AFTER STATEMENT triggers
> are fired.
> 
> CREATE TABLE example1 (a int);
> CREATE TABLE example2 (a int);
> 
> CREATE INCREMENTAL MATERIALIZED VIEW mv AS
> SELECT example1.a, example2.a
> FROM example1 JOIN example2 ON a;
> 
> WITH
>   insert1 AS (INSERT INTO example1 VALUES (1)),
>   insert2 AS (INSERT INTO example2 VALUES (1))
> SELECT NULL;
> 
> Changes to example1 are visible in an AFTER STATEMENT trigger on example2,
> and vice versa. Would this not result in the (1, 1) tuple being
> "double-counted"?
> 
> IVM needs to either:
> 
> (1) Evaluate deltas "serially' (e.g. EACH ROW triggers)
> 
> (2) Have simultaneous access to multiple deltas:
> delta_mv = example1 x delta_example2 + example2 x delta_example1 -
> delta_example1 x delta_example2
> 
> This latter method is the "logged" approach that has been discussed for
> deferred evaluation.
> 
> tl;dr It seems that AFTER STATEMENT triggers required a deferred-like
> implementation anyway.

You are right,  the latest patch doesn't support the situation where
multiple tables are modified in a query. I noticed this when working
on self-join, which also virtually need to handle multiple table
modification.

I am now working on this issue and the next patch will enable to handle
this situation. I plan to submit the patch during this month. Roughly
speaking, in the new implementation, AFTER STATEMENT triggers are used to
collect  information of modified table and its changes (= transition tables), 
and then the only last trigger updates the view. This will avoid the
double-counting. I think this implementation also would be a base of
deferred approach implementation in future where "logs" are used instead
of transition tables.

Regards,
Yugo Nagata

-- 
Yugo Nagata 




Re: Compiler warnings with MinGW

2019-09-17 Thread Peter Eisentraut
On 2019-09-09 14:24, Magnus Hagander wrote:
> On Sat, Sep 7, 2019 at 4:58 AM Michael Paquier  > wrote:
> 
> On Sat, Sep 07, 2019 at 12:11:25AM +0200, Peter Eisentraut wrote:
> > I'm not sure exactly what the upstream of mingw is these days, but I
> > think the original issue that led to 811be893 has long been fixed [0],
> > and the other stuff in mingwcompat.c is also no longer relevant
> [1].  I
> > think mingwcompat.c could be removed altogether.  I'm not sure to what
> > extent we need to support 5+ year old mingw versions.
> 
> On HEAD I would not be against removing that as this leads to a
> cleanup of our code.  For MSVC, we only support VS 2013~ on HEAD, so
> saying that we don't support MinGW older than what was proposed 5
> years ago sounds sensible.
> 
> +1, definitely. 

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: block-level incremental backup

2019-09-17 Thread Amit Kapila
On Mon, Sep 16, 2019 at 7:00 PM Robert Haas  wrote:
>
> On Mon, Sep 16, 2019 at 4:31 AM Amit Kapila  wrote:
> > This seems to be a blocking problem for the LSN based design.
>
> Well, only the simplest version of it, I think.
>
> > Can we think of using creation time for file?  Basically, if the file
> > creation time is later than backup-labels "START TIME:", then include
> > that file entirely.  I think one big point against this is clock skew
> > like what if somebody tinkers with the clock.  And also, this can
> > cover cases like
> > what Jeevan has pointed but might not cover other cases which we found
> > problematic.
>
> Well that would mean, for example, that if you copied the data
> directory from one machine to another, the next "incremental" backup
> would turn into a full backup. That sucks. And in other situations,
> like resetting the clock, it could mean that you end up with a corrupt
> backup without any real ability for PostgreSQL to detect it. I'm not
> saying that it is impossible to create a practically useful system
> based on file time stamps, but I really don't like it.
>
> > I think the operations covered by WAL flag XLR_SPECIAL_REL_UPDATE will
> > have similar problems.
>
> I'm not sure quite what you mean by that.  Can you elaborate? It
> appears to me that the XLR_SPECIAL_REL_UPDATE operations are all
> things that create files, remove files, or truncate files, and the
> sketch in my previous email would handle the first two of those cases
> correctly.  See below for the third.
>
> > One related point is how do incremental backups handle the case where
> > vacuum truncates the relation partially?  Basically, with current
> > patch/design, it doesn't appear that such information can be passed
> > via incremental backup.  I am not sure if this is a problem, but it
> > would be good if we can somehow handle this.
>
> As to this, if you're taking a full backup of a particular file,
> there's no problem.  If you're taking a partial backup of a particular
> file, you need to include the current length of the file and the
> identity and contents of each modified block.  Then you're fine.
>

Right, this should address that point.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: block-level incremental backup

2019-09-17 Thread Amit Kapila
On Mon, Sep 16, 2019 at 11:09 PM Stephen Frost  wrote:
>
> Greetings,
>
> * Robert Haas (robertmh...@gmail.com) wrote:
> > On Mon, Sep 16, 2019 at 9:30 AM Robert Haas  wrote:
> > > > Isn't some operations where at the end we directly call heap_sync
> > > > without writing WAL will have a similar problem as well?
> > >
> > > Maybe.  Can you give an example?
> >
> > Looking through the code, I found two cases where we do this.  One is
> > a bulk insert operation with wal_level = minimal, and the other is
> > CLUSTER or VACUUM FULL with wal_level = minimal. In both of these
> > cases we are generating new blocks whose LSNs will be 0/0. So, I think
> > we need a rule that if the server is asked to back up all blocks in a
> > file with LSNs > some threshold LSN, it must also include any blocks
> > whose LSN is 0/0. Those blocks are either uninitialized or are
> > populated without WAL logging, so they always need to be copied.
>
> I'm not sure I see a way around it but this seems pretty unfortunate-
> every single incremental backup will have all of those included even
> though the full backup likely also does
>

Yeah, this is quite unfortunate.  One more thing to note is that the
same is true for other operation like 'create index' (ex. nbtree
bypasses buffer manager while creating the index, doesn't write wal
for wal_level=minimal and then syncs at the end).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: using explicit_bzero

2019-09-17 Thread Peter Eisentraut
On 2019-09-09 17:18, Andres Freund wrote:
> I think all this implementation actually guarantees is that bzero2 is
> read, but not that the copy is not elided. In practice that's *probably*
> good enough, but a compiler could just check whether bzero_p points to
> memset.

Are you saying that the replacement implementation we provide is not
good enough?  If so, I'm happy to look at alternatives.  But that's the
design from OpenSSH, so if that is wrong, then there are bigger
problems.  We could also take the OpenBSD implementation, but that has a
GCC-ish dependency, so we would probably want the OpenSSH implementation
as a fallback anyway.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Pulling up direct-correlated ANY_SUBLINK

2019-09-17 Thread Richard Guo
On Thu, Sep 12, 2019 at 11:35 PM Antonin Houska  wrote:

> Richard Guo  wrote:
>
> > On Wed, Sep 11, 2019 at 3:25 PM Antonin Houska 
> > wrote:
> >
> >
> > Nevertheless, I don't know how to overcome the problems that I
> > mentioned
> > upthread.
> >
> >
> > Do you mean the problem "the WHERE clause of the subquery didn't
> > participate in the SEMI JOIN evaluation"? Good news is it has been
> > fixed
> > by commit 043f6ff0 as I mentioned upthread.
>
> Do you say that my old patch (rebased) no longer breaks the regression
> tests?
>

I think so.


>
> (I noticed your other email in the thread which seems to indicate that
> you're
> no lo longer interested to work on the feature, but asking out of
> curiosity.)
>

Tom pointed out that even if we pull up the subquery with the help of
LATERAL, we cannot make sure we will end up with a better plan, since
LATERAL pretty much constrains things to use a nestloop. Hmm, I think
what he said makes sense.

Thanks
Richard


Re: Nondeterministic collations vs. text_pattern_ops

2019-09-17 Thread Peter Eisentraut
On 2019-09-17 01:13, Tom Lane wrote:
> Whilst poking at the leakproofness-of-texteq issue, I realized
> that there's an independent problem caused by the nondeterminism
> patch.  To wit, that the text_pattern_ops btree opclass uses
> texteq as its equality operator, even though that operator is
> no longer guaranteed to be bitwise equality.  That means that
> depending on which collation happens to get attached to the
> operator, equality might be inconsistent with the other members
> of the opclass, leading to who-knows-what bad results.

You can't create a text_pattern_ops index on a column with
nondeterministic collation:

create collation c1 (provider = icu, locale = 'und', deterministic = false);
create table t1 (a int, b text collate c1);
create index on t1 (b text_pattern_ops);
ERROR:  nondeterministic collations are not supported for operator class
"text_pattern_ops"

There is some discussion in internal_text_pattern_compare().

Are there other cases we need to consider?

I notice that there is a hash opclass text_pattern_ops, which I'd
actually never heard of until now, and I don't see documented.  What
would we need to do about that?

> The obvious fix for this is to invent separate new equality operators,
> but that's actually rather disastrous for performance, because
> text_pattern_ops indexes would no longer be able to use WHERE clauses
> using plain equality.  That also feeds into whether equality clauses
> deduced from equivalence classes will work for them (nope, not any
> more).  People using such indexes are just about certain to be
> bitterly unhappy.

Would it help if one created COLLATE "C" indexes instead of
text_pattern_ops?  What are the tradeoffs between the two approaches?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Feature request: binary NOTIFY

2019-09-17 Thread Pavel Stehule
Hi

út 17. 9. 2019 v 10:01 odesílatel Mitar  napsal:

> Hi!
>
> I might have missed prior discussions about this, but I wonder if it
> would be possible to support binary payloads for NOTIFY/LISTEN? Again
> and again I find it very limiting with just text (have to base64
> encode data, or convert it to JSON).
>

I think so is not any problem to pass binary data already. Text type "text"
and binary type "bytea" is internally very similar.

But the message doesn't any info about type, so it should be ensure so
clients understand to message and takes data in binary format.

You can overwrite pg_notify function for bytea format.

Is not possible to use NOTIFY with binary data, because this statement
doesn't allow parametrization

Pavel

>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
>
>
>


Feature request: binary NOTIFY

2019-09-17 Thread Mitar
Hi!

I might have missed prior discussions about this, but I wonder if it
would be possible to support binary payloads for NOTIFY/LISTEN? Again
and again I find it very limiting with just text (have to base64
encode data, or convert it to JSON).


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m




Re: pg_rewind docs correction

2019-09-17 Thread Michael Paquier
On Sun, Sep 15, 2019 at 10:36:04AM -0400, James Coleman wrote:
> On Sun, Sep 15, 2019 at 10:25 AM Michael Paquier  wrote:
>> +The rewind operation is not expected to result in a consistent data
>> +directory state either internally to the node or with respect to the 
>> rest
>> +of the cluster. Instead the resulting data directory will only be 
>> consistent
>> +after WAL replay has completed to at least the LSN at which changed 
>> blocks
>> +copied from the source were originally written on the source.
>>
>> That's not necessarily true.  pg_rewind enforces in the control file
>> of the target the minimum consistency LSN to be
>> pg_current_wal_insert_lsn() when using a live source or the last
>> checkpoint LSN for a stopped source, so while that sounds true from
>> the point of view of all the blocks copied, the control file may still
>> cause a complain that the target recovering has not reached its
>> consistent point even if all the blocks are already at a position
>> not-so-far from what has been registered in the control file.
> 
> I could just say "after WAL replay has completed to a consistent state"?

I still would not change this paragraph.  The first sentence means
that we have an equivalency, because that's the case if you think
about it as we make sure that the target is able to sync with the
source, and the target gets into a state where it as an on-disk state
equivalent to the target up to the minimum consistency point defined
in the control file once the tool has done its work (this last point
is too precise to be included in a global description to be honest).
And the second sentence makes clear what are the actual diffs are.

>> +   the point at which the WAL timelines of the source and target diverged 
>> plus
>> +   the current state on the source of any blocks changed on the target after
>> +   that divergence. While only changed blocks from existing relation files 
>> are
>>
>> And here we could mention that all the blocks copied from the source
>> are the ones which are found in the WAL records of the target until
>> the end of WAL of its timeline.  Still, that's basically what is
>> mentioned in the first part of "How It Works", which explains things
>> better.  I honestly don't really see that all this paragraph is an
>> improvement over the simplicity of the original when it comes to
>> understand the global idea of what pg_rewind does.
> 
> The problem with the original is that while simple, it's actually
> incorrect in that simplicity. Pg_rewind does *not* result in the data
> directory on the target matching the data directory on the source.

That's not what I get from the original docs, but I may be too much
used to it.

>> +  
>> +Because pg_rewind copies configuration files
>> +entirely from the source, correcting recovery configuration options 
>> before
>> +restarting the server is necessary if you intend to re-introduce the 
>> target
>> +as a replica of the source. If you restart the server after the rewind
>> +operation has finished but without configuring recovery, the target will
>> +again diverge from the primary.
>> +   
>>
>> No objections regarding that part.  Now it seems to me that we had
>> better apply that to the last part of "How it works" instead?  I kind
>> of agree that the last paragraph could provide more details regarding
>> the risks of overwriting the wanted configuration.  The existing docs
>> also mention that pg_rewind only creates a backup_label file to start
>> recovery, perhaps we could mention up to which point recovery happens
>> in this section?  There is a bit more here than just "apply the WAL".
> 
> I'll look to see if there's a better place to put this.

Thanks.  From what I can see, we could improve further the doc part
about how the tool works in details, especially regarding the
configuration files which may get overwritten and be more precise
about that.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Improve performance of NOTIFY over many databases (v2)

2019-09-17 Thread Martijn van Oosterhout
Hoi Tom,

On Mon, 16 Sep 2019 at 15:33, Tom Lane  wrote:
>
> Martijn van Oosterhout  writes:
> > I think I like the idea of having SignalBackend do the waking up a
> > slow backend but I'm not enthused by the "lets wake up (at once)
> > everyone that is behind". That's one of the issues I was explicitly
> > trying to solve. If there are any significant number of "slow"
> > backends then we get the "thundering herd" again.
>
> But do we care?  With asyncQueueAdvanceTail gone from the listeners,
> there's no longer an exclusive lock for them to contend on.  And,
> again, I failed to see any significant contention even in HEAD as it
> stands; so I'm unconvinced that you're solving a live problem.

You're right, they only acquire a shared lock which is much less of a
problem. And I forgot that we're still reducing the load from a few
hundred signals and exclusive locks per NOTIFY to perhaps a dozen
shared locks every thousand messages. You'd be hard pressed to
demonstrate there's a real problem here.

So I think your patch is fine as is.

Looking at the release cycle it looks like the earliest either of
these patches will appear in a release is PG13, right?

Thanks again.
-- 
Martijn van Oosterhout  http://svana.org/kleptog/




Re: Support for CALL statement in ecpg

2019-09-17 Thread Peter Eisentraut
I don't find this patch in any commit fest.  Seems like a good addition.
-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Leakproofness of texteq()/textne()

2019-09-17 Thread Peter Eisentraut
On 2019-09-16 06:24, Tom Lane wrote:
> So it seems that the consensus is that it's okay to mark these
> functions leakproof, because if any of the errors they throw
> are truly reachable for other than data-corruption reasons,
> we would wish to try to prevent such errors.  (Maybe through
> upstream validity checks?  Hard to say how we'd do it exactly,
> when we don't have an idea what the problem is.)

Yeah, it seems like as we expand our Unicode capabilities, we will see
more cases like "it could fail here in theory, but it shouldn't happen
for normal data", and the answer can't be to call all that untrusted or
leaky.  It's the job of the database software to sort that out.
Obviously, it will require careful evaluation in each case.

> My inclination is to do the proleakproof changes in HEAD, but
> not touch v12.  The inconsistency in leakproof markings in v12
> is annoying but it's not a regression or security hazard, so
> I'm thinking it's not worth a late catversion bump to fix.

Sounds good, unless we do another catversion bump.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add "password_protocol" connection parameter to libpq

2019-09-17 Thread Michael Paquier
On Sat, Sep 14, 2019 at 08:42:53AM -0700, Jeff Davis wrote:
> On Fri, 2019-09-06 at 16:05 +0900, Michael Paquier wrote:
>> Actually, it looks that the handling of channel_bound is incorrect.
>> If the server sends AUTH_REQ_SASL and libpq processes it, then the
>> flag gets already set.  Now let's imagine that the server is a rogue
>> one and sends AUTH_REQ_OK just after AUTH_REQ_SASL, then your check
>> would pass.  It seems to me that we should switch the flag once we
>> are
>> sure that the exchange is completed, aka with AUTH_REQ_SASL_FIN when
>> the final message is done within pg_SASL_continue().
> 
> Thank you! Fixed. I now track whether channel binding is selected, and
> also whether SASL actually finished successfully.

Ah, I see.  So you have added an extra flag "sasl_finished" to make
sure of that, and still kept around "channel_bound".  Hence the two
flags have to be set to make sure that the SASL exchanged has been
finished and that channel binding has been enabled.  "channel_bound"
is linked to the selected mechanism when the exchange begins, meaning
that it could be possible to do the same check with the selected
mechanism directly from fe_scram_state instead.  "sasl_finished" is
linked to the state where the SASL exchange is finished, so this
basically maps into checking after FE_SCRAM_FINISHED.  Instead of
those two flags, wouldn't it be cleaner to add an extra routine to
fe-auth-scram.c which does the same sanity checks, say
pg_fe_scram_check_state()?  This needs to be careful about three
things, taking in input an opaque pointer to fe_scram_state if channel
binding is required:
- The data is not NULL.
- The sasl mechanism selected is SCRAM-SHA-256-PLUS.
- The state is FE_SCRAM_FINISHED.

What do you think?  There is no need to save down the connection
parameter value into fe_scram_state.

>> +# SSL not in use; channel binding still can't work
>> +reset_pg_hba($node, 'scram-sha-256');
>> +$ENV{"PGCHANNELBINDING"} = 'require';
>> +test_login($node, 'saslpreptest4a_role', "a", 2);
>> Worth testing md5 here?
> 
> I added a new md5 test in the ssl test suite. Testing it in the non-SSL 
> path doesn't seem like it adds much.

Good idea.

+   if (conn->channel_binding[0] == 'r' && /* require */
+   strcmp(selected_mechanism, SCRAM_SHA_256_NAME) == 0)
+   {
+   printfPQExpBuffer(>errorMessage,
+ libpq_gettext("SASL authentication mechanism
SCRAM-SHA-256 selected but channel binding is required\n"));
+   goto error;
+   }
Nit here as there are only two mechanisms handled: I would rather
cause the error if the selected mechanism does not match
SCRAM-SHA-256-PLUS, instead of complaining if the selected mechanism
matches SCRAM-SHA-256.  Either way is actually fine :)

+printfPQExpBuffer(>errorMessage,
+  libpq_gettext("Channel binding required but not
offered by server\n"));
+   result = false;
Should that be "completed by server" instead?

+   if (areq == AUTH_REQ_SASL_FIN)
+   conn->sasl_finished = true;
This should have a comment about the why it is done if you go this way
with the two flags added to PGconn.

+   if (conn->channel_binding[0] == 'r' && /* require */
+   !conn->ssl_in_use)
+   {
+   printfPQExpBuffer(>errorMessage,
+ libpq_gettext("Channel binding required, but
SSL not in use\n"));
+   goto error;
+   }
This is not necessary?  If SSL is not in use but the server publishes
SCRAM-SHA-256-PLUS, libpq complains.  If the server sends only
SCRAM-SHA-256 but channel binding is required, we complain down on
"SASL authentication mechanism SCRAM-SHA selected but channel binding
is required".  Or you have in mind that this error message is better?

I think that pgindent would complain with the comment block in
check_expected_areq(). 
--
Michael


signature.asc
Description: PGP signature


  1   2   >