Re: pgbench - use pg logging capabilities

2020-01-10 Thread Michael Paquier
On Fri, Jan 10, 2020 at 05:39:40PM +0100, Fabien COELHO wrote:
> So IMHO the current situation is fine, but the __variable name. So ISTM that
> the attached is the simplest and more reasonable option to fix this.

I'd rather hear more from others at this point.  Peter's opinion, as
the main author behind logging.c/h, would be good to have here.
--
Michael


signature.asc
Description: PGP signature


Re: pg_basebackup fails on databases with high OIDs

2020-01-10 Thread Peter Eisentraut

On 2020-01-06 21:00, Magnus Hagander wrote:

+0.5 to avoid calling OidInputFunctionCall()


Or just directly using atol() instead of atoi()? Well maybe not
directly but in a small wrapper that verifies it's not bigger than an
unsigned?

Unlike in cases where we use oidin etc, we are dealing with data that
is "mostly trusted" here, aren't we? Meaning we could call atol() on
it, and throw an error if it overflows, and be done with it?
Subdirectories in the data directory aren't exactly "untrusted enduser
data"...


Yeah, it looks like we are using strtoul() without additional error 
checking in similar situations, so here is a patch doing it like that.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 8a4d22b95bb9f54e6834bc5285c4d84582e6f128 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sat, 11 Jan 2020 08:16:21 +0100
Subject: [PATCH v2] Fix base backup with database OIDs larger than INT32_MAX

The use of pg_atoi() for parsing a string into an Oid fails for values
larger than INT32_MAX, since OIDs are unsigned.  Instead, use plain
strtoul().  While this has less error checking, the content of the
data directory are expected to be trustworthy, so we don't need to go
out of our way to do full error checking.

Discussion: 
https://www.postgresql.org/message-id/flat/dea47fc8-6c89-a2b1-07e3-754ff1ab094b%402ndquadrant.com
---
 src/backend/replication/basebackup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/replication/basebackup.c 
b/src/backend/replication/basebackup.c
index a73893237a..ae535a0565 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1316,7 +1316,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, 
List *tablespaces,
 
if (!sizeonly)
sent = sendFile(pathbuf, pathbuf + basepathlen 
+ 1, &statbuf,
-   true, isDbDir ? 
pg_atoi(lastDir + 1, sizeof(Oid), 0) : InvalidOid);
+   true, isDbDir ? 
(Oid) strtoul(lastDir + 1, NULL, 10) : InvalidOid);
 
if (sent || sizeonly)
{
-- 
2.24.1



Re: remove some STATUS_* symbols

2020-01-10 Thread Peter Eisentraut

On 2020-01-10 06:23, Michael Paquier wrote:

On Thu, Jan 09, 2020 at 11:15:08AM +0100, Peter Eisentraut wrote:

You mean put he subsequent GrantLock() calls into LockCheckConflicts()? That
would technically save some duplicate code, but it seems weird, because
LockCheckConflicts() is notionally a read-only function that shouldn't
change state.


Nah.  I was thinking about the first part of this "if" clause
LockCheckConflicts is part of here:
if (lockMethodTable->conflictTab[lockmode] & lock->waitMask)
status = STATUS_FOUND;
else
status = LockCheckConflicts(lockMethodTable, lockmode,
lock, proclock);

But now that I look at it closely it messes up heavily with
ProcSleep() ;)


OK, pushed as it was then.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: ALTER TABLE support for dropping generation expression

2020-01-10 Thread Peter Eisentraut

On 2020-01-10 13:20, Sergei Kornilov wrote:

Thank you, but I am late: patch has another merge conflict.

Conflict seems trivial and patch looks fine for me.


Here is another patch version.  I have resolved the conflict and also 
added a check that you don't drop the generation expression from an 
inherited column.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From d59ecf25468bfd10cc7ddf941faf2fb1092cb01c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sat, 11 Jan 2020 07:33:49 +0100
Subject: [PATCH v3] ALTER TABLE ... ALTER COLUMN ... DROP EXPRESSION

Add an ALTER TABLE subcommand for dropping the generated property from
a column, per SQL standard.

Reviewed-by: Sergei Kornilov 
Discussion: 
https://www.postgresql.org/message-id/flat/2f7f1d9c-946e-0453-d841-4f38eb9d69b6%402ndquadrant.com
---
 doc/src/sgml/ref/alter_table.sgml   |  18 +++
 src/backend/catalog/sql_features.txt|   2 +-
 src/backend/commands/tablecmds.c| 161 +++-
 src/backend/parser/gram.y   |  20 ++-
 src/bin/psql/tab-complete.c |   2 +-
 src/include/nodes/parsenodes.h  |   1 +
 src/include/parser/kwlist.h |   1 +
 src/test/regress/expected/generated.out |  87 +
 src/test/regress/sql/generated.sql  |  38 ++
 9 files changed, 326 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml 
b/doc/src/sgml/ref/alter_table.sgml
index 8403c797e2..4bf449587c 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -46,6 +46,7 @@
 ALTER [ COLUMN ] column_name 
SET DEFAULT expression
 ALTER [ COLUMN ] column_name 
DROP DEFAULT
 ALTER [ COLUMN ] column_name 
{ SET | DROP } NOT NULL
+ALTER [ COLUMN ] column_name 
DROP EXPRESSION [ IF EXISTS ]
 ALTER [ COLUMN ] column_name 
ADD GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( 
sequence_options ) ]
 ALTER [ COLUMN ] column_name 
{ SET GENERATED { ALWAYS | BY DEFAULT } | SET 
sequence_option | RESTART [ [ WITH ] restart ] } [...]
 ALTER [ COLUMN ] column_name 
DROP IDENTITY [ IF EXISTS ]
@@ -241,6 +242,23 @@ Description
 

 
+   
+DROP EXPRESSION [ IF EXISTS ]
+
+ 
+  This form turns a stored generated column into a normal base column.
+  Existing data in the columns is retained, but future changes will no
+  longer apply the generation expression.
+ 
+
+ 
+  If DROP EXPRESSION IF EXISTS is specified and the
+  column is not a stored generated column, no error is thrown.  In this
+  case a notice is issued instead.
+ 
+
+   
+

 ADD GENERATED { ALWAYS | BY DEFAULT } AS 
IDENTITY
 SET GENERATED { ALWAYS | BY DEFAULT }
diff --git a/src/backend/catalog/sql_features.txt 
b/src/backend/catalog/sql_features.txt
index ab3e381cff..9f840ddfd2 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -252,7 +252,7 @@ F381Extended schema manipulation03  ALTER 
TABLE statement: DROP CONSTRAINT clau
 F382   Alter column data type  YES 
 F383   Set column not null clause  YES 
 F384   Drop identity property clause   YES 
-F385   Drop column generation expression clauseNO  
+F385   Drop column generation expression clauseYES 
 F386   Set identity column generation clause   YES 
 F391   Long identifiersYES 
 F392   Unicode escapes in identifiers  YES 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 421bc28727..2ec3fc5014 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -388,6 +388,8 @@ static ObjectAddress ATExecAddIdentity(Relation rel, const 
char *colName,
 static ObjectAddress ATExecSetIdentity(Relation rel, const char *colName,
   Node 
*def, LOCKMODE lockmode);
 static ObjectAddress ATExecDropIdentity(Relation rel, const char *colName, 
bool missing_ok, LOCKMODE lockmode);
+static void ATPrepDropExpression(Relation rel, AlterTableCmd *cmd, bool 
recursing);
+static ObjectAddress ATExecDropExpression(Relation rel, const char *colName, 
bool missing_ok, LOCKMODE lockmode);
 static ObjectAddress ATExecSetStatistics(Relation rel, const char *colName, 
int16 colNum,

 Node *newValue, LOCKMODE lockmode);
 static ObjectAddress ATExecSetOptions(Relation rel, const char *colName,
@@ -3672,6 +3674,7 @@ AlterTableGetLockLevel(List *cmds)
case AT_AddIdentity:
case AT_DropIdentity:
case AT_SetIdentity:
+   case AT_DropExpression:
  

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2020-01-10 Thread Tom Lane
Amit Kapila  writes:
> On Fri, Jan 10, 2020 at 9:31 AM Amit Kapila  wrote:
>> ... So, we have the below
>> options:
>> (a) remove this test entirely from all branches and once we found the
>> memory leak problem in back-branches, then consider adding it again
>> without max_files_per_process restriction.
>> (b) keep this test without max_files_per_process restriction till v11
>> and once the memory leak issue in v10 is found, we can back-patch to
>> v10 as well.

> I am planning to go with option (a) and attached are patches to revert
> the entire test on HEAD and back branches.  I am planning to commit
> these by Tuesday unless someone has a better idea.

Makes sense to me.  We've certainly found out something interesting
from this test, but not what it was expecting to find ;-).  I think
that there could be scope for two sorts of successor tests:

* I still like my idea of directly constraining max_safe_fds through
some sort of debug option.  But to my mind, we want to run the entire
regression suite with that restriction, not just one small test.

* The seeming bug in v10 suggests that we aren't testing large enough
logical-decoding cases, or at least aren't noticing leaks in that
area.  I'm not sure what a good design is for testing that.  I'm not
thrilled with just using a larger (and slower) test case, but it's
not clear to me how else to attack it.

regards, tom lane




Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2020-01-10 Thread Amit Kapila
On Fri, Jan 10, 2020 at 9:31 AM Amit Kapila  wrote:
>
> On Fri, Jan 10, 2020 at 6:10 AM Tom Lane  wrote:
> >
> > I wrote:
> > >   ReorderBuffer: 223302560 total in 26995 blocks; 7056 free (3 
> > > chunks); 223295504 used
> >
> > > The test case is only inserting 50K fairly-short rows, so this seems
> > > like an unreasonable amount of memory to be consuming for that; and
> > > even if you think it's reasonable, it clearly isn't going to scale
> > > to large production transactions.
> >
> > > Now, the good news is that v11 and later get through
> > > 006_logical_decoding.pl just fine under the same restriction.
> > > So we did something in v11 to fix this excessive memory consumption.
> > > However, unless we're willing to back-port whatever that was, this
> > > test case is clearly consuming excessive resources for the v10 branch.
> >
> > I dug around a little in the git history for backend/replication/logical/,
> > and while I find several commit messages mentioning memory leaks and
> > faulty spill logic, they all claim to have been back-patched as far
> > as 9.4.
> >
> > It seems reasonably likely to me that this result is telling us about
> > an actual bug, ie, faulty back-patching of one or more of those fixes
> > into v10 and perhaps earlier branches.
> >
>
> I think it would be good to narrow down this problem, but it seems we
> can do this separately.   I think to avoid forgetting about this, can
> we track it somewhere as an open issue (In Older Bugs section of
> PostgreSQL 12 Open Items or some other place)?
>
> It seems to me that this test has found a problem in back-branches, so
> we might want to keep it after removing the max_files_per_process
> restriction.  However, unless we narrow down this memory leak it is
> not a good idea to keep it at least not in v10.  So, we have the below
> options:
> (a) remove this test entirely from all branches and once we found the
> memory leak problem in back-branches, then consider adding it again
> without max_files_per_process restriction.
> (b) keep this test without max_files_per_process restriction till v11
> and once the memory leak issue in v10 is found, we can back-patch to
> v10 as well.
>

I am planning to go with option (a) and attached are patches to revert
the entire test on HEAD and back branches.  I am planning to commit
these by Tuesday unless someone has a better idea.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


HEAD-0001-Revert-test-added-by-commit-d207038053.patch
Description: Binary data


v12-0001-Revert-test-added-by-commit-d207038053.patch
Description: Binary data


Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)

2020-01-10 Thread Tom Lane
Here's a draft patch that cleans up all the logic errors I could find.

I also expanded the previous patch for the kerberos test so that it
verifies that we can upload a nontrivial amount of data to the server,
as well as download.

I also spent a fair amount of effort on removing cosmetic differences
between the comparable logic in be-secure-gssapi.c and fe-secure-gssapi.c,
such that the two files can now be diff'd to confirm that be_gssapi_write
and be_gssapi_read implement identical logic to pg_GSS_write and
pg_GSS_read.  (They did not, before :-(.)

This does not deal with the problem that libpq shouldn't be using
static data space for this purpose.  It seems reasonable to me to
leave that for a separate patch.

This passes tests for me, on my FreeBSD build with lo0 mtu = 1500.
It wouldn't hurt to get some more mileage on it though.  Peter,
I didn't follow how to set up the "packet radio speed" environment
that you mentioned, but if you could beat on this with that setup
it would surely be useful.

regards, tom lane

diff --git a/src/backend/libpq/be-secure-gssapi.c b/src/backend/libpq/be-secure-gssapi.c
index b02d3dd..9a598e9 100644
--- a/src/backend/libpq/be-secure-gssapi.c
+++ b/src/backend/libpq/be-secure-gssapi.c
@@ -5,7 +5,6 @@
  *
  * Portions Copyright (c) 2018-2020, PostgreSQL Global Development Group
  *
- *
  * IDENTIFICATION
  *  src/backend/libpq/be-secure-gssapi.c
  *
@@ -28,10 +27,10 @@
  * Handle the encryption/decryption of data using GSSAPI.
  *
  * In the encrypted data stream on the wire, we break up the data
- * into packets where each packet starts with a sizeof(uint32)-byte
- * length (not allowed to be larger than the buffer sizes defined
- * below) and then the encrypted data of that length immediately
- * following.
+ * into packets where each packet starts with a uint32-size length
+ * word (in network byte order), then encrypted data of that length
+ * immediately following.  Decryption yields the same data stream
+ * that would appear when not using encryption.
  *
  * Encrypted data typically ends up being larger than the same data
  * unencrypted, so we use fixed-size buffers for handling the
@@ -42,8 +41,11 @@
  *
  * NOTE: The client and server have to agree on the max packet size,
  * because we have to pass an entire packet to GSSAPI at a time and we
- * don't want the other side to send arbitrairly huge packets as we
+ * don't want the other side to send arbitrarily huge packets as we
  * would have to allocate memory for them to then pass them to GSSAPI.
+ *
+ * Therefore, these two #define's are effectively part of the protocol
+ * spec and can't ever be changed.
  */
 #define PQ_GSS_SEND_BUFFER_SIZE 16384
 #define PQ_GSS_RECV_BUFFER_SIZE 16384
@@ -69,23 +71,31 @@ uint32		max_packet_size;	/* Maximum size we can encrypt and fit the
  * results into our output buffer */
 
 /*
- * Attempt to write len bytes of data from ptr along a GSSAPI-encrypted connection.
+ * Attempt to write len bytes of data from ptr to a GSSAPI-encrypted connection.
  *
- * Connection must be fully established (including authentication step) before
- * calling.  Returns the bytes actually consumed once complete.  Data is
- * internally buffered; in the case of an incomplete write, the amount of data we
- * processed (encrypted into our output buffer to be sent) will be returned.  If
- * an error occurs or we would block, a negative value is returned and errno is
- * set appropriately.
+ * The connection must be already set up for GSSAPI encryption (i.e., GSSAPI
+ * transport negotiation is complete).
  *
- * To continue writing in the case of EWOULDBLOCK and similar, call this function
- * again with matching ptr and len parameters.
+ * Returns the number of data bytes consumed, or on failure, returns -1
+ * with errno set appropriately.  (For fatal errors, we may just elog and
+ * exit, if errno wouldn't be sufficient to describe the error.)  For
+ * retryable errors, caller should call again (passing the same data)
+ * once the socket is ready.
+ *
+ * Data is internally buffered; in the case of an incomplete write, the
+ * return value is the amount of caller data we consumed, not the amount
+ * physically sent.
  */
 ssize_t
 be_gssapi_write(Port *port, void *ptr, size_t len)
 {
+	OM_uint32	major,
+minor;
+	gss_buffer_desc input,
+output;
 	size_t		bytes_to_encrypt = len;
 	size_t		bytes_encrypted = 0;
+	gss_ctx_id_t gctx = port->gss->ctx;
 
 	/*
 	 * Loop through encrypting data and sending it out until
@@ -95,13 +105,8 @@ be_gssapi_write(Port *port, void *ptr, size_t len)
 	 */
 	while (bytes_to_encrypt || PqGSSSendPointer)
 	{
-		OM_uint32	major,
-	minor;
-		gss_buffer_desc input,
-	output;
 		int			conf_state = 0;
 		uint32		netlen;
-		pg_gssinfo *gss = port->gss;
 
 		/*
 		 * Check if we have data in the encrypted output buffer that needs to
@@ -117,17 +122,14 @@ be_gssapi_write(Port *port, void *ptr, size_t len)
 			if (ret <= 0)
 			{

Re: [HACKERS] Block level parallel vacuum

2020-01-10 Thread Masahiko Sawada
On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor 
wrote:
>
> On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov  wrote:
> >
> > Hi
> > Thank you for update! I looked again
> >
> > (vacuum_indexes_leader)
> > +   /* Skip the indexes that can be processed by parallel
workers */
> > +   if (!skip_index)
> > +   continue;
> >
> > Does the variable name skip_index not confuse here? Maybe rename to
something like can_parallel?
>
> I also agree with your point.

I don't think the change is a good idea.

-   boolskip_index = (get_indstats(lps->lvshared,
i) == NULL ||
-
 skip_parallel_vacuum_index(Irel[i], lps->lvshared));
+   boolcan_parallel = (get_indstats(lps->lvshared,
i) == NULL ||
+
 skip_parallel_vacuum_index(Irel[i],
+
lps->lvshared));

The above condition is true when the index can *not* do parallel index
vacuum. How about changing it to skipped_index and change the comment to
something like “We are interested in only index skipped parallel vacuum”?

>
> >
> > Another question about behavior on temporary tables. Use case: the user
commands just "vacuum;" to vacuum entire database (and has enough
maintenance workers). Vacuum starts fine in parallel, but on first
temporary table we hit:
> >
> > +   if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
> > +   {
> > +   ereport(WARNING,
> > +   (errmsg("disabling parallel option of
vacuum on \"%s\" --- cannot vacuum temporary tables in parallel",
> > +
 RelationGetRelationName(onerel;
> > +   params->nworkers = -1;
> > +   }
> >
> > And therefore we turn off the parallel vacuum for the remaining
tables... Can we improve this case?
>
> Good point.
> Yes, we should improve this. I tried to fix this.

+1

Regards,


-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Setting min/max TLS protocol in clientside libpq

2020-01-10 Thread Michael Paquier
On Fri, Jan 10, 2020 at 12:01:36AM +0100, Daniel Gustafsson wrote:
> I looked into this and it turns out that OpenSSL does nothing to prevent the
> caller from setting a nonsensical protocol range like min=tlsv1.3,max=tlsv1.1.
> Thus, it's quite easy to screw up the backend server config and get it to 
> start
> properly, but with quite unrelated error messages as a result on connection.

FWIW, here is the error produced, and that's confusing:
$ psql -d "host=localhost sslmode=require"
psql: error: could not connect to server: SSL error: tlsv1 alert
internal error

> Since I think this needs to be dealt with for both backend and frontend (if
> this is accepted), I removed it from this patch to return to it in a separate
> thread.

HEAD and back branches only care about the backend, so I think that we
should address this part first as your patch would I guess reuse the
interface we finish by using for the backend.  Looking at OpenSSL, I
agree that there is no internal logic to perform sanity checks on the
min/max bounds.  Still I can see that OpenSSL 1.1.0 has added some
"get" routines for SSL_CTX_set_min/max_proto_version:
https://www.openssl.org/docs/man1.1.0/man3/SSL_CTX_set_min_proto_version.html

Heuh.  It would be perfect to rely only on OpenSSL for that part
to bring some sanity, and compare the results fetched from the SSL
context so as we don't have to worry about special cases in with the
GUC reload if the parameter is not set, or the parameter value is not
supported.  Now, OpenSSL <= 1.0.2 cannot do that, and you can get the
values set only after doing the set, so adding the compatibility
argument it is much more tempting to use our
ssl_protocol_version_to_openssl() wrapper and complain iff:
- both the min and max are supported values.
- min/max are incompatible.
And the check needs to be done before attempting to set the min/max
protos so as you don't finish with an incorrect intermediate state.
Daniel, are you planning to start a new thread?

> One thing I noticed when looking at it is that we now have sha2_openssl.c and
> openssl_protocol.c in src/common.  For easier visual grouping of OpenSSL
> functionality, it makes sense to me to rename sha2_openssl.c to 
> openssl_sha2.c,
> but that might just be pointless churn.

Databases like consistency, and so do I, so no issues from me to do a
rename of the sha2.c file.  That makes sense with the addition of the
new file.
--
Michael


signature.asc
Description: PGP signature


Re: Amcheck: do rightlink verification with lock coupling

2020-01-10 Thread Peter Geoghegan
On Fri, Jan 10, 2020 at 5:45 PM Tomas Vondra
 wrote:
> Peter, any opinion on this proposed amcheck patch? In the other thread
> [1] you seemed to agree this is worth checking, and Alvaro's proposal to
> make this check optional seems like a reasonable compromise with respect
> to the locking.

It's a good idea, and it probably doesn't even need to be made
optional -- lock coupling to the right is safe on a primary, and
should also be safe on standbys (though I should triple check the REDO
routines to be sure). The patch only does lock coupling when it proves
necessary, which ought to only happen when there is a concurrent page
split, which ought to be infrequent. Maybe there is no need to
compromise.

I'm curious why Andrey's corruption problems were not detected by the
cross-page amcheck test, though. We compare the first non-pivot tuple
on the right sibling leaf page with the last one on the target page,
towards the end of bt_target_page_check() -- isn't that almost as good
as what you have here in practice? I probably would have added
something like this myself earlier, if I had reason to think that
verification would be a lot more effective that way.

To be clear, I believe that Andrey wrote this patch for a reason -- I
assume that it makes a noticeable and consistent difference. I would
like to gain a better understanding of why that was for my own
benefit, though. For example, it might be that page deletion was a
factor that made the test I mentioned less effective. I care about the
specifics.

-- 
Peter Geoghegan




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-01-10 Thread Alvaro Herrera
On 2020-Jan-10, Alvaro Herrera wrote:

> From 7d671806584fff71067c8bde38b2f642ba1331a9 Mon Sep 17 00:00:00 2001
> From: Dilip Kumar 
> Date: Wed, 20 Nov 2019 16:41:13 +0530
> Subject: [PATCH v6 10/12] Enable streaming for all subscription TAP tests

This patch turns a lot of test into the streamed mode.  While it's
great that streaming mode is tested, we should add new tests for it
rather than failing to keep tests for the non-streamed mode.  I suggest
that we add two versions of each test, one for each mode.  Maybe the way
to do that is to create some subroutine that can be called twice.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Disallow cancellation of waiting for synchronous replication

2020-01-10 Thread Bruce Momjian
On Thu, Jan  2, 2020 at 10:26:16PM +0500, Andrey Borodin wrote:
> 
> 
> > 2 янв. 2020 г., в 19:13, Robert Haas  написал(а):
> > 
> > On Sun, Dec 29, 2019 at 4:13 AM Andrey Borodin  wrote:
> >> Not loosing data - is a nice property of the database either.
> > 
> > Sure, but there's more than one way to fix that problem, as I pointed
> > out in my first response.
> Sorry, it took some more reading iterations of your message for me to 
> understand the problem you are writing about.
> 
> You proposed two solutions:
> 1. Client analyze warning an understand that data is not actually committed. 
> This, as you pointed out, does not solve the problem: data is lost for 
> another client, who never saw the warning.
> Actually, "client" is a stateless number of connections unable to communicate 
> with each other by any means beside database. They cannot share information 
> about not committed transactions (they would need a database, thus chicken 
> and the egg problem).

Actually, it might be worse than that.  In my reading of
RecordTransactionCommit(), we do this:

write to WAL
flush WAL (durable)
make visible to other backends
replicate
communicate to the client

I think this means we make the transaction commit visible to all
backends _before_ we replicate it, and potentially wait until we get a
replication reply to return SUCCESS to the client.  This means other
clients are acting on data that is durable on the local machine, but not
on the replicated machine, even if synchronous_standby_names is set.

I feel this topic needs a lot more thought before we consider changing
anything.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: How to retain lesser paths at add_path()?

2020-01-10 Thread Tomas Vondra

Hi,

I wonder what is the status of this patch/thread. There was quite a bit
of discussion about possible approaches, but we currently don't have any
patch for review, AFAICS. Not sure what's the plan?

So "needs review" status seems wrong, and considering we haven't seen
any patch since August (so in the past two CFs) I propose marking it as
returned with feedback. Any objections?

FWIW I think we may well need a more elaborate logic which paths to
keep, but I'd prefer re-adding it back to the CF when we actually have a
new patch.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Add FOREIGN to ALTER TABLE in pg_dump

2020-01-10 Thread Tomas Vondra

Hi,

On Thu, Sep 26, 2019 at 01:47:28PM +, Luis Carril wrote:


I don't disagree with adding FOREIGN, though.

Your patch is failing the pg_dump TAP tests.  Please use
configure --enable-tap-tests, fix the problems, then resubmit.

Fixed, I've attached a new version.



This seems like a fairly small and non-controversial patch (I agree with
Alvaro that having the optional FOREIGN seems won't hurt). So barring
objections I'll polish it a bit and push sometime next week.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Amcheck: do rightlink verification with lock coupling

2020-01-10 Thread Tomas Vondra

On Thu, Sep 12, 2019 at 10:16:12AM -0300, Alvaro Herrera wrote:

On 2019-Sep-12, Andrey Borodin wrote:


This patch violates one of amcheck design principles: current code
does not ever take more than one page lock. I do not know: should we
hold this rule or should we use more deep check?


The check does seem worthwhile to me.

As far as I know, in btree you can lock the right sibling of a page
while keeping lock on the page itself, without risk of deadlock.  So I'm
not sure what's the issue with that.  This is in the README:

 In most cases we release our lock and pin on a page before attempting
 to acquire pin and lock on the page we are moving to.  In a few places
 it is necessary to lock the next page before releasing the current one.
 This is safe when moving right or up, but not when moving left or down
 (else we'd create the possibility of deadlocks).

I suppose Peter was concerned about being able to run amcheck without
causing any trouble at all for concurrent operation; maybe we can retain
that property by making this check optional.



Peter, any opinion on this proposed amcheck patch? In the other thread
[1] you seemed to agree this is worth checking, and Alvaro's proposal to
make this check optional seems like a reasonable compromise with respect
to the locking.

[1] 
https://www.postgresql.org/message-id/flat/da9b33ac-53cb-4643-96d4-7a0bbc037...@yandex-team.ru

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

2020-01-10 Thread Michael Paquier
On Fri, Jan 10, 2020 at 05:54:21PM +0530, Mahendra Singh Thalor wrote:
> Thanks for the patch. I am not getting any crash but \d is not showing
> any temp table if we drop temp schema and create again temp table.

That's expected.  As discussed on this thread, the schema has been
dropped by a superuser and there are cases where it is helpful to do
so, so the relation you have created after DROP SCHEMA relies on an
inconsistent session state.  If you actually try to use \d with a
relation name that matches the one you just created, psql would just
show nothing for the namespace name.
--
Michael


signature.asc
Description: PGP signature


Re: [Logical Replication] TRAP: FailedAssertion("rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT || rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL || rel->rd_rel->relreplident == REPLICA_IDE

2020-01-10 Thread Michael Paquier
On Fri, Jan 10, 2020 at 11:01:13AM +0530, Dilip Kumar wrote:
> On Fri, Jan 10, 2020 at 10:31 AM Michael Paquier  wrote:
>> and there is a gap with the regression
>> tests.  So combining all that I get the attached patch (origin point
>> is 665d1fa).  Thoughts?
> 
> LGTM

Thanks for the lookup.  I'll look at that again in a couple of days
and hopefully wrap it by then.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] WIP: Aggregation push-down

2020-01-10 Thread Tomas Vondra

Hi,

I've been looking at the last version (v14) of this patch series,
submitted way back in July and unfortunately quiet since then. Antonin
is probably right one of the reasons for the lack of reviews is that it
requires interest/experience with planner.

Initially it was also a bit hard to understand what are the benefits
(and the patch shifted a bit), which is now mostly addressed by the
README in the last patch. The trouble is that's hidden in the patch and
so not immediately obvious to people considering reviewing it :-( Tom
posted a nice summary in November 2018, but it was perhaps focused on
the internals.

So my first suggestion it to write a short summary with example how it
affects practical queries (plan change, speedup) etc.

My second suggestion is to have meaningful commit messages for each part
of the patch series, explaining why we need that particular part. It
might have been explained somewhere in the thread, but as a reviewer I
really don't want to reverse-engineer the whole thread.


Now, regarding individual parts of the patch:


1) v14-0001-Introduce-RelInfoList-structure.patch
-

- I'm not entirely sure why we need this change. We had the list+hash
  before, so I assume we do this because we need the output functions?

- The RelInfoList naming is pretty confusing, because it's actually not
  a list but a combination of list+hash. And the embedded list is called
  items, to make it yet a bit more confusing. I suggest we call this
  just RelInfo or RelInfoLookup or something else not including List.

- RelInfoEntry seems severely under-documented, particularly the data
  part (which is void* making it even harder to understand what's it for
  without having to read the functions). AFAICS it's just simply a link
  to the embedded list, but maybe I'm wrong.

- I suggest we move find_join_rel/add_rel_info/add_join_rel in relnode.c
  right before build_join_rel. This makes diff clearer/smaller and
  visual diffs nicer.


2) v14-0002-Introduce-make_join_rel_common-function.patch
-

I see no point in keeping this change in a separate patch, as it prety
much just renames make_join_rel to make_join_rel_common and then adds 


  make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
  {
  return make_join_rel_common(root, rel1, rel2);
  }

which seems rather trivial and useless on it's own. I'd just merge it
into 0003 where we use the _common function more extensively.


3) v14-0003-Aggregate-push-down-basic-functionality.patch
-

I haven't done much review on this yet, but I've done some testing and
benchmarking so let me share at least that. The queries I used were of
the type I mentioned earlier in this thread - essentially a star schema,
i.e. fact table referencing dimensions, with aggregation per columns in
the dimension. So something like

  SELECT d.c, sum(f) FROM fact JOIN dimension d ON (d.id = f.d_id)
  GROUP BY d.c;

where "fact" table is much much larger than the dimension.

Attached is a script I used for testing with a bunch of queries and a
number of parameters (parallelism, size of dimension, size of fact, ...)
and a spreadsheed summarizing the results.

Overall, the results seem pretty good - in many cases the queries get
about 2x faster and I haven't seen any regressions. That's pretty nice.

One annoying thing is that this only works for non-parallel queries.
That is, I saw no improvements with parallel queries. It was still
faster than the non-parallel query with aggregate pushdown, but the
parallel query did not improve.

An example of timing looks like this:

  non-parallel (pushdown = off): 918 ms
  non-parallel (pushdown = on):  561 ms
  parallel (pushdown = off): 313 ms
  parallel (pushdown = on):  313 ms

The non-parallel query gets faster because after enabling push-down the
plan changes (and we get partial aggregate below the join). With
parallel query that does not happen, the plans stay the same.

I'm not claiming this is a bug - we end up picking the fastest execution
plan (i.e. we don't make a mistake by e.g. switching to the non-parallel
one with pushdown).

My understanding is that the aggregate pushdown can't (currently?) be
used with queries that use partial aggregate because of parallelism.
That is we can either end up with a plan like this:

  Finalize Aggregate
-> Join
  -> Partial Aggregate
  -> ...

or a parallel plan like this:

  Finalize Aggregate
-> Gather
-> Partial Aggregate
-> Join
-> ...
-> ...

but we currently don't support a combination of both

  Finalize Aggregate
-> Gather
-> Join
-> Partial Aggregate
-> ...

I wonder if that's actually even possible and what would it take to make
it work?


regards

--
Tomas Vondra  http://www.2ndQuadran

Re: Make autovacuum sort tables in descending order of xid_age

2020-01-10 Thread David Kimura
On Thu, Jan 9, 2020 at 12:07 PM Robert Haas  wrote:
>
> On Thu, Dec 12, 2019 at 2:26 PM David Fetter  wrote:
> > > I wonder if you might add information about table size, table changes,
> > > and bloat to your RelFrozenXidAge struct and modify rfxa_comparator to
> > > use a heuristic to cost the (age, size, bloat, changed) grouping and
> > > sort on that cost, such that really large bloated tables with old xids
> > > might get vacuumed before smaller, less bloated tables that have
> > > even older xids. Sorting the tables based purely on xid_age seems to
> > > ignore other factors that are worth considering.  I do not have a
> > > formula for how those four factors should be weighted in the heuristic,
> > > but you are implicitly assigning three of them a weight of zero in
> > > your current patch.
> >
> > I think it's vastly premature to come up with complex sorting systems
> > right now.  Just sorting in descending order of age should either have
> > or not have positive effects.
>
> A lot of previous efforts to improve autovacuum scheduling have fallen
> down precisely because they did something that was so simple that it
> was doomed to regress as many cases as it improved, so I wouldn't be
> too quick to dismiss Mark's suggestion. In general, sorting by XID age
> seems like it should be better, but it's not hard to come up with a
> counterexample: suppose table T1 is going to wrap around in 4 hours
> and takes 4 hours to vacuum, but table T2 is going to wrap around in 2
> hours and takes 1 hour to vacuum.

Ah, so primary purpose of this patch is to add smarts when autovacuum
is triggered to handle wrap around?

> I've had the thought for a while now that perhaps we ought to try to
> estimate the rate of XID consumption, because without that it's really
> hard to make smart decisions.

Very interesting.

Could there be value in making this feature more preventative, perhaps
by triggering emergency autovacuum earlier based on some combination
of these heuristics rather than autovacuum_freeze_max_age alone?

Thanks,
David




Re: Implementing Incremental View Maintenance

2020-01-10 Thread nuko yokohama
LIMIT clause without ORDER BY should be prohibited when creating
incremental materialized views.

In SQL, the result of a LIMIT clause without ORDER BY is undefined.
If the LIMIT clause is allowed when creating an incremental materialized
view, incorrect results will be obtained when the view is updated after
updating the source table.

```
[ec2-user@ip-10-0-1-10 ivm]$ psql --version
psql (PostgreSQL) 13devel-ivm-3bf6953688153fa72dd48478a77e37cf3111a1ee
[ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f limit-problem.sql
DROP TABLE IF EXISTS test CASCADE;
psql:limit-problem.sql:1: NOTICE:  drop cascades to materialized view
test_imv
DROP TABLE
CREATE TABLE test (id int primary key, data text);
CREATE TABLE
INSERT INTO test VALUES (generate_series(1, 10), 'foo');
INSERT 0 10
CREATE INCREMENTAL MATERIALIZED VIEW test_imv AS SELECT * FROM test LIMIT 1;
SELECT 1
   Materialized view "public.test_imv"
Column |  Type   | Collation | Nullable | Default | Storage  |
Stats target | Description
---+-+---+--+-+--+--+-
 id| integer |   |  | | plain|
 |
 data  | text|   |  | | extended |
 |
 __ivm_count__ | bigint  |   |  | | plain|
 |
View definition:
 SELECT test.id,
test.data
   FROM test
 LIMIT 1;
Access method: heap
Incremental view maintenance: yes

SELECT * FROM test LIMIT 1;
 id | data
+--
  1 | foo
(1 row)

TABLE test_imv;
 id | data
+--
  1 | foo
(1 row)

UPDATE test SET data = 'bar' WHERE id = 1;
UPDATE 1
SELECT * FROM test LIMIT 1;
 id | data
+--
  2 | foo
(1 row)

TABLE test_imv;
 id | data
+--
  1 | bar
(1 row)

DELETE FROM test WHERE id = 1;
DELETE 1
SELECT * FROM test LIMIT 1;
 id | data
+--
  2 | foo
(1 row)

TABLE test_imv;
 id | data
+--
(0 rows)
```

ORDER BY clause is not allowed when executing CREATE INCREMENTAL
MATELIARIZED VIEW.
We propose not to allow LIMIT clauses as well.


2018年12月27日(木) 21:57 Yugo Nagata :

> Hi,
>
> I would like to implement Incremental View Maintenance (IVM) on
> PostgreSQL.
> IVM is a technique to maintain materialized views which computes and
> applies
> only the incremental changes to the materialized views rather than
> recomputate the contents as the current REFRESH command does.
>
> I had a presentation on our PoC implementation of IVM at PGConf.eu 2018
> [1].
> Our implementation uses row OIDs to compute deltas for materialized
> views.
> The basic idea is that if we have information about which rows in base
> tables
> are contributing to generate a certain row in a matview then we can
> identify
> the affected rows when a base table is updated. This is based on an idea of
> Dr. Masunaga [2] who is a member of our group and inspired from ID-based
> approach[3].
>
> In our implementation, the mapping of the row OIDs of the materialized view
> and the base tables are stored in "OID map". When a base relation is
> modified,
> AFTER trigger is executed and the delta is recorded in delta tables using
> the transition table feature. The accual udpate of the matview is triggerd
> by REFRESH command with INCREMENTALLY option.
>
> However, we realize problems of our implementation. First, WITH OIDS will
> be removed since PG12, so OIDs are no longer available. Besides this, it
> would
> be hard to implement this since it needs many changes of executor nodes to
> collect base tables's OIDs during execuing a query. Also, the cost of
> maintaining
> OID map would be high.
>
> For these reasons, we started to think to implement IVM without relying on
> OIDs
> and made a bit more surveys.
>
> We also looked at Kevin Grittner's discussion [4] on incremental matview
> maintenance.  In this discussion, Kevin proposed to use counting algorithm
> [5]
> to handle projection views (using DISTNICT) properly. This algorithm need
> an
> additional system column, count_t, in materialized views and delta tables
> of
> base tables.
>
> However, the discussion about IVM is now stoped, so we would like to
> restart and
> progress this.
>
>
> Through our PoC inplementation and surveys, I think we need to think at
> least
> the followings for implementing IVM.
>
> 1. How to extract changes on base tables
>
> I think there would be at least two approaches for it.
>
>  - Using transition table in AFTER triggers
>  - Extracting changes from WAL using logical decoding
>
> In our PoC implementation, we used AFTER trigger and transition tables,
> but using
> logical decoding might be better from the point of performance of base
> table
> modification.
>
> If we can represent a change of UPDATE on a base table as query-like
> rather than
> OLD and NEW, it may be possible to update the materialized view directly
> instead
> of performing delete & insert.
>
>
> 2. How to compute the delta to be applied to materialized vie

Re: Avoid full GIN index scan when possible

2020-01-10 Thread Alexander Korotkov
On Fri, Jan 10, 2020 at 7:36 PM Alexander Korotkov
 wrote:
>
> On Fri, Jan 10, 2020 at 6:31 PM Tom Lane  wrote:
>>
>> Alexander Korotkov  writes:
>> > So, I think v10 is a version of patch, which can be committed after
>> > some cleanup.  And we can try doing better nulls handling in a separate
>> > patch.
>>
>> The cfbot reports that this doesn't pass regression testing.
>> I haven't looked into why not.
>
>
> Thank you for noticing.  I'll take care of it.


In the v10 I've fixed a bug with nulls handling, but it appears that
test contained wrong expected result.  I've modified this test so that
it directly compares sequential scan results with bitmap indexscan
results.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-Avoid-GIN-full-scan-for-empty-ALL-keys-v11.patch
Description: Binary data


Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

2020-01-10 Thread Michael Paquier
On Fri, Jan 10, 2020 at 09:50:58AM -0500, Tom Lane wrote:
> This doesn't seem like a very good idea to me.  Is there any
> evidence that it's fixing an actual problem?  What if the table
> you're skipping is holding back datfrozenxid?

That's the point I wanted to make sure: we don't because autovacuum
has never actually been able to do that and because the cluster is
put in this state by a superuser after issuing DROP SCHEMA on its
temporary schema, which allows many fancy things based on the
inconsistent state the session is on.  Please see see for example
REL_10_STABLE where GetTempNamespaceBackendId() would return
InvalidBackendId when the namespace does not exist, so the drop is
skipped.  246a6c8 (designed to track if a backend slot is using a temp
namespace or not, allowing cleanup of orphaned tables if the namespace
is around, still not used yet by the session it is assigned to) has
changed the logic, accidentally actually, to also allow an orphaned
temp table to be dropped even if its namespace does not exist
anymore.

If we say that it's fine for autovacuum to allow the drop of such
inconsistent pg_class entries, then we would need to either remove or
relax the assertion in relcache.c:1123 (RelationBuildDesc, should only
autovacuum be allowed to do so?) to begin to allow autovacuum to
remove temp relations.  However, this does not sound like a correct
thing to do IMO.  So, note that if autovacuum is allowed to do so, you
basically defeat partially the purpose of the assertion added by
debcec7d in relcache.c.  Another thing noticeable is that If
autovacuum does the pg_class entry drops, the on-disk files for the
temp relations would remain until the cluster is restarted by the
way.
--
Michael


signature.asc
Description: PGP signature


Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-10 Thread Tom Lane
Stephen Frost  writes:
> To be clear, I was advocating for a NEW DB-level privilege ('INSTALL' or
> 'CREATE EXTENSION' if we could make that work), so that we have it be
> distinct from CREATE (which, today, really means 'CREATE SCHEMA').

I still say this is wrong, or at least pointless, because it'd be a
right that any DB owner could grant to himself.  If we're to have any
meaningful access control on extension installation, the privilege
would have to be attached to some other object ... and there's no clear
candidate for what.  As someone noted awhile back, if we could somehow
attach ACLs to potentially-installable extensions, that might be an
interesting avenue to pursue.  That's well beyond what I'm willing
to pursue for v13, though.

In the meantime, though, this idea as stated doesn't do anything except
let a DB owner grant install privileges to someone else.  I'm not even
convinced that we want that, or that anyone needs it (I can recall zero
such requests related to PLs in the past).  And for sure it does not
belong in a minimal implementation of this feature.

regards, tom lane




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-01-10 Thread Alvaro Herrera
Here's a rebase of this patch series.  I didn't change anything except

1. disregard what was 0005, since I already pushed it.
2. roll 0003 into 0002.
3. rebase 0007 (now 0005) to account for the reorderbuffer changes.

(I did notice that 0005 adds a new boolean any_data_sent, which is
silly -- it should be another txn_flags bit.)

However, tests don't pass for me; notably, test_decoding crashes.
OTOH I noticed that the streamed transaction support in test_decoding
writes the XID to the output, which is going to make it useless for
regression testing.  It probably should not emit the numerical values.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)

2020-01-10 Thread Peter
On Fri, Jan 10, 2020 at 12:59:09PM -0500, Tom Lane wrote:
! [ redirecting to -hackers ]

! I modified the kerberos test so that it tries a query with a less
! negligible amount of data, and what I find is:
! 
! * passes on Fedora 30, with either default or 1500 mtu
! * passes on FreeBSD 12.0 with default mtu
! * FAILS on FreeBSD 12.0 with mtu = 1500

So, it is not related to only VIMAGE @ R11.3, and -more important to
me- it is not only happening in my kitchen. Thank You very much :)

! OTOH, I also find that there's some hysteresis in the behavior:
! once it's failed, reverting the mtu to the default setting doesn't
! necessarily make subsequent runs pass.  It's really hard to explain
! that behavior if it's our bug.

That's affirmative. Made me go astray a few times when trying to
isolate it.

On Fri, Jan 10, 2020 at 02:25:22PM -0500, Tom Lane wrote:
! Ah, I have it: whoever wrote pg_GSS_read() failed to pay attention
! to the fact that setting errno is a critical part of its API.
! Sometimes it returns -1 while leaving errno in a state that causes

Wow, that's fast. My probability-guess this morning was either some
hard-coded 8192-byte buffer, or something taking an [EWOULDBLOCK] for
OK. Then I decided to not look into the code, as You will be much
faster anyway, and there are other pieces of software where I do
not have such a competent peer to talk to...

Anyway, thanks a lot!
PMc




Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)

2020-01-10 Thread Stephen Frost
Greetings,

On Fri, Jan 10, 2020 at 15:58 Tom Lane  wrote:

> Stephen Frost  writes:
> > Ah-hah.  Not sure if that was Robbie or myself (probably me, really,
> > since I rewrote a great deal of that code).  I agree that the regression
> > tests don't test with very much data, but I tested pushing quite a bit
> > of data through and didn't see any issues with my testing.  Apparently I
> > was getting pretty lucky. :/
>
> You were *very* lucky, because this code is absolutely full of mistakes
> related to incomplete reads, inadequate or outright wrong error handling,
> etc.


I guess so..  I specifically remember running into problems transferring
large data sets before I rewrote the code but after doing so it was
reliable (for me anyway...).

I was nearly done cleaning that up, when it sank into me that
> fe-secure-gssapi.c uses static buffers for partially-read or
> partially-encoded data.  That means that any client trying to use
> multiple GSSAPI-encrypted connections is very likely to see breakage
> due to different connections trying to use the same buffers concurrently.


Ughhh. That’s a completely valid point and one I should have thought of.

I wonder whether that doesn't explain the complaints mentioned upthread
> from the Ruby folks.


No- the segfault issue has been demonstrated to be able to reproduce
without any PG code involved at all, and it also involved threads with only
one connection, at least as I recall (on my phone atm).

(be-secure-gssapi.c is coded identically, but there it's OK since
> any backend only has one client connection to deal with.)


Right...  I actually wrote the backend code first and then largely copied
it to the front end, and then adjusted it, but obviously insufficiently as
I had been thinking of just the one connection that the backend has to deal
with.

Thanks!

Stephen


Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)

2020-01-10 Thread Tom Lane
Stephen Frost  writes:
> Ah-hah.  Not sure if that was Robbie or myself (probably me, really,
> since I rewrote a great deal of that code).  I agree that the regression
> tests don't test with very much data, but I tested pushing quite a bit
> of data through and didn't see any issues with my testing.  Apparently I
> was getting pretty lucky. :/

You were *very* lucky, because this code is absolutely full of mistakes
related to incomplete reads, inadequate or outright wrong error handling,
etc.

I was nearly done cleaning that up, when it sank into me that
fe-secure-gssapi.c uses static buffers for partially-read or
partially-encoded data.  That means that any client trying to use
multiple GSSAPI-encrypted connections is very likely to see breakage
due to different connections trying to use the same buffers concurrently.
I wonder whether that doesn't explain the complaints mentioned upthread
from the Ruby folks.

(be-secure-gssapi.c is coded identically, but there it's OK since
any backend only has one client connection to deal with.)

>> I'll have a patch in a little while.

> That's fantastic, thanks!

This is gonna take longer than I thought.

regards, tom lane




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-01-10 Thread Alvaro Herrera
I pushed 0005 (the rbtxn flags thing) after some light editing.
It's been around for long enough ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-10 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Fri, Jan 10, 2020 at 2:40 PM Tom Lane  wrote:
> > Well, the other direction we could go here, which I guess is what
> > you are arguing for, is to forget the new default role and just
> > say that marking an extension trusted allows it to be installed by
> > DB owners, full stop.  That's nice and simple and creates no
> > backwards-compatibility issues.  If we later decide that we want
> > a default role, or any other rules about who-can-install, we might
> > feel like this was a mistake --- but the backwards-compatibility issues
> > we'd incur by changing it later are exactly the same as what we'd have
> > today if we do something different from this.  The only difference
> > is that there'd be more extensions affected later (assuming we mark
> > more things trusted).
> 
> I agree with your analysis, but I'm still inclined to feel that the
> new pre-defined roll is a win.
> 
> Generally, decoupled permissions are better. Being able to grant
> someone either A or B or both or neither is usually superior to having
> to grant either both permissions or neither.

Right- I like the idea of decoupled permissions too.

To be clear, I was advocating for a NEW DB-level privilege ('INSTALL' or
'CREATE EXTENSION' if we could make that work), so that we have it be
distinct from CREATE (which, today, really means 'CREATE SCHEMA').

I'd be willing to accept making this part of DB-level 'CREATE' rights if
there is a huge amount of push-back about burning a privilege bit for
it, but, as discussed up-thread, I don't think we should really be
stressing ourselves about that.

I do like the idea of having it be decoupled from explicit DB ownership,
so that a DB owner (or superuser) could say "I want this role to be able
to install extensions, but NOT run ALTER DATABASE", and optionally even
include ADMIN so that it could be further delegated (and also because
then it'd be just like the rest of our GRANT privilege system, and I
like that..).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)

2020-01-10 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> I wrote:
> > I haven't run it further to ground than that, but there's definitely
> > something fishy here.  Based on just these results one would be hard
> > pressed to say if it's our bug or FreeBSD's, but your report that you
> > don't see the failure with PG 11 makes it sound like our problem.
> 
> Ah, I have it: whoever wrote pg_GSS_read() failed to pay attention
> to the fact that setting errno is a critical part of its API.
> Sometimes it returns -1 while leaving errno in a state that causes
> pqReadData to draw the wrong conclusions.  In particular that can
> happen when it reads an incomplete packet, and that's very timing
> dependent, which is why this is so ticklish to reproduce.

Ah-hah.  Not sure if that was Robbie or myself (probably me, really,
since I rewrote a great deal of that code).  I agree that the regression
tests don't test with very much data, but I tested pushing quite a bit
of data through and didn't see any issues with my testing.  Apparently I
was getting pretty lucky. :/

> I'll have a patch in a little while.

That's fantastic, thanks!

Stephen


signature.asc
Description: PGP signature


Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-10 Thread Robert Haas
On Fri, Jan 10, 2020 at 2:40 PM Tom Lane  wrote:
> Well, the other direction we could go here, which I guess is what
> you are arguing for, is to forget the new default role and just
> say that marking an extension trusted allows it to be installed by
> DB owners, full stop.  That's nice and simple and creates no
> backwards-compatibility issues.  If we later decide that we want
> a default role, or any other rules about who-can-install, we might
> feel like this was a mistake --- but the backwards-compatibility issues
> we'd incur by changing it later are exactly the same as what we'd have
> today if we do something different from this.  The only difference
> is that there'd be more extensions affected later (assuming we mark
> more things trusted).

I agree with your analysis, but I'm still inclined to feel that the
new pre-defined roll is a win.

Generally, decoupled permissions are better. Being able to grant
someone either A or B or both or neither is usually superior to having
to grant either both permissions or neither.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: our checks for read-only queries are not great

2020-01-10 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Fri, Jan 10, 2020 at 9:30 AM Tom Lane  wrote:
> > If somebody comes up with a patch
> > that causes "pg_dumpall -g" to include ALTER SYSTEM SET commands for
> > whatever is in postgresql.auto.conf (not an unreasonable idea BTW),
> > will you then decide that ALTER SYSTEM SET is no longer read-only?
> > Or, perhaps, reject such a patch on the grounds that it breaks this
> > arbitrary definition of read-only-ness?
> 
> I would vote to reject such a patch as a confused muddle. I mean,
> generally, the expectation right now is that if you move your data
> from the current cluster to a new one by pg_dump, pg_upgrade, or even
> by promoting a standby, you're responsible for making sure that
> postgresql.conf and postgresql.auto.conf get copied over separately.

I really don't like that the ALTER SYSTEM SET/postgresql.auto.conf stuff
has to be handled in some way external to logical export/import, or
external to pg_upgrade (particularly in link mode), so I generally see
where Tom's coming from with that suggestion.

In general, I don't think people should be expected to hand-muck around
with anything in the data directory.

> In the last case, the backup that created the standby will have copied
> the postgresql.conf from the master as it existed at that time, but
> propagating any subsequent changes is up to you. Now, if we now decide
> to shove ALTER SYSTEM SET commands into pg_dumpall output, then
> suddenly you're changing that rule, and it's not very clear what the
> new rule is.

I'd like a rule of "users don't muck with the data directory", and we
are nearly there when you include sensible packaging such as what Debian
provides- by moving postrgesql.conf, log files, etc, outside of the data
directory.  For things that can't be moved out of the data directory
though, like postgresql.auto.conf, we should be handling those
transparently to the user.

I agree that there are some interesting cases to consider here though-
like doing a pg_dumpall against a standby resulting in something
different than if you did it against the primary because the
postgresql.auto.conf is different between them (something that I'm
generally supportive of having, and it seems everyone else is too).  I
think having an option to control if the postgresql.auto.conf settings
are included or not in the pg_dumpall output would be a reasonable way
to deal with that though.

> Now, our current approach is fairly arguable. Given that GUCs on
> databases, users, functions, etc. are stored in the catalogs and
> subject to backup, restore, replication, etc., one might well expect
> that global settings would be handled the same way. I tend to think
> that would be nicer, though it would require solving the problem of
> how to back out bad changes that make the database not start up.
> Regardless of what you or anybody thinks about that, though, it's not
> how it works today and would require some serious engineering if we
> wanted to make it happen.

This sounds an awful lot like the arguments that I tried to make when
ALTER SYSTEM SET was first going in, but what's done is done and there's
not much to do but make the best of it as I can't imagine there'd be
much support for ripping it out.

I don't really agree about the need for 'some serious engineering'
either, but having an option for it, sure.

I do also tend to agree with Tom about making ALTER SYSTEM SET be
prohibited in explicitly read-only transactions, but still allowing it
to be run against replicas as that's a handy thing to be able to do.

> > As another example, do we need to consider that replacing pg_hba.conf
> > via pg_write_file should be allowed in a "read only" transaction?
> 
> I don't really see what the problem with that is. It bothers me a lot
> more that CLUSTER can be run in a read-only transaction -- which
> actually changes stuff inside the database, even if not necessarily in
> a user-visible way -- than it does that somebody might be able to use
> the database to change something that isn't really part of the
> database anyway. And pg_hba.conf, like postgresql.conf, is largely
> treated as an input to the database rather than part of it.

I don't like that CLUSTER can be run in a read-only transaction either
(though it seems like downthread maybe some people are fine with
that..).  I'm also coming around to the idea that pg_write_file()
probably shouldn't be allowed either, and probably not COPY TO either
(except to stdout, since that's a result, not a change operation).

> Somebody could create a user-defined function that launches a
> satellite into orbit and that is, I would argue, a write operation in
> the truest sense. You have changed the state of the world in a lasting
> way, and you cannot take it back. But, it's not writing to the
> database, so as far as read-only transactions are concerned, who
> cares?

I suppose there's another thing to think about in this discussion, which
are FDWs, if t

Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-10 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Don't see how this follows.  It's somewhat accidental I think that
>> the existing behavior is tied to DB ownership.  That's just because
>> at the time, that's the only sort of privilege we had that seemed
>> intermediate between superuser and Joe User.  If we were designing
>> the behavior today, with default roles already a done deal for
>> handing out possibly-dangerous privileges, I think there's no
>> question that we'd be setting up this privilege as a default role
>> rather than tying it to DB ownership.  We don't make DB ownership
>> a prerequisite to creating other sorts of functions, yet other
>> functions can be just as dangerous in some cases as C functions.

> I suppose I'll just have to say that I disagree.  I see a lot of value
> in having a level between superuser and Joe User, and DB owner looks
> pretty natural as exactly that, particularly for creating database-level
> objects like extensions.

Well, the other direction we could go here, which I guess is what
you are arguing for, is to forget the new default role and just
say that marking an extension trusted allows it to be installed by
DB owners, full stop.  That's nice and simple and creates no
backwards-compatibility issues.  If we later decide that we want
a default role, or any other rules about who-can-install, we might
feel like this was a mistake --- but the backwards-compatibility issues
we'd incur by changing it later are exactly the same as what we'd have
today if we do something different from this.  The only difference
is that there'd be more extensions affected later (assuming we mark
more things trusted).

I'm willing to go with this solution if it'll end the argument.
Robert, Peter, what do you think?

regards, tom lane




Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-10 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> > ... and that backs up my position that we are setting up this
> > privilege at the wrong level by using a default role which a superuser must
> > grant independently from DB ownership.
> 
> Don't see how this follows.  It's somewhat accidental I think that
> the existing behavior is tied to DB ownership.  That's just because
> at the time, that's the only sort of privilege we had that seemed
> intermediate between superuser and Joe User.  If we were designing
> the behavior today, with default roles already a done deal for
> handing out possibly-dangerous privileges, I think there's no
> question that we'd be setting up this privilege as a default role
> rather than tying it to DB ownership.  We don't make DB ownership
> a prerequisite to creating other sorts of functions, yet other
> functions can be just as dangerous in some cases as C functions.

I suppose I'll just have to say that I disagree.  I see a lot of value
in having a level between superuser and Joe User, and DB owner looks
pretty natural as exactly that, particularly for creating database-level
objects like extensions.

If anything, I tend to think we need more levels, not less- like a level
that's "cluster owner" or something along those lines, that's also
independent from "superuser" but would allow creating of cluster-level
objects like databases and roles (with the right to then GRANT the
ability to create those objects to other roles, if they wish).

I don't really see default roles as a better alternative to a privilege
hierarchy, but rather as a way for controlling access to things that
don't really fall into the hierarchy.  Maybe for cluster-level things
like what I hint at above they'd be better, but for database-level
objects, where you might decide you want to give a user access to create
something in database X but not in database Y?  Doesn't seem to fit very
well to me.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)

2020-01-10 Thread Tom Lane
I wrote:
> I haven't run it further to ground than that, but there's definitely
> something fishy here.  Based on just these results one would be hard
> pressed to say if it's our bug or FreeBSD's, but your report that you
> don't see the failure with PG 11 makes it sound like our problem.

Ah, I have it: whoever wrote pg_GSS_read() failed to pay attention
to the fact that setting errno is a critical part of its API.
Sometimes it returns -1 while leaving errno in a state that causes
pqReadData to draw the wrong conclusions.  In particular that can
happen when it reads an incomplete packet, and that's very timing
dependent, which is why this is so ticklish to reproduce.

I'll have a patch in a little while.

regards, tom lane




Re: our checks for read-only queries are not great

2020-01-10 Thread Robert Haas
On Fri, Jan 10, 2020 at 9:30 AM Tom Lane  wrote:
> If somebody comes up with a patch
> that causes "pg_dumpall -g" to include ALTER SYSTEM SET commands for
> whatever is in postgresql.auto.conf (not an unreasonable idea BTW),
> will you then decide that ALTER SYSTEM SET is no longer read-only?
> Or, perhaps, reject such a patch on the grounds that it breaks this
> arbitrary definition of read-only-ness?

I would vote to reject such a patch as a confused muddle. I mean,
generally, the expectation right now is that if you move your data
from the current cluster to a new one by pg_dump, pg_upgrade, or even
by promoting a standby, you're responsible for making sure that
postgresql.conf and postgresql.auto.conf get copied over separately.
In the last case, the backup that created the standby will have copied
the postgresql.conf from the master as it existed at that time, but
propagating any subsequent changes is up to you. Now, if we now decide
to shove ALTER SYSTEM SET commands into pg_dumpall output, then
suddenly you're changing that rule, and it's not very clear what the
new rule is.

Now, our current approach is fairly arguable. Given that GUCs on
databases, users, functions, etc. are stored in the catalogs and
subject to backup, restore, replication, etc., one might well expect
that global settings would be handled the same way. I tend to think
that would be nicer, though it would require solving the problem of
how to back out bad changes that make the database not start up.
Regardless of what you or anybody thinks about that, though, it's not
how it works today and would require some serious engineering if we
wanted to make it happen.

> As another example, do we need to consider that replacing pg_hba.conf
> via pg_write_file should be allowed in a "read only" transaction?

I don't really see what the problem with that is. It bothers me a lot
more that CLUSTER can be run in a read-only transaction -- which
actually changes stuff inside the database, even if not necessarily in
a user-visible way -- than it does that somebody might be able to use
the database to change something that isn't really part of the
database anyway. And pg_hba.conf, like postgresql.conf, is largely
treated as an input to the database rather than part of it.

Somebody could create a user-defined function that launches a
satellite into orbit and that is, I would argue, a write operation in
the truest sense. You have changed the state of the world in a lasting
way, and you cannot take it back. But, it's not writing to the
database, so as far as read-only transactions are concerned, who
cares?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-10 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > So I'm at a loss for why there is this insistence on a default role and
> > a superuser-explicit-granting based approach that goes beyond "is it
> > installed on the filesystem?" and "is it marked as trusted?".
> 
> Okay, so it seems like we're down to just this one point of contention.

I agree that this seems to be the crux of the contention- though I need
to revisit what I wrote above because I didn't cover everything
relevant.  I'm suggesting that having:

- Extension installed on the filesystem
- Extension is marked as trusted
- Calling user has been made a DB owner (which, from a bare initdb,
  requires a superuser to take action to make happen)

is what would be needed to install a trusted extension.

> You feel that the superuser can control what is in the extension library
> directory and that that ought to be sufficient control.  I disagree
> with that, for two reasons:
> 
> * ISTM that that's assuming that the DBA and the sysadmin are the same
> person (or at least hold identical views on this subject).  In many
> installations it'd only be root who has control over what's in that
> directory, and I don't think it's unreasonable for the DBA to wish
> to be able to exercise additional filtering.

I don't think we should start conflating roles by saying things like
"DBA" because it's not clear if that's "PG superuser" or "DB owner" or
"DB user".  In many, many, many installations the DBA is *not* the
superuser- in fact, pretty much every cloud-provider installation of PG
is that way.

Even so though, I don't agree with this particular rationale as, at
least largely in my experience, the sysadmin isn't going to go
installing things on their own- they're going to install what they've
been asked to, and it'll be a PG superuser or DB owner or DB user
doing the asking (and hopefully they'll consult with whomever is
appropriate before installing anything anyway).  The idea that
additional filtering on that is needed strikes me as highly unlikely.

Now, that said, I'm not strictly against the idea of allowing a
superuser, if they wish, to do additional filtering of what's allowed
(though I think there's a fair bit of complication in coming up with a
sensible way for them to do so, but that's an independent discussion).
I still don't think that the privilege to install a trusted extension
should be done through a default role though, or that it needs to be
independent from the DB owner role.

> * The point of a default role would be for the DBA to be able to
> control which database users can install extensions.  Even if the
> DBA has full authority over the extension library, that would not
> provide control over who can install, only over what is available
> for any of them to install.

In my approach, a superuser absolutely still has control over which
database users can install extensions, by virtue of being in control
over which users are DB owners, and further by being able to GRANT out
the right to install extensions, in specific databases, to specific
users.  If we want to have a mechanism where superusers can further
whitelist or blacklist extensions across the cluster, that's fine, but,
again, that's largely orthogonal to what I'm talking about.

Also, consider this- with the default role approach, is a user granted
that role allowed to create extensions in *every* database they can
connect to, or do they need some additional privilege, like CREATE
rights on the database, which is then under the purview of the database
owner?  What if the superuser wishes to allow a given role the ability
to install extensions only in a specific database?  That strikes me as
highly likely use-case ("you can install extensions in this other
database, but not in the 'postgres' database that I use for my
monitoring and other stuff") that isn't addressed at all with this
default role approach (looking at the patch, it seems to switch to the
superuser role to actually create objects, of course, so the caller
doesn't need create rights in the database), but is with mine- and
done so in a natural, intuitive, way that works just like the rest of
our privilege system.

I've always viewed the privilege system in PG to be a hierarchchy of
privileges-

superuser   - ultimate owner, able to do everything
DB owner- controls create/use/modify for objects in the database,
  and altering of the database itself
schema owner- controls create/use/modify for objects in a schema, and
  altering of the schema itself
table owner - controls access/use/modify for the table, and altering of
  table itself

Now we're moving outside of that to start using default roles to control
who can create objects in a database, and that's what I don't agree
with.  If the superuser doesn't feel that a particular role should be
able to create extensions in a given database, there is a simple and
well understood solution- don't have tha

Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)

2020-01-10 Thread Tom Lane
[ redirecting to -hackers ]

Peter  writes:
> But I just recognize something of interest (which I had taken for
> granted when importing the database): the flaw does NOT appear when
> accessing the database from the server's local system (with TCP and
> GSSAPI encryption active). Only from remote system.
> But then, if I go on the local system, and change the mtu:
> # ifconfig lo0 mtu 1500
> and restart the server, then I get the exact same errors locally.

Oh-ho, that is interesting.

Looking at our regression tests for gssenc, I observe that they
only try to transport a negligible amount of data (viz, single-row
boolean results).  So we'd not notice any problem that involved
multiple-packet messages.

I modified the kerberos test so that it tries a query with a less
negligible amount of data, and what I find is:

* passes on Fedora 30, with either default or 1500 mtu
* passes on FreeBSD 12.0 with default mtu
* FAILS on FreeBSD 12.0 with mtu = 1500

I haven't run it further to ground than that, but there's definitely
something fishy here.  Based on just these results one would be hard
pressed to say if it's our bug or FreeBSD's, but your report that you
don't see the failure with PG 11 makes it sound like our problem.

OTOH, I also find that there's some hysteresis in the behavior:
once it's failed, reverting the mtu to the default setting doesn't
necessarily make subsequent runs pass.  It's really hard to explain
that behavior if it's our bug.

I tested today's HEAD of our code with up-to-date FreeBSD 12.0-RELEASE-p12
running on amd64 bare metal, no jails or emulators or VIMAGE or anything.

Attached are proposed test patch, as well as client-side regression log
output from a failure.  (There's no evidence of distress in the
postmaster log, same as your report.)

regards, tom lane

diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl
index e3eb052..0fc18e3 100644
--- a/src/test/kerberos/t/001_auth.pl
+++ b/src/test/kerberos/t/001_auth.pl
@@ -19,7 +19,7 @@ use Test::More;
 
 if ($ENV{with_gssapi} eq 'yes')
 {
-	plan tests => 12;
+	plan tests => 15;
 }
 else
 {
@@ -195,6 +195,29 @@ sub test_access
 	return;
 }
 
+# As above, but test for an arbitrary query result.
+sub test_query
+{
+	my ($node, $role, $query, $expected, $gssencmode, $test_name) = @_;
+
+	# need to connect over TCP/IP for Kerberos
+	my ($res, $stdoutres, $stderrres) = $node->psql(
+		'postgres',
+		"$query",
+		extra_params => [
+			'-XAtd',
+			$node->connstr('postgres')
+			  . " host=$host hostaddr=$hostaddr $gssencmode",
+			'-U',
+			$role
+		]);
+
+	is($res, 0, $test_name);
+	like($stdoutres, $expected, $test_name);
+	is($stderrres, "", $test_name);
+	return;
+}
+
 unlink($node->data_dir . '/pg_hba.conf');
 $node->append_conf('pg_hba.conf',
 	qq{host all all $hostaddr/32 gss map=mymap});
@@ -231,6 +254,15 @@ test_access(
 	"gssencmode=require",
 	"succeeds with GSS-encrypted access required with host hba");
 
+# Test that we can transport a reasonable amount of data.
+test_query(
+	$node,
+	"test1",
+	'SELECT * FROM generate_series(1, 10);',
+	qr/^1\n.*\n1024\n.*\n\n.*\n10$/s,
+	"gssencmode=require",
+	"transporting 100K lines works");
+
 unlink($node->data_dir . '/pg_hba.conf');
 $node->append_conf('pg_hba.conf',
 	qq{hostgssenc all all $hostaddr/32 gss map=mymap});
1..15
# Checking port 56615
# Found port 56615
# setting up Kerberos
# Running: /usr/local/bin/krb5-config --version
# Running: /usr/local/sbin/kdb5_util create -s -P secret0
Loading random data
Initializing database 
'/usr/home/tgl/pgsql/src/test/kerberos/tmp_check/krb5kdc/principal' for realm 
'EXAMPLE.COM',
master key name 'K/m...@example.com'
# Running: /usr/local/sbin/kadmin.local -q addprinc -pw secret1 test1
WARNING: no policy specified for te...@example.com; defaulting to no policy
Authenticating as principal test1/ad...@example.com with password.
Principal "te...@example.com" created.
# Running: /usr/local/sbin/kadmin.local -q addprinc -randkey 
postgres/auth-test-localhost.postgresql.example.com
WARNING: no policy specified for 
postgres/auth-test-localhost.postgresql.example@example.com; defaulting to 
no policy
Authenticating as principal test1/ad...@example.com with password.
Principal "postgres/auth-test-localhost.postgresql.example@example.com" 
created.
# Running: /usr/local/sbin/kadmin.local -q ktadd -k 
/usr/home/tgl/pgsql/src/test/kerberos/tmp_check/krb5.keytab 
postgres/auth-test-localhost.postgresql.example.com
Authenticating as principal test1/ad...@example.com with password.
Entry for principal postgres/auth-test-localhost.postgresql.example.com with 
kvno 2, encryption type aes256-cts-hmac-sha1-96 added to keytab 
WRFILE:/usr/home/tgl/pgsql/src/test/kerberos/tmp_check/krb5.keytab.
Entry for principal postgres/auth-test-localhost.postgresql.example.com with 
kvno 2, encryption type aes128-cts-hmac-sha1-96 added to keytab 
WRFILE:/usr/home/tgl/pgsql/src/test/kerberos/tmp_check/krb

Re: Add pg_file_sync() to adminpack

2020-01-10 Thread Fujii Masao
On Fri, Jan 10, 2020 at 8:16 PM Michael Paquier  wrote:
>
> On Fri, Jan 10, 2020 at 06:50:12PM +0900, Fujii Masao wrote:
> > I changed the doc that way. Thanks for the review!

Thanks for the review!

> + 
> +  pg_file_sync fsyncs the specified file or directory
> +  named by filename.  Returns true on success,
> +  an error is thrown otherwise (e.g., the specified file is not present).
> + 
> What's the point of having a function that returns a boolean if it
> just returns true all the time?  Wouldn't it be better to have a set
> of semantics closer to the unlink() part, where the call of stat()
> fails with an ERROR for (errno != ENOENT) and the fsync call returns
> false with a WARNING?

I'm not sure if returning false with WARNING only in some error cases
is really good idea or not. At least for me, it's more intuitive to
return true on success and emit an ERROR otherwise. I'd like to hear
more opinions about this. Also if returning true on success is rather
confusing, we can change its return type to void.

> +SELECT pg_file_sync('global'); -- sync directory
> + pg_file_sync
> +--
> + t
> +(1 row)
> installcheck deployments may not like that.

Could you elaborate why? But if it's not good to sync the existing directory
in the regression test, we may need to give up testing the sync of directory.
Another idea is to add another function like pg_mkdir() into adminpack
and use the directory that we newly created by using that function,
for the test. Or better idea?

Regards,

-- 
Fujii Masao




Re: pgbench - use pg logging capabilities

2020-01-10 Thread Fabien COELHO


Michaël,


So I think that the current situation is a good thing at least for debug.


If you look at some of my messages on other threads, you would likely
notice that my mood of the day is to not design things which try to
outsmart a user's expectations :)


I'm not following you.

ISTM that user expectations is that the message is printed when the level 
requires it, and that the performance impact is minimal otherwise.


I'm not aiming at anything different.


So I would stand on the position to just remove those likely/unlikely
parts if we want this logging to be generic.


It is unclear to me whether your point is about the whole "if", or only 
the compiler directive itself (i.e. "likely" and "unlikely").


I'll assume the former. I do not think we should "want" logging to be 
generic per se, but only if it makes sense from a performance and feature 
point of view.


For the normal case (standard level, no debug), there is basically no 
difference because the message is going to be printed anyway: either it is 
check+call+work, or call+check+work. Anything is fine. The directive would 
help the compiler reorder instructions so that usual case does not inccur 
a jump.


For debug messages, things are different: removing the external test & 
unlikely would have a detrimental effect on performance when not 
debugging, which is most of the time, because you would pay the cost of 
evaluating arguments and call/return cycle on each message anyway. That 
can be bad if a debug message is place in some critical place.


So the right place of the the debug check is early. Once this is done, 
then why not doing that for all other level for consistency? This is the 
current situation.


If the check is moved inside the call, then there is a performance benefit 
to repeat it for debug, which is a pain because then it would be there 
twice in that case, and it creates an exception. The fact that some macro 
are simplified is not very useful because this is not really user visible.


So IMHO the current situation is fine, but the __variable name. So ISTM 
that the attached is the simplest and more reasonable option to fix this.



For other levels, they are on by default AND would not be placed at critical
performance points, so the whole effort of avoiding the call are moot.

I agree with Tom that __pg_log_level variable name violates usages.


My own taste would be to still keep the variable local to logging.c,
and use a "get"-like routine to be consistent with the "set" part.  I
don't have to be right, let's see where this discussion leads us.


This would defeat the point of avoiding a function call, if a function 
call is needed to check whether the other function call is needed:-)


Hence the macro.


(I mentioned that upthread, but I don't think it is a good idea to
discuss about a redesign of those routines on a thread about pgbench
based on $subject.  All the main players are here so it likely does
not matter, but..)


Yep. I hesitated to be the one to do it, and ISTM that the problem is 
small enough so that it can be resolved without a new thread. I may be 
naïvely wrong:-)


--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 39c1a243d5..e7b33bb8e0 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3295,7 +3295,7 @@ executeMetaCommand(CState *st, instr_time *now)
 	argc = command->argc;
 	argv = command->argv;
 
-	if (unlikely(__pg_log_level <= PG_LOG_DEBUG))
+	if (unlikely(pg_log_current_level <= PG_LOG_DEBUG))
 	{
 		PQExpBufferData	buf;
 
diff --git a/src/common/logging.c b/src/common/logging.c
index c78ae793b8..05827c52cc 100644
--- a/src/common/logging.c
+++ b/src/common/logging.c
@@ -13,7 +13,7 @@
 
 #include "common/logging.h"
 
-enum pg_log_level __pg_log_level;
+enum pg_log_level pg_log_current_level;
 
 static const char *progname;
 static int	log_flags;
@@ -45,7 +45,7 @@ pg_logging_init(const char *argv0)
 	setvbuf(stderr, NULL, _IONBF, 0);
 
 	progname = get_progname(argv0);
-	__pg_log_level = PG_LOG_INFO;
+	pg_log_current_level = PG_LOG_INFO;
 
 	if (pg_color_env)
 	{
@@ -107,7 +107,7 @@ pg_logging_config(int new_flags)
 void
 pg_logging_set_level(enum pg_log_level new_level)
 {
-	__pg_log_level = new_level;
+	pg_log_current_level = new_level;
 }
 
 void
diff --git a/src/include/common/logging.h b/src/include/common/logging.h
index 028149c7a1..487ce21cf7 100644
--- a/src/include/common/logging.h
+++ b/src/include/common/logging.h
@@ -55,7 +55,7 @@ enum pg_log_level
 	PG_LOG_OFF,
 };
 
-extern enum pg_log_level __pg_log_level;
+extern enum pg_log_level pg_log_current_level;
 
 /*
  * Kind of a hack to be able to produce the psql output exactly as required by
@@ -73,23 +73,23 @@ void		pg_log_generic(enum pg_log_level level, const char *pg_restrict fmt,...) p
 void		pg_log_generic_v(enum pg_log_level level, const char *pg_restrict fmt, va_list ap) pg_attribute_printf(2, 0);
 
 #define pg_log_fatal(...) do { \
-		if (likely(__pg_log_level

Re: Avoid full GIN index scan when possible

2020-01-10 Thread Alexander Korotkov
On Fri, Jan 10, 2020 at 6:31 PM Tom Lane  wrote:

> Alexander Korotkov  writes:
> > So, I think v10 is a version of patch, which can be committed after
> > some cleanup.  And we can try doing better nulls handling in a separate
> > patch.
>
> The cfbot reports that this doesn't pass regression testing.
> I haven't looked into why not.
>

Thank you for noticing.  I'll take care of it.

 --
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR

2020-01-10 Thread Daniel Verite
Dent John wrote:

> Yes. That’s at least true if unnest(x) is used in the FROM. If it’s used in
> the SELECT, actually it can get the performance benefit right away

At a quick glance, I don't see it called in the select-list  in any
of the regression tests. When trying it, it appears to crash (segfault):

postgres=# begin;
BEGIN

postgres=# declare c cursor for select oid::int as i, relname::text as r from
pg_class;
DECLARE CURSOR

postgres=# select unnest('c'::refcursor);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

The build is configured with:
./configure --enable-debug --with-icu --with-perl --enable-depend


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Avoid full GIN index scan when possible

2020-01-10 Thread Tom Lane
Alexander Korotkov  writes:
> So, I think v10 is a version of patch, which can be committed after
> some cleanup.  And we can try doing better nulls handling in a separate
> patch.

The cfbot reports that this doesn't pass regression testing.
I haven't looked into why not.

regards, tom lane




Re: Fixing parallel make of libpq

2020-01-10 Thread Tom Lane
Peter Eisentraut  writes:
> On 2020-01-09 15:17, Tom Lane wrote:
>> 1) Changing from an "|"-style dependency to a plain dependency seems
>> like a semantics change.  I've never been totally clear on the
>> difference though.  I think Peter introduced our use of the "|" style,
>> so maybe he can comment.

> If you have a phony target as a prerequisite of a real-file target, you 
> should make that an order-only ("|") prerequisite.  Otherwise the 
> real-file target rules will *always* be run, on account of the phony 
> target prerequisite.

OK, got that.  But that doesn't directly answer the question of whether
it's wrong to use a phony target as an order-only prerequisite of
another phony target.  Grepping around for other possible issues,
I see that you recently added

update-unicode: | submake-generated-headers submake-libpgport
$(MAKE) -C src/common/unicode $@
$(MAKE) -C contrib/unaccent $@

Doesn't that also have parallel-make hazards, if libpq does?

regards, tom lane




Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

2020-01-10 Thread Tom Lane
Michael Paquier  writes:
> [ patch to skip tables if get_namespace_name fails ]

This doesn't seem like a very good idea to me.  Is there any
evidence that it's fixing an actual problem?  What if the table
you're skipping is holding back datfrozenxid?

regards, tom lane




Re: our checks for read-only queries are not great

2020-01-10 Thread Tom Lane
Peter Eisentraut  writes:
> I don't really remember, but that was basically the opinion I had 
> arrived at as I was reading through this current thread.  Roughly 
> speaking, anything that changes the database state (data or schema) in a 
> way that would be reflected in a pg_dump output is not read-only.

OK, although I'd put some emphasis on "roughly speaking".

> ALTER SYSTEM is read only in my mind.

I'm still having trouble with this conclusion.  I think it can only
be justified by a very narrow reading of "reflected in pg_dump"
that relies on the specific factorization we've chosen for upgrade
operations, ie that postgresql.conf mods have to be carried across
by hand.  But that's mostly historical baggage, rather than a sane
basis for defining "read only".  If somebody comes up with a patch
that causes "pg_dumpall -g" to include ALTER SYSTEM SET commands for
whatever is in postgresql.auto.conf (not an unreasonable idea BTW),
will you then decide that ALTER SYSTEM SET is no longer read-only?
Or, perhaps, reject such a patch on the grounds that it breaks this
arbitrary definition of read-only-ness?

As another example, do we need to consider that replacing pg_hba.conf
via pg_write_file should be allowed in a "read only" transaction?

These conclusions seem obviously silly to me, but perhaps YMMV.

regards, tom lane




standby apply lag on inactive servers

2020-01-10 Thread Alvaro Herrera
A customer of ours complained that if you have an inactive primary,
monitoring the apply lag on a standby reports monotonically increasing
lag.  The reason for this is that the apply lag is only updated on
COMMIT records, which of course don't occur in inactive servers.
But CHECKPOINT records do occur, so the WAL insert pointer continues to
move forward, which is what causes the spurious lag.

(I think newer releases are protected from this problem because they
don't emit checkpoints during periods of inactivity.  I didn't verify
this.)

This patch fixes the problem by using the checkpoint timestamp to update
the lag tracker in the standby.  This requires a little change in where
this update is invoked, because previously it was done only for the XACT
rmgr; this makes the patch a little bigger than it should.

-- 
Álvaro HerreraPostgreSQL Expert, https://www.2ndQuadrant.com/
>From 859cd9db48cdde68c7767359e7cdae5b72a3011f Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 10 Jan 2020 10:37:44 -0300
Subject: [PATCH] Use checkpoint timestamp to update apply lag tracker

Previously we only used transaction commit records, which makes the wal
pointer in standby lag behind when the primary is idle.
---
 src/backend/access/transam/xlog.c | 37 ++-
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7f4f784c0e..59679b31ec 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5562,8 +5562,8 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
  *
  * If the record contains a timestamp, returns true, and saves the timestamp
  * in *recordXtime. If the record type has no timestamp, returns false.
- * Currently, only transaction commit/abort records and restore points contain
- * timestamps.
+ * Currently, only transaction commit/abort records, restore points and
+ * checkpoints contain timestamps.
  */
 static bool
 getRecordTimestamp(XLogReaderState *record, TimestampTz *recordXtime)
@@ -5577,6 +5577,13 @@ getRecordTimestamp(XLogReaderState *record, TimestampTz *recordXtime)
 		*recordXtime = ((xl_restore_point *) XLogRecGetData(record))->rp_time;
 		return true;
 	}
+	if (rmid == RM_XLOG_ID && (info == XLOG_CHECKPOINT_ONLINE ||
+			   info == XLOG_CHECKPOINT_SHUTDOWN))
+	{
+		*recordXtime =
+			time_t_to_timestamptz(((CheckPoint *) XLogRecGetData(record))->time);
+		return true;
+	}
 	if (rmid == RM_XACT_ID && (xact_info == XLOG_XACT_COMMIT ||
 			   xact_info == XLOG_XACT_COMMIT_PREPARED))
 	{
@@ -5745,9 +5752,6 @@ recoveryStopsBefore(XLogReaderState *record)
 
 /*
  * Same as recoveryStopsBefore, but called after applying the record.
- *
- * We also track the timestamp of the latest applied COMMIT/ABORT
- * record in XLogCtl->recoveryLastXTime.
  */
 static bool
 recoveryStopsAfter(XLogReaderState *record)
@@ -5755,7 +5759,6 @@ recoveryStopsAfter(XLogReaderState *record)
 	uint8		info;
 	uint8		xact_info;
 	uint8		rmid;
-	TimestampTz recordXtime;
 
 	/*
 	 * Ignore recovery target settings when not in archive recovery (meaning
@@ -5823,10 +5826,6 @@ recoveryStopsAfter(XLogReaderState *record)
 	{
 		TransactionId recordXid;
 
-		/* Update the last applied transaction timestamp */
-		if (getRecordTimestamp(record, &recordXtime))
-			SetLatestXTime(recordXtime);
-
 		/* Extract the XID of the committed/aborted transaction */
 		if (xact_info == XLOG_XACT_COMMIT_PREPARED)
 		{
@@ -5865,7 +5864,7 @@ recoveryStopsAfter(XLogReaderState *record)
 		{
 			recoveryStopAfter = true;
 			recoveryStopXid = recordXid;
-			recoveryStopTime = recordXtime;
+			getRecordTimestamp(record, &recoveryStopTime);
 			recoveryStopLSN = InvalidXLogRecPtr;
 			recoveryStopName[0] = '\0';
 
@@ -6049,6 +6048,19 @@ recoveryApplyDelay(XLogReaderState *record)
 	return true;
 }
 
+/*
+ * Track the timestamp of the latest applied time-bearing WAL record in
+ * XLogCtl->recoveryLastXTime.
+ */
+static void
+updateRecoveryXTime(XLogReaderState *record)
+{
+	TimestampTz	recordXtime;
+
+	if (getRecordTimestamp(record, &recordXtime))
+		SetLatestXTime(recordXtime);
+}
+
 /*
  * Save timestamp of latest processed commit/abort record.
  *
@@ -7255,6 +7267,9 @@ StartupXLOG(void)
 		WalSndWakeup();
 }
 
+/* update last recovery time */
+updateRecoveryXTime(xlogreader);
+
 /* Exit loop if we reached inclusive recovery target */
 if (recoveryStopsAfter(xlogreader))
 {
-- 
2.20.1



Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2020-01-10 Thread Alvaro Herrera
On 2020-Jan-09, Alvaro Herrera wrote:

> I looked at this a little while and was bothered by the perl changes; it
> seems out of place to have RecursiveCopy be thinking about tablespaces,
> which is way out of its league.  So I rewrote that to use a callback:
> the PostgresNode code passes a callback that's in charge to handle the
> case of a symlink.  Things look much more in place with that.  I didn't
> verify that all places that should use this are filled.
> 
> In 0002 I found adding a new function unnecessary: we can keep backwards
> compat by checking 'ref' of the third argument.  With that we don't have
> to add a new function.  (POD changes pending.)

I forgot to add that something in these changes is broken (probably the
symlink handling callback) so the tests fail, but I couldn't stay away
from my daughter's birthday long enough to figure out what or how.  I'm
on something else today, so if one of you can research and submit fixed
versions, that'd be great.

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: our checks for read-only queries are not great

2020-01-10 Thread Robert Haas
On Fri, Jan 10, 2020 at 7:23 AM Peter Eisentraut
 wrote:
> I don't really remember, but that was basically the opinion I had
> arrived at as I was reading through this current thread.  Roughly
> speaking, anything that changes the database state (data or schema) in a
> way that would be reflected in a pg_dump output is not read-only.

This rule very nearly matches the current behavior: it explains why
temp table operations are allowed, and why ALTER SYSTEM is allowed,
and why REINDEX etc. are allowed. However, there's a notable
exception: PREPARE, COMMIT PREPARED, and ROLLBACK PREPARED are allowed
in a read-only transaction. Under the "doesn't change pg_dump output"
criteria, the first and third ones should be permitted but COMMIT
PREPARED should be denied, except maybe if the prepared transaction
didn't do any writes (and in that case, why did we bother preparing
it?). Despite that, this rule does a way better job explaining the
current behavior than anything else suggested so far.

Accordingly, here's v3, with comments adjusted to match this new
explanation for the current behavior. This seems way better than what
I had before, because it actually explains why stuff is the way it is
rather than just appealing to history.

BTW, there's a pending patch that allows CLUSTER to change the
tablespace of an object while rewriting it. If we want to be strict
about it, that variant would need to be disallowed in read only mode,
under this definition. (I also think that it's lousy syntax and ought
to be spelled ALTER TABLE x SET TABLESPACE foo, CLUSTER or something
like that rather than anything beginning with CLUSTER, but I seem to
be losing that argument.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v3-0001-Fix-problems-with-read-only-query-checks-and-refa.patch
Description: Binary data


Re: [HACKERS] Block level parallel vacuum

2020-01-10 Thread Sergei Kornilov
Hello

> Yes, we should improve this. I tried to fix this. Attaching a delta
> patch that is fixing both the comments.

Thank you, I have no objections.

I think that status of CF entry is outdated and the most appropriate status for 
this patch is "Ready to Commiter". Changed. I also added an annotation with a 
link to recently summarized results.

regards, Sergei




Re: Fixing parallel make of libpq

2020-01-10 Thread Peter Eisentraut

On 2020-01-09 15:17, Tom Lane wrote:

1) Changing from an "|"-style dependency to a plain dependency seems
like a semantics change.  I've never been totally clear on the
difference though.  I think Peter introduced our use of the "|" style,
so maybe he can comment.


If you have a phony target as a prerequisite of a real-file target, you 
should make that an order-only ("|") prerequisite.  Otherwise the 
real-file target rules will *always* be run, on account of the phony 
target prerequisite.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [Proposal] Global temporary tables

2020-01-10 Thread Konstantin Knizhnik



On 09.01.2020 19:30, Tomas Vondra wrote:








3 Still no one commented on GTT's transaction information 
processing, they include

3.1 Should gtt's frozenxid need to be care?
3.2 gtt’s clog clean
3.3 How to deal with "too old" gtt data



No idea what to do about this.



I wonder what is the specific of GTT here?
The same problem takes place for normal (local) temp tables, doesn't it?



Not sure. TBH I'm not sure I understand what the issue actually is. 


Just open session, create temporary table and insert some data in it.
Then in other session run 2^31 transactions (at my desktop it takes 
about 2 hours).

As far as temp tables are not proceeded by vacuum, database is stalled:

 ERROR:  database is not accepting commands to avoid wraparound data 
loss in database "postgres"


It seems to be quite dubious behavior and it is strange to me that 
nobody complains about it.
We discuss  many issues related with temp tables (statistic, parallel 
queries,...) which seems to be less critical.


But this problem is not specific to GTT - it can be reproduced with 
normal (local) temp tables.

This is why I wonder why do we need to solve it in GTT patch.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

2020-01-10 Thread Mahendra Singh Thalor
On Fri, 10 Jan 2020 at 16:37, Michael Paquier  wrote:
>
> On Fri, Jan 10, 2020 at 05:01:25PM +0900, Michael Paquier wrote:
> > This makes me wonder how much we should try to outsmart somebody which
> > puts the catalogs in such a inconsistent state.  Hmm.  Perhaps at the
> > end autovacuum should just ignore such entries and just don't help the
> > user at all as this also comes with its own issues with the storage
> > level as well as smgr.c uses rd_backend.  And if the user plays with
> > temporary namespaces like that with superuser rights, he likely knows
> > what he is doing.  Perhaps not :D, in which case autovacuum may not be
> > the best thing to decide that.  I still think we should make the log
> > of autovacuum.c for orphaned relations more careful with its coding
> > though, and fix it with the previous patch.  The documentation of
> > isTempNamespaceInUse() could gain in clarity, just a nit from me while
> > looking at the surroundings.  And actually I found an issue with its
> > logic, as the routine would not consider a temp namespace in use for a
> > session's own MyBackendId.  As that's only used for autovacuum, this
> > has no consequence, but let's be correct in hte long run.
> >
> > And this gives the attached after a closer lookup.  Thoughts?
>
> Thinking more about it, this has a race condition if a temporary
> schema is removed after collecting the OIDs in the drop phase.  So the
> updated attached is actually much more conservative and does not need
> an update of the log message, without giving up on the improvements
> done in v11~.  In 9.4~10, the code of the second phase relies on
> GetTempNamespaceBackendId() which causes an orphaned relation to not
> be dropped in the event of a missing namespace.  I'll just leave that
> alone for a couple of days now..
> --

Thanks for the patch. I am not getting any crash but \d is not showing
any temp table if we drop temp schema and create again temp table.

postgres=# create temporary table test1 (a int);
CREATE TABLE
postgres=# \d
  List of relations
  Schema   | Name  | Type  |  Owner
---+---+---+--
 pg_temp_3 | test1 | table | mahendra
(1 row)

postgres=# drop schema pg_temp_3 cascade ;
NOTICE:  drop cascades to table test1
DROP SCHEMA
postgres=# \d
Did not find any relations.
postgres=# create temporary table test1 (a int);
CREATE TABLE
postgres=# \d
Did not find any relations.
postgres=#

Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com




Re: our checks for read-only queries are not great

2020-01-10 Thread Peter Eisentraut

On 2020-01-09 21:52, Tom Lane wrote:

Peter might remember more clearly, but I have a feeling that we
concluded that the intent of the spec was for read-only-ness to
disallow globally-visible changes in the visible database contents.


I don't really remember, but that was basically the opinion I had 
arrived at as I was reading through this current thread.  Roughly 
speaking, anything that changes the database state (data or schema) in a 
way that would be reflected in a pg_dump output is not read-only.



VACUUM, for example, does not cause any visible change, so it
should be admissible.  REINDEX ditto.  (We ignore here the possibility
of such things causing, say, a change in the order in which rows are
returned, since that's beneath the spec's notice to begin with.)


agreed


ANALYZE ditto, except to the extent that if you look at pg_stats
you might see something different --- but again, system catalog
contents are outside the spec's purview.


agreed


You could extend this line of argument, perhaps, far enough to justify
ALTER SYSTEM SET as well.  But I don't like that because some GUCs have
visible effects on the results that an ordinary query minding its own
business can get.  Timezone is perhaps the poster child there, or
search_path.  If we were to subdivide the GUCs into "affects
implementation details only" vs "can affect query semantics",
I'd hold still for allowing ALTER SYSTEM SET on the former group.
Doubt it's worth the trouble to distinguish, though.


ALTER SYSTEM is read only in my mind.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Created feature for to_date() conversion using patterns 'YYYY-WW', 'YYYY-WW-D', 'YYYY-MM-W' and 'YYYY-MM-W-D'

2020-01-10 Thread Mark Lorenz
Updated the chg_to_date_wwd.patch with additional tests (because it 
works not only for 'D' pattern but also for all day patterns like 'Day' 
or 'DY'). Added the necessary documentation change.


(The fix_to_char_wwd.patch from 
f4e740a8de3ad1e762a28f6ff253ea4f%40four-two.de is still up-to-date)From 4e35bd88bef1916e7d11ad0776b3075e3183f7d0 Mon Sep 17 00:00:00 2001
From: Mark Lorenz 
Date: Fri, 20 Dec 2019 14:30:41 +0100
Subject: [PATCH 1/3] change to_date()/to_timestamp() behaviour with
 '-WW-D' pattern

Currently, the D part is ignored at non-ISO week pattern. Now the D
pattern works as well.

Added regression tests
---
 src/backend/utils/adt/formatting.c | 44 +++---
 src/backend/utils/adt/timestamp.c  | 84 ++
 src/include/utils/timestamp.h  |  6 ++
 src/test/regress/expected/horology.out | 26 +++-
 src/test/regress/sql/horology.sql  |  4 ++
 5 files changed, 156 insertions(+), 8 deletions(-)

diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 8fcbc22..603c51c 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -4494,21 +4494,51 @@ do_to_timestamp(text *date_txt, text *fmt, bool std,
 			fmask |= DTK_DATE_M;
 		}
 		else
-			tmfc.ddd = (tmfc.ww - 1) * 7 + 1;
+		{
+			/*
+			 * If tmfc.d is not set, then the date is left at the beginning of
+			 * the week (Sunday).
+			 */
+			if (tmfc.d)
+weekdate2date(tmfc.ww, tmfc.d, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
+			else
+week2date(tmfc.ww, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
+			fmask |= DTK_DATE_M;
+		}
+	}
+
+	if (tmfc.mm)
+	{
+		tm->tm_mon = tmfc.mm;
+		fmask |= DTK_M(MONTH);
 	}
 
 	if (tmfc.w)
-		tmfc.dd = (tmfc.w - 1) * 7 + 1;
+	{
+		/* if tmfc.mm is set, the date can be calculated */
+		if (tmfc.mm)
+		{
+			/*
+			 * If tmfc.d is not set, then the date is left at the beginning of
+			 * the week (Sunday).
+			 */
+			if (tmfc.d)
+monthweekdate2date(tmfc.mm, tmfc.w, tmfc.d, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
+			else
+monthweek2date(tmfc.mm, tmfc.w, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
+
+			fmask |= DTK_DATE_M;
+			tmfc.dd = tm->tm_mday;
+		}
+		else
+			tmfc.dd = (tmfc.w - 1) * 7 + 1;
+	}
+
 	if (tmfc.dd)
 	{
 		tm->tm_mday = tmfc.dd;
 		fmask |= DTK_M(DAY);
 	}
-	if (tmfc.mm)
-	{
-		tm->tm_mon = tmfc.mm;
-		fmask |= DTK_M(MONTH);
-	}
 
 	if (tmfc.ddd && (tm->tm_mon <= 1 || tm->tm_mday <= 1))
 	{
diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index 945b8f8..3e2f499 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -4264,6 +4264,90 @@ interval_trunc(PG_FUNCTION_ARGS)
 	PG_RETURN_INTERVAL_P(result);
 }
 
+/* monthweek2j()
+ *
+ *	Return the Julian day which corresponds to the first day (Sunday) of the given month/year and week.
+ *	Julian days are used to convert between ISO week dates and Gregorian dates.
+ */
+int
+monthweek2j(int year, int month, int week)
+{
+	int			day0,
+day1;
+
+	/* first day of given month */
+	day1 = date2j(year, month, 1);
+
+	// day0 == offset to first day of week (Sunday)
+	day0 = j2day(day1);
+
+	return ((week - 1) * 7) + (day1 - day0);
+}
+
+/* monthweek2date()
+ * Convert week of month and year number to date.
+ */
+void
+monthweek2date(int month, int wom, int *year, int *mon, int *mday)
+{
+	j2date(monthweek2j(*year, month, wom), year, mon, mday);
+}
+
+/* monthweek2date()
+ *
+ *	Convert a week of month date (year, month, week of month) into a Gregorian date.
+ *	Gregorian day of week sent so weekday strings can be supplied.
+ *	Populates year, mon, and mday with the correct Gregorian values.
+ */
+void
+monthweekdate2date(int month, int wom, int wday, int *year, int *mon, int *mday)
+{
+	int 		jday;
+
+	jday = monthweek2j(*year, month, wom);
+	jday += wday - 1;
+
+	j2date(jday, year, mon, mday);
+}
+
+/* week2j()
+ *
+ *	Return the Julian day which corresponds to the first day (Sunday) of the given year and week.
+ *	Julian days are used to convert between ISO week dates and Gregorian dates.
+ */
+int
+week2j(int year, int week)
+{
+	/* calculating the Julian Day from first month of current year */
+	return monthweek2j(year, 1, week);
+}
+
+/* week2date()
+ * Convert week of year number to date.
+ */
+void
+week2date(int woy, int *year, int *mon, int *mday)
+{
+	j2date(week2j(*year, woy), year, mon, mday);
+}
+
+/* weekdate2date()
+ *
+ *	Convert a week date (year, week) into a Gregorian date.
+ *	Gregorian day of week sent so weekday strings can be supplied.
+ *	Populates year, mon, and mday with the correct Gregorian values.
+ */
+void
+weekdate2date(int woy, int wday, int *year, int *mon, int *mday)
+{
+	int			jday;
+
+	jday = week2j(*year, woy);
+	jday += wday - 1;
+
+	j2date(jday, year, mon, mday);
+}
+
 /* isoweek2j()
  *
  *	Return the Julian day which corresponds to the first day (Monday) of the given ISO 8601 year and week.
diff --git a/src/include

Re: ALTER TABLE support for dropping generation expression

2020-01-10 Thread Sergei Kornilov
Hello

Thank you, but I am late: patch has another merge conflict.

Conflict seems trivial and patch looks fine for me.

regards, Sergei




Re: Minimal logical decoding on standbys

2020-01-10 Thread Rahila Syed
Hi Amit,

Can you please rebase the patches as they don't apply on latest master?

Thank you,
Rahila Syed


On Thu, 26 Dec 2019 at 16:36, Amit Khandekar  wrote:

> On Tue, 24 Dec 2019 at 14:02, Amit Khandekar 
> wrote:
> >
> > On Thu, 19 Dec 2019 at 01:02, Rahila Syed 
> wrote:
> > >
> > > Hi,
> > >
> > >> Hi, do you consistently get this failure on your machine ? I am not
> > >> able to get this failure, but I am going to analyze when/how this can
> > >> fail. Thanks
> > >>
> > > Yes, I am getting it each time I run make -C src/test/recovery/ check
> PROVE_TESTS=t/018_standby_logical_decoding_xmins.pl
> > > Also, there aren't any errors in logs indicating the cause.
> >
> > Thanks for the reproduction. Finally I could reproduce the behaviour.
> > It occurs once in 7-8 runs of the test on my machine. The issue is :
> > on master, the catalog_xmin does not immediately get updated. It
> > happens only after the hot standby feedback reaches on master. And I
> > haven't used wait_for_xmins() for these failing cases. I should use
> > that. Working on the same ...
>
> As mentioned above, I have used wait_for_xmins() so that we can wait
> for the xmins to be updated after hot standby feedback is processed.
> In one of the 3 scenarios where it failed for you, I removed the check
> at the second place because it was redundant. At the 3rd place, I did
> some appropriate changes with detailed comments. Please check.
> Basically we are checking that the master's phys catalog_xmin has
> advanced but not beyond standby's logical catalog_xmin. And for making
> sure the master's xmins are updated, I call txid_current() and then
> wait for the master's xmin to advance after hot-standby_feedback, and
> in this way I make sure the xmin/catalog_xmins are now up-to-date
> because of hot-standby-feedback, so that we can check whether the
> master's physical slot catalog_xmin has reached the value of standby's
> catalog_xmin but not gone past it.
>
> I have also moved the "wal_receiver_status_interval = 1" setting from
> master to standby. It was wrongly kept in master. This now reduces the
> test time by half, on my machine.
>
> Attached patch set v5 has only the test changes. Please check if now
> the test fails for you.
>
> >
> > --
> > Thanks,
> > -Amit Khandekar
> > EnterpriseDB Corporation
> > The Postgres Database Company
>
>
>
> --
> Thanks,
> -Amit Khandekar
> EnterpriseDB Corporation
> The Postgres Database Company
>


-- 
Rahila Syed
Performance Engineer
2ndQuadrant
http://www.2ndQuadrant.com 
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Block level parallel vacuum

2020-01-10 Thread Mahendra Singh Thalor
On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov  wrote:
>
> Hi
> Thank you for update! I looked again
>
> (vacuum_indexes_leader)
> +   /* Skip the indexes that can be processed by parallel workers 
> */
> +   if (!skip_index)
> +   continue;
>
> Does the variable name skip_index not confuse here? Maybe rename to something 
> like can_parallel?

I also agree with your point.

>
> Another question about behavior on temporary tables. Use case: the user 
> commands just "vacuum;" to vacuum entire database (and has enough maintenance 
> workers). Vacuum starts fine in parallel, but on first temporary table we hit:
>
> +   if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
> +   {
> +   ereport(WARNING,
> +   (errmsg("disabling parallel option of vacuum 
> on \"%s\" --- cannot vacuum temporary tables in parallel",
> +   
> RelationGetRelationName(onerel;
> +   params->nworkers = -1;
> +   }
>
> And therefore we turn off the parallel vacuum for the remaining tables... Can 
> we improve this case?

Good point.
Yes, we should improve this. I tried to fix this.  Attaching a delta
patch that is fixing both the comments.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


v44-0002-delta_Allow-vacuum-command-to-process-indexes-in-parallel.patch
Description: Binary data


Re: Add pg_file_sync() to adminpack

2020-01-10 Thread Julien Rouhaud
On Fri, Jan 10, 2020 at 10:50 AM Fujii Masao  wrote:
>
> On Thu, Jan 9, 2020 at 10:39 PM Julien Rouhaud  wrote:
> >
> > I think that pg_write_server_files should be allowed to call that
> > function by default.
>
> But pg_write_server_files users are not allowed to execute
> other functions like pg_file_write() by default. So doing that
> change only for pg_file_sync() looks strange to me.

Ah indeed.  I'm wondering if that's an oversight of the original
default role patch or voluntary.




Re: Add pg_file_sync() to adminpack

2020-01-10 Thread Michael Paquier
On Fri, Jan 10, 2020 at 06:50:12PM +0900, Fujii Masao wrote:
> I changed the doc that way. Thanks for the review!

+ 
+  pg_file_sync fsyncs the specified file or directory
+  named by filename.  Returns true on success,
+  an error is thrown otherwise (e.g., the specified file is not present).
+ 
What's the point of having a function that returns a boolean if it
just returns true all the time?  Wouldn't it be better to have a set
of semantics closer to the unlink() part, where the call of stat()
fails with an ERROR for (errno != ENOENT) and the fsync call returns
false with a WARNING?

+SELECT pg_file_sync('global'); -- sync directory
+ pg_file_sync
+--
+ t
+(1 row)
installcheck deployments may not like that.
--
Michael


signature.asc
Description: PGP signature


Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

2020-01-10 Thread Michael Paquier
On Fri, Jan 10, 2020 at 05:01:25PM +0900, Michael Paquier wrote:
> This makes me wonder how much we should try to outsmart somebody which
> puts the catalogs in such a inconsistent state.  Hmm.  Perhaps at the
> end autovacuum should just ignore such entries and just don't help the
> user at all as this also comes with its own issues with the storage
> level as well as smgr.c uses rd_backend.  And if the user plays with
> temporary namespaces like that with superuser rights, he likely knows
> what he is doing.  Perhaps not :D, in which case autovacuum may not be
> the best thing to decide that.  I still think we should make the log
> of autovacuum.c for orphaned relations more careful with its coding
> though, and fix it with the previous patch.  The documentation of
> isTempNamespaceInUse() could gain in clarity, just a nit from me while
> looking at the surroundings.  And actually I found an issue with its
> logic, as the routine would not consider a temp namespace in use for a
> session's own MyBackendId.  As that's only used for autovacuum, this
> has no consequence, but let's be correct in hte long run.
> 
> And this gives the attached after a closer lookup.  Thoughts?

Thinking more about it, this has a race condition if a temporary
schema is removed after collecting the OIDs in the drop phase.  So the
updated attached is actually much more conservative and does not need
an update of the log message, without giving up on the improvements
done in v11~.  In 9.4~10, the code of the second phase relies on
GetTempNamespaceBackendId() which causes an orphaned relation to not
be dropped in the event of a missing namespace.  I'll just leave that
alone for a couple of days now..
--
Michael
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index c82f9fc4b5..0987986a8f 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -3235,10 +3235,14 @@ isTempNamespaceInUse(Oid namespaceId)
 
 	backendId = GetTempNamespaceBackendId(namespaceId);
 
-	if (backendId == InvalidBackendId ||
-		backendId == MyBackendId)
+	/* No such temporary namespace? */
+	if (backendId == InvalidBackendId)
 		return false;
 
+	/* Is the namespace used by this backend? */
+	if (backendId == MyBackendId)
+		return true;
+
 	/* Is the backend alive? */
 	proc = BackendIdGetProc(backendId);
 	if (proc == NULL)
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index f0e40e36af..9eb8132a37 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2071,9 +2071,11 @@ do_autovacuum(void)
 		{
 			/*
 			 * We just ignore it if the owning backend is still active and
-			 * using the temporary schema.
+			 * using the temporary schema.  If the namespace does not exist
+			 * ignore the entry.
 			 */
-			if (!isTempNamespaceInUse(classForm->relnamespace))
+			if (!isTempNamespaceInUse(classForm->relnamespace) &&
+get_namespace_name(classForm->relnamespace) != NULL)
 			{
 /*
  * The table seems to be orphaned -- although it might be that
@@ -2202,6 +2204,7 @@ do_autovacuum(void)
 		Oid			relid = lfirst_oid(cell);
 		Form_pg_class classForm;
 		ObjectAddress object;
+		char	   *nspname;
 
 		/*
 		 * Check for user-requested abort.
@@ -2243,7 +2246,15 @@ do_autovacuum(void)
 			continue;
 		}
 
-		if (isTempNamespaceInUse(classForm->relnamespace))
+		nspname = get_namespace_name(classForm->relnamespace);
+
+		/*
+		 * Nothing to do for a relation with a missing namespace.  This
+		 * check is the same as above when building the list of orphan
+		 * relations.
+		 */
+		if (isTempNamespaceInUse(classForm->relnamespace) ||
+			nspname == NULL)
 		{
 			UnlockRelationOid(relid, AccessExclusiveLock);
 			continue;
@@ -2253,8 +2264,7 @@ do_autovacuum(void)
 		ereport(LOG,
 (errmsg("autovacuum: dropping orphan temp table \"%s.%s.%s\"",
 		get_database_name(MyDatabaseId),
-		get_namespace_name(classForm->relnamespace),
-		NameStr(classForm->relname;
+		nspname, NameStr(classForm->relname;
 
 		object.classId = RelationRelationId;
 		object.objectId = relid;


signature.asc
Description: PGP signature


Re: [HACKERS] Block level parallel vacuum

2020-01-10 Thread Sergei Kornilov
Hi
Thank you for update! I looked again

(vacuum_indexes_leader)
+   /* Skip the indexes that can be processed by parallel workers */
+   if (!skip_index)
+   continue;

Does the variable name skip_index not confuse here? Maybe rename to something 
like can_parallel?

Another question about behavior on temporary tables. Use case: the user 
commands just "vacuum;" to vacuum entire database (and has enough maintenance 
workers). Vacuum starts fine in parallel, but on first temporary table we hit:

+   if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
+   {
+   ereport(WARNING,
+   (errmsg("disabling parallel option of vacuum on 
\"%s\" --- cannot vacuum temporary tables in parallel",
+   
RelationGetRelationName(onerel;
+   params->nworkers = -1;
+   }

And therefore we turn off the parallel vacuum for the remaining tables... Can 
we improve this case?

regards, Sergei




Re: Add pg_file_sync() to adminpack

2020-01-10 Thread Fujii Masao
On Thu, Jan 9, 2020 at 10:39 PM Julien Rouhaud  wrote:
>
> On Thu, Jan 9, 2020 at 7:31 AM Fujii Masao  wrote:
> >
> > On Mon, Jan 6, 2020 at 3:42 PM Michael Paquier  wrote:
> > >
> > > On Mon, Jan 06, 2020 at 03:20:13PM +0900, Arthur Zakirov wrote:
> > > > It isn't case if a file doesn't exist. But if there are no permissions 
> > > > on
> > > > the file:
> > > >
> > > > PANIC:  could not open file "testfile": Permissions denied
> > > > server closed the connection unexpectedly
> > > >
> > > > It could be fixed by implementing a function like 
> > > > pg_file_sync_internal() or
> > > > by making the function fsync_fname_ext() external.
> > >
> > > The patch uses stat() to make sure that the file exists and has no
> > > issues.  Though it could be a problem with any kind of TOCTOU-like
> > > issues (looking at you, Windows, for ENOPERM), so I agree that it
> > > would make more sense to use pg_fsync() here with a fd opened first.
> >
> > I agree that it's not good for pg_file_sync() to cause a PANIC.
> > I updated the patch so that pg_file_sync() uses fsync_fname_ext()
> > instead of fsync_fname() as Arthur suggested.
> >
> > It's one of ideas to make pg_file_sync() open the file and directly call
> > pg_fsync(). But fsync_fname_ext() has already such code and
> > I'd like to avoid the code duplication.
>
> This looks good to me.
>
> > Attached is the updated version of the patch.
>
> +
> + pg_catalog.pg_file_sync(filename 
> text)
> + boolean
> + 
> +  Sync a file or directory
> + 
> +
>
> "Flush to disk" looks better than "sync" here.

I changed the doc that way. Thanks for the review!

> I think that pg_write_server_files should be allowed to call that
> function by default.

But pg_write_server_files users are not allowed to execute
other functions like pg_file_write() by default. So doing that
change only for pg_file_sync() looks strange to me.

Regards,

-- 
Fujii Masao


pg_file_sync_v3.patch
Description: Binary data


Re: range_agg

2020-01-10 Thread Pavel Stehule
Hi

so 4. 1. 2020 v 6:29 odesílatel Paul A Jungwirth <
p...@illuminatedcomputing.com> napsal:

> On Fri, Dec 20, 2019 at 11:29 AM Alvaro Herrera
>  wrote:
> > > Should I just give up on implicit casts and require you to specify
> > > one? That makes it a little more annoying to mix range & multirange
> > > types, but personally I'm okay with that. This is my preferred
> > > approach.
> >
> > +1
>
> Here is a patch adding the casts, rebasing on the latest master, and
> incorporating Alvaro's changes. Per his earlier suggestion I combined
> it all into one patch file, which also makes it easier to apply
> rebases & updates.
>

This patch was applied cleanly and all tests passed



>
> My work on adding casts also removes the @+ / @- / @* operators and
> adds + / - / * operators where both parameters are multiranges. I
> retained other operators with mixed range/multirange parameters, both
> because there are already range operators with mixed range/scalar
> parameters (e.g. <@), and because it seemed like the objection to @+ /
> @- / @* was not mixed parameters per se, but rather their
> unguessability. Since the other operators are the same as the existing
> range operators, they don't share that problem.
>

looks well


>
> This still leaves the question of how best to format the docs for
> these operators. I like being able to combine all the <@ variations
> (e.g.) into one table row, but if that is too ugly I could give them
> separate rows instead. Giving them all their own row consumes a lot of
> vertical space though, and to me that makes the docs more tedious to
> read & browse, so it's harder to grasp all the available range-related
> operations at a glance.
>

I have similar opinion - maybe is better do documentation for range and
multirange separately. Sometimes there are still removed operators @+


> I'm skeptical of changing pg_type.typtype from 'm' to 'r'. A
> multirange isn't a range, so why should we give it the same type? Also
> won't this break any queries that are using that column to find range
> types? What is the motivation to use the same typtype for both ranges
> and multiranges? (There is plenty I don't understand here, e.g. why we
> have both typtype and typcategory, so maybe there is a good reason I'm
> missing.)
>

If you can share TYPTYPE_RANGE in code for multiranges, then it should be
'r'. If not, then it needs own value.


> I experimented with setting pg_type.typelem to the multirange's range
> type, but it seemed to break a lot of things, and reading the code I
> saw some places that treat a non-zero typelem as synonymous with being
> an array. So I'm reluctant to make this change also, especially when
> it is just as easy to query pg_range to get a multirange's range type.
>

ok, it is unhappy, but it is true. This note should be somewhere in code,
please


> Also range types themselves don't set typelem to their base type, and
> it seems like we'd want to treat ranges and multiranges the same way
> here.
>
> Alvaro also suggested renaming pg_range.mltrngtypid to
> pg_range.rngmultitypid, so it shares the same "rng" prefix as the
> other columns in this table. Having a different prefix does stand out.
> I've included that change in this patch too.
>

Personally I have not any comments to implemented functionality.

>
> Yours,
> Paul
>


Re: [Proposal] Global temporary tables

2020-01-10 Thread Konstantin Knizhnik




On 09.01.2020 19:48, Tomas Vondra wrote:


The most complex and challenged task is to support GTT for all kind 
of indexes. Unfortunately I can not proposed some good universal 
solution for it.
Just patching all existed indexes implementation seems to be the only 
choice.




I haven't looked at the indexing issue closely, but IMO we need to
ensure that every session sees/uses only indexes on GTT that were
defined before the seesion started using the table.


Why? It contradicts with behavior of normal tables.
Assume that you have active clients and at some point of time DBA 
recognizes that them are spending to much time in scanning some GTT.
It cab create index for this GTT but if existed client will not be able 
to use this index, then we need somehow make this clients to restart 
their sessions?
In my patch I have implemented building indexes for GTT on demand: if 
accessed index on GTT is not yet initialized, then it is filled with 
local data.


Can't we track which indexes a particular session sees, somehow?


Statistic is another important case.
But once again I do not completely understand why we want to address 
all this issues with statistic in first version of the patch?


I think the question is which "issues with statistic" you mean. I'm sure
we can ignore some of them, e.g. the one with parallel workers not
having any stats (assuming we consider functions using GTT to be
parallel restricted).


If we do not use shared buffers for GTT then parallel processing of GTT 
is not possible at all, so there is no problem with statistic for 
parallel workers.




I think someone pointed out pushing stuff directly into the cache is
rather problematic, but I don't recall the details.

I have not encountered any problems, so if you can point me on what is 
wrong with this approach, I will think about alternative solution.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: pgbench - use pg logging capabilities

2020-01-10 Thread Michael Paquier
On Fri, Jan 10, 2020 at 08:52:17AM +0100, Fabien COELHO wrote:
> Compared to dealing with the level inside the call, the use of the level
> variable avoids a call-test-return cycle in this case, and the unlikely
> should help the compiler reorder instructions so that no actual branch is
> taken under the common case.
> 
> So I think that the current situation is a good thing at least for debug.

If you look at some of my messages on other threads, you would likely
notice that my mood of the day is to not design things which try to
outsmart a user's expectations :)

So I would stand on the position to just remove those likely/unlikely
parts if we want this logging to be generic.

> For other levels, they are on by default AND would not be placed at critical
> performance points, so the whole effort of avoiding the call are moot.
> 
> I agree with Tom that __pg_log_level variable name violates usages.

My own taste would be to still keep the variable local to logging.c,
and use a "get"-like routine to be consistent with the "set" part.  I
don't have to be right, let's see where this discussion leads us.

(I mentioned that upthread, but I don't think it is a good idea to
discuss about a redesign of those routines on a thread about pgbench
based on $subject.  All the main players are here so it likely does
not matter, but..)
--
Michael


signature.asc
Description: PGP signature


Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

2020-01-10 Thread Michael Paquier
On Fri, Jan 10, 2020 at 11:56:37AM +0530, Mahendra Singh Thalor wrote:
> I reviewed and tested the patch. After applying patch, I am getting other
> assert failure.
>
> I think, before committing 1st patch, we should fix this crash also and
> then we should commit all the patches.

I have somewhat managed to break my environment for a couple of days
so as I got zero testing done with assertions, so I missed this one.
Thanks for the lookup!  The environment is fixed since.

This code path uses an assertion that would become incorrect once you
are able to create in pg_class temporary relations which rely on a
temporary schema that does not exist anymore, because its schema has
been dropped it, and that's what you are doing.  The assertion does
not concern only autovacuum originally, as it would fail each time a
session tries to build a relation descriptor for its cache with a
relation using a non-existing namespace.  I have not really dug if
that's actually possible to trigger..  Anyway.

So, on the one hand, saying that we allow orphaned temporary relations
to be dropped even if their schema does not exist is what autovacuum
does now more aggressively, so that can help to avoid having to clean
up yourself orphaned entries from catalogs, following up with their
toast entries, etc.  And this approach makes the assertion lose its
meaning for autovacuum.

On the other hand keeping this assertion makes sure that we never try
to load incorrect relcache entries, and just make autovacuum less
aggressive by ignoring orphaned entries with incorrect namespace
references, though the user experience in fixing the cluster means
manual manipulation of the catalogs.  This is something I understood
we'd like to avoid as much as possible, while keeping autovacuum
aggressive on the removal as that can ease the life of people fixing a
cluster.  So this would bring us back to a point intermediate of
246a6c8.

This makes me wonder how much we should try to outsmart somebody which
puts the catalogs in such a inconsistent state.  Hmm.  Perhaps at the
end autovacuum should just ignore such entries and just don't help the
user at all as this also comes with its own issues with the storage
level as well as smgr.c uses rd_backend.  And if the user plays with
temporary namespaces like that with superuser rights, he likely knows
what he is doing.  Perhaps not :D, in which case autovacuum may not be
the best thing to decide that.  I still think we should make the log
of autovacuum.c for orphaned relations more careful with its coding
though, and fix it with the previous patch.  The documentation of
isTempNamespaceInUse() could gain in clarity, just a nit from me while
looking at the surroundings.  And actually I found an issue with its
logic, as the routine would not consider a temp namespace in use for a
session's own MyBackendId.  As that's only used for autovacuum, this
has no consequence, but let's be correct in hte long run.

And this gives the attached after a closer lookup.  Thoughts?
--
Michael
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index c82f9fc4b5..0987986a8f 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -3235,10 +3235,14 @@ isTempNamespaceInUse(Oid namespaceId)
 
 	backendId = GetTempNamespaceBackendId(namespaceId);
 
-	if (backendId == InvalidBackendId ||
-		backendId == MyBackendId)
+	/* No such temporary namespace? */
+	if (backendId == InvalidBackendId)
 		return false;
 
+	/* Is the namespace used by this backend? */
+	if (backendId == MyBackendId)
+		return true;
+
 	/* Is the backend alive? */
 	proc = BackendIdGetProc(backendId);
 	if (proc == NULL)
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index f0e40e36af..d8702d650a 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2071,9 +2071,11 @@ do_autovacuum(void)
 		{
 			/*
 			 * We just ignore it if the owning backend is still active and
-			 * using the temporary schema.
+			 * using the temporary schema.  If the namespace does not exist
+			 * ignore the entry.
 			 */
-			if (!isTempNamespaceInUse(classForm->relnamespace))
+			if (!isTempNamespaceInUse(classForm->relnamespace) &&
+get_namespace_name(classForm->relnamespace) != NULL)
 			{
 /*
  * The table seems to be orphaned -- although it might be that
@@ -2202,6 +2204,7 @@ do_autovacuum(void)
 		Oid			relid = lfirst_oid(cell);
 		Form_pg_class classForm;
 		ObjectAddress object;
+		char	   *nspname;
 
 		/*
 		 * Check for user-requested abort.
@@ -2249,12 +2252,18 @@ do_autovacuum(void)
 			continue;
 		}
 
-		/* OK, let's delete it */
-		ereport(LOG,
-(errmsg("autovacuum: dropping orphan temp table \"%s.%s.%s\"",
-		get_database_name(MyDatabaseId),
-		get_namespace_name(classForm->relnamespace),
-		NameStr(classForm->relname;
+		nspname = get_namespace_name(classForm->relnamespace);
+
+		if (nspname != NULL)
+			ere