Re: allow granting CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX

2022-12-16 Thread Nathan Bossart
On Thu, Dec 15, 2022 at 10:10:43AM -0800, Jeff Davis wrote:
> The proposal to skip privilege checks for partitions would be
> consistent with INSERT, SELECT, REINDEX that flow through to the
> underlying partitions regardless of permissions/ownership (and even
> RLS). It would be very minor behavior change on 15 for this weird case
> of superuser-owned partitions, but I doubt anyone would be relying on
> that.

I've attached a work-in-progress patch that aims to accomplish this.
Instead of skipping the privilege checks, I added logic to trawl through
pg_inherits and pg_class to check whether the user has privileges for the
partitioned table or for the main relation of a TOAST table.  This means
that MAINTAIN on a partitioned table is enough to execute maintenance
commands on all the partitions, and MAINTAIN on a main relation is enough
to execute maintenance commands on its TOAST table.  Also, the maintenance
commands that flow through to the partitions or the TOAST table should no
longer error due to permissions when the user only has MAINTAIN on the
paritioned table or main relation.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 79ccddce55..022b5e9bd3 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -119,6 +119,20 @@ get_partition_parent_worker(Relation inhRel, Oid relid, bool *detach_pending)
 	return result;
 }
 
+Oid
+get_partition_root(Oid relid)
+{
+	List	   *ancestors = get_partition_ancestors(relid);
+	Oid			root = InvalidOid;
+
+	if (list_length(ancestors) > 0)
+		root = llast_oid(ancestors);
+
+	list_free(ancestors);
+
+	return root;
+}
+
 /*
  * get_partition_ancestors
  *		Obtain ancestors of given relation
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 9bc10729b0..1ca2a65405 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/heapam.h"
 #include "access/toast_compression.h"
 #include "access/xact.h"
@@ -32,6 +33,7 @@
 #include "nodes/makefuncs.h"
 #include "storage/lock.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
 
@@ -413,3 +415,30 @@ needs_toast_table(Relation rel)
 	/* Otherwise, let the AM decide. */
 	return table_relation_needs_toast_table(rel);
 }
+
+Oid
+get_toast_parent(Oid relid)
+{
+	Relation	rel;
+	SysScanDesc scan;
+	ScanKeyData key[1];
+	Oid			result = InvalidOid;
+	HeapTuple	tuple;
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+
+	ScanKeyInit([0],
+Anum_pg_class_reltoastrelid,
+BTEqualStrategyNumber, F_OIDEQ,
+ObjectIdGetDatum(relid));
+
+	scan = systable_beginscan(rel, InvalidOid, false, NULL, 1, key);
+	tuple = systable_getnext(scan);
+	if (HeapTupleIsValid(tuple))
+		result = ((Form_pg_class) GETSTRUCT(tuple))->oid;
+	systable_endscan(scan);
+
+	table_close(rel, AccessShareLock);
+
+	return result;
+}
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 8966b75bd1..04681ce494 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -80,6 +80,7 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 static List *get_tables_to_cluster(MemoryContext cluster_context);
 static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
 			   Oid indexOid);
+static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
 
 
 /*---
@@ -366,8 +367,7 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 	if (recheck)
 	{
 		/* Check that the user still has privileges for the relation */
-		if (!object_ownercheck(RelationRelationId, tableOid, save_userid) &&
-			pg_class_aclcheck(tableOid, save_userid, ACL_MAINTAIN) != ACLCHECK_OK)
+		if (!cluster_is_permitted_for_relation(tableOid, save_userid))
 		{
 			relation_close(OldHeap, AccessExclusiveLock);
 			goto out;
@@ -1646,8 +1646,7 @@ get_tables_to_cluster(MemoryContext cluster_context)
 
 		index = (Form_pg_index) GETSTRUCT(indexTuple);
 
-		if (!object_ownercheck(RelationRelationId, index->indrelid, GetUserId()) &&
-			pg_class_aclcheck(index->indrelid, GetUserId(), ACL_MAINTAIN) != ACLCHECK_OK)
+		if (!cluster_is_permitted_for_relation(index->indrelid, GetUserId()))
 			continue;
 
 		/* Use a permanent memory context for the result list */
@@ -1695,12 +1694,11 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 		if (get_rel_relkind(indexrelid) != RELKIND_INDEX)
 			continue;
 
-		/* Silently skip partitions which the user has no access to. */
-		if (!object_ownercheck(RelationRelationId, relid, GetUserId()) &&
-			pg_class_aclcheck(relid, GetUserId(), ACL_MAINTAIN) != ACLCHECK_OK &&
-			

Re: Raising the SCRAM iteration count

2022-12-16 Thread Michael Paquier
On Thu, Dec 15, 2022 at 12:09:15PM +0100, Daniel Gustafsson wrote:
>> On 15 Dec 2022, at 00:52, Michael Paquier  wrote:
>>conn->in_hot_standby = PG_BOOL_UNKNOWN;
>> +   conn->scram_iterations = SCRAM_DEFAULT_ITERATIONS;
>> 
>> s/SCRAM_DEFAULT_ITERATIONS/SCRAM_SHA_256_DEFAULT_ITERATIONS/ and
>> s/scram_iterations/scram_sha_256_interations/ perhaps?  
> 
> Distinct members in the conn object is only of interest if there is a way for
> the user to select a different password method in \password right?  I can
> rename it now but I think doing too much here is premature, awaiting work on
> \password (should that materialize) seems reasonable no?

You could do that already, somewhat indirectly, with
password_encryption, assuming that it supports more than one mode
whose password build is influenced by it.  If you wish to keep it
named this way, this is no big deal for me either way, so feel free to
use what you think is best based on the state of HEAD.  I think that
I'd value more the consistency with the backend in terms of naming,
though.

>> @@ -692,7 +697,7 @@ mock_scram_secret(const char *username, int *iterations, 
>> char **salt,
>>encoded_salt[encoded_len] = '\0';
>> 
>>*salt = encoded_salt;
>> -   *iterations = SCRAM_DEFAULT_ITERATIONS;
>> +   *iterations = scram_sha256_iterations;
>> 
>> This looks incorrect to me?  The mock authentication is here to
>> produce a realistic verifier, still it will fail.  It seems to me that
>> we'd better stick to the default in all the cases.
> 
> For avoiding revealing anything, I think a case can be argued for both.  I've
> reverted back to the default though.
> 
> I also renamed the GUC sha_256 to match terminology we use.

+   "SET password_encryption='scram-sha-256';
+SET scram_sha_256_iterations=10;
Maybe use a lower value to keep the test cheap?

+time of encryption. In order to make use of a changed value, new
+password must be set.
"A new password must be set".

Superuser-only GUCs should be documented as such, or do you intend to
make it user-settable like I suggested upthread :) ?
--
Michael


signature.asc
Description: PGP signature


Re: Refactor SCRAM code to dynamically handle hash type and key length

2022-12-16 Thread Michael Paquier
On Thu, Dec 15, 2022 at 04:59:52AM +0900, Michael Paquier wrote:
> However, that's only half of the picture.  The key length and the hash
> type (or just the hash type to know what's the digest/key length to
> use but that's more invasive) still need to be sent across the
> internal routines of SCRAM and attached to the state data of the
> frontend and the backend or we won't be able to do the hash and HMAC
> computations dependent on that.

Attached is a patch to do exactly that, and as a result v2 is half the
size of v1:
- SCRAM_KEY_LEN is now named SCRAM_MAX_KEY_LEN, adding a note that
this should be kept in sync as the maximum digest size of the
supported hash methods.  This is used as the method to size all the
internal buffers of the SCRAM routines.
- SCRAM_SHA_256_KEY_LEN is used to track the key length for
SCRAM-SHA-256, the one initialized with the state data.
- No changes in the internal, the buffers are just resized based on
the max defined.

I'd like to move on with that in the next couple of days (still need
to study more the other areas of the code to see what else could be
made more pluggable), so let me know if there are any objections..
--
Michael
From 0e01ec72ebfdf71bafd7434ea19c2dcb17164f1d Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sat, 17 Dec 2022 12:06:37 +0900
Subject: [PATCH v2] Remove dependency to hash type and key length in internal
 SCRAM code

SCRAM_KEY_LEN had a hard dependency on SHA-256, making difficult the
addition of more hash methods in SCRAM with many internal buffers sized
depending on that.  A second issue is that SHA-256 is assumed as the
computation method to use all the time.

This commit renames SCRAM_KEY_LEN to a more generic SCRAM_KEY_MAX_LEN,
which is used as the size of the buffers used by the internal routines
of SCRAM, which is aimed at tracking centrally the maximum size
necessary for all the hash methods supported.  A second change is that
the key length (SHA digest length) and hash types are now tracked by the
state data in the backend and the frontend, the common portions being
extended to handle these as arguments by the internal routines of
SCRAM.
---
 src/include/common/scram-common.h|  31 --
 src/include/libpq/scram.h|   6 +-
 src/backend/libpq/auth-scram.c   | 137 ---
 src/backend/libpq/crypt.c|  10 +-
 src/common/scram-common.c|  83 +---
 src/interfaces/libpq/fe-auth-scram.c |  65 -
 6 files changed, 201 insertions(+), 131 deletions(-)

diff --git a/src/include/common/scram-common.h b/src/include/common/scram-common.h
index 4acf2a78ad..953d30ac54 100644
--- a/src/include/common/scram-common.h
+++ b/src/include/common/scram-common.h
@@ -21,7 +21,13 @@
 #define SCRAM_SHA_256_PLUS_NAME "SCRAM-SHA-256-PLUS"	/* with channel binding */
 
 /* Length of SCRAM keys (client and server) */
-#define SCRAM_KEY_LENPG_SHA256_DIGEST_LENGTH
+#define SCRAM_SHA_256_KEY_LENPG_SHA256_DIGEST_LENGTH
+
+/*
+ * Size of buffers used internally by SCRAM routines, that should be the
+ * maximum of SCRAM_SHA_*_KEY_LEN among the hash methods supported.
+ */
+#define SCRAM_MAX_KEY_LEN	SCRAM_SHA_256_KEY_LEN
 
 /*
  * Size of random nonce generated in the authentication exchange.  This
@@ -43,17 +49,22 @@
  */
 #define SCRAM_DEFAULT_ITERATIONS	4096
 
-extern int	scram_SaltedPassword(const char *password, const char *salt,
- int saltlen, int iterations, uint8 *result,
- const char **errstr);
-extern int	scram_H(const uint8 *input, int len, uint8 *result,
+extern int	scram_SaltedPassword(const char *password,
+ pg_cryptohash_type hash_type, int key_length,
+ const char *salt, int saltlen, int iterations,
+ uint8 *result, const char **errstr);
+extern int	scram_H(const uint8 *input, pg_cryptohash_type hash_type,
+	int key_length, uint8 *result,
 	const char **errstr);
-extern int	scram_ClientKey(const uint8 *salted_password, uint8 *result,
-			const char **errstr);
-extern int	scram_ServerKey(const uint8 *salted_password, uint8 *result,
-			const char **errstr);
+extern int	scram_ClientKey(const uint8 *salted_password,
+			pg_cryptohash_type hash_type, int key_length,
+			uint8 *result, const char **errstr);
+extern int	scram_ServerKey(const uint8 *salted_password,
+			pg_cryptohash_type hash_type, int key_length,
+			uint8 *result, const char **errstr);
 
-extern char *scram_build_secret(const char *salt, int saltlen, int iterations,
+extern char *scram_build_secret(pg_cryptohash_type hash_type, int key_length,
+const char *salt, int saltlen, int iterations,
 const char *password, const char **errstr);
 
 #endif			/* SCRAM_COMMON_H */
diff --git a/src/include/libpq/scram.h b/src/include/libpq/scram.h
index c51e848c24..b29501ef96 100644
--- a/src/include/libpq/scram.h
+++ b/src/include/libpq/scram.h
@@ -13,6 +13,7 @@
 #ifndef PG_SCRAM_H
 #define PG_SCRAM_H
 
+#include 

Re: New strategies for freezing, advancing relfrozenxid early

2022-12-16 Thread Peter Geoghegan
On Thu, Dec 15, 2022 at 11:59 PM Nikita Malakhov  wrote:
> I've found this discussion very interesting, in view of vacuuming
> TOAST tables is always a problem because these tables tend to
> bloat very quickly with dead data - just to remind, all TOAST-able
> columns of the relation use the same TOAST table which is one
> for the relation, and TOASTed data are not updated - there are
> only insert and delete operations.

I don't think that it would be any different to any other table that
happened to have lots of inserts and deletes, such as the table
described here:

https://wiki.postgresql.org/wiki/Freezing/skipping_strategies_patch:_motivating_examples#Mixed_inserts_and_deletes

In the real world, a table like this would probably consist of some
completely static data, combined with other data that is constantly
deleted and re-inserted -- probably only a small fraction of the table
at any one time. I would expect such a table to work quite well,
because the static pages would all become frozen (at least after a
while), leaving behind only the tuples that are deleted quickly, most
of the time. VACUUM would have a decent chance of noticing that it
will be cheap to advance relfrozenxid in earlier VACUUM operations, as
bloat is cleaned up -- even a VACUUM that happens long before the
point that autovacuum.c will launch an antiwraparound autovacuum has a
decent chance of it. That's not a new idea, really; the
pgbench_branches example from the Wiki page looks like that already,
and even works on Postgres 15.

Here is the part that's new: the pressure to advance relfrozenxid
grows gradually, as table age grows. If table age is still very young,
then we'll only do it if the number of "extra" scanned pages is < 5%
of rel_pages -- only when the added cost is very low (again, like the
pgbench_branches example, mostly). Once table age gets about halfway
towards the point that antiwraparound autovacuuming is required,
VACUUM then starts caring less about costs. It gradually worries less
about the costs, and more about the need to advance it. Ideally it
will happen before antiwraparound autovacuum is actually required.

I'm not sure how much this would help with bloat. I suspect that it
could make a big difference with the right workload. If you always
need frequent autovacuums, just to deal with bloat, then there is
never a good time to run an aggressive antiwraparound autovacuum. An
aggressive AV will probably end up taking much longer than the typical
autovacuum that deals with bloat. While the aggressive AV will remove
as much bloat as any other AV, in theory, that might not help much. If
the aggressive AV takes as long as (say) 5 regular autovacuums would
have taken, and if you really needed those 5 separate autovacuums to
run, just to deal with the bloat, then that's a real problem.  The
aggressive AV effectively causes bloat with such a workload.

-- 
Peter Geoghegan




Re: [PoC] Federated Authn/z with OAUTHBEARER

2022-12-16 Thread Jacob Champion
On Mon, Dec 12, 2022 at 9:06 PM Andrey Chudnovsky
 wrote:
> If your concern is extension not honoring the DBA configured values:
> Would a server-side logic to prefer HBA value over extension-provided
> resolve this concern?

Yeah. It also seals the role of the extension here as "optional".

> We are definitely biased towards the cloud deployment scenarios, where
> direct access to .hba files is usually not offered at all.
> Let's find the middle ground here.

Sure. I don't want to make this difficult in cloud scenarios --
obviously I'd like for Timescale Cloud to be able to make use of this
too. But if we make this easy for a lone DBA (who doesn't have any
institutional power with the providers) to use correctly and securely,
then it should follow that the providers who _do_ have power and
resources will have an easy time of it as well. The reverse isn't
necessarily true. So I'm definitely planning to focus on the DBA case
first.

> A separate reason for creating this pre-authentication hook is further
> extensibility to support more metadata.
> Specifically when we add support for OAUTH flows to libpq, server-side
> extensions can help bridge the gap between the identity provider
> implementation and OAUTH/OIDC specs.
> For example, that could allow the Github extension to provide an OIDC
> discovery document.
>
> I definitely see identity providers as institutional actors here which
> can be given some power through the extension hooks to customize the
> behavior within the framework.

We'll probably have to make some compromises in this area, but I think
they should be carefully considered exceptions and not a core feature
of the mechanism. The gaps you point out are just fragmentation, and
adding custom extensions to deal with it leads to further
fragmentation instead of providing pressure on providers to just
implement the specs. Worst case, we open up new exciting security
flaws, and then no one can analyze them independently because no one
other than the provider knows how the two sides work together anymore.

Don't get me wrong; it would be naive to proceed as if the OAUTHBEARER
spec were perfect, because it's clearly not. But if we need to make
extensions to it, we can participate in IETF discussions and make our
case publicly for review, rather than enshrining MS/GitHub/Google/etc.
versions of the RFC and enabling that proliferation as a Postgres core
feature.

> Obtaining a token is an asynchronous process with a human in the loop.
> Not sure if expecting a hook function to return a token synchronously
> is the best option here.
> Can that be an optional return value of the hook in cases when a token
> can be obtained synchronously?

I don't think the hook is generally going to be able to return a token
synchronously, and I expect the final design to be async-first. As far
as I know, this will need to be solved for the builtin flows as well
(you don't want a synchronous HTTP call to block your PQconnectPoll
architecture), so the hook should be able to make use of whatever
solution we land on for that.

This is hand-wavy, and I don't expect it to be easy to solve. I just
don't think we have to solve it twice.

Have a good end to the year!
--Jacob




Re: New strategies for freezing, advancing relfrozenxid early

2022-12-16 Thread Peter Geoghegan
On Thu, Dec 15, 2022 at 11:48 PM John Naylor
 wrote:
> Thanks for this. This is the kind of concrete, data-based evidence that I 
> find much more convincing, or at least easy to reason about.

I'm glad to hear that it helped. It's always difficult to judge where
other people are coming from, especially when it's not clear how much
context is shared. Face time would have helped here, too.

> One motivating example mentioned is the append-only table. If we detected 
> that case, which I assume we can because autovacuum_vacuum_insert_* GUCs 
> exist, we could use that information as one way to drive eager freezing 
> independently of size. At least in theory -- it's very possible size will be 
> a necessary part of the decision, but it's less clear that it's as useful as 
> a user-tunable knob.

I am not strongly opposed to that idea, though I have my doubts about
it. I have thought about it already, and it wouldn't be hard to get
the information to vacuumlazy.c (I plan on doing it as part of related
work on antiwraparound autovacuum, in fact [1]). I'm skeptical of the
general idea that autovacuum.c has enough reliable information to give
detailed recommendations as to how vacuumlazy.c should process the
table.

I have pointed out several major flaws with the autovacuum.c dead
tuple accounting in the past [2][3], but I also think that there are
significant problems with the tuples inserted accounting. Basically, I
think that there are effects which are arguably an example of the
inspection paradox [4]. Insert-based autovacuums occur on a timeline
determined by the "inserted since last autovacuum" statistics. These
statistics are (in part) maintained by autovacuum/VACUUM itself. Which
has no specific understanding of how it might end up chasing its own
tail.

Let me be more concrete about what I mean about autovacuum chasing its
own tail. The autovacuum_vacuum_insert_threshold mechanism works by
triggering an autovacuum whenever the number of tuples inserted since
the last autovacuum/VACUUM reaches a certain threshold -- usually some
fixed proportion of pg_class.reltuples. But the
tuples-inserted-since-last-VACUUM counter gets reset at the end of
VACUUM, not at the start. Whereas VACUUM itself processes only the
subset of pages that needed to be vacuumed at the start of the VACUUM.
There is no attempt to compensate for that disparity. This *isn't*
really a measure of "unvacuumed tuples" (you'd need to compensate to
get that).

This "at the start vs at the end" difference won't matter at all with
smaller tables. And even in larger tables we might hope that the
effect would kind of average out. But what about cases where one
particular VACUUM operation takes an unusually long time, out of a
sequence of successive VACUUMs that run against the same table? For
example, the sequence that you see on the Wiki page, when Postgres
HEAD autovacuum does an aggressive VACUUM on one occasion, which takes
dramatically longer [5].

Notice that the sequence in [5] shows that the patch does one more
autovacuum operation in total, compared to HEAD/master. That's a lot
more -- we're talking about VACUUMs that each take 40+ minutes. That
can be explained by the fact that VACUUM (quite naturally) resets the
"tuples inserted since last VACUUM" at the end of that unusually long
running aggressive autovacuum -- just like any other VACUUM would.
That seems very weird to me. If (say) we happened to have a much
higher vacuum_freeze_table_age setting, then we wouldn't have had an
aggressive VACUUM until much later on (or never, because the benchmark
would just end). And the VACUUM that was aggressive would have been a
regular VACUUM instead, and would therefore have completed far sooner,
and would therefore have had a *totally* different cadence, compared
to what we actually saw -- it becomes distorted in a way that outlasts
the aggressive VACUUM.

With a far higher vacuum_freeze_table_age, we might have even managed
to do two regular autovacuums in the same period that it took a single
aggressive VACUUM to run in (that's not too far from what actually
happened with the patch). The *second* regular autovacuum would then
end up resetting the "inserted since last VACUUM" counter to 0 at the
same time as the long running aggressive VACUUM actually did so (same
wall clock time, same time since the start of the benchmark). Notice
that we'll have done much less useful work (on cleaning up bloat and
setting newer pages all-visible) with the "one long aggressive mode
VACUUM" setup/scenario -- we'll be way behind -- but the statistics
will nevertheless look about the same as they do in the "two fast
autovacuums instead of one slow autovacuum" counterfactual scenario.

In short, autovacuum.c fails to appreciate that a lot of stuff about
the table changes when VACUUM runs. Time hasn't stood still -- the
table was modified and extended throughout. So autovacuum.c hasn't
compensated for how VACUUM actually performed, and, in effect, forgets
how far 

Re: On login trigger: take three

2022-12-16 Thread Nikita Malakhov
Hi,

Mikhail, I've checked the patch and previous discussion,
the condition mentioned earlier is still actual:
>The example trigger from the documentation

+DECLARE

+  hour integer = EXTRACT('hour' FROM current_time);

+  rec boolean;

+BEGIN

+-- 1. Forbid logging in between 2AM and 4AM.

+IF hour BETWEEN 2 AND 4 THEN

+  RAISE EXCEPTION 'Login forbidden';

+END IF;


>can be bypassed with PGOPTIONS='-c timezone=...'. Probably this is
>nothing new and concerns any SECURITY DEFINER function, but still...

along with
+IF hour BETWEEN 8 AND 20 THEN
It seems to be a minor security issue, so just in case you haven't noticed
it.

On Fri, Dec 16, 2022 at 9:14 PM Mikhail Gribkov  wrote:

> Hi hackers,
> Attached v33 is a new rebase of the flagless version of the patch.  As
> there were no objections at first glance, I’ll try to post it to the
> upcoming commitfest, thus the brief recap of all the patch details is below.
>
> v33-On_client_login_event_trigger
> The patch introduces a trigger on login event, allowing to fire some
> actions right on the user connection. This can be useful for  logging or
> connection check purposes as well as for some personalization of the
> environment. Usage details are described in the documentation included, but
> shortly usage is the same as for other triggers: create function returning
> event_trigger and then create event trigger on login event.
>
> The patch is prepared to be applied to the master branch and tested when
> applied after e52f8b301ed54aac5162b185b43f5f1e44b6b17e commit by Thomas
> Munro (Date:   Fri Dec 16 17:36:22 2022 +1300).
> Regression tests and documentation included.
>
> A couple of words about what and why I changed compared to the previous
> author's version.
> First, the (en/dis)abling GUC was removed from the patch because
> ideologically it is a separate feature and nowadays it’s  discussed and
>  supported in a separate thread by Daniel Gustaffson:
>
> https://www.postgresql.org/message-id/flat/9140106E-F9BF-4D85-8FC8-F2D3C094A6D9%40yesql.se
> Second, I have removed the dathasloginevt flag. The flag was initially
> added to the patch for performance reasons and it did the job, although it
> was quite a clumsy construct causing tons of bugs and could potentially
> lead to tons more during further PostgreSQL evolution. I have removed the
> flag and found that the performance drop is not that significant.
>
> And this is an aspect I should describe in more detail.
> The possible performance drop is expected as an increased time used to
> login a user NOT using a login trigger.
> First of all, the method of performance check:
> echo 'select 1;' > ./tst.sql
> pgbench -n -C -T3 -f tst.sql -U postgres postgres
> The output value "average connection time" is the one I use to compare
> performance.
> Now, what are the results.
> * master branch: 0.641 ms
> * patched version: 0.644 ms
> No significant difference here and it is just what was expected. Based on
> the patch design the performance drop can be expected when there are lots
> of event triggers created, but not the login one. Thus I have created 1000
> drop triggers (the script for creating triggers is attached too) in the
> database and repeated the test:
> * master branch: 0.646 ms
> * patched version: 0.754 ms
> For 2000 triggers the patched version connection time is further increased
> to 0.862. Thus we have a login time rise in about 16.5% per 1000 event
> triggers in the database. It is a statistically noticeable value, still I
> don’t think it’s a critical one we should be afraid of.
> N.B. The exact values of the login times  slightly differ from the ones I
> posted in the previous email. Well, that’s the repeatability level we have.
> This convinces me even more that the observed level of performance drop is
> acceptable.
> --
>  best regards,
> Mikhail A. Gribkov
>
> e-mail: youzh...@gmail.com
> *http://www.flickr.com/photos/youzhick/albums
> *
> http://www.strava.com/athletes/5085772
> phone: +7(916)604-71-12
> Telegram: @youzhick
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


Re: Error-safe user functions

2022-12-16 Thread Tom Lane
I wrote:
> I'm going to step back from this for now and get on with other work,
> but before that I thought there was one more input function I should
> look at: xml_in, because xml.c is such a hairy can of worms.

Pushed that.  For the record, my list of input functions still needing
attention stands at

Core:

jsonpath_in
regclassin
regcollationin
regconfigin
regdictionaryin
regnamespacein
regoperatorin
regoperin
regprocedurein
regprocin
regrolein
regtypein
tsqueryin
tsvectorin

Contrib:

hstore:
hstore_in
intarray:
bqarr_in
isn:
ean13_in
isbn_in
ismn_in
issn_in
upc_in
ltree:
ltree_in
lquery_in
ltxtq_in
seg:
seg_in

The reg* functions probably need a unified plan as to how far
down we want to push non-error behavior.  The rest of these
I think just require turning the crank along the same lines
as in functions already dealt with.

While it'd be good to get all of these done before v16 feature
freeze, I can't see that any of them represent blockers for
building features based on soft input error handling.

regards, tom lane




Re: On login trigger: take three

2022-12-16 Thread Mikhail Gribkov
Hi hackers,
Attached v33 is a new rebase of the flagless version of the patch.  As
there were no objections at first glance, I’ll try to post it to the
upcoming commitfest, thus the brief recap of all the patch details is below.

v33-On_client_login_event_trigger
The patch introduces a trigger on login event, allowing to fire some
actions right on the user connection. This can be useful for  logging or
connection check purposes as well as for some personalization of the
environment. Usage details are described in the documentation included, but
shortly usage is the same as for other triggers: create function returning
event_trigger and then create event trigger on login event.

The patch is prepared to be applied to the master branch and tested when
applied after e52f8b301ed54aac5162b185b43f5f1e44b6b17e commit by Thomas
Munro (Date:   Fri Dec 16 17:36:22 2022 +1300).
Regression tests and documentation included.

A couple of words about what and why I changed compared to the previous
author's version.
First, the (en/dis)abling GUC was removed from the patch because
ideologically it is a separate feature and nowadays it’s  discussed and
 supported in a separate thread by Daniel Gustaffson:
https://www.postgresql.org/message-id/flat/9140106E-F9BF-4D85-8FC8-F2D3C094A6D9%40yesql.se
Second, I have removed the dathasloginevt flag. The flag was initially
added to the patch for performance reasons and it did the job, although it
was quite a clumsy construct causing tons of bugs and could potentially
lead to tons more during further PostgreSQL evolution. I have removed the
flag and found that the performance drop is not that significant.

And this is an aspect I should describe in more detail.
The possible performance drop is expected as an increased time used to
login a user NOT using a login trigger.
First of all, the method of performance check:
echo 'select 1;' > ./tst.sql
pgbench -n -C -T3 -f tst.sql -U postgres postgres
The output value "average connection time" is the one I use to compare
performance.
Now, what are the results.
* master branch: 0.641 ms
* patched version: 0.644 ms
No significant difference here and it is just what was expected. Based on
the patch design the performance drop can be expected when there are lots
of event triggers created, but not the login one. Thus I have created 1000
drop triggers (the script for creating triggers is attached too) in the
database and repeated the test:
* master branch: 0.646 ms
* patched version: 0.754 ms
For 2000 triggers the patched version connection time is further increased
to 0.862. Thus we have a login time rise in about 16.5% per 1000 event
triggers in the database. It is a statistically noticeable value, still I
don’t think it’s a critical one we should be afraid of.
N.B. The exact values of the login times  slightly differ from the ones I
posted in the previous email. Well, that’s the repeatability level we have.
This convinces me even more that the observed level of performance drop is
acceptable.
--
 best regards,
Mikhail A. Gribkov

e-mail: youzh...@gmail.com
*http://www.flickr.com/photos/youzhick/albums
*
http://www.strava.com/athletes/5085772
phone: +7(916)604-71-12
Telegram: @youzhick


v33-On_client_login_event_trigger.patch
Description: Binary data


create_1000_triggers.sql
Description: Binary data


Re: pg_upgrade: Make testing different transfer modes easier

2022-12-16 Thread Peter Eisentraut

On 14.12.22 10:40, Daniel Gustafsson wrote:

On 14 Dec 2022, at 08:04, Peter Eisentraut  
wrote:

On 07.12.22 17:33, Peter Eisentraut wrote:

I think if we want to make this configurable on the fly, and environment 
variable would be much easier, like
 my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';


Here is an updated patch set that incorporates this idea.


I would prefer a small note about it in src/bin/pg_upgrade/TESTING to document
it outside of the code, but otherwise LGTM.

+   $mode,
'--check'
],

...

-   '-p', $oldnode->port, '-P', $newnode->port
+   '-p', $oldnode->port, '-P', $newnode->port,
+   $mode,
],

Minor nitpick, but while in there should we take the opportunity to add a
trailing comma on the other two array declarations which now ends with --check?
It's good Perl practice and will make the code consistent.


committed with these changes





Re: Minimal logical decoding on standbys

2022-12-16 Thread Drouvot, Bertrand

Hi,

On 12/16/22 5:38 PM, Robert Haas wrote:

On Fri, Dec 16, 2022 at 10:08 AM Drouvot, Bertrand
 wrote:

After 1489b1ce728 the name mayConflictInLogicalDecoding seems odd. Seems
it should be a riff on snapshotConflictHorizon?


Gotcha, what about logicalSnapshotConflictThreat?


logicalConflictPossible? checkDecodingConflict?

I think we should try to keep this to three words if we can. There's
not likely to be enough value in a fourth word to make up for the
downside of being more verbose.




Yeah agree, I'd vote for logicalConflictPossible then.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2022-12-16 Thread Robert Haas
On Fri, Dec 16, 2022 at 10:08 AM Drouvot, Bertrand
 wrote:
> > After 1489b1ce728 the name mayConflictInLogicalDecoding seems odd. Seems
> > it should be a riff on snapshotConflictHorizon?
>
> Gotcha, what about logicalSnapshotConflictThreat?

logicalConflictPossible? checkDecodingConflict?

I think we should try to keep this to three words if we can. There's
not likely to be enough value in a fourth word to make up for the
downside of being more verbose.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Privileges on PUBLICATION

2022-12-16 Thread Antonin Houska
Antonin Houska  wrote:

> Antonin Houska  wrote:
> 
> > Peter Eisentraut  wrote:
> > 
> > > On 04.11.22 08:28, Antonin Houska wrote:
> > > > I thought about the whole concept a bit more and I doubt if the 
> > > > PUBLICATION
> > > > privilege is the best approach. In particular, the user specified in 
> > > > CREATE
> > > > SUBSCRIPTION ... CONNECTION ... (say "subscription user") needs to have 
> > > > SELECT
> > > > privilege on the tables replicated. So if the DBA excludes some columns 
> > > > from
> > > > the publication's column list and sets the (publication) privileges in 
> > > > such a
> > > > way that the user cannot get the column values via other publications, 
> > > > the
> > > > user still can connect to the database directly and get values of the 
> > > > excluded
> > > > columns.
> > > 
> > > Why are the SELECT privileges needed?  Maybe that's something to think 
> > > about
> > > and maybe change.
> > 
> > I haven't noticed an explanation in comments nor did I search in the mailing
> > list archives, but the question makes sense: the REPLICATION attribute of a
> > role is sufficient for streaming replication, so why should the logical
> > replication require additional privileges?
> > 
> > Technically the SELECT privilege is needed because the sync worker does
> > actually execute SELECT query on the published tables. However, I realize 
> > now
> > that it's not checked by the output plugin. Thus if SELECT is revoked from 
> > the
> > "subscription user" after the table has been synchronized, the replication
> > continues to work. So the necessity for the SELECT privilege might be an
> > omission rather than a design choice. (Even the documentation says that the
> > SELECT privilege is needed only for the initial synchronization [1], however
> > it does not tell why.)
> > 
> > > > As an alternative to the publication privileges, I think that the CREATE
> > > > SUBSCRIPTION command could grant ACL_SELECT automatically to the 
> > > > subscription
> > > > user on the individual columns contained in the publication column 
> > > > list, and
> > > > DROP SUBSCRIPTION would revoke that privilege.
> > > 
> > > I think that approach is weird and unusual.  Privileges and object 
> > > creation
> > > should be separate operations.
> > 
> > ok. Another approach would be to skip the check for the SELECT privilege (as
> > well as the check for the USAGE privilege on the corresponding schema) if
> > given column is being accessed via a publication which has it on its column
> > list and if the subscription user has the USAGE privilege on that 
> > publication.
> > 
> > So far I wasn't sure if we can do that because, if pg_upgrade grants the 
> > USAGE
> > privilege on all publications to the "public" role, the DBAs who relied on 
> > the
> > SELECT privileges might not notice that any role having the REPLICATION
> > attribute can access all the published tables after the upgrade. (pg_upgrade
> > can hardly do anything else because it has no information on the 
> > "subscription
> > users", so it cannot convert the SELECT privilege on tables to the USAGE
> > privileges on publications.)
> > 
> > But now that I see that the logical replication doesn't check the SELECT
> > privilege properly anyway, I think we can get rid of it.
> 
> The attached version tries to do that - as you can see in 0001, the SELECT
> privilege is not required for the walsender process.
> 
> I also added PUBLICATION_NAMES option to the COPY TO command so that the
> publisher knows which publications are subject to the ACL check. Only data of
> those publications are returned to the subscriber. (In the previous patch
> version the ACL checks were performed on the subscriber side, but I that's not
> ideal in terms of security.)
> 
> I also added the regression tests for publications, enhanced psql (the \dRp+
> command) so that it displays the publication ACL and added a few missing
> pieces of documentation.
> 

This is v4. The patch had to be rebased due to the commit 369f09e420.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



publication_privileges_v04.tgz
Description: application/gzip


Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32

2022-12-16 Thread Daniel Gustafsson
> On 16 Dec 2022, at 16:09, Daniel Watzinger  wrote:

> first-time contributor here. I certainly hope I got the patch
> creation and email workflow right. Let me know if anything can be
> improved as I`m eager to learn.

Welcome!  The patch seems to be in binary format or using some form of
non-standard encoding? Can you re-send in plain text format?

--
Daniel Gustafsson   https://vmware.com/





pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32

2022-12-16 Thread Daniel Watzinger
Hi there,

first-time contributor here. I certainly hope I got the patch
creation and email workflow right. Let me know if anything can be
improved as I`m eager to learn. Regression tests (check) were 
successful on native Win32 MSVC as well as Debian. Here comes the 
patch and corresponding commit text.

During archive initialization pg_backup_custom.c determines if the file
pointer it should read from or write to is seekable. pg_dump
uses this information to rewrite the custom output format's TOC
enriched with known offsets into the archive on close. pg_restore uses
seeking to speed up file operations when searching for specific
blocks within the archive.

The seekable property of a file pointer is currently checked by
invoking ftello and subsequently fseeko. Both calls succeed
on Windows platforms if the underlying file descriptor represents a
terminal handle or an anonymous or named pipe. Obviously, these type
of devices do not support seeking. In the case of pg_dump, this
leads to the TOC being appended to the end of the output when attempting
to rewrite known offsets. Furthermore, pg_restore may try to seek to
known file offsets if the custom format archive's TOC supports it
and subsequently fails to locate blocks.

This commit improves the detection of the seekable property by checking
a descriptor's file type (st_mode) and filtering character special
devices and pipes. The current customized implementation of fstat on
Windows platforms (_pgfstat64) erroneously marks terminal and pipe
handles as regular files (_S_IFREG). This was improved on by
utilizing WinAPI functionality (GetFileType) to correctly distinguish
and flag descriptors based on their native OS handle's file type.

Daniel

---
 src/bin/pg_dump/pg_backup_archiver.c | 12 +
 src/include/port/win32_port.h|  6 +++
 src/port/win32stat.c | 68 
 3 files changed, 67 insertions(+), 19 deletions(-)
v'߂+ZþÊÜý¸§þ˜ºj¦ÚrK©j·!Š÷«q¿ì­ÏۊéÛ¦§ú`m§$º–«r¯z·"×±íþÚÑý\{¾¼áÖøk^Ú×M:ㆿ²·?n)ÿ¦nšŸé¶œ’êZ­Èb½êÜûï›þÊÜý¸§þ˜ºj¦ÚrK©j·!Š÷«s7ó¯·ó5ñº(•È^rDžzAH,Gé¦
~Ûi¢Ï¬¶»œ¶ËZ¶Ë~ûð¡yÉ"~Øb²+µêæŠv¥uë®*m¢¿þ‰ø¬jÛr~)^ž‡éú·­º¹ßj[ûï¿
œ’'í†+"±©îžÇž‘¦åyÈZ­§-z»)yȚ•×¯‰Ç¨®˜©{ÿ¢}û-j×â•éè~›-! ‡FËl¶j{äˆHR:Ël¶j{êÞ¶êç}©l{ï…ç$¶­~×¥–Œ(®K(žØb±ø¥{ûi¢Àµée¡ú]‰÷àŠÖ¿²·?Šw%¹×¿¦Šíÿ§ßjh®Ø[þÊÜþ)ܖç^þš+·ü"Ÿ}©¢»aŠw^Çoz÷N[s§<÷gtm¾4×]4ëŽþÊÜþ)ܖç^þš+·ü"Ÿ}©¢»aûï›þÊÜþ)ܖç^þš+·ü"Ÿ}©¢»aßozû}½×g±µêçŠ{i‚[-jÞ¸r‰ìµÈZ®v¦zËk¹Ëlµ«[¹øŸ×ŸH„‘g^~)ÞH„‘i¦HLL TDé݉ÿ¢~w^}"NùןŠw’!!Hé¦HLL R;ç§v'þ‰ùÝyôˆH!ÑùןŠw’! ‡Fi’ S!Ñùé݉ÿȞŠÝz·è®[-jÛhm曕ëh­êh®ØîËb¢zhŠ{ljË2šX§’Ìv'߂+ZþÊÜþš+·ü"Ÿ}¬µ«\oû+súh®ßðŠ}ö²Ö­r)Ý{ºçÞãMôí¦ô÷ÍÚïtÓ®8kû+súh®ßðŠ}ö²Ö­sï¾oû+súh®ßðŠ}ö²Ö­snzߟ¶ç®¹¦-jÞ¸r‰ìµÈZ®v¦zËk¹Ëlµ«[¹ø§¶˜²Ö­ëˆ§µø¥zz,¶»œ¶ËZµ»ŸC,HEŠWC,H¶‹…©Ý•çâ•éèÀ42Ä‚Ä ÑNDÀ ã_ˆ6­káÀ42ĄX¥x42āëh±øZÙ^~)^žƒXäC~)^O*^R6CJ4åø5ŽD9Z²Ñ+®Šþº{"‚w²+¶Ëf¡×¢~b• Õ@,€ÇÐËPPFî|P²þ‰øEŠWƒ²C,E@-A!)^42ÄÙ»Ÿ,·«®zƒUºÞ¶êç×ð¡yÉ"~Ø^~)^žˆ¬iÖ­jËky©ˆ~Ê.žW¬²+ajÆÞzzÞv*Þrם¶†ŸŠW zÛbž§~ŠæjبžØk¢è!ŠÛÐË¥•ö¢–ÊšéZµè­²ËZ¶)ߢ¹š¶*'Š{azj,µªi®Šk‰«^Á¬šÚÞ¶êçŠØ§²×šwçâ•äò¥àzÑb•äò¥èEŠW¾•«-ºè¬´¶¬´J뢿¾ÿâ'¾‰ëKjËD®º+Šz+uêí¡Ø¬¶)àº+!mëpyéÚ½©bwêÞ¶êç¡ñH,DØ

Re: Minimal logical decoding on standbys

2022-12-16 Thread Drouvot, Bertrand

Hi,

On 12/16/22 2:51 PM, Andres Freund wrote:

Hi,

On 2022-12-16 11:33:50 +0100, Drouvot, Bertrand wrote:

@@ -235,13 +236,14 @@ typedef struct xl_btree_delete
TransactionId snapshotConflictHorizon;
uint16  ndeleted;
uint16  nupdated;
+   boolmayConflictInLogicalDecoding;


After 1489b1ce728 the name mayConflictInLogicalDecoding seems odd. Seems
it should be a riff on snapshotConflictHorizon?



Gotcha, what about logicalSnapshotConflictThreat?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2022-12-16 Thread Andres Freund
Hi,

On 2022-12-16 11:33:50 +0100, Drouvot, Bertrand wrote:
> @@ -235,13 +236,14 @@ typedef struct xl_btree_delete
>   TransactionId snapshotConflictHorizon;
>   uint16  ndeleted;
>   uint16  nupdated;
> + boolmayConflictInLogicalDecoding;

After 1489b1ce728 the name mayConflictInLogicalDecoding seems odd. Seems
it should be a riff on snapshotConflictHorizon?

Greetings,

Andres Freund




Re: pg_upgrade allows itself to be run twice

2022-12-16 Thread Justin Pryzby
On Thu, Dec 01, 2022 at 10:30:16AM +0100, Peter Eisentraut wrote:
> On 01.11.22 14:07, Justin Pryzby wrote:
> > On Tue, Nov 01, 2022 at 01:54:35PM +0100, Peter Eisentraut wrote:
> > > On 07.07.22 08:22, Justin Pryzby wrote:
> > > > > This one comes from NextOID in the control data file after a fresh
> > > > > initdb, and GetNewObjectId() would enforce that in a postmaster
> > > > > environment to be FirstNormalObjectId when assigning the first user
> > > > > OID.  Would you imply an extra step at the end of initdb to update the
> > > > > control data file of the new cluster to reflect FirstNormalObjectId?
> > > > I added a call to reset xlog, similar to what's in pg_upgrade.
> > > > Unfortunately, I don't see an easy way to silence it.
> > > 
> > > I think it would be better to update the control file directly instead of
> > > going through pg_resetwal.  (See src/include/common/controldata_utils.h 
> > > for
> > > the required functions.)
> > > 
> > > However, I don't know whether we need to add special provisions that guard
> > > against people using postgres --single in complicated ways.  Many consider
> > > the single-user mode deprecated outside of initdb use.
> > 
> > Thanks for looking.

To be clear, I don't think it's worth doing anything special just to
avoid creating special OIDs if someone runs postgres --single
immediately after initdb.

However, setting FirstNormalOid in initdb itself (rather than in the
next invocation of postgres, if it isn't in single-user-mode) was the
mechanism by which to avoid the original problem that pg_upgrade can be
run twice, if there are no non-system relations.  That still seems
desirable to fix somehow.

> I think the above is a "returned with feedback" at this point.
> 
> > One other thing I noticed (by accident!) is that pg_upgrade doesn't
> > prevent itself from trying to upgrade a cluster on top of itself:

> > | command: cp -Rf "pg15.dat/pg_xact" "pg15.dat/pg_xact" >> 
> > "pg15.dat/pg_upgrade_output.d/20221101T055916.486/log/pg_upgrade_utility.log"
> >  2>&1
> > | cp: cannot stat 'pg15.dat/pg_xact': No such file or directory
> > 
> > This may be of little concern since it's upgrading a version to itself, 
> > which
> > only applies to developers.
> 
> I think this would be worth addressing nonetheless, for robustness.  For
> comparison, "cp" and "mv" will error if you give source and destination that
> are the same file.

I addressed this in 002.

-- 
Justin
>From ecc7a9a4698eb138e63adcf23302c8e7d43ab96e Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 6 Jul 2022 08:55:11 -0500
Subject: [PATCH 1/2] wip: initdb should advance the OID counter to FirstNormal

Otherwise, a cluster started in single-user mode immediately after
initdb can create objects in the range of system OIDs, and pg_upgrade
might be able to be run twice (if the cluster has no relations/objects).
---
 src/backend/access/transam/varsup.c |  5 ++---
 src/bin/initdb/initdb.c | 10 ++
 src/bin/pg_upgrade/check.c  |  5 +
 src/bin/pg_upgrade/pg_upgrade.c | 13 -
 4 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 849a7ce9d6d..d93974fcaa5 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -543,8 +543,7 @@ GetNewObjectId(void)
 	 *
 	 * During initdb, we start the OID generator at FirstGenbkiObjectId, so we
 	 * only wrap if before that point when in bootstrap or standalone mode.
-	 * The first time through this routine after normal postmaster start, the
-	 * counter will be forced up to FirstNormalObjectId.  This mechanism
+	 * This mechanism
 	 * leaves the OIDs between FirstGenbkiObjectId and FirstNormalObjectId
 	 * available for automatic assignment during initdb, while ensuring they
 	 * will never conflict with user-assigned OIDs.
@@ -553,7 +552,7 @@ GetNewObjectId(void)
 	{
 		if (IsPostmasterEnvironment)
 		{
-			/* wraparound, or first post-initdb assignment, in normal mode */
+			/* wraparound */
 			ShmemVariableCache->nextOid = FirstNormalObjectId;
 			ShmemVariableCache->oidCount = 0;
 		}
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 7c391aaf0b1..1ac602da7f3 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -61,11 +61,13 @@
 #include "sys/mman.h"
 #endif
 
+#include "access/transam.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_class_d.h" /* pgrminclude ignore */
 #include "catalog/pg_collation_d.h"
 #include "catalog/pg_database_d.h"	/* pgrminclude ignore */
+#include "common/controldata_utils.h"
 #include "common/file_perm.h"
 #include "common/file_utils.h"
 #include "common/logging.h"
@@ -2725,6 +2727,14 @@ initialize_data_directory(void)
 
 	PG_CMD_CLOSE;
 
+	/* Set FirstNormal OID */
+	{
+		bool	crc_ok;
+		ControlFileData *cfd = get_controlfile(pg_data, _ok);
+		cfd->checkPointCopy.nextOid = 

(non) translatable string splicing

2022-12-16 Thread Justin Pryzby
I was surprised to see that this has been here for a few years (since
77517ba59) without complaints or inquiries from translators.

src/bin/pg_upgrade/option.c:check_required_directory(_cluster.bindir, 
"PGBINOLD", false,
src/bin/pg_upgrade/option.c-
 "-b", _("old cluster binaries reside"), false);
src/bin/pg_upgrade/option.c:check_required_directory(_cluster.bindir, 
"PGBINNEW", false,
src/bin/pg_upgrade/option.c-
 "-B", _("new cluster binaries reside"), true);
src/bin/pg_upgrade/option.c:check_required_directory(_cluster.pgdata, 
"PGDATAOLD", false,
src/bin/pg_upgrade/option.c-
 "-d", _("old cluster data resides"), false);
src/bin/pg_upgrade/option.c:check_required_directory(_cluster.pgdata, 
"PGDATANEW", false,
src/bin/pg_upgrade/option.c-
 "-D", _("new cluster data resides"), false);
src/bin/pg_upgrade/option.c:check_required_directory(_opts.socketdir, 
"PGSOCKETDIR", true,
src/bin/pg_upgrade/option.c-
 "-s", _("sockets will be created"), false);

src/bin/pg_upgrade/option.c:pg_fatal("You must identify the 
directory where the %s.\n"
src/bin/pg_upgrade/option.c- "Please use 
the %s command-line option or the %s environment variable.",
src/bin/pg_upgrade/option.c- description, 
cmdLineOption, envVarName);

Due to incomplete translation, that allows some pretty fancy output,
like:
| You must identify the directory where the residen los binarios del clúster 
antiguo.

That commit also does this a couple times:

+_(" which is an index on \"%s.%s\""),

-- 
Justin




Feature request to add rows extracted using pg_dump utility

2022-12-16 Thread vinay kumar
Hi Team,

Good Day!

My name is Vinay Kumar. I am checking with you to see if you can add the
number of rows extracted by the pg_dump utility for tables when using
verbose mode.

This will help end users to validate the amount of data extracted.

I understand that pg_class.reltuples can help identify the number of rows
but it's only an estimated value.

Thanks & Regards,
Vinay Kumar Dumpa


Re: Perform streaming logical transactions by background workers and parallel apply

2022-12-16 Thread Amit Kapila
On Fri, Dec 16, 2022 at 2:47 PM houzj.f...@fujitsu.com
 wrote:
>
> > ---
> > +   active_workers = list_copy(ParallelApplyWorkerPool);
> > +
> > +   foreach(lc, active_workers)
> > +   {
> > +   int slot_no;
> > +   uint16  generation;
> > +   ParallelApplyWorkerInfo *winfo =
> > (ParallelApplyWorkerInfo *) lfirst(lc);
> > +
> > +   LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
> > +   napplyworkers =
> > logicalrep_pa_worker_count(MyLogicalRepWorker->subid);
> > +   LWLockRelease(LogicalRepWorkerLock);
> > +
> > +   if (napplyworkers <=
> > max_parallel_apply_workers_per_subscription / 2)
> > +   return;
> > +
> >
> > Calling logicalrep_pa_worker_count() with lwlock for each worker seems
> > not efficient to me. I think we can get the number of workers once at
> > the top of this function and return if it's already lower than the
> > maximum pool size. Otherwise, we attempt to stop extra workers.
>
> How about we directly check the length of worker pool list here which
> seems simpler and don't need to lock ?
>

I don't see any problem with that. Also, if such a check is safe then
can't we use the same in pa_free_worker() as well? BTW, shouldn't
pa_stop_idle_workers() try to free/stop workers unless the active
number reaches below max_parallel_apply_workers_per_subscription?

-- 
With Regards,
Amit Kapila.




Re: [PATCH] Add native windows on arm64 support

2022-12-16 Thread Niyas Sait

On 12/12/2022 23:56, Michael Paquier wrote:

On Mon, Dec 12, 2022 at 01:38:37PM +, Niyas Sait wrote:

On 05/12/2022 18:14, Andres Freund wrote:
I think the old build system specific part is really minimal in the patch. I
can strip out those if that's preferred.

Removing all the changes from src/tools/msvc/ is an approach that
works for me.



I've attached a new version (v7) that removes changes to support old 
MSVC build system.



--
NiyasFrom 50e8affb5bcb3f6a50a223053fc92145a5709e82 Mon Sep 17 00:00:00 2001
From: Niyas Sait 
Date: Fri, 16 Dec 2022 10:45:56 +
Subject: [PATCH v7] Enable postgres native build for windows-arm64 platform

- Add support for meson build
- Add arm64 definition of spin_delay function
- Exclude arm_acle.h import for MSVC
---
 doc/src/sgml/install-windows.sgml |  3 ++-
 meson.build   | 33 +++
 src/include/storage/s_lock.h  | 20 +--
 src/port/pg_crc32c_armv8.c|  2 ++
 src/tools/msvc/gendef.pl  |  8 
 5 files changed, 46 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/install-windows.sgml 
b/doc/src/sgml/install-windows.sgml
index bbd4960e7b..3f865d7d3b 100644
--- a/doc/src/sgml/install-windows.sgml
+++ b/doc/src/sgml/install-windows.sgml
@@ -352,7 +352,8 @@ $ENV{MSBFLAGS}="/m";
   Special Considerations for 64-Bit Windows
 
   
-   PostgreSQL will only build for the x64 architecture on 64-bit Windows.
+   PostgreSQL will only build for the x64 and ARM64 architectures on 64-bit
+   Windows.
   
 
   
diff --git a/meson.build b/meson.build
index 725e10d815..e354ad7650 100644
--- a/meson.build
+++ b/meson.build
@@ -1944,7 +1944,13 @@ int main(void)
 
 elif host_cpu == 'arm' or host_cpu == 'aarch64'
 
-  prog = '''
+  if cc.get_id() == 'msvc'
+cdata.set('USE_ARMV8_CRC32C', false)
+cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1)
+have_optimized_crc = true
+  else
+
+prog = '''
 #include 
 
 int main(void)
@@ -1960,18 +1966,19 @@ int main(void)
 }
 '''
 
-  if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd 
without -march=armv8-a+crc',
-  args: test_c_args)
-# Use ARM CRC Extension unconditionally
-cdata.set('USE_ARMV8_CRC32C', 1)
-have_optimized_crc = true
-  elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd 
with -march=armv8-a+crc',
-  args: test_c_args + ['-march=armv8-a+crc'])
-# Use ARM CRC Extension, with runtime check
-cflags_crc += '-march=armv8-a+crc'
-cdata.set('USE_ARMV8_CRC32C', false)
-cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1)
-have_optimized_crc = true
+if cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd 
without -march=armv8-a+crc',
+args: test_c_args)
+  # Use ARM CRC Extension unconditionally
+  cdata.set('USE_ARMV8_CRC32C', 1)
+  have_optimized_crc = true
+elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd 
with -march=armv8-a+crc',
+args: test_c_args + ['-march=armv8-a+crc'])
+  # Use ARM CRC Extension, with runtime check
+  cflags_crc += '-march=armv8-a+crc'
+  cdata.set('USE_ARMV8_CRC32C', false)
+  cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1)
+  have_optimized_crc = true
+endif
   endif
 endif
 
diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
index 8b19ab160f..94d2380393 100644
--- a/src/include/storage/s_lock.h
+++ b/src/include/storage/s_lock.h
@@ -707,15 +707,31 @@ typedef LONG slock_t;
 
 #define SPIN_DELAY() spin_delay()
 
-/* If using Visual C++ on Win64, inline assembly is unavailable.
- * Use a _mm_pause intrinsic instead of rep nop.
+/*
+ * If using Visual C++ on Win64, inline assembly is unavailable.
+ * Use architecture specific intrinsics.
  */
 #if defined(_WIN64)
+/*
+ * For Arm64, use __isb intrinsic. See aarch64 inline assembly definition for 
details.
+ */
+#ifdef _M_ARM64
+static __forceinline void
+spin_delay(void)
+{
+/* Reference: 
https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics#BarrierRestrictions
 */
+   __isb(_ARM64_BARRIER_SY);
+}
+#else
+/*
+ * For x64, use _mm_pause intrinsic instead of rep nop.
+ */
 static __forceinline void
 spin_delay(void)
 {
_mm_pause();
 }
+#endif
 #else
 static __forceinline void
 spin_delay(void)
diff --git a/src/port/pg_crc32c_armv8.c b/src/port/pg_crc32c_armv8.c
index 9e301f96f6..981718752f 100644
--- a/src/port/pg_crc32c_armv8.c
+++ b/src/port/pg_crc32c_armv8.c
@@ -14,7 +14,9 @@
  */
 #include "c.h"
 
+#ifndef _MSC_VER
 #include 
+#endif
 
 #include "port/pg_crc32c.h"
 
diff --git a/src/tools/msvc/gendef.pl b/src/tools/msvc/gendef.pl
index d6bed1ce15..4882d37faf 100644
--- a/src/tools/msvc/gendef.pl
+++ b/src/tools/msvc/gendef.pl
@@ -120,9 +120,9 @@ sub writedef
{
my $isdata = $def->{$f} eq 'data';
 
-   # Strip the leading underscore for win32, but not x64
+   # Strip 

Re: Perform streaming logical transactions by background workers and parallel apply

2022-12-16 Thread Amit Kapila
On Thu, Dec 15, 2022 at 6:29 PM Amit Kapila  wrote:
>

I have noticed that the origin information of the rollback is not
restored after restart of the server. So, the apply worker will send
the old origin information in that case. It seems we need the below
change in XactLogAbortRecord(). What do you think?

diff --git a/src/backend/access/transam/xact.c
b/src/backend/access/transam/xact.c
index 419fac5d6f..1b047133db 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -5880,11 +5880,10 @@ XactLogAbortRecord(TimestampTz abort_time,
}

/*
-* Dump transaction origin information only for abort prepared. We need
-* this during recovery to update the replication origin progress.
+* Dump transaction origin information. We need this during recovery to
+* update the replication origin progress.
 */
-   if ((replorigin_session_origin != InvalidRepOriginId) &&
-   TransactionIdIsValid(twophase_xid))
+   if (replorigin_session_origin != InvalidRepOriginId)
{
xl_xinfo.xinfo |= XACT_XINFO_HAS_ORIGIN;

@@ -5941,8 +5940,8 @@ XactLogAbortRecord(TimestampTz abort_time,
if (xl_xinfo.xinfo & XACT_XINFO_HAS_ORIGIN)
XLogRegisterData((char *) (_origin), sizeof(xl_xact_origin));

-   if (TransactionIdIsValid(twophase_xid))
-   XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN);
+   /* include the replication origin */
+   XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN);


-- 
With Regards,
Amit Kapila.




Re: Minimal logical decoding on standbys

2022-12-16 Thread Drouvot, Bertrand

Hi,

On 12/15/22 11:31 AM, Drouvot, Bertrand wrote:

Hi,

On 12/14/22 4:55 PM, Robert Haas wrote:

  On Wed, Dec 14, 2022 at 8:06 AM Drouvot, Bertrand> Other comments:

+    if RelationIsAccessibleInLogicalDecoding(rel)
+    xlrec.flags |= VISIBILITYMAP_ON_CATALOG_ACCESSIBLE_IN_LOGICAL_DECODING;

This is a few parentheses short of where it should be. Hilariously it
still compiles because there are parentheses in the macro definition.


Oops, thanks will fix.


Fixed in v32 attached.





+    xlrec.onCatalogAccessibleInLogicalDecoding =
RelationIsAccessibleInLogicalDecoding(relation);

These lines are quite long. I think we should consider (1) picking a
shorter name for the xlrec field and, if it's such lines are going to
still routinely exceed 80 characters, (2) splitting them into two
lines, with the second one indented to match pgindent's preferences in
such cases, which I think is something like this:

xlrec.onCatalogAccessibleInLogicalDecoding =
 RelationIsAccessibleInLogicalDecoding(relation);

As far as renaming, I think we could at least remove onCatalog part
from the identifier, as that doesn't seem to be adding much. And maybe
we could even think of changing it to something like
logicalDecodingConflict or even decodingConflict, which would shave
off a bunch more characters.



I'm not sure I like the decodingConflict proposal. Indeed, it might be there is 
no conflict (depending of the xids
comparison).

What about "checkForConflict"?


In v32 attached, it is renamed to mayConflictInLogicalDecoding (I think it's 
important it reflects
that it is linked to the logical decoding and the "uncertainty"  of the 
conflict). What do you think?





+    if (heapRelation->rd_options)
+    isusercatalog = ((StdRdOptions *)
(heapRelation)->rd_options)->user_catalog_table;

Couldn't you get rid of the if statement here and also the
initialization at the top of the function and just write isusercatalog
= RelationIsUsedAsCatalogTable(heapRelation)? Or even just get rid of
the variable entirely and pass
RelationIsUsedAsCatalogTable(heapRelation) as the argument to
UpdateIndexRelation directly?



Yeah, that's better, will do, thanks!


Fixed in v32 attached.



While at it, I'm not sure that isusercatalog should be visible in pg_index.
I mean, this information could be retrieved with a join on pg_class (on the 
table the index is linked to), so the weirdness to have it visible.
I did not check how difficult it would be to make it "invisible" though.
What do you think?



It's still visible in v32 attached.
I had a second thought on it and it does not seem like a "real" concern to me.



I think this could use some test cases demonstrating that
indisusercatalog gets set correctly in all the relevant cases: table
is created with user_catalog_table = true/false, reloption is changed,
reloptions are reset, new index is added later, etc.



v31 already provides a few checks:

- After index creation on relation with user_catalog_table = true
- Propagation is done correctly after a user_catalog_table RESET
- Propagation is done correctly after an ALTER SET user_catalog_table = true
- Propagation is done correctly after an ALTER SET user_catalog_table = false

In v32, I can add a check for index creation after each of the last 3 mentioned 
above and one when a table is created with user_catalog_table = false.



v32 attached is adding the checks mentioned above.

v32 does not change anything linked to the alignment discussion, as I think 
this will depend of its outcome.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 40764ded906d166757267b7e85fdc5568fa7a018 Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Fri, 16 Dec 2022 09:38:44 +
Subject: [PATCH v32 6/6] Fixing Walsender corner case with logical decoding on
 standby.

The problem is that WalSndWaitForWal() waits for the *replay* LSN to
increase, but gets woken up by walreceiver when new WAL has been
flushed. Which means that typically walsenders will get woken up at the
same time that the startup process will be - which means that by the
time the logical walsender checks GetXLogReplayRecPtr() it's unlikely
that the startup process already replayed the record and updated
XLogCtl->lastReplayedEndRecPtr.

Introducing a new condition variable to fix this corner case.
---
 src/backend/access/transam/xlogrecovery.c | 28 
 src/backend/replication/walsender.c   | 31 +--
 src/backend/utils/activity/wait_event.c   |  3 +++
 src/include/access/xlogrecovery.h |  3 +++
 src/include/replication/walsender.h   |  1 +
 src/include/utils/wait_event.h|  1 +
 6 files changed, 59 insertions(+), 8 deletions(-)
  41.2% src/backend/access/transam/
  48.5% src/backend/replication/
   3.6% src/backend/utils/activity/
   3.4% src/include/access/

diff --git a/src/backend/access/transam/xlogrecovery.c 

RE: Perform streaming logical transactions by background workers and parallel apply

2022-12-16 Thread houzj.f...@fujitsu.com
On Friday, December 16, 2022 3:08 PM Masahiko Sawada  
wrote:
> 
> 
>Here are some minor comments:

Thanks for the comments!

> ---
> +pa_has_spooled_message_pending()
> +{
> +   PartialFileSetState fileset_state;
> +
> +   fileset_state = pa_get_fileset_state();
> +
> +   if (fileset_state != FS_UNKNOWN)
> +   return true;
> +   else
> +   return false;
> +}
> 
> I think we can simply do:
> 
> return (fileset_state != FS_UNKNOWN);

Will change.

> 
> Or do we need this function in the first place? I think we can do in
> LogicalParallelApplyLoop() like:

I was intended to not expose the file state in the main loop, so maybe better
to keep this function.

> ---
> +   active_workers = list_copy(ParallelApplyWorkerPool);
> +
> +   foreach(lc, active_workers)
> +   {
> +   int slot_no;
> +   uint16  generation;
> +   ParallelApplyWorkerInfo *winfo =
> (ParallelApplyWorkerInfo *) lfirst(lc);
> +
> +   LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
> +   napplyworkers =
> logicalrep_pa_worker_count(MyLogicalRepWorker->subid);
> +   LWLockRelease(LogicalRepWorkerLock);
> +
> +   if (napplyworkers <=
> max_parallel_apply_workers_per_subscription / 2)
> +   return;
> +
> 
> Calling logicalrep_pa_worker_count() with lwlock for each worker seems
> not efficient to me. I think we can get the number of workers once at
> the top of this function and return if it's already lower than the
> maximum pool size. Otherwise, we attempt to stop extra workers.

How about we directly check the length of worker pool list here which
seems simpler and don't need to lock ?

> ---
> +bool
> +pa_free_worker(ParallelApplyWorkerInfo *winfo, TransactionId xid)
> +{
> 
> 
> Is there any reason why this function has the XID as a separate
> argument? It seems to me that since we always call this function with
> 'winfo' and 'winfo->shared->xid', we can remove xid from the function
> argument.
> 
> ---
> +   /* Initialize shared memory area. */
> +   SpinLockAcquire(>shared->mutex);
> +   winfo->shared->xact_state = PARALLEL_TRANS_UNKNOWN;
> +   winfo->shared->xid = xid;
> +   SpinLockRelease(>shared->mutex);
> 
> It's practically no problem but is there any reason why some fields of
> ParallelApplyWorkerInfo are initialized in pa_setup_dsm() whereas some
> fields are done here?

We could be using old worker in the pool here in which case we need to update
these fields with the new streaming transaction information.

I will address other comments except above ones which are being discussed.

Best regards,
Hou zj



Re: Inconsistency in reporting checkpointer stats

2022-12-16 Thread Kyotaro Horiguchi
At Wed, 14 Dec 2022 16:54:53 +0530, Bharath Rupireddy 
 wrote in 
> Indeed PendingCheckpointerStats.buf_written_checkpoints needs to count
> buffer writes in SlruInternalWritePage(). However, does it need to be
> done immediately there? The stats will not be visible to the users
> until the next pgstat_report_checkpointer(). Incrementing
> buf_written_checkpoints in BufferSync() makes sense as the
> pgstat_report_checkpointer() gets called in there via
> CheckpointWriteDelay() and it becomes visible to the user immediately.
> Have you checked if pgstat_report_checkpointer() gets called while the
> checkpoint calls SlruInternalWritePage()? If not, then you can just
> assign ckpt_bufs_written to buf_written_checkpoints in
> LogCheckpointEnd() like its other friends
> checkpoint_write_time and checkpoint_sync_time.

If I'm getting Bharath correctly, it results in double counting of
BufferSync. If we want to keep the realtime-reporting nature of
BufferSync, BufferSync should give up to increment CheckPointerStats'
counter.  Such separation seems to be a kind of stupid and quite
bug-prone.

In the first place I don't like that we count the same things twice.
Couldn't we count the number only by any one of them?

If we remove CheckPointerStats.ckpt_bufs_written, CreateCheckPoint can
get the final number as the difference between the start-end values of
*the shared stats*. As long as a checkpoint runs on a single process,
trace info in BufferSync will work fine.  Assuming single process
checkpointing there must be no problem to do that. (Anyway the current
shared stats update for checkpointer is assuming single-process).

Otherwise, in exchange with giving up the realtime nature, we can
count the number only by CheckPointerStats.ckpt_bufs_written.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)

2022-12-16 Thread Pavel Stehule
čt 15. 12. 2022 v 17:13 odesílatel Justin Pryzby 
napsal:

> Rebased on c727f511b.
>
> This patch record was "closed for lack of interest", but I think what's
> actually needed is committer review of which approach to take.
>
>  - add backend functions but do not modify psql ?
>  - add to psql slash-plus commnds ?
>  - introduce psql double-plus commands for new options ?
>  - change pre-existing psql plus commands to only show size with
>double-plus ?
>  - go back to the original, two-line client-side sum() ?
>
> Until then, the patchset is organized with those questions in mind.
>

+1

This format makes sense to me.

Regards

Pavel

>
> --
> Justin
>


Re: New strategies for freezing, advancing relfrozenxid early

2022-12-16 Thread Nikita Malakhov
Hi!

I've found this discussion very interesting, in view of vacuuming
TOAST tables is always a problem because these tables tend to
bloat very quickly with dead data - just to remind, all TOAST-able
columns of the relation use the same TOAST table which is one
for the relation, and TOASTed data are not updated - there are
only insert and delete operations.

Have you tested it with large and constantly used TOAST tables?
How would it work with the current TOAST implementation?

We propose a different approach to the TOAST mechanics [1],
and a new vacuum would be very promising.

Thank you!

[1] https://commitfest.postgresql.org/41/3490/

On Fri, Dec 16, 2022 at 10:48 AM John Naylor 
wrote:

>
> On Wed, Dec 14, 2022 at 6:07 AM Peter Geoghegan  wrote:
> >
> > At the suggestion of Jeff, I wrote a Wiki page that shows motivating
> > examples for the patch series:
> >
> >
> https://wiki.postgresql.org/wiki/Freezing/skipping_strategies_patch:_motivating_examples
> >
> > These are all cases where VACUUM currently doesn't do the right thing
> > around freezing, in a way that is greatly ameliorated by the patch.
> > Perhaps this will help other hackers to understand the motivation
> > behind some of these mechanisms. There are plenty of details that only
> > make sense in the context of a certain kind of table, with certain
> > performance characteristics that the design is sensitive to, and seeks
> > to take advantage of in one way or another.
>
> Thanks for this. This is the kind of concrete, data-based evidence that I
> find much more convincing, or at least easy to reason about. I'd actually
> recommend in the future to open discussion with this kind of analysis --
> even before coding, it's possible to indicate what a design is *intended*
> to achieve. And reviewers can likewise bring up cases of their own in a
> concrete fashion.
>
> On Wed, Dec 14, 2022 at 12:16 AM Peter Geoghegan  wrote:
>
> > At the very least, a given VACUUM operation has to choose its freezing
> > strategy based on how it expects the table will look when it's done
> > vacuuming the table, and how that will impact the next VACUUM against
> > the same table. Without that, then vacuuming an append-only table will
> > fall into a pattern of setting pages all-visible in one vacuum, and
> > then freezing those same pages all-frozen in the very next vacuum
> > because there are too many. Which makes little sense; we're far better
> > off freezing the pages at the earliest opportunity instead.
>
> That makes sense, but I wonder if we can actually be more specific: One
> motivating example mentioned is the append-only table. If we detected that
> case, which I assume we can because autovacuum_vacuum_insert_* GUCs exist,
> we could use that information as one way to drive eager freezing
> independently of size. At least in theory -- it's very possible size will
> be a necessary part of the decision, but it's less clear that it's as
> useful as a user-tunable knob.
>
> If we then ignored the append-only case when evaluating a freezing policy,
> maybe other ideas will fall out. I don't have a well-thought out idea about
> policy or knobs, but it's worth thinking about.
>
> Aside from that, I've only given the patches a brief reading. Having seen
> the VM snapshot in practice (under "Scanned pages, visibility map snapshot"
> in the wiki page), it's neat to see fewer pages being scanned. Prefetching
> not only seems superior to SKIP_PAGES_THRESHOLD, but anticipates
> asynchronous IO. Keeping only one VM snapshot page in memory makes perfect
> sense.
>
> I do have a cosmetic, but broad-reaching, nitpick about terms regarding
> "skipping strategy". That's phrased as a kind of negative -- what we're
> *not* doing. Many times I had to pause and compute in my head what we're
> *doing*, i.e. the "scanning strategy". For example, I wonder if the VM
> strategies would be easier to read as:
>
> VMSNAP_SKIP_ALL_VISIBLE -> VMSNAP_SCAN_LAZY
> VMSNAP_SKIP_ALL_FROZEN -> VMSNAP_SCAN_EAGER
> VMSNAP_SKIP_NONE -> VMSNAP_SCAN_ALL
>
> Notice here they're listed in order of increasing eagerness.
>
> --
> John Naylor
> EDB: http://www.enterprisedb.com
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/