Re: Should vacuum process config file reload more often

2023-03-08 Thread John Naylor
On Thu, Mar 9, 2023 at 12:42 AM Jim Nasby  wrote:
>
> Doesn't the dead tuple space grow as needed? Last I looked we don't
allocate up to 1GB right off the bat.

Incorrect.

> Of course, if the patch that eliminates the 1GB vacuum limit gets
committed the situation will be even worse.

If you're referring to the proposed tid store, I'd be interested in seeing
a reproducible test case with a m_w_m over 1GB where it makes things worse
than the current state of affairs.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Raising the SCRAM iteration count

2023-03-08 Thread Michael Paquier
On Wed, Mar 08, 2023 at 05:21:20PM +0900, Michael Paquier wrote:
> On Wed, Mar 08, 2023 at 09:07:36AM +0100, Daniel Gustafsson wrote:
>> AFAIK a TAP test with psql_interactive is the only way to do this so that's
>> what I've implemented.

I cannot think of a better idea than what you have here, so I am
marking this patch as ready for committer.  I am wondering how stable
a logic based on a timer of 5s would be..
--
Michael


signature.asc
Description: PGP signature


Re: Testing autovacuum wraparound (including failsafe)

2023-03-08 Thread Michael Paquier
On Tue, Mar 07, 2023 at 09:21:00PM -0800, Peter Geoghegan wrote:
> Surely your tap test should be using single user mode?  Perhaps you
> missed the obnoxious HINT, that's part of the WARNING that the test
> parses?  ;-)

I may be missing something, but you cannot use directly a "postgres"
command in a TAP test, can you?  See 1a9d802, that has fixed a problem
when TAP tests run with a privileged account on Windows.
--
Michael


signature.asc
Description: PGP signature


Re: Add pg_walinspect function with block info columns

2023-03-08 Thread Michael Paquier
On Thu, Mar 09, 2023 at 09:52:57AM +0530, Bharath Rupireddy wrote:
> Hm, then, +1 for v3.

Okay, thanks.  Let's use that, then.
--
Michael


signature.asc
Description: PGP signature


Re: pg_upgrade and logical replication

2023-03-08 Thread Amit Kapila
On Wed, Mar 8, 2023 at 12:26 PM Julien Rouhaud  wrote:
>
> On Sat, 4 Mar 2023, 14:13 Amit Kapila,  wrote:
>>
>>
>> > For the publisher nodes, that may be something nice to support (I'm 
>> > assuming it
>> > could be useful for more complex replication setups) but I'm not 
>> > interested in
>> > that at the moment as my goal is to reduce downtime for major upgrade of
>> > physical replica, thus *not* doing pg_upgrade of the primary node, whether
>> > physical or logical.  I don't see why it couldn't be done later on, if/when
>> > someone has a use case for it.
>> >
>>
>> I thought there is value if we provide a way to upgrade both publisher
>> and subscriber.
>
>
> it's still unclear to me whether it's actually achievable on the publisher 
> side, as running pg_upgrade leaves a "hole" in the WAL stream and resets the 
> timeline, among other possible difficulties. Now I don't know much about 
> logical replication internals so I'm clearly not the best person to answer 
> those questions.
>

I think that is the part we need to analyze and see what are the
challenges there. One part of the challenge is that we need to
preserve slots that have some WAL locations like restart_lsn,
confirmed_flush and we need WAL from those locations for decoding. I
haven't analyzed this but isn't it possible to that on clean shutdown
we confirm that all the WAL has been sent and confirmed by the logical
subscriber in which case I think truncating WAL in pg_upgrade
shouldn't be a problem?

>> Now, you came up with a use case linking it to a
>> physical replica where allowing an upgrade of only subscriber nodes is
>> useful. It is possible that users find your steps easy to perform and
>> didn't find them error-prone but it may be better to get some
>> authentication of the same. I haven't yet analyzed all the steps in
>> detail but let's see what others think.
>
>
> It's been quite some time since and no one seemed to chime in or object. IMO 
> doing a major version upgrade with limited downtime (so something faster than 
> stopping postgres and running pg_upgrade) has always been difficult and never 
> prevented anyone from doing it, so I don't think that it should be a blocker 
> for what I'm suggesting here, especially since the current behavior of 
> pg_upgrade on a subscriber node is IMHO broken.
>
> Is there something that can be done for pg16? I was thinking that having a 
> fix for the normal and easy case could be acceptable: only allowing 
> pg_upgrade to optionally, and not by default, preserve the subscription 
> relations IFF all subscriptions only have tables in ready state. Different 
> states should be transient, and it's easy to check as a user beforehand and 
> also easy to check during pg_upgrade, so it seems like an acceptable 
> limitations (which I personally see as a good sanity check, but YMMV). It 
> could be lifted in later releases if wanted anyway.
>
> It's unclear to me whether this limited scope would also require to preserve 
> the replication origins, but having looked at the code I don't think it would 
> be much of a problem as the local LSN doesn't have to be preserved.
>

I think we need to preserve replication origins as they help us to
determine the WAL location from where to start the streaming after the
upgrade. If we don't preserve those then from which location will the
subscriber start streaming? We don't want to replicate the WAL which
has already been sent.

-- 
With Regards,
Amit Kapila.




Re: [PoC] Let libpq reject unexpected authentication requests

2023-03-08 Thread Michael Paquier
On Mon, Mar 06, 2023 at 04:02:25PM -0800, Jacob Champion wrote:
> On Fri, Mar 3, 2023 at 6:35 PM Michael Paquier  wrote:
>> I was refreshing my mind with 0001 yesterday, and except for the two
>> parts where we need to worry about AUTH_REQ_OK being sent too early
>> and the business with gssenc, this is a rather straight-forward.  It
>> also looks like the the participants of the thread are OK with the
>> design you are proposing (list of keywords, potentially negative
>> patterns).  I think that I can get this part merged for this CF, at
>> least, not sure about the rest :p
> 
> Thanks! Is there anything that would make the sslcertmode patch more
> palatable? Or any particular areas of concern?

I have been reviewing 0001, finishing with the attached, and that's
nice work.  My notes are below.

pqDropServerData() is in charge of cleaning up the transient data of a
connection between different attempts.  Shouldn't client_finished_auth
be reset to false there?  No parameters related to the connection
parameters should be reset in this code path, but this state is
different.  It does not seem possible that we could reach
pqDropServerData() after client_finished_auth has been set to true,
but that feels safer.  I was tempted first to do that as well in
makeEmptyPGconn(), but we do a memset(0) there, so there is no point
in doing that anyway ;)

require_auth needs a cleanup in freePGconn().

+   case AUTH_REQ_SCM_CREDS:
+   return libpq_gettext("server requested UNIX socket credentials");
I am not really cool with the fact that this would fail and that we
offer no options to control that.  Now, this involves servers up to
9.1, which is also a very good to rip of this code entirely.  For now,
I think that we'd better allow this option, and discuss the removal of
that in a separate thread.

pgindent has been complaining on the StaticAssertDecl() in fe-auth.c:
src/interfaces/libpq/fe-auth.c: Error@847: Unbalanced parens
Warning@847: Extra )
Warning@847: Extra )
Warning@848: Extra )

From what I can see, this comes from the use of {0} within the
expression itself.  I don't really want to dig into why pg_bsd_indent
thinks this is a bad idea, so let's just move the StaticAssertDecl() a
bit, like in the attached.  The result is the same.

As of the "sensitive" cases of the patch:
- I don't really think that we have to care much of the cases like
"none,scram" meaning that a SASL exchange hastily downgraded to
AUTH_REQ_OK by the server would be a success, as "none" means that the
client is basically OK with trust-level.  This said, "none" could be a
dangerous option in some cases, while useful in others.
- SSPI is the default connection setup for the TAP tests on Windows.
We could stick a small test somewhere, perhaps, certainly not in
src/test/authentication/.
- SASL/SCRAM is indeed a problem on its own.  My guess is that we
should let channel_binding do the job for SASL, or introduce a new
option to decide which sasl mechanisms are authorized.  At the end,
using "scram-sha-256" as the keyword is fine by me as we use that even
for HBA files, so that's quite known now, I hope.
--
Michael
From 4c4eaf6989e60676df4a597e7831abf35133cfc5 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 9 Mar 2023 15:26:39 +0900
Subject: [PATCH v15] libpq: let client reject unexpected auth methods

The require_auth connection option allows the client to choose a list of
acceptable authentication types for use with the server. There is no
negotiation: if the server does not present one of the allowed
authentication requests, the connection fails. Additionally, all methods
in the list may be negated, e.g. '!password', in which case the server
must NOT use the listed authentication type. The special method "none"
allows/disallows the use of unauthenticated connections (but it does not
govern transport-level authentication via TLS or GSSAPI).

Internally, the patch expands the role of check_expected_areq() to
ensure that the incoming request is compatible with conn->require_auth.
It also introduces a new flag, conn->client_finished_auth, which is set
by various authentication routines when the client side of the handshake
is finished. This signals to check_expected_areq() that an OK message
from the server is expected, and allows the client to complain if the
server forgoes authentication entirely.

(Since the client can't generally prove that the server is actually
doing the work of authentication, this last part is mostly useful for
SCRAM without channel binding. It could also provide a client with a
decent signal that, at the very least, it's not connecting to a database
with trust auth, and so the connection can be tied to the client in a
later audit.)

Deficiencies:
- This is unlikely to be very forwards-compatible at the moment,
  especially with SASL/SCRAM.
- SSPI support is "implemented" but untested.
- require_auth=none,scram-sha-256 currently allows the server to leave a
  SCRAM exchange unfinished. This is not 

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-08 Thread Amit Kapila
On Thu, Mar 9, 2023 at 10:37 AM Peter Smith  wrote:
>
> On Thu, Mar 9, 2023 at 3:49 PM Amit Kapila  wrote:
> >
> > On Wed, Mar 8, 2023 at 8:44 PM Önder Kalacı  wrote:
> > >
> > >>
> > >> I felt that once you remove the create publication/subscription/wait
> > >> for sync steps, the test execution might become faster and save some
> > >> time in the local execution, cfbot and the various machines in
> > >> buildfarm. If the execution time will not reduce, then no need to
> > >> change.
> > >>
> > >
> > > So, as I noted earlier, there are different schemas. As far as I count, 
> > > there are at least
> > > 7 different table definitions. I think all tables having the same name 
> > > are maybe confusing?
> > >
> > > Even if I try to group the same table definitions, and avoid create 
> > > publication/subscription/wait
> > > for sync steps, the total execution time of the test drops only ~5%. As 
> > > far as I test, that does not
> > > seem to be the bottleneck for the tests.
> > >
> > > Well, I'm really not sure if it is really worth doing that. I think 
> > > having each test independent of each
> > > other is really much easier to follow.
> > >
> >
> > This new test takes ~9s on my machine whereas most other tests in
> > subscription/t take roughly 2-5s. I feel we should try to reduce the
> > test timing without sacrificing much of the functionality or code
> > coverage.  I think if possible we should try to reduce setup/teardown
> > cost for each separate test by combining them where possible. I have a
> > few comments on tests which also might help to optimize these tests.
> >
>
> To avoid culling useful tests just because they take too long to run I
> have often thought we should separate some of the useful (but costly)
> subscription tests from the mainstream other tests. Then they won't
> cost any extra time for the build-farm, but at least we can still run
> them on-demand using PG_TEST_EXTRA [1] approach if we really want to.
>

I don't think that is relevant here. It is mostly about removing
duplicate work we are doing in tests. I don't see anything in the
tests that should require a long time to complete.

-- 
With Regards,
Amit Kapila.




Cross-database SERIALIZABLE safe snapshots

2023-03-08 Thread Thomas Munro
Here is a feature idea that emerged from a pgsql-bugs thread[1] that I
am kicking into the next commitfest.  Example:

s1: \c db1
s1: CREATE TABLE t (i int);
s1: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
s1: INSERT INTO t VALUES (42);

s2: \c db2
s2: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY DEFERRABLE;
s2: SELECT * FROM x;

I don't know of any reason why s2 should have to wait, or
alternatively, without DEFERRABLE, why it shouldn't immediately drop
from SSI to SI (that is, opt out of predicate locking and go faster).
This change relies on the fact that PostgreSQL doesn't allow any kind
of cross-database access, except for shared catalogs, and all catalogs
are already exempt from SSI.  I have updated this new version of the
patch to explain that very clearly at the place where that exemption
happens, so that future hackers would see that we rely on that fact
elsewhere if reconsidering that.

[1] 
https://www.postgresql.org/message-id/flat/17368-98a4f99e8e4b4402%40postgresql.org
From 2156a432b3629d1b7f8bf2fe42f641f8845baf48 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 9 Mar 2023 18:12:17 +1300
Subject: [PATCH] Optimize cross-database SERIALIZABLE safe snapshots.

SERIALIZABLE READ ONLY [DEFERRABLE] transactions can benefit from "safe
snapshots", where they detect that no serializable anomalies could be
created, so we can silently drop to the cheaper REPEATABLE READ (in
other words from SSI to SI).

Since a transactions connected to different databases can't access each
others' data at all, except for catalogs where SSI doesn't apply anyway,
there is no point in waiting for transactions in other databases.
Filter them out while populating our possibleUnsafeConflicts lists.

This means that non-DEFERRABLE safe snapshots might opt out immediately
or sooner, and DEFERRABLE safe snapshots might not have to wait as long
to get started, in scenarios where more than one database is using
SERIALIZABLE transactions.

Discussion: https://postgr.es/m/17116-d6ca217acc180e30%40postgresql.org

diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 23461b46f6..343a1497f3 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -489,6 +489,13 @@ static void ReleasePredicateLocksLocal(void);
 /*
  * Does this relation participate in predicate locking? Temporary and system
  * relations are exempt.
+ *
+ * Note that GetSerializableTransactionSnapshotInt() relies on the latter
+ * exemption when considering separate databases to be fully partitioned from
+ * each other for the purposes of the safe snapshot optimization.  If SSI is
+ * ever applied to system catalogs, and in particular shared catalogs,
+ * databases would not no longer be sufficiently isolated and that would need
+ * to be reconsdered.
  */
 static inline bool
 PredicateLockingNeededForRelation(Relation relation)
@@ -1213,6 +1220,7 @@ InitPredicateLocks(void)
 		PredXact->OldCommittedSxact->flags = SXACT_FLAG_COMMITTED;
 		PredXact->OldCommittedSxact->pid = 0;
 		PredXact->OldCommittedSxact->pgprocno = INVALID_PGPROCNO;
+		PredXact->OldCommittedSxact->database = InvalidOid;
 	}
 	/* This never changes, so let's keep a local copy. */
 	OldCommittedSxact = PredXact->OldCommittedSxact;
@@ -1795,6 +1803,7 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 	sxact->xmin = snapshot->xmin;
 	sxact->pid = MyProcPid;
 	sxact->pgprocno = MyProc->pgprocno;
+	sxact->database = MyDatabaseId;
 	dlist_init(>predicateLocks);
 	dlist_node_init(>finishedLink);
 	sxact->flags = 0;
@@ -1814,7 +1823,17 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 		{
 			othersxact = dlist_container(SERIALIZABLEXACT, xactLink, iter.cur);
 
-			if (!SxactIsCommitted(othersxact)
+			/*
+			 * We can't possibly have an unsafe conflict with a transaction in
+			 * another database.  The only possible overlap is on shared
+			 * catalogs, but we don't support SSI for shared catalogs.  The
+			 * invalid database case covers 2PC, because we don't yet record
+			 * database OIDs in the 2PC information.  We also filter out doomed
+			 * transactions as they can't possibly commit.
+			 */
+			if ((othersxact->database == InvalidOid ||
+ othersxact->database == MyDatabaseId)
+&& !SxactIsCommitted(othersxact)
 && !SxactIsDoomed(othersxact)
 && !SxactIsReadOnly(othersxact))
 			{
@@ -1824,9 +1843,9 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 
 		/*
 		 * If we didn't find any possibly unsafe conflicts because every
-		 * uncommitted writable transaction turned out to be doomed, then we
-		 * can "opt out" immediately.  See comments above the earlier check for
-		 * PredXact->WritableSxactCount == 0.
+		 * uncommitted writable transaction turned out to be doomed or in
+		 * another database, then we can "opt out" immediately.  See comments
+		 * above the earlier check for PredXact->WritableSxactCount == 0.
 		 */
 		if 

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-03-08 Thread Amit Kapila
On Wed, Mar 8, 2023 at 9:20 AM Masahiko Sawada  wrote:
>
> On Wed, Mar 8, 2023 at 3:30 AM Nathan Bossart  
> wrote:
> >
>
> >  IMO the current set of
> > trade-offs (e.g., unavoidable bloat and WAL buildup) would make this
> > feature virtually unusable for a lot of workloads, so it's probably worth
> > exploring an alternative approach.
>
> It might require more engineering effort for alternative approaches
> such as one I proposed but the feature could become better from the
> user perspective. I also think it would be worth exploring it if we've
> not yet.
>

Fair enough. I think as of now most people think that we should
consider alternative approaches for this feature. The two ideas at a
high level are that the apply worker itself first flushes the decoded
WAL (maybe only when time-delay is configured) or have a separate
walreceiver process as we have for standby. I think we need to analyze
the pros and cons of each of those approaches and see if they would be
useful even for other things on the apply side.

-- 
With Regards,
Amit Kapila.




Re: Rework LogicalOutputPluginWriterUpdateProgress

2023-03-08 Thread Peter Smith
Here are some review comments for v6-0001

==
General.

1.
There are lots of new comments saying:
/* don't call update progress, we didn't really make any */

but is the wording "call update progress" meaningful?

Should that be written something more like:
/* No progress has been made so there is no need to call
UpdateProgressAndKeepalive. */

==

2. rollback_prepared_cb_wrapper

  /*
  * If the plugin support two-phase commits then rollback prepared callback
  * is mandatory
+ *
+ * FIXME: This should have been caught much earlier.
  */
  if (ctx->callbacks.rollback_prepared_cb == NULL)
  ereport(ERROR,

~

Why is this seemingly unrelated FIXME still in the patch? I thought it
was posted a while ago (See [1] comment #8) that this would be
deleted.

~~~

4.

@@ -1370,6 +1377,8 @@ stream_abort_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,

  /* Pop the error context stack */
  error_context_stack = errcallback.previous;
+
+ UpdateProgressAndKeepalive(ctx, (txn->toptxn == NULL));
 }

~

Are the double parentheses necessary?

~~~

5. UpdateProgressAndKeepalive

I had previously suggested (See [2] comment #3) that the code might be
simplified if the "is_keepalive_threshold_exceeded(ctx)" check was
pushed down into this function, but it seems like nobody else gave any
opinion for/against that idea yet... so the question still stands.

==
src/backend/replication/walsender.c

6. WalSndUpdateProgressAndKeepalive

Since the 'ctx' is unused here, it might be nicer to annotate that to
make it clear it is deliberate and suppress any possible warnings
about unused params.

e.g. something like:

WalSndUpdateProgressAndKeepalive(
pg_attribute_unused() LogicalDecodingContext *ctx,
XLogRecPtr lsn,
TransactionId xid,
bool did_write,
bool finished_xact)

--
[1] 
https://www.postgresql.org/message-id/OS3PR01MB6275C6CA7C0C23730A319EAD9%40OS3PR01MB6275.jpnprd01.prod.outlook.com
[2] 
https://www.postgresql.org/message-id/CAHut%2BPt3ZEMo-KTF%3D5KJSU%2BHdWJD19GPGGCKOmBeM47484Ychw%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-08 Thread Peter Smith
On Thu, Mar 9, 2023 at 3:49 PM Amit Kapila  wrote:
>
> On Wed, Mar 8, 2023 at 8:44 PM Önder Kalacı  wrote:
> >
> >>
> >> I felt that once you remove the create publication/subscription/wait
> >> for sync steps, the test execution might become faster and save some
> >> time in the local execution, cfbot and the various machines in
> >> buildfarm. If the execution time will not reduce, then no need to
> >> change.
> >>
> >
> > So, as I noted earlier, there are different schemas. As far as I count, 
> > there are at least
> > 7 different table definitions. I think all tables having the same name are 
> > maybe confusing?
> >
> > Even if I try to group the same table definitions, and avoid create 
> > publication/subscription/wait
> > for sync steps, the total execution time of the test drops only ~5%. As far 
> > as I test, that does not
> > seem to be the bottleneck for the tests.
> >
> > Well, I'm really not sure if it is really worth doing that. I think having 
> > each test independent of each
> > other is really much easier to follow.
> >
>
> This new test takes ~9s on my machine whereas most other tests in
> subscription/t take roughly 2-5s. I feel we should try to reduce the
> test timing without sacrificing much of the functionality or code
> coverage.  I think if possible we should try to reduce setup/teardown
> cost for each separate test by combining them where possible. I have a
> few comments on tests which also might help to optimize these tests.
>

To avoid culling useful tests just because they take too long to run I
have often thought we should separate some of the useful (but costly)
subscription tests from the mainstream other tests. Then they won't
cost any extra time for the build-farm, but at least we can still run
them on-demand using PG_TEST_EXTRA [1] approach if we really want to.

--
[1] https://www.postgresql.org/docs/devel/regress-run.html

Kind Regards,
Peter Smith.
Fujitsu Austrlia




Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-08 Thread Amit Kapila
On Wed, Mar 8, 2023 at 8:44 PM Önder Kalacı  wrote:
>
>>
>> I felt that once you remove the create publication/subscription/wait
>> for sync steps, the test execution might become faster and save some
>> time in the local execution, cfbot and the various machines in
>> buildfarm. If the execution time will not reduce, then no need to
>> change.
>>
>
> So, as I noted earlier, there are different schemas. As far as I count, there 
> are at least
> 7 different table definitions. I think all tables having the same name are 
> maybe confusing?
>
> Even if I try to group the same table definitions, and avoid create 
> publication/subscription/wait
> for sync steps, the total execution time of the test drops only ~5%. As far 
> as I test, that does not
> seem to be the bottleneck for the tests.
>
> Well, I'm really not sure if it is really worth doing that. I think having 
> each test independent of each
> other is really much easier to follow.
>

This new test takes ~9s on my machine whereas most other tests in
subscription/t take roughly 2-5s. I feel we should try to reduce the
test timing without sacrificing much of the functionality or code
coverage.  I think if possible we should try to reduce setup/teardown
cost for each separate test by combining them where possible. I have a
few comments on tests which also might help to optimize these tests.

1.
+# Testcase start: SUBSCRIPTION USES INDEX
+#
+# Basic test where the subscriber uses index
+# and only updates 1 row and deletes
+# 1 other row
...
...
+# Testcase start: SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS
+#
+# Basic test where the subscriber uses index
+# and updates 50 rows

...
+# Testcase start: SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS
+#
+# Basic test where the subscriber uses index
+# and deletes 200 rows

I think to a good extent these tests overlap each other. I think we
can have just one test for the index with multiple columns that
updates multiple rows and have both updates and deletes.

2.
+# Testcase start: SUBSCRIPTION DOES NOT USE PARTIAL INDEX
...
...
+# Testcase start: SUBSCRIPTION DOES NOT USE INDEXES WITH ONLY EXPRESSIONS

Instead of having separate tests where we do all setups again, I think
it would be better if we create both the indexes in one test and show
that none of them is used.

3.
+# now, the update could either use the test_replica_id_full_idy or
test_replica_id_full_idy index
+# it is not possible for user to control which index to use

The name of the second index is wrong in the above comment.

4.
+# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN

As we have removed enable_indexscan check, you should remove this test.

5. In general, the line length seems to vary a lot for different
multi-line comments. Though we are not strict in that it will look
better if there is consistency in that (let's have ~80 cols line
length for each comment in a single line).

-- 
With Regards,
Amit Kapila.




Re: proposal - get_extension_version function

2023-03-08 Thread Pavel Stehule
st 8. 3. 2023 v 23:43 odesílatel Tom Lane  napsal:

> Jacob Champion  writes:
> > What I'm trying to pin down is the project's position on the reverse
> > -- binary version X and SQL version X+1 -- because that seems
> > generally unmaintainable, and I don't understand why an author would
> > pay that tax if they could just avoid it by bailing out entirely. (If
> > an author wants to allow that, great, but does everyone have to?)
>
> Hard to say.  Our experience with the standard contrib modules is that
> it really isn't much additional trouble; but perhaps more-complex modules
> would have more interdependencies between functions.  In any case,
> I fail to see the need for basing things on a catalog lookup rather
> than embedding API version numbers in relevant C symbols.
>

How can you check it? There is not any callback now.

Regards

Pavel




>
> regards, tom lane
>


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-08 Thread vignesh C
On Wed, 8 Mar 2023 at 21:46, Önder Kalacı  wrote:
>
> Hi Vignesh C,
>
>>
>>
>> Few comments
>> 1) Maybe this change is not required:
>> fallback if no other solution is possible.  If a replica identity other
>> than full is set on the publisher side, a replica identity
>> -   comprising the same or fewer columns must also be set on the subscriber
>> -   side.  See  for details 
>> on
>> +   comprising the same or fewer columns must also be set on the
>> subscriber side.
>> +   See  for details on
>
>
> Yes, fixed.
>>
>>
>> 2) Variable declaration and the assignment can be split so that the
>> readability is better:
>> +
>> +   boolisUsableIndex =
>> +   IsIndexUsableForReplicaIdentityFull(indexInfo);
>> +
>> +   index_close(indexRelation, AccessShareLock);
>> +
>
>
> Hmm, can you please elaborate more on this? The declaration
> and assignment are already on different lines.
>
> ps: pgindent changed this line a bit. Does that look better?

I thought of changing it to something like below:
bool isUsableIndex;
Oid idxoid = lfirst_oid(lc);
Relation indexRelation = index_open(idxoid, AccessShareLock);
IndexInfo  *indexInfo = BuildIndexInfo(indexRelation);

isUsableIndex = IsIndexUsableForReplicaIdentityFull(indexInfo);

Regards,
Vignesh




Re: Add pg_walinspect function with block info columns

2023-03-08 Thread Bharath Rupireddy
On Thu, Mar 9, 2023 at 7:34 AM Kyotaro Horiguchi
 wrote:
>
> > In short, my choice would still be simplicity here with v3, I guess.
>
> FWIW, I slightly prefer v3 for the reason I mentioned above.

Hm, then, +1 for v3.

FWIW, I quickly tried to hit that case where a single WAL record has
max_block_id = XLR_MAX_BLOCK_ID with both FPIs and block data, but I
couldn't. I could generate WAL records with 45K FPIs, 10mn block data
and the total palloc'd length in the block_id for loop has not crossed
8K.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: meson vs make: missing/inconsistent ENV

2023-03-08 Thread Michael Paquier
On Wed, Mar 08, 2023 at 05:59:13PM -0800, Andres Freund wrote:
> I now pushed a fix for the two obvious cases pointed out by Justin.

Thanks!
--
Michael


signature.asc
Description: PGP signature


Re: allow_in_place_tablespaces vs. pg_basebackup

2023-03-08 Thread Kyotaro Horiguchi
At Thu, 09 Mar 2023 11:53:26 +0900 (JST), I wrote
> It turned out to be not as simple as I thought, though...

The error message and the location where the error condition is
checked don't match, but making the messages more generic may not be
helpful for users..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-08 Thread Amit Kapila
On Thu, Mar 9, 2023 at 6:34 AM Peter Smith  wrote:
>
> 4. build_replindex_scan_key
>
> >
> > Based on the discussions below, I kept as-is. I really don't want to do 
> > unrelated
> > changes in this patch, as I also got several feedback for not doing it,
> >
>
> Hmm, although this code pre-existed I don’t consider this one as
> "unrelated changes" because the patch introduced the new "if
> (!AttributeNumberIsValid(table_attno))" which changed things.  As I
> wrote to Amit yesterday [2] IMO it would be better to do the 'opttype'
> assignment *after* the potential 'continue' otherwise there is every
> chance that the assignment is just redundant. And if you move the
> assignment where it belongs, then you might as well declare the
> variable in the more appropriate place at the same time – i.e. with
> 'opfamily' declaration. Anyway, I've given my reason a couple of times
> now, so if you don't want to change it I won't about it debate
> anymore.
>

I agree with this reasoning.

-- 
With Regards,
Amit Kapila.




Re: allow_in_place_tablespaces vs. pg_basebackup

2023-03-08 Thread Kyotaro Horiguchi
At Thu, 09 Mar 2023 10:58:41 +0900 (JST), I wrote
> behavior for that purpose. I believe it is reasonable to make
> basebackup error-out when it encounters an in-place tablespace
> directory when TABLESPACE_MAP is activated.

It turned out to be not as simple as I thought, though...

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 543d4d897a..c33a51772a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8241,7 +8241,8 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
  */
 void
 do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
-   BackupState *state, StringInfo tblspcmapfile)
+   BackupState *state, StringInfo tblspcmapfile,
+   bool allow_inplace_tsps)
 {
 	bool		backup_started_in_recovery;
 
@@ -8434,6 +8435,7 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
 			int			rllen;
 			StringInfoData escapedpath;
 			char	   *s;
+			PGFileType  ftype;
 
 			/* Skip anything that doesn't look like a tablespace */
 			if (strspn(de->d_name, "0123456789") != strlen(de->d_name))
@@ -8446,7 +8448,14 @@ do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
 			 * we sometimes use allow_in_place_tablespaces to create
 			 * directories directly under pg_tblspc, which would fail below.
 			 */
-			if (get_dirent_type(fullpath, de, false, ERROR) != PGFILETYPE_LNK)
+			ftype = get_dirent_type(fullpath, de, false, ERROR);
+
+			/* reject in-place tablespace directories when not allowed */
+			if (ftype == PGFILETYPE_DIR && !allow_inplace_tsps)
+ereport(ERROR,
+		errmsg("cannot handle in-place tablespace directories when TABLESPACE_MAP is activated"),
+		errhint("If you are using pg_basebackup, try -Fp instead."));
+			if (ftype != PGFILETYPE_LNK)
 continue;
 
 			rllen = readlink(fullpath, linkpath, sizeof(linkpath));
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index c07daa874f..face85c1bf 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -99,7 +99,8 @@ pg_backup_start(PG_FUNCTION_ARGS)
 	MemoryContextSwitchTo(oldcontext);
 
 	register_persistent_abort_backup_handler();
-	do_pg_backup_start(backupidstr, fast, NULL, backup_state, tablespace_map);
+	do_pg_backup_start(backupidstr, fast, NULL, backup_state, tablespace_map,
+	   true);
 
 	PG_RETURN_LSN(backup_state->startpoint);
 }
diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index 6efdefb591..c69e91a4ff 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -259,8 +259,10 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
 	tablespace_map = makeStringInfo();
 
 	basebackup_progress_wait_checkpoint();
+
+	/* don't allow in-place tablespace when sending the tablespace map file */
 	do_pg_backup_start(opt->label, opt->fastcheckpoint, ,
-	   backup_state, tablespace_map);
+	   backup_state, tablespace_map, !opt->sendtblspcmapfile);
 
 	state.startptr = backup_state->startpoint;
 	state.starttli = backup_state->starttli;
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index cfe5409738..0a64a84b71 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -280,7 +280,8 @@ typedef enum SessionBackupState
 
 extern void do_pg_backup_start(const char *backupidstr, bool fast,
 			   List **tablespaces, BackupState *state,
-			   StringInfo tblspcmapfile);
+			   StringInfo tblspcmapfile,
+			   bool allow_inplace_tsps);
 extern void do_pg_backup_stop(BackupState *state, bool waitforarchive);
 extern void do_pg_abort_backup(int code, Datum arg);
 extern void register_persistent_abort_backup_handler(void);


Re: allow_in_place_tablespaces vs. pg_basebackup

2023-03-08 Thread Thomas Munro
On Thu, Mar 9, 2023 at 2:58 PM Kyotaro Horiguchi
 wrote:
> At Wed, 8 Mar 2023 16:30:05 -0500, Robert Haas  wrote 
> in
> > I'm not sure how messy it's going to be to clean this up. I think each
> > tablespace really needs to go into a separate ${TSOID}.tar file,
> > because we've got tons of code both on the server side and in
> > pg_basebackup that assumes that's how things work, and having in-place
> > tablespaces be a rare exception to that rule seems really unappealing.
> > This of course also implies that files in an in-place tablespace must
> > not go missing from the backup altogether.
>
> The doc clearly describes the purpose of in-place tablespaces and the
> potential confusion they can cause for backup tools not excluding
> pg_basebackup. Does this not provide sufficient information?
>
> https://www.postgresql.org/docs/devel/runtime-config-developer.html

Yeah.  We knew that this didn't work (was discussed in a couple of
other threads), but you might be right that an error would be better
for now.  It's absolutely not a user facing mode of operation, it was
intended just for the replication tests.  Clearly it might be useful
for testing purposes in the backup area too, which is probably how
Robert got here.  I will think about what changes would be needed as I
am looking at backup code currently, but I'm not sure when I'll be
able to post a patch so don't let that stop anyone else who sees how
to get it working...




Re: Allow logical replication to copy tables in binary format

2023-03-08 Thread Amit Kapila
On Wed, Mar 8, 2023 at 6:17 PM Melih Mutlu  wrote:
>
> On 7 Mar 2023 Tue at 04:10 Amit Kapila  wrote:
>>
>> As per what I could read in this thread, most people prefer to use the
>> existing binary option rather than inventing a new way (option) to
>> binary copy in the initial sync phase. Do you agree?
>
>
> I agree.
> What do you think about the version checks? I removed any kind of check since 
> it’s currently a different option. Should we check publisher version before 
> doing binary copy to ensure that the publisher node supports binary option of 
> COPY command?
>

It is not clear to me which version check you wanted to add because we
seem to have a binary option in COPY from the time prior to logical
replication. I feel we need a publisher version 14 check as that is
where we start to support binary mode transfer in logical replication.
See the check in function libpqrcv_startstreaming().

-- 
With Regards,
Amit Kapila.




RE: Support logical replication of DDLs

2023-03-08 Thread wangw.f...@fujitsu.com
On Mon, Mar 6, 2023 18:17 PM Wang, Wei/王 威  wrote:
> On Mon, Mar 6, 2023 14:34 AM Ajin Cherian  wrote:
> > Changes are in patch 1 and patch 2
> 
> Thanks for updating the patch set.
> 
> Here are some comments:

Here are some more comments for v-75-0002* patch:

1. In the function deparse_AlterRelation
+   if ((sub->address.objectId != relId &&
+sub->address.objectId != InvalidOid) &&
+   !(subcmd->subtype == AT_AddConstraint &&
+ subcmd->recurse) &&
+   istable)
+   continue;
I think when we execute the command "ALTER TABLE ... CLUSTER ON" (subtype is
AT_ClusterOn), this command will be skipped for parsing. I think we need to
parse this command here.

I think we are skipping some needed parsing due to this check, such as [1].#1
and the AT_ClusterOn command mentioned above. After reading the thread, I think
the purpose of this check is to fix the bug in [2] (see the last point in [2]).
I think maybe we could modify this check to `continue` when
sub->address.objectId and relId are inconsistent and sub->address.objectId is a
child (inherited or partition) table. What do you think?

~~~

2. In the function deparse_CreateStmt
I think when we execute the following command:
`CREATE TABLE tbl (a int GENERATED ALWAYS AS (1) STORED);`
the deparsed result is :
`CREATE  TABLE  public.tbl (a pg_catalog.int4 STORAGE plain GENERATED 
ALWAYS AS 1 STORED);`
I think the parentheses around generation_expr(I mean `1`) are missing, which
would cause a syntax error.

~~~

3. In the function deparse_IndexStmt
I think we missed parsing of options [NULLS NOT DISTINCT] in the following
command:
```
CREATE UNIQUE INDEX ... ON table_name ... NULLS NOT DISTINCT;
```
I think we could check this option via node->nulls_not_distinct.

[1] - 
https://www.postgresql.org/message-id/OS3PR01MB6275FE40496DA47C0A3369289EB69%40OS3PR01MB6275.jpnprd01.prod.outlook.com
[2] - 
https://www.postgresql.org/message-id/CAAD30UJ25nTPiVc0RTnsVbhHSNrnoqoackf9%2B%2BNa%2BR-QN6dRkw%40mail.gmail.com

Regards,
Wang wei


Re: meson vs make: missing/inconsistent ENV

2023-03-08 Thread Andres Freund
Hi,

On 2023-03-09 09:36:52 +0900, Michael Paquier wrote:
> On Sun, Feb 26, 2023 at 03:21:04PM -0800, Andres Freund wrote:
> > On 2023-02-26 16:52:39 -0600, Justin Pryzby wrote:
> >> Also, e6927270c added ZSTD to src/bin/pg_basebackup/meson.build, but
> >> it's not in ./Makefile ??  Maybe that was for consistency with other
> >> places, or pre-emptive in case the tap tests want to do tests involving
> >> the ZSTD tool.  But it'd be better if ./Makefile had it too.
> > 
> > I suspect I just over-eagerly added it when the pg_basebackup zstd support
> > went in, using the GZIP_PROGRAM/LZ4 cases as a template. And foolishly
> > assuming a newly added compression method would be tested.
> 
> leaving the discussion with the perl warnings aside for the moment,
> these still need to be adjusted.  Justin, would you like to write a
> patch with everything you have found?

I now pushed a fix for the two obvious cases pointed out by Justin.

Greetings,

Andres Freund




Re: allow_in_place_tablespaces vs. pg_basebackup

2023-03-08 Thread Kyotaro Horiguchi
At Wed, 8 Mar 2023 16:30:05 -0500, Robert Haas  wrote in 
> Commit 7170f2159fb21b62c263acd458d781e2f3c3f8bb, which introduced
> in-place tablespaces, didn't make any adjustments to pg_basebackup.
> The resulting behavior is pretty bizarre.
> 
> If you take a plain-format backup using pg_basebackup -Fp, then the
> files in the in-place tablespace are backed up, but if you take a
> tar-format backup using pg_basebackup -Ft, then they aren't.
>
> I had at first wondered how this could even be possible, since after
> all a plain format backup is little more than a tar-format backup that
> pg_basebackup chooses to extract for you. The answer turns out to be
> that a tar-format backup activates the server's TABLESPACE_MAP option,
> and a plain-format backup doesn't, and so pg_basebackup can handle
> this case differently depending on the value of that flag, and does.
> Even for a plain-format backup, the server's behavior is not really
> correct, because I think what's happening is that the tablespace files
> are getting included in the base.tar file generated by the server,
> rather than in a separate ${TSOID}.tar file as would normally happen
> for a user-defined tablespace, but because the tar files get extracted
> before the user lays eyes on them, the user-visible consequences are
> masked, at least in the cases I've tested so far.

In my understading, in-place tablespaces are a feature for testing
purpose only. Therefore, the feature was intentionally designed to
have minimal impact on the code and to provide minimum-necessary
behavior for that purpose. I believe it is reasonable to make
basebackup error-out when it encounters an in-place tablespace
directory when TABLESPACE_MAP is activated.

> I'm not sure how messy it's going to be to clean this up. I think each
> tablespace really needs to go into a separate ${TSOID}.tar file,
> because we've got tons of code both on the server side and in
> pg_basebackup that assumes that's how things work, and having in-place
> tablespaces be a rare exception to that rule seems really unappealing.
> This of course also implies that files in an in-place tablespace must
> not go missing from the backup altogether.

The doc clearly describes the purpose of in-place tablespaces and the
potential confusion they can cause for backup tools not excluding
pg_basebackup. Does this not provide sufficient information?

https://www.postgresql.org/docs/devel/runtime-config-developer.html

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-03-08 Thread Melanie Plageman
Thank you to all reviewers!

This email is in answer to the three reviews
done since my last version. Attached is v2 and inline below is replies
to all the review comments.

The main difference between this version and the previous version is
that I've added a guc, buffer_usage_limit and the VACUUM option
BUFFER_USAGE_LIMIT is now to be specified in size (like kB, MB, etc).

I currently only use the guc value for VACUUM, but it is meant to be
used for all buffer access strategies and is configurable at the session
level.

I would prefer that we had the option of resizing the buffer access
strategy object per table being autovacuumed. Since autovacuum reloads
the config file between tables, this would be quite possible.

I started implementing this, but stopped because the code is not really
in a good state for that.

In fact, I'm not very happy with my implementation at all because I
think given the current structure of vacuum() and vacuum_rel(), it will
potentially make the code more confusing.

I don't like how autovacuum and vacuum use vacuum_rel() and vacuum()
differently (autovacuum always calls vacuum() with a list containing a
single relation). And vacuum() takes buffer access strategy as a
parameter, supposedly so that autovacuum can change the buffer access
strategy object per call, but it doesn't do that. And then vacuum() and
vacuum_rel() go and access VacuumParams at various places with no rhyme
or reason -- seemingly just based on the random availability of other
objects whose state they would like to check on. So, IMO, in adding a
"buffers" parameter to VacuumParams, I am asking for confusion in
autovacuum code with table-level VacuumParams containing an value for
buffers that isn't used.


On Mon, Feb 27, 2023 at 4:21 PM Andres Freund  wrote:
> On 2023-02-22 16:32:53 -0500, Melanie Plageman wrote:
> > The first patch in the set is to free the BufferAccessStrategy object
> > that is made in do_autovacuum() -- I don't see when the memory context
> > it is allocated in is destroyed, so it seems like it might be a leak?
>
> The backend shuts down just after, so that's not a real issue. Not that it'd
> hurt to fix it.

I've dropped that patch from the set.

> > > I can see that making sense, particularly if we were to later extend this
> > > to
> > > other users of ringbuffers. E.g. COPYs us of the ringbuffer makes loading
> > > of
> > > data > 16MB but also << s_b vastly slower, but it can still be very
> > > important
> > > to use if there's lots of parallel processes loading data.
> > >
> > > Maybe BUFFER_USAGE_LIMIT, with a value from -1 to N, with -1 indicating 
> > > the
> > > default value, 0 preventing use of a buffer access strategy, and 1..N
> > > indicating the size in blocks?
>
> > I have found the implementation you suggested very hard to use.
> > The attached fourth patch in the set implements it the way you suggest.
> > I had to figure out what number to set the BUFER_USAGE_LIMIT to -- and,
> > since I don't specify shared buffers in units of nbuffer, it's pretty
> > annoying to have to figure out a valid number.
>
> I think we should be able to parse it in a similar way to how we parse
> shared_buffers. You could even implement this as a GUC that is then set by
> VACUUM (similar to how VACUUM FREEZE is implemented).

in the attached v2, I've used parse_int() to do this.


> > I think that it would be better to have it be either a percentage of
> > shared buffers or a size in units of bytes/kb/mb like that of shared
> > buffers.
>
> I don't think a percentage of shared buffers works particularly well - you
> very quickly run into the ringbuffer becoming impractically big.

It is now a size.

> > > Would we want to set an upper limit lower than implied by the memory limit
> > > for
> > > the BufferAccessStrategy allocation?
> > >
> > >
> > So, I was wondering what you thought about NBuffers / 8 (the current
> > limit). Does it make sense?
>
> That seems *way* too big. Imagine how large allocations we'd end up with a
> shared_buffers size of a few TB.
>
> I'd probably make it a hard error at 1GB and a silent cap at NBuffers / 2 or
> such.

Well, as I mentioned NBuffers / 8 is the current GetAccessStrategy()
cap.

In the attached patchset, I have introduced a hard cap of 16GB which is
enforced for the VACUUM option and for the buffer_usage_limit guc. I
kept the "silent cap" at NBuffers / 8 but am open to changing it to
NBuffers / 2 if we think it is okay for its silent cap to be different
than GetAccessStrategy()'s cap.


> > @@ -547,7 +547,7 @@ bt_check_every_level(Relation rel, Relation heaprel, 
> > bool heapkeyspace,
> >   state->targetcontext = AllocSetContextCreate(CurrentMemoryContext,
> > 
> >"amcheck context",
> > 
> >ALLOCSET_DEFAULT_SIZES);
> > - state->checkstrategy = 

Re: [PATCH] Align GSS and TLS error handling in PQconnectPoll()

2023-03-08 Thread Michael Paquier
On Fri, Feb 17, 2023 at 09:01:43AM -0800, Jacob Champion wrote:
> On Thu, Feb 16, 2023 at 10:59 PM Michael Paquier  wrote:
>> I am adding Stephen Frost
>> in CC to see if he has any comments about all this part of the logic
>> with gssencmode.
> 
> Sounds good.

Hearing nothing on this part, perhaps we should just move on and
adjust the behavior on HEAD?  Thats seems like one step in the good
direction.  If this brews right, we could always discuss a backpatch
at some point, if necessary.

Thoughts from others?
--
Michael


signature.asc
Description: PGP signature


Re: Add SHELL_EXIT_CODE to psql

2023-03-08 Thread Justin Pryzby
On Wed, Feb 22, 2023 at 03:04:33PM -0500, Corey Huinker wrote:
> +
> +/*
> + * Return the POSIX exit code (0 to 255) that corresponds to the argument.
> + * The argument is a return code returned by wait(2) or waitpid(2), which
> + * also applies to pclose(3) and system(3).
> + */
> +int
> +wait_result_to_exit_code(int exit_status)
> +{
> + if (WIFEXITED(exit_status))
> + return WEXITSTATUS(exit_status);
> + if (WIFSIGNALED(exit_status))
> + return WTERMSIG(exit_status);
> + return 0;
> +}

This fails to distinguish between exiting with (say) code 1 and being
killed by signal 1.

> - if (ferror(fd))
> + exit_code = ferror(fd);
> + if (exit_code)

And this adds even more ambiguity with internal library/system calls, as
Tom said.

> + if (close_exit_code && !exit_code)
> + {
> + error = true;
> + exit_code = close_exit_code;
> + if (close_exit_code == -1)
> + pg_log_error("%s: %m", cmd);

I think if an error ocurrs in pclose(), then it should *always* be
reported.  Knowing that we somehow failed while running the command is
more important than knowing how the command ran when we had a failure
while running it.

Note that for some tools, a nonzero exit code can be normal.  Like diff
and grep.

The exit status is one byte.  I think you should define the status
variable along the lines of:

 - 0 if successful; or,
 - a positive number 1..255 indicating its exit status. or,
 - a negative number N indicating it was terminated by signal -N; or,
 - 256 if an internal error occurred (like pclose/ferror);

See bash(1).  This would be a good behavior to start with, since it
ought to be familiar to everyone, and if it's good enough to write/run
shell scripts in, then it's got to be good enough for psql to run a
single command in.  I'm not sure why the shell uses 126-127 specially,
though..

EXIT STATUS
   The exit status of an executed command is the value returned by the 
waitpid system call or  equivalent  function.   Exit  statuses
   fall  between  0  and  255,  though,  as  explained below, the shell may 
use values above 125 specially.  Exit statuses from shell
   builtins and compound commands are also limited to this range.  Under 
certain circumstances, the shell will use special values  to
   indicate specific failure modes.

   For  the shell's purposes, a command which exits with a zero exit status 
has succeeded.  An exit status of zero indicates success.
   A non-zero exit status indicates failure.  When a command terminates on 
a fatal signal N, bash uses the value of 128+N as the exit
   status.

   If a command is not found, the child process created to execute it 
returns a status of 127.  If a command is found but is not exe‐
   cutable, the return status is 126.

   If a command fails because of an error during expansion or redirection, 
the exit status is greater than zero.

-- 
Justin




Re: Add pg_walinspect function with block info columns

2023-03-08 Thread Michael Paquier
On Thu, Mar 09, 2023 at 09:46:12AM +0900, Kyotaro Horiguchi wrote:
> Although I'm not strongly opposed to pfreeing them, I'm not sure I
> like the way the patch frees them.  The life times of all of raw_data,
> raw_page and flags are within a block.  They can be freed
> unconditionally after they are actually used and the scope of the
> pointer variables can be properly narowed.

The thing is that you cannot keep them inside each individual blocks
because they have to be freed once their values are stored in the
tuplestore, which is why I guess Bharath has done things this way.
After sleeping on that, I tend to prefer the simplicity of v3 where we
keep track of the block and fpi data in each of their respective
blocks.  It means that we lose track of them each time we go to a
different block, but the memory context reset done after each record
means that scanning through a large WAL history will not cause a leak
across the function call.

The worst scenario with v3 is a record that makes use of all the 32
blocks with a hell lot of block data in each one of them, which is
possible in theory, but very unlikely in practice except if someone
uses a custom RGMR to generate crazily-shaped WAL records.  I am aware
of the fact that it is possible to generate such records if you are
really willing to do so, aka this thread:
https://www.postgresql.org/message-id/flat/CAEze2WgGiw+LZt+vHf8tWqB_6VxeLsMeoAuod0N=ij1q17n...@mail.gmail.com

In short, my choice would still be simplicity here with v3, I guess.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-08 Thread Peter Smith
Here are my review comments for v37-0001.

==
General - should/must.

1.
In my previous review [1] (comment #1) I wrote that only some of the
"should" were misleading and gave examples where to change. But I
didn't say that *every* usage of that word was wrong, so your global
replace of "should" to "must" has modified a couple of places in
unexpected ways.

Details are in subsequent review comments below -- see #2b, #3, #5.

==
doc/src/sgml/logical-replication.sgml

2.
A published table must have a “replica identity” configured in order
to be able to replicate UPDATE and DELETE operations, so that
appropriate rows to update or delete can be identified on the
subscriber side. By default, this is the primary key, if there is one.
Another unique index (with certain additional requirements) can also
be set to be the replica identity. If the table does not have any
suitable key, then it can be set to replica identity “full”, which
means the entire row becomes the key. When replica identity “full” is
specified, indexes can be used on the subscriber side for searching
the rows. Candidate indexes must be btree, non-partial, and have at
least one column reference (i.e. cannot consist of only expressions).
These restrictions on the non-unique index properties adheres to some
of the restrictions that are enforced for primary keys. Internally, we
follow a similar approach for supporting index scans within logical
replication scope. If there are no such suitable indexes, the search
on the subscriber side can be very inefficient, therefore replica
identity “full” must only be used as a fallback if no other solution
is possible. If a replica identity other than “full” is set on the
publisher side, a replica identity comprising the same or fewer
columns must also be set on the subscriber side. See REPLICA IDENTITY
for details on how to set the replica identity. If a table without a
replica identity is added to a publication that replicates UPDATE or
DELETE operations then subsequent UPDATE or DELETE operations will
cause an error on the publisher. INSERT operations can proceed
regardless of any replica identity.

~~

2a.
My previous review [1] (comment #2)  was not fixed quite as suggested.

Please change:
"adheres to" --> "adhere to"

~~

2b. should/must

This should/must change was OK as it was before, because here it is only advice.

Please change this back how it was:
"must only be used as a fallback" --> "should only be used as a fallback"

==
src/backend/executor/execReplication.c

3. build_replindex_scan_key

 /*
  * Setup a ScanKey for a search in the relation 'rel' for a tuple 'key' that
  * is setup to match 'rel' (*NOT* idxrel!).
  *
- * Returns whether any column contains NULLs.
+ * Returns how many columns must be used for the index scan.
+ *

~

This should/must change does not seem quite right.

SUGGESTION (reworded)
Returns how many columns to use for the index scan.

~~~

4. build_replindex_scan_key

>
> Based on the discussions below, I kept as-is. I really don't want to do 
> unrelated
> changes in this patch, as I also got several feedback for not doing it,
>

Hmm, although this code pre-existed I don’t consider this one as
"unrelated changes" because the patch introduced the new "if
(!AttributeNumberIsValid(table_attno))" which changed things.  As I
wrote to Amit yesterday [2] IMO it would be better to do the 'opttype'
assignment *after* the potential 'continue' otherwise there is every
chance that the assignment is just redundant. And if you move the
assignment where it belongs, then you might as well declare the
variable in the more appropriate place at the same time – i.e. with
'opfamily' declaration. Anyway, I've given my reason a couple of times
now, so if you don't want to change it I won't about it debate
anymore.

==
src/backend/replication/logical/relation.c

5. FindUsableIndexForReplicaIdentityFull

+ * XXX: There are no fundamental problems for supporting non-btree indexes.
+ * We mostly need to relax the limitations in RelationFindReplTupleByIndex().
+ * For partial indexes, the required changes are likely to be larger. If
+ * none of the tuples satisfy the expression for the index scan, we must
+ * fall-back to sequential execution, which might not be a good idea in some
+ * cases.

The should/must change (the one in the XXX comment) does not seem quite right.

SUGGESTION
"we must fall-back to sequential execution" --> "we fallback to
sequential execution"

==
.../subscription/t/032_subscribe_use_index.pl

FYI, I get TAP test in error (Note - this is when only patch 0001 is appied)

t/031_column_list.pl ... ok
t/032_subscribe_use_index.pl ... 19/?
#   Failed test 'ensure subscriber has not used index with
enable_indexscan=false'
#   at t/032_subscribe_use_index.pl line 806.
#  got: '1'
# expected: '0'
t/032_subscribe_use_index.pl ... 21/? # Looks like you failed 1 test of 22.
t/032_subscribe_use_index.pl ... Dubious, test returned 1 

Re: Add error functions: erf() and erfc()

2023-03-08 Thread Thomas Munro
On Thu, Mar 9, 2023 at 1:16 PM Dean Rasheed  wrote:
> On Wed, 8 Mar 2023 at 23:24, Thomas Munro  wrote:
> > ... But Oracle, and I think
> > several other analytics-focused SQL systems, can do it in a very easy
> > built-in way.  I think to get at that you probably need the t CDF, and
> > in there[2] I see... Γ().  Huh.
>
> Hmm, possibly having the gamma function would be independently useful
> for other things too. I don't want to get side-tracked though.

I guess if we did want to add some nice easy-to-use hypothesis testing
tools to PostgreSQL, then perhaps gamma wouldn't actually be needed
from SQL, but it might be used inside C code for something higher
level like tcdf()[1], or even very high level like
t_test_independent_agg(s1, s2) etc.  Anyway, just thought I'd mention
those in passing, as I see they arrived together; sorry for getting
off topic.

[1] 
https://stats.stackexchange.com/questions/394978/how-to-approximate-the-student-t-cdf-at-a-point-without-the-hypergeometric-funct




Re: Add pg_walinspect function with block info columns

2023-03-08 Thread Kyotaro Horiguchi
At Wed, 8 Mar 2023 20:18:06 +0530, Bharath Rupireddy 
 wrote in 
> On Wed, Mar 8, 2023 at 4:23 PM Michael Paquier  wrote:
> >
> > On Wed, Mar 08, 2023 at 04:01:56PM +0530, Bharath Rupireddy wrote:
> > > I understand that performance is critical here but we need to ensure
> > > memory is used wisely. Therefore, I'd still vote to free at least the
> > > major contributors here, that is, pfree(raw_data);, pfree(raw_page);
> > > and pfree(flags); right after they are done using. I'm sure pfree()s
> > > don't hurt more than resetting memory context for every block_id.
> >
> > Okay by me to have intermediate pfrees between each block scanned if
> > you feel strongly about it.
> 
> Thanks. Attached v4 with that change.

Although I'm not strongly opposed to pfreeing them, I'm not sure I
like the way the patch frees them.  The life times of all of raw_data,
raw_page and flags are within a block.  They can be freed
unconditionally after they are actually used and the scope of the
pointer variables can be properly narowed.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Re: meson vs make: missing/inconsistent ENV

2023-03-08 Thread Michael Paquier
On Sun, Feb 26, 2023 at 03:21:04PM -0800, Andres Freund wrote:
> On 2023-02-26 16:52:39 -0600, Justin Pryzby wrote:
>> Also, e6927270c added ZSTD to src/bin/pg_basebackup/meson.build, but
>> it's not in ./Makefile ??  Maybe that was for consistency with other
>> places, or pre-emptive in case the tap tests want to do tests involving
>> the ZSTD tool.  But it'd be better if ./Makefile had it too.
> 
> I suspect I just over-eagerly added it when the pg_basebackup zstd support
> went in, using the GZIP_PROGRAM/LZ4 cases as a template. And foolishly
> assuming a newly added compression method would be tested.

leaving the discussion with the perl warnings aside for the moment,
these still need to be adjusted.  Justin, would you like to write a
patch with everything you have found?
--
Michael


signature.asc
Description: PGP signature


Re: Track IO times in pg_stat_io

2023-03-08 Thread Andres Freund
Hi,

On 2023-03-08 12:55:34 +0100, Drouvot, Bertrand wrote:
> On 3/7/23 7:47 PM, Andres Freund wrote:
> > On 2023-03-07 13:43:28 -0500, Melanie Plageman wrote:
> > > > Now I've a second thought: what do you think about resetting the 
> > > > related number
> > > > of operations and *_time fields when enabling/disabling 
> > > > track_io_timing? (And mention it in the doc).
> > > > 
> > > > That way it'd prevent bad interpretation (at least as far the time per 
> > > > operation metrics are concerned).
> > > > 
> > > > Thinking that way as we'd loose some (most?) benefits of the new *_time 
> > > > columns
> > > > if one can't "trust" their related operations and/or one is not 
> > > > sampling pg_stat_io frequently enough (to discard the samples
> > > > where the track_io_timing changes occur).
> > > > 
> > > > But well, resetting the operations could also lead to bad 
> > > > interpretation about the operations...
> > > > 
> > > > Not sure about which approach I like the most yet, what do you think?
> > > 
> > > Oh, this is an interesting idea. I think you are right about the
> > > synchronization issues making the statistics untrustworthy and, thus,
> > > unuseable.
> > 
> > No, I don't think we can do that. It can be enabled on a per-session basis.
> 
> Oh right. So it's even less clear to me to get how one would make use of 
> those new *_time fields, given that:
> 
> - pg_stat_io is "global" across all sessions. So, even if one session is 
> doing some "testing" and needs to turn track_io_timing on, then it
> is even not sure it's only reflecting its own testing (as other sessions may 
> have turned it on too).

I think for 17 we should provide access to per-existing-connection pg_stat_io
stats, and also provide a database aggregated version. Neither should be
particularly hard.


> - There is the risk mentioned above of bad interpretations for the "time per 
> operation" metrics.
> 
> - Even if there is frequent enough sampling of it pg_stat_io, one does not 
> know which samples contain track_io_timing changes (at the cluster or session 
> level).

You'd just make the same use of them you do with pg_stat_database.blks_read
etc today.

I don't think it's particularly useful to use the time to calculate "per IO"
costs - they can vary *drastically* due to kernel level buffering. The point
of having the time available is that it provides information that the number
of operations doesn't provide.


> > I think we simply shouldn't do anything here. This is a pre-existing issue.
> 
> Oh, never thought about it. You mean like for pg_stat_database.blks_read and 
> pg_stat_database.blk_read_time for example?

Yes.

Greetings,

Andres Freund




Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

2023-03-08 Thread Michael Paquier
On Thu, Mar 09, 2023 at 12:39:08AM +0100, Tomas Vondra wrote:
> IMO we should fix that. We have a bunch of buildfarm members running on
> Ubuntu 18.04 (or older) - it's true none of them seems to be running TAP
> tests. But considering how trivial the fix is ...
> 
> Barring objections, I'll push a fix early next week.

+1, better to change that, thanks.  Actually, would --rm be OK even on
Windows?  As far as I can see, the CI detects a LZ4 command for the
VS2019 environment but not the liblz4 libraries that would be needed
to trigger the set of tests.
--
Michael


signature.asc
Description: PGP signature


Re: Add error functions: erf() and erfc()

2023-03-08 Thread Nathan Bossart
On Thu, Mar 09, 2023 at 12:27:47AM +, Dean Rasheed wrote:
> On Thu, 9 Mar 2023 at 00:13, Nathan Bossart  wrote:
>> I'm also wondering about whether we need the isinf() checks.  IIUC that
>> should never happen, but maybe you added that "just in case."
> 
> I copied those from dtanh(), otherwise I probably wouldn't have
> bothered. erf() is always in the range [-1, 1], just like tanh(), so
> it should never overflow, but maybe it can happen in a broken
> implementation.

Okay.  This looks good to me, then.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Add error functions: erf() and erfc()

2023-03-08 Thread Dean Rasheed
On Thu, 9 Mar 2023 at 00:13, Nathan Bossart  wrote:
>
> On Wed, Mar 08, 2023 at 11:29:12PM +, Dean Rasheed wrote:
> > On Wed, 8 Mar 2023 at 20:11, Nathan Bossart  
> > wrote:
> >> The man pages for these seem to indicate that underflow can occur.  Do we
> >> need to check for that?
> >
> > No, I don't think so. The docs indicate that if an underflow occurs,
> > the correct result (after rounding) should be returned, so I think we
> > should just return that result (as we do for tanh(), for example).
>
> Makes sense.
>
> I'm also wondering about whether we need the isinf() checks.  IIUC that
> should never happen, but maybe you added that "just in case."
>

I copied those from dtanh(), otherwise I probably wouldn't have
bothered. erf() is always in the range [-1, 1], just like tanh(), so
it should never overflow, but maybe it can happen in a broken
implementation.

Regards,
Dean




Re: Should vacuum process config file reload more often

2023-03-08 Thread Andres Freund
Hi,

On 2023-03-08 11:42:31 -0600, Jim Nasby wrote:
> On 3/2/23 1:36 AM, Masahiko Sawada wrote:
> 
> > > > > For example, I guess we will need to take care of changes of
> > > > > maintenance_work_mem. Currently we initialize the dead tuple space at
> > > > > the beginning of lazy vacuum, but perhaps we would need to
> > > > > enlarge/shrink it based on the new value?
> Doesn't the dead tuple space grow as needed? Last I looked we don't allocate
> up to 1GB right off the bat.
> > > > I don't think we need to do anything about that initially, just because 
> > > > the
> > > > config can be changed in a more granular way, doesn't mean we have to 
> > > > react to
> > > > every change for the current operation.
> > > Perhaps we can mention in the docs that a change to maintenance_work_mem
> > > will not take effect in the middle of vacuuming a table. But, Ithink it 
> > > probably
> > > isn't needed.
> > Agreed.
> 
> I disagree that there's no need for this. Sure, if maintenance_work_memory
> is 10MB then it's no big deal to just abandon your current vacuum and start
> a new one, but the index vacuuming phase with maintenance_work_mem set to
> say 500MB can take quite a while. Forcing a user to either suck it up or
> throw everything in the phase away isn't terribly good.
> 
> Of course, if the patch that eliminates the 1GB vacuum limit gets committed
> the situation will be even worse.
> 
> While it'd be nice to also honor maintenance_work_mem getting set lower, I
> don't see any need to go through heroics to accomplish that. Simply
> recording the change and honoring it for future attempts to grow the memory
> and on future passes through the heap would be plenty.
> 
> All that said, don't let these suggestions get in the way of committing
> this. Just having the ability to tweak cost parameters would be a win.

Nobody said anything about it not being useful to react to m_w_m changes, just
that it's not required to make some progress . So I really don't understand
what the point of your comment is.

Greetings,

Andres Freund




Re: Add documentation for coverage reports with meson

2023-03-08 Thread Michael Paquier
On Wed, Mar 08, 2023 at 05:23:48PM +0100, Peter Eisentraut wrote:
> The "cd" command needs to be moved after the meson commands, and the meson
> commands need to have a -C builddir option.

Still that's not mandatory, is it?  The compile and test commands of
meson work as well if you are located at the root of the build
directory, AFAIK.

> So it should be like
> 
> 
> meson setup -Db_coverage=true ... OTHER OPTIONS ... builddir/
> meson compile -C builddir
> meson test -C builddir
> cd builddir/
> ninja coverage-html
> 
> 
> Otherwise, this looks good to me.

Anyway, this works as well and I don't have any arguments against
that.  So I have used your flow, and applied the patch.  I have
actually switched my own scripts to rely more on -C, removing direct
calls to ninja ;p

Thanks for the feedback.
--
Michael


signature.asc
Description: PGP signature


Re: psql \watch 2nd argument: iteration count

2023-03-08 Thread Andrey Borodin
On Wed, Mar 8, 2023 at 10:49 AM Nathan Bossart  wrote:
>
> Is there any reason to disallow 0 for the sleep argument?  I often use
> commands like "\watch .1" to run statements repeatedly with very little
> time in between, and I'd use "\watch 0" instead if it was available.
>

Yes, that makes sense! Thanks!

Best regards, Andrey Borodin.


v4-0002-Add-iteration-count-argument-to-psql-watch-comman.patch
Description: Binary data


v4-0001-Fix-incorrect-argument-handling-in-psql-watch.patch
Description: Binary data


Re: Add error functions: erf() and erfc()

2023-03-08 Thread Dean Rasheed
On Wed, 8 Mar 2023 at 23:24, Thomas Munro  wrote:
>
> No comment on the maths, but I'm pretty sure we won't need a fallback
> implementation.  That stuff goes back to the math libraries of 80s
> Unixes, even though it didn't make it into C until '99.  I just
> checked the man pages for all our target systems and they all show it.
> (There might be some portability details around the tgmath.h versions
> on some systems, eg to support different float sizes, I dunno, but
> you're using the plain math.h versions.)
>

Thanks for checking. Hopefully they will be available everywhere.

I think what's more likely to happen is that the tests will reveal
implementation variations, as they did when the hyperbolic functions
were added, and it'll be necessary to adjust or remove some of the
test cases. When I originally wrote those tests, I picked a value from
each branch that the FreeBSD implementation handled differently, but I
think that's overkill. If the purpose of the tests is just to confirm
that the right C library function has been exposed, they could
probably be pared all the way down to just testing erf(1) and erfc(1),
but it might be useful to first see what platform variations exist.

> Two related functions that also arrived in C99 are lgamma() and
> tgamma().  If you'll excuse the digression, that reminded me of
> something I was trying to figure out once, for a practical problem.
> My statistics knowledge is extremely patchy, but I have been trying to
> up my benchmarking game, and that led to a bit of remedial reading on
> Student's t tests and related stuff.  A few shaven yaks later, I
> understood that you could probably (if you like pain) do that sort of
> stuff inside PostgreSQL using our existing aggregates, if you took the
> approach of ministat[1].  That tool has a table of critical values
> inside it, indexed by degrees-of-freedom (1-100) and confidence level
> (80, 90, 95, 98, 99, 99.5), and one could probably write SQL queries
> that spit out an answer like "p is less than 5%, ship it!", if we
> stole that table.  But what if I actually want to know p?  Of course
> you can do all that good stuff very easily with tools like R, SciPy
> etc and maybe that's the best way to do it.  But Oracle, and I think
> several other analytics-focused SQL systems, can do it in a very easy
> built-in way.  I think to get at that you probably need the t CDF, and
> in there[2] I see... Γ().  Huh.
>

Hmm, possibly having the gamma function would be independently useful
for other things too. I don't want to get side-tracked though.

Regards,
Dean




Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

2023-03-08 Thread Andres Freund
Hi,

On 2023-02-06 13:02:05 -0800, Andres Freund wrote:
> I didn't quite feel confident pushing a fix for this just before a minor
> release, so I'll push once the minor releases are tagged. A quite minimal fix
> to GetFullRecentGlobalXmin() in 12-13 (returning FirstNormalTransactionId if
> epoch == 0 and RecentGlobalXmin > nextxid_xid), and the slightly larger fix in
> 14+.

Pushed that.


Mark:

I worked some more on the fixes for amcheck, and fixes for amcheck.

The second amcheck fix ends up correcting some inaccurate output in the tests
- previously xids from before xid 0 were reported to be in the future.

Previously there was no test case exercising exceeding nextxid, without
wrapping around into the past. I added that at the end of
004_verify_heapam.pl, because renumbering seemed too annoying.

What do you think?


Somewhat random note:

Is it intentional that we VACUUM FREEZE test ROWCOUNT times? That's
effectively O(ROWCOUNT^2), albeit with small enough constants to not really
matter. I don't think we need to insert the rows one-by-one either. Changing
that to a single INSERT and FREEZE shaves 10-12% off the tests.  I didn't
change that, but we also fire off a psql for each tuple for heap_page_items(),
with offset $N no less. That seems to be another 500ms.

Greetings,

Andres Freund
>From 3f67523bac084964c0e780fddd509f3464d32282 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 8 Mar 2023 11:37:05 -0800
Subject: [PATCH v5 1/8] amcheck: Fix ordering bug in update_cached_xid_range()

The initialization order in update_cached_xid_range() was wrong, calling
FullTransactionIdFromXidAndCtx() before setting
->next_xid. FullTransactionIdFromXidAndCtx() uses ->next_xid.

In most situations this will not cause visible issues, because the next call
to update_cached_xid_range() will use a less wrong ->next_xid. It's rare that
xids advance fast enough for this to be a problem.

Found while adding more asserts to the 64bit xid infrastructure.

Discussion: https://postgr.es/m/20230108002923.cyoser3ttmt63...@awork3.anarazel.de
Backpatch: 14-, where heapam verification was introduced
---
 contrib/amcheck/verify_heapam.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 4fcfd6df72d..33c5b338959 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -1576,6 +1576,9 @@ FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx)
 {
 	uint32		epoch;
 
+	Assert(TransactionIdIsNormal(ctx->next_xid));
+	Assert(FullTransactionIdIsNormal(ctx->next_fxid));
+
 	if (!TransactionIdIsNormal(xid))
 		return FullTransactionIdFromEpochAndXid(0, xid);
 	epoch = EpochFromFullTransactionId(ctx->next_fxid);
@@ -1597,8 +1600,8 @@ update_cached_xid_range(HeapCheckContext *ctx)
 	LWLockRelease(XidGenLock);
 
 	/* And compute alternate versions of the same */
-	ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx);
 	ctx->next_xid = XidFromFullTransactionId(ctx->next_fxid);
+	ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx);
 }
 
 /*
-- 
2.38.0

>From c376f9c698c2c9cf9ad5316ceb96160611225430 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 8 Mar 2023 11:57:34 -0800
Subject: [PATCH v5 2/8] amcheck: Fix FullTransactionIdFromXidAndCtx() for xids
 before epoch 0

64bit xids can't represent xids before epoch 0 (see also be504a3e974). When
FullTransactionIdFromXidAndCtx() was passed such an xid, it'd create a 64bit
xid far into the future. Noticed while adding assertions in the course of
investigating be504a3e974, as amcheck's test create such xids.

To fix the issue, just return FirstNormalFullTransactionId in this case. A
freshly initdb'd cluster already has a newer horizon. The most minimal version
of this would make the messages for some detected corruptions differently
inaccurate. To make those cases accurate, switch
FullTransactionIdFromXidAndCtx() to use the 32bit modulo difference between
xid and nextxid to compute the 64bit xid, yielding sensible "in the future" /
"in the past" answers.

Discussion: https://postgr.es/m/20230108002923.cyoser3ttmt63...@awork3.anarazel.de
Backpatch: 14-, where heapam verification was introduced
---
 src/bin/pg_amcheck/t/004_verify_heapam.pl | 28 +--
 contrib/amcheck/verify_heapam.c   | 33 +++
 2 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/src/bin/pg_amcheck/t/004_verify_heapam.pl b/src/bin/pg_amcheck/t/004_verify_heapam.pl
index 215c30eaa8e..9984d0d9f87 100644
--- a/src/bin/pg_amcheck/t/004_verify_heapam.pl
+++ b/src/bin/pg_amcheck/t/004_verify_heapam.pl
@@ -217,7 +217,7 @@ my $rel = $node->safe_psql('postgres',
 my $relpath = "$pgdata/$rel";
 
 # Insert data and freeze public.test
-use constant ROWCOUNT => 16;
+use constant ROWCOUNT => 17;
 $node->safe_psql(
 	'postgres', qq(
 	INSERT INTO public.test (a, b, c)
@@ -378,23 +378,24 @@ for (my 

Re: Add error functions: erf() and erfc()

2023-03-08 Thread Nathan Bossart
On Wed, Mar 08, 2023 at 11:29:12PM +, Dean Rasheed wrote:
> On Wed, 8 Mar 2023 at 20:11, Nathan Bossart  wrote:
>> The man pages for these seem to indicate that underflow can occur.  Do we
>> need to check for that?
> 
> No, I don't think so. The docs indicate that if an underflow occurs,
> the correct result (after rounding) should be returned, so I think we
> should just return that result (as we do for tanh(), for example).

Makes sense.

I'm also wondering about whether we need the isinf() checks.  IIUC that
should never happen, but maybe you added that "just in case."

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

2023-03-08 Thread Tomas Vondra
On 3/8/23 20:20, Daniel Gustafsson wrote:
>> On 8 Mar 2023, at 18:55, Christoph Berg  wrote:
> 
>> 18.04 will be EOL in a few weeks so it might be ok to just say it's
>> not supported, but removing the input file manually after calling lz4
>> would be an easy fix.
> 
> Is it reasonable to expect that this version of LZ4 can/will appear on any
> other platform outside of archeology?  Removing the file manually would be a
> trivial way to stabilize but if it's only expected to happen on platforms 
> which
> are long since EOL by the time 16 ships then the added complication could be
> hard to justify.
> 

IMO we should fix that. We have a bunch of buildfarm members running on
Ubuntu 18.04 (or older) - it's true none of them seems to be running TAP
tests. But considering how trivial the fix is ...

Barring objections, I'll push a fix early next week.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add error functions: erf() and erfc()

2023-03-08 Thread Dean Rasheed
On Wed, 8 Mar 2023 at 20:11, Nathan Bossart  wrote:
>
> On Mon, Feb 27, 2023 at 12:54:35PM +, Dean Rasheed wrote:
> > + /*
> > +  * For erf, we don't need an errno check because it never overflows.
> > +  */
>
> > + /*
> > +  * For erfc, we don't need an errno check because it never overflows.
> > +  */
>
> The man pages for these seem to indicate that underflow can occur.  Do we
> need to check for that?
>

No, I don't think so. The docs indicate that if an underflow occurs,
the correct result (after rounding) should be returned, so I think we
should just return that result (as we do for tanh(), for example).

Regards,
Dean




Re: Add error functions: erf() and erfc()

2023-03-08 Thread Thomas Munro
On Tue, Feb 28, 2023 at 1:54 AM Dean Rasheed  wrote:
> Now that we have random_normal(), it seems like it would be useful to
> add the error functions erf() and erfc(), which I think are
> potentially useful to the people who will find random_normal() useful,
> and possibly others.
>
> An immediate use for erf() is that it allows us to do a
> Kolmogorov-Smirnov test for random_normal(), similar to the one for
> random().
>
> Both of these functions are defined in POSIX and C99, so in theory
> they should be available on all platforms. If that turns out not to be
> the case, then there's a commonly used implementation (e.g., see [1]),
> which we could include. I played around with that (replacing the
> direct bit manipulation stuff with frexp()/ldexp(), see pg_erf.c
> attached), and it appeared to be accurate to +/-1 ULP across the full
> range of inputs. Hopefully we won't need that though.

Hi,

No comment on the maths, but I'm pretty sure we won't need a fallback
implementation.  That stuff goes back to the math libraries of 80s
Unixes, even though it didn't make it into C until '99.  I just
checked the man pages for all our target systems and they all show it.
(There might be some portability details around the tgmath.h versions
on some systems, eg to support different float sizes, I dunno, but
you're using the plain math.h versions.)

I wonder if the SQL standard has anything to say about these, or the
related normal CDF.  I can't check currently but I doubt it, based on
searches and other systems' manuals.

Two related functions that also arrived in C99 are lgamma() and
tgamma().  If you'll excuse the digression, that reminded me of
something I was trying to figure out once, for a practical problem.
My statistics knowledge is extremely patchy, but I have been trying to
up my benchmarking game, and that led to a bit of remedial reading on
Student's t tests and related stuff.  A few shaven yaks later, I
understood that you could probably (if you like pain) do that sort of
stuff inside PostgreSQL using our existing aggregates, if you took the
approach of ministat[1].  That tool has a table of critical values
inside it, indexed by degrees-of-freedom (1-100) and confidence level
(80, 90, 95, 98, 99, 99.5), and one could probably write SQL queries
that spit out an answer like "p is less than 5%, ship it!", if we
stole that table.  But what if I actually want to know p?  Of course
you can do all that good stuff very easily with tools like R, SciPy
etc and maybe that's the best way to do it.  But Oracle, and I think
several other analytics-focused SQL systems, can do it in a very easy
built-in way.  I think to get at that you probably need the t CDF, and
in there[2] I see... Γ().  Huh.

[1] https://man.freebsd.org/cgi/man.cgi?query=ministat
[2] https://www.mathworks.com/help/stats/tcdf.html




Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security

2023-03-08 Thread Jacob Champion
On Wed, Mar 8, 2023 at 11:40 AM Robert Haas  wrote:
> On Wed, Mar 8, 2023 at 2:30 PM Jacob Champion  wrote:
> > I don't think I necessarily like that option better than SASL-style,
> > but hopefully that clarifies it somewhat?
>
> Hmm, yeah, I guess that's OK.

Okay, cool.

> I still don't love it, though. It feels
> more solid to me if the proxy can actually block the connections
> before they even happen, without having to rely on a server
> interaction to figure out what is permissible.

Sure. I don't see a way for the proxy to figure that out by itself,
though, going back to my asymmetry argument from before. Only the
server truly knows, at time of HBA processing, whether the proxy
itself has authority. If the proxy knew, it wouldn't be confused.

> I don't know what you mean by SASL-style, exactly.

That's the one where the server explicitly names all forms of
authentication, including the ambient ones (ANONYMOUS, EXTERNAL,
etc.), and requires the client to choose one before running any
actions on their behalf. That lets the require_auth machinery work for
this case, too.

--Jacob




Re: proposal - get_extension_version function

2023-03-08 Thread Tom Lane
Jacob Champion  writes:
> What I'm trying to pin down is the project's position on the reverse
> -- binary version X and SQL version X+1 -- because that seems
> generally unmaintainable, and I don't understand why an author would
> pay that tax if they could just avoid it by bailing out entirely. (If
> an author wants to allow that, great, but does everyone have to?)

Hard to say.  Our experience with the standard contrib modules is that
it really isn't much additional trouble; but perhaps more-complex modules
would have more interdependencies between functions.  In any case,
I fail to see the need for basing things on a catalog lookup rather
than embedding API version numbers in relevant C symbols.

regards, tom lane




Re: Ignoring BRIN for HOT updates (was: -udpates seems broken)

2023-03-08 Thread Matthias van de Meent
On Wed, 22 Feb 2023 at 14:14, Matthias van de Meent
 wrote:
>
> On Wed, 22 Feb 2023 at 13:15, Tomas Vondra
>  wrote:
> >
> > On 2/20/23 19:15, Matthias van de Meent wrote:
> > > Thanks. Based on feedback, attached is v2 of the patch, with as
> > > significant changes:
> > >
> > > - We don't store the columns we mention in predicates of summarized
> > > indexes in the hotblocking column anymore, they are stored in the
> > > summarized columns bitmap instead. This further reduces the chance of
> > > failiing to apply HOT with summarizing indexes.
> >
> > Interesting idea. I need to think about the correctness, but AFAICS it
> > should work. Do we have any tests covering such cases?
>
> There is a test that checks that an update to the predicated column
> does update the index (on table brin_hot_2). However, the description
> was out of date, so I've updated that in v4.
>
> > > - The heaptuple header bit for summarized update in inserted tuples is
> > > replaced with passing an out parameter. This simplifies the logic and
> > > decreases chances of accidentally storing incorrect data.
> > >
> >
> > OK.
> >
> > 0002 proposes a minor RelationGetIndexPredicate() tweak, getting rid of
> > the repeated if/else branches. Feel free to discard, if you think the v2
> > approach is better.
>
> I agree that this is better, it's included in v4 of the patch, as attached.

I think that the v4 patch solves all comments up to now; and
considering that most of this patch was committed but then reverted
due to an issue in v15, and that said issue is fixed in this patch,
I'm marking this as ready for committer.

Tomas, would you be up for that?

Kind regards,

Matthias van de Meent




Re: proposal - get_extension_version function

2023-03-08 Thread Jacob Champion
On Wed, Mar 8, 2023 at 1:47 PM Tom Lane  wrote:
> Pavel's proposed check would break that too.  There's going to be some
> interval where the SQL definitions are not in sync with the .so version,
> so you really want the .so to support at least two versions' worth of
> SQL objects.

I think we're in agreement that the extension must be able to load
with SQL version X and binary version X+1. (Pavel too, if I'm reading
the argument correctly -- the proposal is to gate execution paths, not
init time. And Pavel's not the only one implementing that today.)

What I'm trying to pin down is the project's position on the reverse
-- binary version X and SQL version X+1 -- because that seems
generally unmaintainable, and I don't understand why an author would
pay that tax if they could just avoid it by bailing out entirely. (If
an author wants to allow that, great, but does everyone have to?)

--Jacob




Re: meson: Optionally disable installation of test modules

2023-03-08 Thread Andrew Dunstan


On 2023-03-08 We 08:49, Nazir Bilal Yavuz wrote:

Hi,

On Mon, 6 Mar 2023 at 18:30, Andrew Dunstan  wrote:

There are two separate issues here, but let's deal with the Windows issue. 
Attached is the log output and also a listing of the runpython directory in the 
build directory.

Thanks for the logs but I couldn't understand the problem. Is there a
way to reproduce this?



Problem now apparently resolved.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: buildfarm + meson

2023-03-08 Thread Andrew Dunstan


On 2023-03-08 We 14:22, Andres Freund wrote:

Hi,

On 2023-03-02 17:35:26 -0500, Andrew Dunstan wrote:

On 2023-03-02 Th 17:06, Andres Freund wrote:

Hi

On 2023-03-02 17:00:47 -0500, Andrew Dunstan wrote:

On 2023-03-01 We 16:32, Andres Freund wrote:

This is now working
on my MSVC test rig (WS2019, VS2019, Strawberry Perl), including TAP tests.
I do get a whole lot of annoying messages like this:

Unknown TAP version. The first line MUST be `TAP version `. Assuming
version 12.

The newest minor version has fixed that, it was a misunderstanding about /
imprecision in the tap 14 specification.


Unfortunately, meson v 1.0.1 appears to be broken on Windows, I had to
downgrade back to 1.0.0.

Is it possible that you're using a PG checkout from a few days ago? A
hack I used was invalidated by 1.0.1, but I fixed that already.

CI is running with 1.0.1:
https://cirrus-ci.com/task/5806561726038016?logs=configure#L8


No, running against PG master tip. I'll get some details - it's not too hard
to switch back and forth.

Any more details?



I was held up by difficulties even with meson 1.0.0 (the test modules 
stuff). Now I again have a clean build with meson 1.0.0 on Windows as a 
baseline I will get back to trying meson 1.0.1.



cheers


andrew


--Andrew Dunstan

EDB:https://www.enterprisedb.com


Re: Improve logging when using Huge Pages

2023-03-08 Thread Nathan Bossart
On Wed, Feb 15, 2023 at 10:13:17AM -0800, Nathan Bossart wrote:
> On Tue, Feb 14, 2023 at 07:32:56PM -0600, Justin Pryzby wrote:
>> On Mon, Feb 13, 2023 at 08:18:52PM -0800, Nathan Bossart wrote:
>>> I'm curious why you chose to make this a string instead of an enum.  There
>>> might be little practical difference, but since there are only three
>>> possible values, I wonder if it'd be better form to make it an enum.
>> 
>> It takes more code to write as an enum - see 002.txt.  I'm not convinced
>> this is better.
>> 
>> But your comment made me fix its , and reconsider the strings,
>> which I changed to active={unknown/true/false} rather than {unk/on/off}.
>> It could also be active={unknown/yes/no}...
> 
> I think unknown/true/false is fine.  I'm okay with using a string if no one
> else thinks it should be an enum (or a bool).

There's been no response for this, so I guess we can proceed with a string
GUC.

+Reports whether huge pages are in use by the current instance.
+See  for more information.

I still think we should say "server" in place of "current instance" here.

+   {"huge_pages_active", PGC_INTERNAL, PRESET_OPTIONS,
+   gettext_noop("Indicates whether huge pages are in 
use."),
+   NULL,
+   GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | 
GUC_RUNTIME_COMPUTED
+   },

I don't think we need to use GUC_RUNTIME_COMPUTED.  'postgres -C' seems to
always report "unknown" for this GUC, so all this would do is cause that
command to error unnecessarily when the server is running.

It might be worth documenting exactly what "unknown" means.  IIUC you'll
only ever see "on" or "off" via SHOW or pg_settings, which doesn't seem
tremendously obvious.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

2023-03-08 Thread Anton A. Melnikov

On 08.03.2023 07:28, Thomas Munro wrote:

Sorry, I was confused; please ignore that part.  We don't have a copy
of the control file anywhere else.  (Perhaps we should, but that could
be a separate topic.)


That’s all right! Fully agreed that this is a possible separate topic.

Sincerely yours,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: RFC: logical publication via inheritance root?

2023-03-08 Thread Jacob Champion
On Tue, Mar 7, 2023 at 2:40 AM Aleksander Alekseev
 wrote:
> So far I only have a couple of nitpicks, mostly regarding the code coverage 
> [1]:

Yeah, I need to work on error cases and their coverage in general.
There are more cases that I need to reject as well (marked TODO).

> +Datum
> +pg_get_publication_rels_to_sync(PG_FUNCTION_ARGS)
> +{
> +#define NUM_SYNC_TABLES_ELEM   1
> ```
>
> What is this macro for?

Whoops, that's cruft from an intermediate implementation. Will fix in
the next draft.

> +{ oid => '8137', descr => 'get list of tables to copy during initial sync',
> +  proname => 'pg_get_publication_rels_to_sync', prorows => '10',
> proretset => 't',
> +  provolatile => 's', prorettype => 'regclass', proargtypes => 'regclass 
> text',
> +  proargnames => '{rootid,pubname}',
> +  prosrc => 'pg_get_publication_rels_to_sync' },
> ```
>
> Something seems odd here. Is there a chance that it can return
> different results even within one statement, especially considering
> the fact that pg_set_logical_root() is VOLATILE? Maybe
> pg_get_publication_rels_to_sync() should be VOLATILE too [2].

Hm. I'm not sure how this all should behave in the face of concurrent
structural changes, or how the existing publication queries handle
that same situation (e.g. partition attachment), so that's definitely
something for me to look into. At a glance, I'm not sure that
returning different results for the same table is more correct. And I
feel like a VOLATILE implementation might significantly impact the
JOIN/LATERAL performance in the pg_dump query? But I don't really know
how that's planned.

> +{ oid => '8136', descr => 'mark a table root for logical replication',
> +  proname => 'pg_set_logical_root', provolatile => 'v', proparallel => 'u',
> +  prorettype => 'void', proargtypes => 'regclass regclass',
> +  prosrc => 'pg_set_logical_root' },
> ```
>
> Shouldn't we also have pg_unset(reset?)_logical_root?

My initial thought was that a one-way "upgrade" makes things easier to
reason about. But a one-way function is not good UX, so maybe we
should provide that. We'd need to verify and test what happens if you
undo/"detach" the logical tree during replication.

If it's okay to blindly replace any existing inhseqno with, say, 1 (on
a table with single inheritance), then we can reverse the process
safely. If not, we can't -- at least not with the current
implementation -- because we don't save the previous value anywhere.

Thanks for the review!

--Jacob




Re: proposal - get_extension_version function

2023-03-08 Thread Tom Lane
Jacob Champion  writes:
> On Wed, Mar 8, 2023 at 11:11 AM Pavel Stehule  wrote:
>> installation from rpm or deb packages

> Right, but I thought the safe order for a downgrade was to issue the
> SQL downgrade first (thus putting the system back into the
> post-upgrade state), and only then replacing the packages with prior
> versions.

Pavel's proposed check would break that too.  There's going to be some
interval where the SQL definitions are not in sync with the .so version,
so you really want the .so to support at least two versions' worth of
SQL objects.

regards, tom lane




Re: SQL/JSON revisited

2023-03-08 Thread Andrew Dunstan



On 2023-03-05 Su 22:18, Amit Langote wrote:

On Tue, Feb 28, 2023 at 8:36 PM Amit Langote  wrote:

On Mon, Feb 27, 2023 at 4:45 PM Amit Langote  wrote:

On Tue, Feb 21, 2023 at 2:25 AM Andres Freund  wrote:

Evaluating N expressions for a json table isn't a good approach, both memory
and CPU efficiency wise.

Are you referring to JsonTableInitOpaque() initializing all these
sub-expressions of JsonTableParent, especially colvalexprs, using N
*independent* ExprStates?  That could perhaps be made to work by
making JsonTableParent be an expression recognized by execExpr.c, so
that a single ExprState can store the steps for all its
sub-expressions, much like JsonExpr is.  I'll give that a try, though
I wonder if the semantics of making this work in a single
ExecEvalExpr() call will mismatch that of the current way, because
different sub-expressions are currently evaluated under different APIs
of TableFuncRoutine.

Hmm, the idea to turn JSON_TABLE into a single expression turned out
to be a non-starter after all, because, unlike JsonExpr, it can
produce multiple values.  So there must be an ExprState for computing
each column of its output rows.  As I mentioned in my other reply,
TableFuncScanState has a list of ExprStates anyway for
TableFunc.colexprs.  What we could do is move the ExprStates of
TableFunc.colvalexprs into TableFuncScanState instead of making that
part of the JSON_TABLE opaque state, as I've done in the attached
updated patch.

Here's another version in which I've also moved the ExprStates of
PASSING args into TableFuncScanState instead of keeping them in
JSON_TABLE opaque state.  That means all the subsidiary ExprStates of
TableFuncScanState are now initialized only once during
ExecInitTableFuncScan().  Previously, JSON_TABLE related ones would be
initialized on every call of JsonTableInitOpaque().

I've also done some cosmetic changes such as renaming the
JsonTableContext to JsonTableParseContext in parse_jsontable.c and to
JsonTableExecContext in jsonpath_exec.c.




Hi, I have just spent some time going through the first five patches 
(i.e. those that precede the JSONTABLE patches) and Andres's comments in





AFAICT there are only two possible matters of concern that remain, both 
regarding the grammar.



First is this general complaint:



This stuff adds quite a bit of complexity to the parser. Do we realy need like
a dozen new rules here?


I mentioned that more than a year ago, I think, without anybody taking 
the matter up, so I didn't pursue it. I guess I should have.


There are probably some fairly easy opportunities to reduce the number 
of non-terminals introduced here (e.g. I think json_aggregate_func could 
possibly be expanded in place without introducing 
json_object_aggregate_constructor and json_array_aggregate_constructor). 
I'm going to make an attempt at that, at least to pick some low hanging 
fruit. But in the end I think we are going to be left with a significant 
expansion of the grammar rules, more or less forced on us by the way the 
SQL Standards Committee rather profligately invents new ways of 
contorting the grammar.


Second is this complaint:



+json_behavior_empty_array:
+   EMPTY_P ARRAY   { $$ = 
makeJsonBehavior(JSON_BEHAVIOR_EMPTY_ARRAY, NULL); }
+   /* non-standard, for Oracle compatibility only */
+   | EMPTY_P   { $$ = 
makeJsonBehavior(JSON_BEHAVIOR_EMPTY_ARRAY, NULL); }
+   ;
Do we really want to add random oracle compat crud here?



I think this case is pretty harmless, and it literally involves one line 
of code, so I'm inclined to leave it.


These both seem like things not worth holding up progress for, and I 
think it would be good to get these patches committed as soon as 
possible. My intention is to commit them (after some grammar 
adjustments) plus their documentation in the next few days. That would 
leave the JSONTABLE patches still to go. They are substantial, but a far 
more manageable chunk of work for some committer (not me) once we get 
this foundational piece in.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Can we let extensions change their dumped catalog schemas?

2023-03-08 Thread Jacob Champion
On Tue, Feb 7, 2023 at 10:16 AM Jacob Champion  wrote:
> Even more concretely, here's a draft patch. There's no nuance --
> setting dump_version affects all past versions of the extension, and
> it can't be turned off at restore time. So extensions opting in would
> have to be written to be side-by-side installable. (Ours is, in its
> own way, but the PGDG installers don't allow it -- which maybe
> highlights a weakness in this approach.) I'm still not sure if this
> addresses Tom's concerns, or just adds new ones.

Any general thoughts on this approach? I don't think it's baked enough
for registration yet, but I also don't know what approach would be
better.

Given the recent chatter around extension versions in other threads
[1, 2], I feel like there is a big gap between the Postgres core
expectations and what extension authors are actually doing when it
comes to handling version upgrades. I'd like to chip away at that,
somehow.

Thanks,
--Jacob

[1] https://www.postgresql.org/message-id/212074.1678301349%40sss.pgh.pa.us
[2] 
https://www.postgresql.org/message-id/CA%2BTgmoYqK6nfP15SjuyO6t5jOmymG%3DqO7JOOVJdTOj96L0XJ1Q%40mail.gmail.com




Re: Get rid of PgStat_BackendFunctionEntry

2023-03-08 Thread Nathan Bossart
On Mon, Feb 13, 2023 at 10:06:27AM +0100, Drouvot, Bertrand wrote:
> Please find attached a patch proposal to $SUBJECT: it would allow to simplify
> even more the work done to generate pg_stat_get_xact*() functions with Macros.
> 
> This is a follow up for [1] where it has been suggested
> to get rid of PgStat_BackendFunctionEntry in a separate patch.

Looks reasonable to me.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




allow_in_place_tablespaces vs. pg_basebackup

2023-03-08 Thread Robert Haas
Commit 7170f2159fb21b62c263acd458d781e2f3c3f8bb, which introduced
in-place tablespaces, didn't make any adjustments to pg_basebackup.
The resulting behavior is pretty bizarre.

If you take a plain-format backup using pg_basebackup -Fp, then the
files in the in-place tablespace are backed up, but if you take a
tar-format backup using pg_basebackup -Ft, then they aren't.

I had at first wondered how this could even be possible, since after
all a plain format backup is little more than a tar-format backup that
pg_basebackup chooses to extract for you. The answer turns out to be
that a tar-format backup activates the server's TABLESPACE_MAP option,
and a plain-format backup doesn't, and so pg_basebackup can handle
this case differently depending on the value of that flag, and does.
Even for a plain-format backup, the server's behavior is not really
correct, because I think what's happening is that the tablespace files
are getting included in the base.tar file generated by the server,
rather than in a separate ${TSOID}.tar file as would normally happen
for a user-defined tablespace, but because the tar files get extracted
before the user lays eyes on them, the user-visible consequences are
masked, at least in the cases I've tested so far.

I'm not sure how messy it's going to be to clean this up. I think each
tablespace really needs to go into a separate ${TSOID}.tar file,
because we've got tons of code both on the server side and in
pg_basebackup that assumes that's how things work, and having in-place
tablespaces be a rare exception to that rule seems really unappealing.
This of course also implies that files in an in-place tablespace must
not go missing from the backup altogether.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: Moving forward with TDE

2023-03-08 Thread Stephen Frost
Greetings,

* Chris Travers (chris.trav...@gmail.com) wrote:
> From the documentation, the primary threat model of TDE is to prevent 
> decryption of data from archived wal segments (and data files), for example 
> on a backup system.  While there are other methods around this problem to 
> date, I think that this feature is worth pursuing for that reason.  I want to 
> address a couple of reasons for this and then go into some reservations I 
> have about how some of this is documented.

Agreed, though the latest efforts include an option for *authenticated*
encryption as well as unauthenticated.  That makes it much more
difficult to make undetected changes to the data that's protected by
the authenticated encryption being used.

> There are current workarounds to ensuring encryption at rest, but these have 
> a number of problems.  Encryption passphrases end up lying around the system 
> in various places.  Key rotation is often difficult.  And one mistake can 
> easily render all efforts ineffective.  TDE solves these problems.  The 
> overall design from the internal docs looks solid.  This definitely is 
> something I would recommend for many users.

There's clearly user demand for it as there's a number of organizations
who have forks which are providing it in one shape or another.  This
kind of splintering of the community is actually an actively bad thing
for the project and is part of what killed Unix, by at least some pretty
reputable accounts, in my view.

> I have a couple small caveats though.  Encryption of data is a large topic 
> and there isn't a one-size-fits-all solution to industrial or state 
> requirements.  Having all this key management available in PostgreSQL is a 
> very good thing.  Long run it is likely to end up being extensible, and 
> therefore both more powerful and offering a wider range of choices for 
> solution architects.  Implementing encryption is also something that is easy 
> to mess up.  For this reason I think it would be great if we had a 
> standardized format for discussing encryption options that we could use going 
> forward.  I don't think that should be held against this patch but I think we 
> need to start discussing it now because it will be a bigger problem later.

Do you have a suggestion as to the format to use?

> A second caveat I have is that key management is a topic where you really 
> need a good overview of internals in order to implement effectively.  If you 
> don't know how an SSL handshake works or what is in a certificate, you can 
> easily make mistakes in setting up SSL.  I can see the same thing happening 
> here.  For example, I don't think it would be safe to leave the KEK on an 
> encrypted filesystem that is decrypted at runtime (or at least I wouldn't 
> consider that safe -- your appetite for risk may vary).

Agreed that we should document this and make clear that the KEK is
necessary for server start but absolutely should be kept as safe as
possible and certainly not stored on disk somewhere nearby the encrypted
cluster.

> My proposal would be to have build a template for encryption options in the 
> documentation.  This could include topics like SSL as well.  In such a 
> template we'd have sections like "Threat model,"  "How it works," 
> "Implementation Requirements" and so forth.  Again I don't think this needs 
> to be part of the current patch but I think it is something we need to start 
> thinking about now.  Maybe after this goes in, I can present a proposed 
> documentation patch.

I'm not entirely sure that it makes sense to lump this and TLS in the
same place as they end up being rather independent at the end of the
day.  If you have ideas for how to improve the documentation, I'd
certainly encourage you to go ahead and work on that and submit it as a
patch rather than waiting for this to actually land in core.  Having
good and solid documentation is something that will help this get in,
after all, and to the extent that it's covering existing topics like
TLS, those could likely be included independently and that would be of
benefit to everyone.

> I will also note that I don't consider myself to be very qualified on topics 
> like encryption.  I can reason about key management to some extent but some 
> implementation details may be beyond me.  I would hope we could get some 
> extra review on this patch set soon.

Certainly agree with you there though there's an overall trajectory of
patches involved in all of this that's a bit deep.  The plan is to
discuss that at PGCon (On the Road to TDE) and at the PGCon
Unconference after.  I certainly hope those interested will be there.
I'm also happy to have a call with anyone interested in this effort
independent of that, of course.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: [PoC] Reducing planning time when tables have many partitions

2023-03-08 Thread David Rowley
On Thu, 9 Mar 2023 at 01:34, Alvaro Herrera  wrote:
> David, do you intend to continue to be involved in reviewing this one?

Yes. I'm currently trying to make a few Bitmapset improvements which
include the change made in this thread's 0001 patch over on [1].

For the main patch, I've been starting to wonder if it should work
completely differently.  Instead of adding members for partitioned and
inheritance children, we could just translate the Vars from child to
top-level parent and find the member that way. I wondered if this
method might be even faster as it would forego
add_child_rel_equivalences(). I think we'd still need em_is_child for
UNION ALL children.  So far, I've not looked into this in detail. I
was hoping to find an idea that would allow some means to have the
planner realise that a LIST partition which allows a single Datum
could skip pushing base quals which are constantly true. i.e:

create table lp (a int) partition by list(a);
create table lp1 partition of lp for values in(1);
explain select * from lp where a = 1;

 Seq Scan on lp1 lp  (cost=0.00..41.88 rows=13 width=4)
   Filter: (a = 1)

David

[1] 
https://postgr.es/m/caaphdvq9eq0w_afugrb6ba28ieuqn4zm5uwqxy7+lmzjjc+...@mail.gmail.com




Re: proposal - get_extension_version function

2023-03-08 Thread Jacob Champion
On Wed, Mar 8, 2023 at 11:22 AM Pavel Stehule  wrote:
> There is agreement - I call this check from functions.

I think pg_auto_failover does this too, or at least used to.

Timescale does strict compatibility checks as well. It's not entirely
comparable to your implementation, though.

--Jacob




Re: proposal - get_extension_version function

2023-03-08 Thread Jacob Champion
On Wed, Mar 8, 2023 at 11:11 AM Pavel Stehule  wrote:
> installation from rpm or deb packages

Right, but I thought the safe order for a downgrade was to issue the
SQL downgrade first (thus putting the system back into the
post-upgrade state), and only then replacing the packages with prior
versions.

--Jacob




Re: proposal - get_extension_version function

2023-03-08 Thread Jacob Champion
On Wed, Mar 8, 2023 at 11:18 AM Tom Lane  wrote:
>
> Jacob Champion  writes:
> > On Wed, Mar 8, 2023 at 10:49 AM Tom Lane  wrote:
> >> This is a bad idea.  How will you do extension upgrades, if the new .so
> >> won't run till you apply the extension upgrade script but the old .so
> >> malfunctions as soon as you do?
>
> > Which upgrade paths allow you to have an old .so with a new version
> > number? I didn't realize that was an issue.
>
> More usually, it's the other way around: new .so but SQL objects not
> upgraded yet.  That's typical in a pg_upgrade to a new major version,
> where the new installation may have a newer extension .so than the
> old one did.

That's the opposite case though; I think the expectation of backwards
compatibility from C to SQL is very different from (infinite?)
forwards compatibility from C to SQL.

> If you have old .so and new SQL objects, it's likely that at least
> some of those new objects won't work --- but it's good to not break
> any more functionality than you have to.

To me it doesn't seem like a partial break is safer than refusing to
execute in the face of old-C-and-new-SQL -- assuming it's safe at all?
A bailout seems pretty reasonable in that case.

Thanks,
--Jacob




RE: [PATCH] Support % wildcard in extension upgrade filenames

2023-03-08 Thread Regina Obe
> I wonder if a solution to this problem might be to provide some kind of a
> version map file. Let's suppose that the map file is a bunch of lines of the 
> form
> X Y Z, where X, Y, and Z are version numbers. The semantics could be: we (the
> extension authors) promise that if you want to upgrade from X to Z, it 
> suffices
> to run the script that knows how to upgrade from Y to Z. This would address
> Tom's concern, because if you write a master upgrade script, you have to
> explicitly declare the versions to which it applies. But it gives you a way 
> to do
> that which does not require creating a bazillion empty files. Instead you can
> just declare that if you're running the upgrade script from
> 14.36.279 to 14.36.280, that also suffices for an upgrade from
> 14.36.278 or 14.36.277 or 14.36.276 or 
> 
> --
> Robert Haas
> EDB: http://www.enterprisedb.com

That's what I was proposing as a compromise.
Just not sure what the appropriate name would be for the map file.

Originally I thought maybe we could put it in the .control, but decided 
it's better to have as a separate file that way we don't need to change the 
control and just add this extra file for PostgreSQL 16+. 

Then question arises if you have such a map file and you have files as well 
(the old way).
Would we meld the two, so the map file would be used to simulate the file paths 
and these get added on as extra target paths or should we just assume if there 
is a map file, then that takes precedence over any paths inferred by files in 
the system.

Thanks,
Regina





Re: Add error functions: erf() and erfc()

2023-03-08 Thread Nathan Bossart
On Mon, Feb 27, 2023 at 12:54:35PM +, Dean Rasheed wrote:
> + /*
> +  * For erf, we don't need an errno check because it never overflows.
> +  */

> + /*
> +  * For erfc, we don't need an errno check because it never overflows.
> +  */

The man pages for these seem to indicate that underflow can occur.  Do we
need to check for that?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Non-superuser subscription owners

2023-03-08 Thread Andres Freund
Hi,

On 2023-02-07 16:56:55 -0500, Robert Haas wrote:
> On Wed, Feb 1, 2023 at 4:02 PM Andres Freund  wrote:
> > > + /* Is the use of a password mandatory? */
> > > + must_use_password = MySubscription->passwordrequired &&
> > > + !superuser_arg(MySubscription->owner);
> >
> > There's a few repetitions of this - perhaps worth putting into a helper?
> 
> I don't think so. It's slightly different each time, because it's
> pulling data out of different data structures.
> 
> > This still leaks the connection on error, no?
> 
> I've attempted to fix this in v4, attached.

Hm - it still feels wrong that we error out in case of failure, despite the
comment to the function saying:
 * Returns NULL on error and fills the err with palloc'ed error message.

Other than this, the change looks ready to me.

Greetings,

Andres Freund




Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security

2023-03-08 Thread Robert Haas
On Wed, Mar 8, 2023 at 2:30 PM Jacob Champion  wrote:
> > No, that's the opposite, and exactly the point I'm trying to make. In
> > that case, the SERVER says what it's willing to accept, and the CLIENT
> > decides whether or not to provide that. In your proposal, the client
> > tells the server which authentication methods to accept.
>
> Ah, that's a (the?) sticking point. In my example, the client doesn't
> tell the server which methods to accept. The client tells the server
> which method the *client* has the ability to use. (Or, implicitly,
> which methods it refuses to use.)
>
> That shouldn't lose any power, security-wise, because the server is
> looking for an intersection of the two sets. And the client already
> has the power to do that for almost every form of authentication,
> except the ambient methods.
>
> I don't think I necessarily like that option better than SASL-style,
> but hopefully that clarifies it somewhat?

Hmm, yeah, I guess that's OK. I still don't love it, though. It feels
more solid to me if the proxy can actually block the connections
before they even happen, without having to rely on a server
interaction to figure out what is permissible.

I don't know what you mean by SASL-style, exactly.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security

2023-03-08 Thread Jacob Champion
On Tue, Mar 7, 2023 at 11:04 AM Robert Haas  wrote:
>
> On Thu, Feb 9, 2023 at 4:46 PM Jacob Champion  wrote:
> > On 2/6/23 08:22, Robert Haas wrote:
> > > I don't think that's quite the right concept. It seems to me that the
> > > client is responsible for informing the server of what the situation
> > > is, and the server is responsible for deciding whether to allow the
> > > connection. In your scenario, the client is not only communicating
> > > information ("here's the password I have got") but also making demands
> > > on the server ("DO NOT authenticate using anything else"). I like the
> > > first part fine, but not the second part.
> >
> > For what it's worth, making a negative demand during authentication is
> > pretty standard: if you visit example.com and it tells you "I need your
> > OS login password and Social Security Number to authenticate you," you
> > have the option of saying "no thanks" and closing the tab.
>
> No, that's the opposite, and exactly the point I'm trying to make. In
> that case, the SERVER says what it's willing to accept, and the CLIENT
> decides whether or not to provide that. In your proposal, the client
> tells the server which authentication methods to accept.

Ah, that's a (the?) sticking point. In my example, the client doesn't
tell the server which methods to accept. The client tells the server
which method the *client* has the ability to use. (Or, implicitly,
which methods it refuses to use.)

That shouldn't lose any power, security-wise, because the server is
looking for an intersection of the two sets. And the client already
has the power to do that for almost every form of authentication,
except the ambient methods.

I don't think I necessarily like that option better than SASL-style,
but hopefully that clarifies it somewhat?

> I don't think we're really very far apart here, but for some reason
> the terminology seems to be giving us some trouble.

Agreed.

> Of course, there's
> also the small problem of actually finding the time to do some
> meaningful work on this stuff, rather than just talking

Agreed :)

--Jacob




Re: Add shared buffer hits to pg_stat_io

2023-03-08 Thread Andres Freund
On 2023-03-08 13:44:32 -0500, Melanie Plageman wrote:
> However, I am concerned that, while unlikely, this could be flakey.
> Something could happen to force all of those blocks out of shared
> buffers (even though they were just read in) before we hit them.

You could make the test query a simple nested loop self-join, that'll prevent
the page being evicted, because it'll still be pinned on the outer side, while
generating hits on the inner side.

Greetings,

Andres Freund




Re: proposal - get_extension_version function

2023-03-08 Thread Pavel Stehule
st 8. 3. 2023 v 20:17 odesílatel Tom Lane  napsal:

> Jacob Champion  writes:
> > On Wed, Mar 8, 2023 at 10:49 AM Tom Lane  wrote:
> >> This is a bad idea.  How will you do extension upgrades, if the new .so
> >> won't run till you apply the extension upgrade script but the old .so
> >> malfunctions as soon as you do?
>
> > Which upgrade paths allow you to have an old .so with a new version
> > number? I didn't realize that was an issue.
>
> More usually, it's the other way around: new .so but SQL objects not
> upgraded yet.  That's typical in a pg_upgrade to a new major version,
> where the new installation may have a newer extension .so than the
> old one did.  You can't run ALTER EXTENSION UPGRADE if the new .so
> refuses to load with the old SQL objects ... which AFAICS is exactly
> what Pavel's sketch would do.
>
> If you have old .so and new SQL objects, it's likely that at least
> some of those new objects won't work --- but it's good to not break
> any more functionality than you have to.  That's why I suggest
> managing the compatibility checks on a per-function level rather
> than trying to have an overall version check.
>

There is agreement - I call this check from functions.


https://github.com/okbob/plpgsql_check/commit/b0970ff319256207ffe5ba5f18b2a7476c7136f9

Regards

Pavel


> regards, tom lane
>


Re: buildfarm + meson

2023-03-08 Thread Andres Freund
Hi,

On 2023-03-02 17:35:26 -0500, Andrew Dunstan wrote:
> On 2023-03-02 Th 17:06, Andres Freund wrote:
> > Hi
> > 
> > On 2023-03-02 17:00:47 -0500, Andrew Dunstan wrote:
> > > On 2023-03-01 We 16:32, Andres Freund wrote:
> > > > > This is now working
> > > > > on my MSVC test rig (WS2019, VS2019, Strawberry Perl), including TAP 
> > > > > tests.
> > > > > I do get a whole lot of annoying messages like this:
> > > > > 
> > > > > Unknown TAP version. The first line MUST be `TAP version `. 
> > > > > Assuming
> > > > > version 12.
> > > > The newest minor version has fixed that, it was a misunderstanding 
> > > > about /
> > > > imprecision in the tap 14 specification.
> > > > 
> > > Unfortunately, meson v 1.0.1 appears to be broken on Windows, I had to
> > > downgrade back to 1.0.0.
> > Is it possible that you're using a PG checkout from a few days ago? A
> > hack I used was invalidated by 1.0.1, but I fixed that already.
> > 
> > CI is running with 1.0.1:
> > https://cirrus-ci.com/task/5806561726038016?logs=configure#L8
> > 
> 
> No, running against PG master tip. I'll get some details - it's not too hard
> to switch back and forth.

Any more details?

Greetings,

Andres Freund




Re: buildfarm + meson

2023-03-08 Thread Andres Freund
Hi,

On 2023-03-08 09:41:57 -0500, Andrew Dunstan wrote:
> On 2023-03-08 We 08:57, Andrew Dunstan wrote:
> > On 2023-03-07 Tu 20:29, Andres Freund wrote:
> > > On 2023-03-07 15:47:54 -0500, Andrew Dunstan wrote:
> > > Here's a prototype for that.
> > > 
> > > It adds an install-test-files target, Because we want to install into a 
> > > normal
> > > directory, I removed the necessary munging of the target paths from
> > > meson.build and moved it into install-test-files. I also added DESTDIR
> > > support, so that installing can redirect the directory if desired. That's 
> > > used
> > > for the tmp_install/ installation now.
> > > 
> > > I didn't like the number of arguments necessary for install_test_files, 
> > > so I
> > > changed it to use
> > > 
> > > --install target list of files
> > > 
> > > which makes it easier to use for further directories, if/when we need 
> > > them.
> > > 
> > 
> > So if I understand this right, the way to use this would be something
> > like:
> > 
> > 
> >     local $ENV{DESTDIR} = $installdir;
> > 
> >     run_log("meson compile -C $pgsql install-test-files");
> > 
> > 
> > Is that right? I did that but it didn't work :-(

Bilal's explanation of why that doesn't work was right. You'd only want to use
DESTDIR to install into somewhere other than the real install path.


> OK, tried without the `local` line and it worked, so let's push this.

Done. It's possible that we might some more refinement here, but I thought
it important to unblock the buildfarm work...

Greetings,

Andres Freund




Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

2023-03-08 Thread Daniel Gustafsson
> On 8 Mar 2023, at 18:55, Christoph Berg  wrote:

> 18.04 will be EOL in a few weeks so it might be ok to just say it's
> not supported, but removing the input file manually after calling lz4
> would be an easy fix.

Is it reasonable to expect that this version of LZ4 can/will appear on any
other platform outside of archeology?  Removing the file manually would be a
trivial way to stabilize but if it's only expected to happen on platforms which
are long since EOL by the time 16 ships then the added complication could be
hard to justify.

--
Daniel Gustafsson





Re: proposal - get_extension_version function

2023-03-08 Thread Pavel Stehule
st 8. 3. 2023 v 19:49 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > I try to write a safeguard check that ensures the expected extension
> > version for an extension library.
>
> This is a bad idea.  How will you do extension upgrades, if the new .so
> won't run till you apply the extension upgrade script but the old .so
> malfunctions as soon as you do?  You need to make the C code as forgiving
> as possible, not as unforgiving as possible.
>

This method doesn't break  updates. It allows any registration, just
doesn't allow execution with unsynced SQL API.


>
> If you have C-level ABI changes you need to make, the usual fix is to
> include some sort of version number in the C name of each individual
> function you've changed, so that calls made with the old or the new SQL
> definition will be routed to the right place.  There are multiple
> examples of this in contrib/.
>

In my extensions like plpgsql_check I don't want to promise compatible ABI.
I support PostgreSQL 10 .. 16, and I really don't try to multiply code for
any historic input/output.





>
> regards, tom lane
>


Re: proposal - get_extension_version function

2023-03-08 Thread Tom Lane
Jacob Champion  writes:
> On Wed, Mar 8, 2023 at 10:49 AM Tom Lane  wrote:
>> This is a bad idea.  How will you do extension upgrades, if the new .so
>> won't run till you apply the extension upgrade script but the old .so
>> malfunctions as soon as you do?

> Which upgrade paths allow you to have an old .so with a new version
> number? I didn't realize that was an issue.

More usually, it's the other way around: new .so but SQL objects not
upgraded yet.  That's typical in a pg_upgrade to a new major version,
where the new installation may have a newer extension .so than the
old one did.  You can't run ALTER EXTENSION UPGRADE if the new .so
refuses to load with the old SQL objects ... which AFAICS is exactly
what Pavel's sketch would do.

If you have old .so and new SQL objects, it's likely that at least
some of those new objects won't work --- but it's good to not break
any more functionality than you have to.  That's why I suggest
managing the compatibility checks on a per-function level rather
than trying to have an overall version check.

regards, tom lane




Re: proposal - get_extension_version function

2023-03-08 Thread Pavel Stehule
st 8. 3. 2023 v 20:04 odesílatel Jacob Champion 
napsal:

> On Wed, Mar 8, 2023 at 10:49 AM Tom Lane  wrote:
> > This is a bad idea.  How will you do extension upgrades, if the new .so
> > won't run till you apply the extension upgrade script but the old .so
> > malfunctions as soon as you do?
>
> Which upgrade paths allow you to have an old .so with a new version
> number? I didn't realize that was an issue.
>

installation from rpm or deb packages



> --Jacob
>


Re: proposal - get_extension_version function

2023-03-08 Thread Jacob Champion
On Wed, Mar 8, 2023 at 10:49 AM Tom Lane  wrote:
> This is a bad idea.  How will you do extension upgrades, if the new .so
> won't run till you apply the extension upgrade script but the old .so
> malfunctions as soon as you do?

Which upgrade paths allow you to have an old .so with a new version
number? I didn't realize that was an issue.

--Jacob




Re: zstd compression for pg_dump

2023-03-08 Thread Jacob Champion
On Sat, Mar 4, 2023 at 8:57 AM Justin Pryzby  wrote:
> pryzbyj=# CREATE TABLE t1 AS SELECT i,array_agg(j) FROM 
> generate_series(1,444)i,generate_series(1,9)j GROUP BY 1;
> $ ./src/bin/pg_dump/pg_dump -d pryzbyj -Fc -Z zstd:long=1 |wc -c
> 82023
> $ ./src/bin/pg_dump/pg_dump -d pryzbyj -Fc -Z zstd:long=0 |wc -c
> 1048267

Nice!

I did some smoke testing against zstd's GitHub release on Windows. To
build against it, I had to construct an import library, and put that
and the DLL into the `lib` folder expected by the MSVC scripts...
which makes me wonder if I've chosen a harder way than necessary?

Parallel zstd dumps seem to work as expected, in that the resulting
pg_restore output is identical to uncompressed dumps and nothing
explodes. I haven't inspected the threading implementation for safety
yet, as you mentioned. And I still wasn't able to test :workers, since
it looks like the official libzstd for Windows isn't built for
multithreading. That'll be another day's project.

--Jacob




Re: optimize several list functions with SIMD intrinsics

2023-03-08 Thread Nathan Bossart
cfbot's Windows build wasn't happy with a couple of casts.  I applied a
fix similar to c6a43c2 in v2.  The patch is still very much a work in
progress.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 055717233c47518ae48119938ebd203cc55f7f3c Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Sat, 4 Mar 2023 23:16:07 -0800
Subject: [PATCH v2 1/1] speed up several list functions with SIMD

---
 src/backend/nodes/list.c| 262 +++-
 src/include/port/pg_lfind.h | 189 ++
 src/include/port/simd.h | 103 ++
 3 files changed, 521 insertions(+), 33 deletions(-)

diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index a709d23ef1..02ddbeb3f2 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -19,6 +19,7 @@
 
 #include "nodes/pg_list.h"
 #include "port/pg_bitutils.h"
+#include "port/pg_lfind.h"
 #include "utils/memdebug.h"
 #include "utils/memutils.h"
 
@@ -680,11 +681,25 @@ list_member(const List *list, const void *datum)
 bool
 list_member_ptr(const List *list, const void *datum)
 {
+#ifdef USE_NO_SIMD
 	const ListCell *cell;
+#endif
 
 	Assert(IsPointerList(list));
 	check_list_invariants(list);
 
+#ifndef USE_NO_SIMD
+
+	Assert(sizeof(ListCell) == 8);
+	Assert(sizeof(void *) == 8);
+
+	if (list == NIL)
+		return false;
+
+	return pg_lfind64((uint64) datum, (uint64 *) list->elements, list->length);
+
+#else
+
 	foreach(cell, list)
 	{
 		if (lfirst(cell) == datum)
@@ -692,46 +707,125 @@ list_member_ptr(const List *list, const void *datum)
 	}
 
 	return false;
+
+#endif
 }
 
 /*
- * Return true iff the integer 'datum' is a member of the list.
+ * Optimized linear search routine (using SIMD intrinsics where available) for
+ * lists with inline 32-bit data.
  */
-bool
-list_member_int(const List *list, int datum)
+static inline bool
+list_member_inline_internal(const List *list, uint32 datum)
 {
+	uint32		i = 0;
 	const ListCell *cell;
 
-	Assert(IsIntegerList(list));
-	check_list_invariants(list);
+#ifndef USE_NO_SIMD
 
-	foreach(cell, list)
+	/*
+	 * For better instruction-level parallelism, each loop iteration operates
+	 * on a block of four registers.
+	 */
+	const Vector32 keys = vector32_broadcast(datum);	/* load copies of key */
+	const uint32 nelem_per_vector = sizeof(Vector32) / sizeof(uint32);
+	const uint32 nelem_per_iteration = 4 * nelem_per_vector;
+#ifdef USE_NEON
+	const Vector32 mask = (Vector32) vector64_broadcast(UINT64CONST(0x));
+#else
+	const Vector32 mask = vector64_broadcast(UINT64CONST(0x));
+#endif
+	const uint32 *elements = (const uint32 *) list->elements;
+
+	/* round down to multiple of elements per iteration */
+	const uint32 tail_idx = (list->length * 2) & ~(nelem_per_iteration - 1);
+
+	/*
+	 * The SIMD-optimized portion of this routine is written with the
+	 * expectation that the 32-bit datum we are searching for only takes up
+	 * half of a ListCell.  If that changes, this routine must change, too.
+	 */
+	Assert(sizeof(ListCell) == 8);
+
+	for (i = 0; i < tail_idx; i += nelem_per_iteration)
+	{
+		Vector32	vals1,
+	vals2,
+	vals3,
+	vals4,
+	result1,
+	result2,
+	result3,
+	result4,
+	tmp1,
+	tmp2,
+	result,
+	masked;
+
+		/* load the next block into 4 registers */
+		vector32_load(, [i]);
+		vector32_load(, [i + nelem_per_vector]);
+		vector32_load(, [i + nelem_per_vector * 2]);
+		vector32_load(, [i + nelem_per_vector * 3]);
+
+		/* compare each value to the key */
+		result1 = vector32_eq(keys, vals1);
+		result2 = vector32_eq(keys, vals2);
+		result3 = vector32_eq(keys, vals3);
+		result4 = vector32_eq(keys, vals4);
+
+		/* combine the results into a single variable */
+		tmp1 = vector32_or(result1, result2);
+		tmp2 = vector32_or(result3, result4);
+		result = vector32_or(tmp1, tmp2);
+
+		/* filter out matches in space between data */
+		masked = vector32_and(result, mask);
+
+		/* see if there was a match */
+		if (vector32_is_highbit_set(masked))
+			return true;
+	}
+
+#endif			/* ! USE_NO_SIMD */
+
+	for_each_from(cell, list, i / 2)
 	{
-		if (lfirst_int(cell) == datum)
+		if (lfirst_int(cell) == (int) datum)
 			return true;
 	}
 
 	return false;
 }
 
+/*
+ * Return true iff the integer 'datum' is a member of the list.
+ */
+bool
+list_member_int(const List *list, int datum)
+{
+	Assert(IsIntegerList(list));
+	check_list_invariants(list);
+
+	if (list == NIL)
+		return false;
+
+	return list_member_inline_internal(list, datum);
+}
+
 /*
  * Return true iff the OID 'datum' is a member of the list.
  */
 bool
 list_member_oid(const List *list, Oid datum)
 {
-	const ListCell *cell;
-
 	Assert(IsOidList(list));
 	check_list_invariants(list);
 
-	foreach(cell, list)
-	{
-		if (lfirst_oid(cell) == datum)
-			return true;
-	}
+	if (list == NIL)
+		return false;
 
-	return false;
+	return list_member_inline_internal(list, datum);
 }
 
 /*
@@ -740,18 +834,13 @@ list_member_oid(const 

Re: psql \watch 2nd argument: iteration count

2023-03-08 Thread Nathan Bossart
+1 for adding an iteration count argument to \watch.

+   char *opt_end;
+   sleep = strtod(opt, _end);
+   if (sleep <= 0 || *opt_end)
+   {
+   pg_log_error("Watch period must be positive 
number, but argument is '%s'", opt);
+   free(opt);
+   resetPQExpBuffer(query_buf);
+   return PSQL_CMD_ERROR;
+   }

Is there any reason to disallow 0 for the sleep argument?  I often use
commands like "\watch .1" to run statements repeatedly with very little
time in between, and I'd use "\watch 0" instead if it was available.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: proposal - get_extension_version function

2023-03-08 Thread Tom Lane
Pavel Stehule  writes:
> I try to write a safeguard check that ensures the expected extension
> version for an extension library.

This is a bad idea.  How will you do extension upgrades, if the new .so
won't run till you apply the extension upgrade script but the old .so
malfunctions as soon as you do?  You need to make the C code as forgiving
as possible, not as unforgiving as possible.

If you have C-level ABI changes you need to make, the usual fix is to
include some sort of version number in the C name of each individual
function you've changed, so that calls made with the old or the new SQL
definition will be routed to the right place.  There are multiple
examples of this in contrib/.

regards, tom lane




Re: Add shared buffer hits to pg_stat_io

2023-03-08 Thread Melanie Plageman
On Tue, Mar 7, 2023 at 2:47 PM Andres Freund  wrote:
>
> Hi,
>
> LGTM. The only comment I have is that a small test wouldn't hurt... Compared
> to the other things it should be fairly easy...

So, I have attached an updated patchset which adds a test for hits. Since
there is only one call site where we count hits, I think this single
test is sufficient to protect against regressions.

However, I am concerned that, while unlikely, this could be flakey.
Something could happen to force all of those blocks out of shared
buffers (even though they were just read in) before we hit them.

We could simply check if hits are greater at the end of all of the
pg_stat_io tests than at the beginning and rely on the fact that it is
highly unlikely that every single buffer access will be a miss for all
of the tests. However, is it not technically also possible to have zero
hits?

- Melanie
From cab8e0435b2a8390887f99091b2e9d336dd353c0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Sat, 25 Feb 2023 14:36:06 -0500
Subject: [PATCH v3 2/2] Track shared buffer hits in pg_stat_io

Count new IOOP_HITs and add "hits" column to pg_stat_io.

Reviewed-by: Bertrand Drouvot 
Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_beMa9Hzih40%3DXPYqhDVz6tsgUGTrhZXRo%3Dunp%2Bszb%3DUA%40mail.gmail.com
---
 doc/src/sgml/monitoring.sgml   | 11 
 src/backend/catalog/system_views.sql   |  1 +
 src/backend/storage/buffer/bufmgr.c| 38 ++
 src/backend/storage/buffer/localbuf.c  | 11 ++--
 src/backend/utils/activity/pgstat_io.c |  2 +-
 src/backend/utils/adt/pgstatfuncs.c|  3 ++
 src/include/catalog/pg_proc.dat|  6 ++--
 src/include/pgstat.h   |  1 +
 src/include/storage/buf_internals.h|  2 +-
 src/test/regress/expected/rules.out|  3 +-
 src/test/regress/expected/stats.out| 27 --
 src/test/regress/sql/stats.sql | 14 --
 12 files changed, 76 insertions(+), 43 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6249bb50d0..8b34ca60bc 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3855,6 +3855,17 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
+ 
+  
+   
+hits bigint
+   
+   
+The number of times a desired block was found in a shared buffer.
+   
+  
+ 
+
  
   

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 34ca0e739f..87bbbdfcb3 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1126,6 +1126,7 @@ SELECT
b.writes,
b.extends,
b.op_bytes,
+   b.hits,
b.evictions,
b.reuses,
b.fsyncs,
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 0a05577b68..05fd3c9a2a 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -472,7 +472,7 @@ static BufferDesc *BufferAlloc(SMgrRelation smgr,
 			   ForkNumber forkNum,
 			   BlockNumber blockNum,
 			   BufferAccessStrategy strategy,
-			   bool *foundPtr, IOContext *io_context);
+			   bool *foundPtr, IOContext io_context);
 static void FlushBuffer(BufferDesc *buf, SMgrRelation reln,
 		IOObject io_object, IOContext io_context);
 static void FindAndDropRelationBuffers(RelFileLocator rlocator,
@@ -850,13 +850,14 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	if (isLocalBuf)
 	{
 		/*
-		 * LocalBufferAlloc() will set the io_context to IOCONTEXT_NORMAL. We
-		 * do not use a BufferAccessStrategy for I/O of temporary tables.
+		 * We do not use a BufferAccessStrategy for I/O of temporary tables.
 		 * However, in some cases, the "strategy" may not be NULL, so we can't
 		 * rely on IOContextForStrategy() to set the right IOContext for us.
 		 * This may happen in cases like CREATE TEMPORARY TABLE AS...
 		 */
-		bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, , _context);
+		io_context = IOCONTEXT_NORMAL;
+		io_object = IOOBJECT_TEMP_RELATION;
+		bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, );
 		if (found)
 			pgBufferUsage.local_blks_hit++;
 		else if (isExtend)
@@ -871,8 +872,10 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 		 * lookup the buffer.  IO_IN_PROGRESS is set if the requested block is
 		 * not currently in memory.
 		 */
+		io_context = IOContextForStrategy(strategy);
+		io_object = IOOBJECT_RELATION;
 		bufHdr = BufferAlloc(smgr, relpersistence, forkNum, blockNum,
-			 strategy, , _context);
+			 strategy, , io_context);
 		if (found)
 			pgBufferUsage.shared_blks_hit++;
 		else if (isExtend)
@@ -892,6 +895,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 			/* Just need to update stats before we exit */
 			

Re: Add standard collation UNICODE

2023-03-08 Thread Jeff Davis
On Wed, 2023-03-08 at 07:21 +0100, Peter Eisentraut wrote:
> On 04.03.23 19:29, Jeff Davis wrote:
> > It looks like the way you've handled this is by inserting the
> > collation
> > with collprovider=icu even if built without ICU support. I think
> > that's
> > a new case, so we need to make sure it throws reasonable user-
> > facing
> > errors.
> 
> It would look like this:
> 
> => select * from t1 order by b collate unicode;
> ERROR:  0A000: ICU is not supported in this build

Right, the error looks good. I'm just pointing out that before this
patch, having provider='i' in a build without ICU was a configuration
mistake; whereas afterward every database will have a collation with
provider='i' whether it has ICU support or not. I think that's fine,
I'm just double-checking.

Why is "unicode" only provided for the UTF-8 encoding? For "ucs_basic"
that makes some sense, because the implementation only works in UTF-8.
But here we are using ICU, and the "und" locale should work for any
ICU-supported encoding. I suggest that we use collencoding=-1 for
"unicode", and the docs can just add a note next to "ucs_basic" that it
only works for UTF-8, because that's the weird case.

For the docs, I suggest that you clarify that "ucs_basic" has the same
behavior as the C locale does *in the UTF-8 encoding*. Not all users
might pick up on the subtlety that the C locale has different behaviors
in different encodings.

Other than that, it looks good.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS






Re: Parallelize correlated subqueries that execute within each worker

2023-03-08 Thread Antonin Houska
James Coleman  wrote:

> On Mon, Feb 6, 2023 at 11:39 AM Antonin Houska  wrote:

> Attached is v9.

ok, I've changed the status to RfC

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

2023-03-08 Thread Christoph Berg
Re: Tomas Vondra
> Add LZ4 compression to pg_dump

This broke the TAP tests on Ubuntu 18.04 (bionic):

[17:06:45.513](0.000s) ok 1927 - compression_lz4_custom: should not dump 
test_table with 4-row INSERTs
# Running: pg_dump --jobs=2 --format=directory --compress=lz4:1 
--file=/home/myon/projects/postgresql/pg/master/build/src/bin/pg_dump/tmp_check/tmp_test__aAO/compression_lz4_dir
 postgres
[17:06:46.651](1.137s) ok 1928 - compression_lz4_dir: pg_dump runs
# Running: /usr/bin/lz4 -z -f --rm 
/home/myon/projects/postgresql/pg/master/build/src/bin/pg_dump/tmp_check/tmp_test__aAO/compression_lz4_dir/blobs.toc
 
/home/myon/projects/postgresql/pg/master/build/src/bin/pg_dump/tmp_check/tmp_test__aAO/compression_lz4_dir/blobs.toc.lz4
Incorrect parameters
Usage :
  /usr/bin/lz4 [arg] [input] [output]

input   : a filename
  with no FILE, or when FILE is - or stdin, read standard input
Arguments :
 -1 : Fast compression (default) 
 -9 : High compression 
 -d : decompression (default for .lz4 extension)
 -z : force compression
 -f : overwrite output without prompting 
 -h/-H  : display help/long help and exit
[17:06:46.667](0.016s) not ok 1929 - compression_lz4_dir: compression commands
[17:06:46.668](0.001s) 
[17:06:46.668](0.001s) #   Failed test 'compression_lz4_dir: compression 
commands'
[17:06:46.669](0.000s) #   at t/002_pg_dump.pl line 4274.
[17:06:46.670](0.001s) ok 1930 - compression_lz4_dir: glob check for 
/home/myon/projects/postgresql/pg/master/build/src/bin/pg_dump/tmp_check/tmp_test__aAO/compression_lz4_dir/toc.dat

The lz4 binary there doesn't have the --rm option yet.

liblz4-tool  0.0~r131-2ubuntu3

--rm appears in a single place only:

 # Give coverage for manually compressed blob.toc files during
 # restore.
 compress_cmd => {
 program => $ENV{'LZ4'},
 args=> [
 '-z', '-f', '--rm',
 "$tempdir/compression_lz4_dir/blobs.toc",
 "$tempdir/compression_lz4_dir/blobs.toc.lz4",
 ],
 },

18.04 will be EOL in a few weeks so it might be ok to just say it's
not supported, but removing the input file manually after calling lz4
would be an easy fix.

Christoph




Re: allow meson to find ICU in non-standard localtion

2023-03-08 Thread Jeff Davis
On Wed, 2023-03-08 at 17:30 +0100, Peter Eisentraut wrote:
> So should we withdraw the patch from the commit fest?

Withdrawn. If someone else is interested we can still pursue some
improvements.

Regards,
Jeff Davis





Re: Should vacuum process config file reload more often

2023-03-08 Thread Jim Nasby

On 3/2/23 1:36 AM, Masahiko Sawada wrote:


For example, I guess we will need to take care of changes of
maintenance_work_mem. Currently we initialize the dead tuple space at
the beginning of lazy vacuum, but perhaps we would need to
enlarge/shrink it based on the new value?
Doesn't the dead tuple space grow as needed? Last I looked we don't 
allocate up to 1GB right off the bat.

I don't think we need to do anything about that initially, just because the
config can be changed in a more granular way, doesn't mean we have to react to
every change for the current operation.

Perhaps we can mention in the docs that a change to maintenance_work_mem
will not take effect in the middle of vacuuming a table. But, Ithink it probably
isn't needed.

Agreed.


I disagree that there's no need for this. Sure, if 
maintenance_work_memory is 10MB then it's no big deal to just abandon 
your current vacuum and start a new one, but the index vacuuming phase 
with maintenance_work_mem set to say 500MB can take quite a while. 
Forcing a user to either suck it up or throw everything in the phase 
away isn't terribly good.


Of course, if the patch that eliminates the 1GB vacuum limit gets 
committed the situation will be even worse.


While it'd be nice to also honor maintenance_work_mem getting set lower, 
I don't see any need to go through heroics to accomplish that. Simply 
recording the change and honoring it for future attempts to grow the 
memory and on future passes through the heap would be plenty.


All that said, don't let these suggestions get in the way of committing 
this. Just having the ability to tweak cost parameters would be a win.


proposal - get_extension_version function

2023-03-08 Thread Pavel Stehule
Hi

I try to write a safeguard check that ensures the expected extension
version for an extension library.

Some like

const char *expected_extversion = "2.5";

...

extoid = getExtensionOfObject(ProcedureRelationId, fcinfo->flinfo->fn_oid));
extversion = get_extension_version(extoid);
if (strcmp(expected_extversion, extversion) != 0)
   elog(ERROR, "extension \"%s\" needs \"ALTER EXTENSION %s UPDATE\",
  get_extension_name(extversion),
  get_extension_name(extversion)))

Currently the extension version is not simply readable - I need to read
directly from the table.

Notes, comments?

Regards

Pavel


Re: allow meson to find ICU in non-standard localtion

2023-03-08 Thread Peter Eisentraut

On 01.03.23 21:30, Jeff Davis wrote:

On Wed, 2023-03-01 at 11:43 -0800, Jacob Champion wrote:

I work around it by manually setting -Dextra_lib_dirs. I just tried
doing that with ICU 72, and it worked without a patch. Hopefully that
helps some?


Yes, works, thank you.

Obviously we'd like a little better solution so that others don't get
confused, but it's not really a problem for me any more.


So should we withdraw the patch from the commit fest?




Re: Add documentation for coverage reports with meson

2023-03-08 Thread Peter Eisentraut

On 03.03.23 12:12, Michael Paquier wrote:

+
+meson setup -Db_coverage=true ... OTHER OPTIONS ... builddir/
+cd builddir/
+meson compile
+meson test
+ninja coverage-html
+


The "cd" command needs to be moved after the meson commands, and the 
meson commands need to have a -C builddir option.  So it should be like



meson setup -Db_coverage=true ... OTHER OPTIONS ... builddir/
meson compile -C builddir
meson test -C builddir
cd builddir/
ninja coverage-html


Otherwise, this looks good to me.





Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-08 Thread Önder Kalacı
Hi Vignesh C,


>
> Few comments
> 1) Maybe this change is not required:
> fallback if no other solution is possible.  If a replica identity other
> than full is set on the publisher side, a replica
> identity
> -   comprising the same or fewer columns must also be set on the subscriber
> -   side.  See  for
> details on
> +   comprising the same or fewer columns must also be set on the
> subscriber side.
> +   See  for details on
>

Yes, fixed.

>
> 2) Variable declaration and the assignment can be split so that the
> readability is better:
> +
> +   boolisUsableIndex =
> +   IsIndexUsableForReplicaIdentityFull(indexInfo);
> +
> +   index_close(indexRelation, AccessShareLock);
> +


Hmm, can you please elaborate more on this? The declaration
and assignment are already on different lines.

ps: pgindent changed this line a bit. Does that look better?


3) Since there is only one statement within the if condition, the
> braces can be removed
> +   if (is_btree && !is_partial && !is_only_on_expression)
> +   {
> +   return true;
> +   }
>
>
Fixed on a newer version of the patch. Now it is only:

*return is_btree && !is_partial && !is_only_on_expression;*


> 4) There is minor indentation issue in this, we could run pgindent to fix
> it:
> +static Oid FindLogicalRepLocalIndex(Relation localrel,
> +
>LogicalRepRelation *remoterel);
> +
>
>
Yes, pgindent fixed it, thanks.


Attached v37

Thanks,
Onder KALACI


v37_0002_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


v37_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: Schema variables - new implementation for Postgres 15

2023-03-08 Thread Pavel Stehule
st 8. 3. 2023 v 16:35 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
napsal:

> > On Wed, Mar 08, 2023 at 08:31:07AM +0100, Pavel Stehule wrote:
> > pá 3. 3. 2023 v 21:19 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
> > napsal:
> >
> > > > On Tue, Feb 28, 2023 at 06:12:50AM +0100, Pavel Stehule wrote:
> > > >
> > > > fresh rebase
> > >
> > > I'm continuing to review, this time going through shadowing stuff in
> > > transformColumnRef, IdentifyVariable etc. Well, that's a lot of leg
> work
> > > for rather little outcome :) I guess all attempts to simplify this part
> > > weren't successful?
> > >
> >
> > Originally I wrote it in the strategy "reduce false alarms". But when I
> > think about it, it may not be good in this case. Usually the changes are
> > done first on some developer environment, and good practice is to
> disallow
> > same (possibly confusing) identifiers. So I am not against making this
> > warning more aggressive with some possibility of false alarms.  With
> > blocking reduction of alarms the differences in regress was zero. So I
> > reduced this part.
>
> Great, thank you.
>
> > > Couple of questions to it. In IdentifyVariable in the branch handling
> > > two values the commentary says:
> > >
> > > /*
> > >  * a.b can mean "schema"."variable" or "variable"."field",
> > >  * Check both variants, and returns InvalidOid with
> > >  * not_unique flag, when both interpretations are
> > >  * possible. Second node can be star. In this case, the
> > >  * only allowed possibility is "variable"."*".
> > >  */
> > >
> > > I read this as "variable"."*" is a valid combination, but the very next
> > > part of this condition says differently:
> > >
> >
> >
> >
> > >
> > > /*
> > >  * Session variables doesn't support unboxing by star
> > >  * syntax. But this syntax have to be calculated here,
> > >  * because can come from non session variables related
> > >  * expressions.
> > >  */
> > > Assert(IsA(field2, A_Star));
> > >
> > > Is the first commentary not quite correct?
> > >
> >
> > I think it is correct, but maybe I was not good at describing this issue.
> > The sentence "Second node can be a star. In this case, the
> > the only allowed possibility is "variable"."*"." should be in the next
> > comment.
> >
> > In this part we process a list of identifiers, and we try to map these
> > identifiers to some semantics. The parser should ensure that
> > all fields of lists are strings or the last field is a star. In this case
> > the semantic "schema".* is nonsense, and the only possible semantic
> > is "variable".*. It is valid semantics, but unsupported now. Unboxing is
> > available by syntax (var).*
> >
> > I changed the comment
>
> Thanks. Just to clarify, by "unsupported" you mean unsupported in the
> current patch implementation right? From what I understand value
> unboxing could be done without parentheses in a non-top level select
> query.
>

Yes, it can be implemented in the next steps. I don't think there can be
some issues, but it means more lines and a little bit more complex
interface.
In this step, I try to implement minimalistic required functionality that
can be enhanced in next steps. For this area is an important fact, so
session variables
will be shadowed always by relations. It means new functionality in session
variables cannot break existing applications ever, and then there is space
for future enhancement.


>
> As a side note, I'm not sure if this branch is exercised in any tests.
> I've replaced returning InvalidOid with actually doing
> LookupVariable(NULL, a, true)
> in this case, and all the tests are still passing.
>

Usually we don't test not yet implemented functionality.


>
> > > Another question about how shadowing warning should work between
> > > namespaces.
> > > Let's say I've got two namespaces, public and test, both have a session
> > > variable with the same name, but only one has a table with the same
> name:
> > >
> > > -- in public
> > > create table test_agg(a int);
> > > create type for_test_agg as (a int);
> > > create variable test_agg for_test_agg;
> > >
> > > -- in test
> > > create type for_test_agg as (a int);
> > > create variable test_agg for_test_agg;
> > >
> > > Now if we will try to trigger the shadowing warning from public
> > > namespace, it would work differently:
> > >
> > > -- in public
> > > =# let test.test_agg.a = 10;
> > > =# let test_agg.a = 20;
> > > =# set session_variables_ambiguity_warning to on;
> > >
> > > -- note the value returned from the table
> > > =# select jsonb_agg(test_agg.a) from test_agg;
> > > WARNING:  42702: session variable "test_agg.a" is shadowed
> > > LINE 1: select jsonb_agg(test_agg.a) from test_agg;
> > >  ^
> > > DETAIL:  Session variables can be shadowed by columns,
> routine's
> > > variables and routine's 

Re: Allow tailoring of ICU locales with custom rules

2023-03-08 Thread Peter Eisentraut

On 08.03.23 15:18, Laurenz Albe wrote:

On Fri, 2023-03-03 at 13:45 +0100, Peter Eisentraut wrote:

Ok, here is a version with an initdb option and also a createdb option
(to expose the CREATE DATABASE option).

You can mess with people by setting up your databases like this:

initdb -D data --locale-provider=icu --icu-rules=' < c < b < e < d'


Looks good.  I cannot get it to misbehave, "make check-world" is successful
(the regression tests misbehave in interesting ways when running
"make installcheck" on a cluster created with non-standard ICU rules, but
that can be expected).

I checked the documentation, tested "pg_dump" support, everything fine.

I'll mark it as "ready for committer".


committed





RE: Rework LogicalOutputPluginWriterUpdateProgress

2023-03-08 Thread Takamichi Osumi (Fujitsu)
Hi,


On Wednesday, March 8, 2023 11:54 AM From: wangw.f...@fujitsu.com 
 wrote:
> Attach the new patch.
Thanks for sharing v6 ! Few minor comments for the same.

(1) commit message

The old function name 'is_skip_threshold_change' is referred currently. We need 
to update it to 'is_keepalive_threshold_exceeded' I think.

(2) OutputPluginPrepareWrite

@@ -662,7 +656,8 @@ void
 OutputPluginPrepareWrite(struct LogicalDecodingContext *ctx, bool last_write)
 {
if (!ctx->accept_writes)
-   elog(ERROR, "writes are only accepted in commit, begin and 
change callbacks");
+   elog(ERROR, "writes are only accepted in output plugin 
callbacks, "
+"except startup, shutdown, filter_by_origin, and 
filter_prepare.");

We can remove the period at the end of error string.

(3) is_keepalive_threshold_exceeded's comments

+/*
+ * Helper function to check whether a large number of changes have been skipped
+ * continuously.
+ */
+static bool
+is_keepalive_threshold_exceeded(LogicalDecodingContext *ctx)

I suggest to update the comment slightly something like below.
From:
...whether a large number of changes have been skipped continuously
To:
...whether a large number of changes have been skipped without being sent to 
the output plugin continuously

(4) term for 'keepalive'

+/*
+ * Update progress tracking and send keep alive (if required).
+ */

The 'keep alive' might be better to be replaced with 'keepalive', which looks 
commonest in other source codes. In the current patch, there are 3 different 
ways to express it (the other one is 'keep-alive') and it would be better to 
unify the term, at least within the same patch ?


Best Regards,
Takamichi Osumi



Re: Schema variables - new implementation for Postgres 15

2023-03-08 Thread Dmitry Dolgov
> On Wed, Mar 08, 2023 at 08:31:07AM +0100, Pavel Stehule wrote:
> pá 3. 3. 2023 v 21:19 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
> napsal:
>
> > > On Tue, Feb 28, 2023 at 06:12:50AM +0100, Pavel Stehule wrote:
> > >
> > > fresh rebase
> >
> > I'm continuing to review, this time going through shadowing stuff in
> > transformColumnRef, IdentifyVariable etc. Well, that's a lot of leg work
> > for rather little outcome :) I guess all attempts to simplify this part
> > weren't successful?
> >
>
> Originally I wrote it in the strategy "reduce false alarms". But when I
> think about it, it may not be good in this case. Usually the changes are
> done first on some developer environment, and good practice is to disallow
> same (possibly confusing) identifiers. So I am not against making this
> warning more aggressive with some possibility of false alarms.  With
> blocking reduction of alarms the differences in regress was zero. So I
> reduced this part.

Great, thank you.

> > Couple of questions to it. In IdentifyVariable in the branch handling
> > two values the commentary says:
> >
> > /*
> >  * a.b can mean "schema"."variable" or "variable"."field",
> >  * Check both variants, and returns InvalidOid with
> >  * not_unique flag, when both interpretations are
> >  * possible. Second node can be star. In this case, the
> >  * only allowed possibility is "variable"."*".
> >  */
> >
> > I read this as "variable"."*" is a valid combination, but the very next
> > part of this condition says differently:
> >
>
>
>
> >
> > /*
> >  * Session variables doesn't support unboxing by star
> >  * syntax. But this syntax have to be calculated here,
> >  * because can come from non session variables related
> >  * expressions.
> >  */
> > Assert(IsA(field2, A_Star));
> >
> > Is the first commentary not quite correct?
> >
>
> I think it is correct, but maybe I was not good at describing this issue.
> The sentence "Second node can be a star. In this case, the
> the only allowed possibility is "variable"."*"." should be in the next
> comment.
>
> In this part we process a list of identifiers, and we try to map these
> identifiers to some semantics. The parser should ensure that
> all fields of lists are strings or the last field is a star. In this case
> the semantic "schema".* is nonsense, and the only possible semantic
> is "variable".*. It is valid semantics, but unsupported now. Unboxing is
> available by syntax (var).*
>
> I changed the comment

Thanks. Just to clarify, by "unsupported" you mean unsupported in the
current patch implementation right? From what I understand value
unboxing could be done without parentheses in a non-top level select
query.

As a side note, I'm not sure if this branch is exercised in any tests.
I've replaced returning InvalidOid with actually doing LookupVariable(NULL, a, 
true)
in this case, and all the tests are still passing.

> > Another question about how shadowing warning should work between
> > namespaces.
> > Let's say I've got two namespaces, public and test, both have a session
> > variable with the same name, but only one has a table with the same name:
> >
> > -- in public
> > create table test_agg(a int);
> > create type for_test_agg as (a int);
> > create variable test_agg for_test_agg;
> >
> > -- in test
> > create type for_test_agg as (a int);
> > create variable test_agg for_test_agg;
> >
> > Now if we will try to trigger the shadowing warning from public
> > namespace, it would work differently:
> >
> > -- in public
> > =# let test.test_agg.a = 10;
> > =# let test_agg.a = 20;
> > =# set session_variables_ambiguity_warning to on;
> >
> > -- note the value returned from the table
> > =# select jsonb_agg(test_agg.a) from test_agg;
> > WARNING:  42702: session variable "test_agg.a" is shadowed
> > LINE 1: select jsonb_agg(test_agg.a) from test_agg;
> >  ^
> > DETAIL:  Session variables can be shadowed by columns, routine's
> > variables and routine's arguments with the same name.
> > LOCATION:  transformColumnRef, parse_expr.c:940
> >  jsonb_agg
> > ---
> >  [1]
> >
> > -- no warning, note the session variable value
> > =# select jsonb_agg(test.test_agg.a) from test_agg;
> >  jsonb_agg
> > ---
> >  [10]
> >
> > It happens because in the second scenario the logic inside
> > transformColumnRef
> > will not set up the node variable (there is no corresponding table in the
> > "test" schema), and the following conditions covering session variables
> > shadowing are depending on it. Is it supposed to be like this?
> >
>
> I am sorry, I don't understand what you want to describe. Session variables
> are shadowed by relations, ever. It is design. In the first case, the
> variable is shadowed and a 

Re: TAP tests for psql \g piped into program

2023-03-08 Thread Peter Eisentraut

On 02.01.23 22:32, Daniel Verite wrote:

This is a follow-up to commit d2a44904 from the 2022-11 CF [1]
The TAP tests were left out with the suggestion to use Perl instead of
cat (Unix) / findstr (Windows) as the program to pipe into.

PFA a patch implementing that suggestion.


The perl binary refactoring in this patch caught my attention, since I 
ran into this issue in another patch as well.  I'm always happy to 
consider a refactoring, but I think in this case I wouldn't do it.


If you grep for PostgreSQL::Test::Utils::windows_os, you'll find quite a 
few pieces of code that somehow fix up paths for Windows.  By hiding the 
Perl stuff in a function, we give the illusion that you don't have to 
worry about it and it's all taken care of in the test library.  But you 
have to worry about it in the very next line in 
025_stuck_on_old_timeline.pl!  We should handle this all on the same 
level: either in the test code or in the test library.  It would be 
useful to work toward a general "prepare path for shell" routine.  But 
until we have that, I don't think this is sufficient progress.


So for your patch, I would just do the path adjustment ad hoc in-line. 
It's just one additional line.






Re: Add pg_walinspect function with block info columns

2023-03-08 Thread Bharath Rupireddy
On Wed, Mar 8, 2023 at 4:23 PM Michael Paquier  wrote:
>
> On Wed, Mar 08, 2023 at 04:01:56PM +0530, Bharath Rupireddy wrote:
> > I understand that performance is critical here but we need to ensure
> > memory is used wisely. Therefore, I'd still vote to free at least the
> > major contributors here, that is, pfree(raw_data);, pfree(raw_page);
> > and pfree(flags); right after they are done using. I'm sure pfree()s
> > don't hurt more than resetting memory context for every block_id.
>
> Okay by me to have intermediate pfrees between each block scanned if
> you feel strongly about it.

Thanks. Attached v4 with that change.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From e73245277cfaf95462ef1f1fe5334bf5aba87a2a Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 8 Mar 2023 14:29:50 +
Subject: [PATCH v4] Rework pg_walinspect to retrieve more block information

---
 .../pg_walinspect/expected/pg_walinspect.out  |  35 ++--
 .../pg_walinspect/pg_walinspect--1.0--1.1.sql |  16 +-
 contrib/pg_walinspect/pg_walinspect.c | 150 +-
 contrib/pg_walinspect/sql/pg_walinspect.sql   |  33 ++--
 doc/src/sgml/pgwalinspect.sgml|  48 --
 5 files changed, 196 insertions(+), 86 deletions(-)

diff --git a/contrib/pg_walinspect/expected/pg_walinspect.out b/contrib/pg_walinspect/expected/pg_walinspect.out
index 9bcb05354e..e0eb7ca08f 100644
--- a/contrib/pg_walinspect/expected/pg_walinspect.out
+++ b/contrib/pg_walinspect/expected/pg_walinspect.out
@@ -74,17 +74,28 @@ SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info(:'wal_lsn1', :'wal_lsn2'
 (1 row)
 
 -- ===
--- Tests to get full page image (FPI) from WAL record
+-- Tests to get block information from WAL record
 -- ===
+-- Update table to generate some block data
 SELECT pg_current_wal_lsn() AS wal_lsn3 \gset
--- Force FPI on the next update.
-CHECKPOINT;
--- Update table to generate an FPI.
-UPDATE sample_tbl SET col1 = col1 * 100 WHERE col1 = 1;
+UPDATE sample_tbl SET col1 = col1 + 1 WHERE col1 = 1;
 SELECT pg_current_wal_lsn() AS wal_lsn4 \gset
+-- Check if we get block data from WAL record.
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_block_info(:'wal_lsn3', :'wal_lsn4')
+  WHERE relfilenode = :'sample_tbl_oid' AND blockdata IS NOT NULL;
+ ok 
+
+ t
+(1 row)
+
+-- Force full-page image on the next update.
+SELECT pg_current_wal_lsn() AS wal_lsn5 \gset
+CHECKPOINT;
+UPDATE sample_tbl SET col1 = col1 + 1 WHERE col1 = 2;
+SELECT pg_current_wal_lsn() AS wal_lsn6 \gset
 -- Check if we get FPI from WAL record.
-SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_fpi_info(:'wal_lsn3', :'wal_lsn4')
-  WHERE relfilenode = :'sample_tbl_oid';
+SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_block_info(:'wal_lsn5', :'wal_lsn6')
+  WHERE relfilenode = :'sample_tbl_oid' AND fpi IS NOT NULL;
  ok 
 
  t
@@ -116,7 +127,7 @@ SELECT has_function_privilege('regress_pg_walinspect',
 (1 row)
 
 SELECT has_function_privilege('regress_pg_walinspect',
-  'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- no
+  'pg_get_wal_block_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- no
  has_function_privilege 
 
  f
@@ -146,7 +157,7 @@ SELECT has_function_privilege('regress_pg_walinspect',
 (1 row)
 
 SELECT has_function_privilege('regress_pg_walinspect',
-  'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+  'pg_get_wal_block_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
  has_function_privilege 
 
  t
@@ -160,7 +171,7 @@ GRANT EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
   TO regress_pg_walinspect;
 GRANT EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
   TO regress_pg_walinspect;
-GRANT EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+GRANT EXECUTE ON FUNCTION pg_get_wal_block_info(pg_lsn, pg_lsn)
   TO regress_pg_walinspect;
 SELECT has_function_privilege('regress_pg_walinspect',
   'pg_get_wal_record_info(pg_lsn)', 'EXECUTE'); -- yes
@@ -184,7 +195,7 @@ SELECT has_function_privilege('regress_pg_walinspect',
 (1 row)
 
 SELECT has_function_privilege('regress_pg_walinspect',
-  'pg_get_wal_fpi_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
+  'pg_get_wal_block_info(pg_lsn, pg_lsn) ', 'EXECUTE'); -- yes
  has_function_privilege 
 
  t
@@ -196,7 +207,7 @@ REVOKE EXECUTE ON FUNCTION pg_get_wal_records_info(pg_lsn, pg_lsn)
   FROM regress_pg_walinspect;
 REVOKE EXECUTE ON FUNCTION pg_get_wal_stats(pg_lsn, pg_lsn, boolean)
   FROM regress_pg_walinspect;
-REVOKE EXECUTE ON FUNCTION pg_get_wal_fpi_info(pg_lsn, pg_lsn)
+REVOKE EXECUTE ON FUNCTION pg_get_wal_block_info(pg_lsn, pg_lsn)
   FROM regress_pg_walinspect;
 -- ===
 -- Clean up
diff --git 

Re: buildfarm + meson

2023-03-08 Thread Andrew Dunstan


On 2023-03-08 We 08:57, Andrew Dunstan wrote:



On 2023-03-07 Tu 20:29, Andres Freund wrote:

Hi,

On 2023-03-07 15:47:54 -0500, Andrew Dunstan wrote:

On 2023-03-07 Tu 14:37, Andres Freund wrote:

The failures are like this:

+ERROR:  extension "dummy_index_am" is not available
+DETAIL:  Could not open extension control file 
"/home/bf/bf-build/piculet-meson/HEAD/inst/share/postgresql/extension/dummy_index_am.control":
 No such file or directory.
+HINT:  The extension must first be installed on the system where PostgreSQL is 
running.

I assume this is in an interaction with b6a0d469cae.


I think we need a install-test-modules or such that installs into the normal
directory.


Exactly.

Here's a prototype for that.

It adds an install-test-files target, Because we want to install into a normal
directory, I removed the necessary munging of the target paths from
meson.build and moved it into install-test-files. I also added DESTDIR
support, so that installing can redirect the directory if desired. That's used
for the tmp_install/ installation now.

I didn't like the number of arguments necessary for install_test_files, so I
changed it to use

--install target list of files

which makes it easier to use for further directories, if/when we need them.



So if I understand this right, the way to use this would be something 
like:



    local $ENV{DESTDIR} = $installdir;

    run_log("meson compile -C $pgsql install-test-files");


Is that right? I did that but it didn't work :-(





OK, tried without the `local` line and it worked, so let's push this.


cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: buildfarm + meson

2023-03-08 Thread Nazir Bilal Yavuz
Hi,

On Wed, 8 Mar 2023 at 16:57, Andrew Dunstan  wrote:
> So if I understand this right, the way to use this would be something like:
>
>
> local $ENV{DESTDIR} = $installdir;
>
> run_log("meson compile -C $pgsql install-test-files");
>
>
> Is that right? I did that but it didn't work :-(

I think you shouldn't set DESTDIR to the $installdir. If DESTDIR is
set, it joins $DESTDIR and $install_dir(-Dprefix). So, when you run

local $ENV{DESTDIR} = $installdir;
run_log("meson compile -C $pgsql install-test-files");

it installs these files to the '$install_dir/$install_dir'.

Could you try only running 'run_log("meson compile -C $pgsql
install-test-files");' without setting DESTDIR, this could work.

Regards,
Nazir Bilal Yavuz
Microsoft




  1   2   >