date:20221208

static assert cleanup

2022-12-08 Thread Peter Eisentraut


A number of static assertions could be moved to better places.

We first added StaticAssertStmt() in 2012, which required all static 
assertions to be inside function bodies.  We then added 
StaticAssertDecl() in 2020, which enabled static assertions on file 
level.  We have a number of calls that were stuck in not-really-related 
functions for this historical reason.  This patch set cleans that up.


0001-Update-static-assert-usage-comment.patch

This updates the usage information in c.h to be more current and precise.

0002-Move-array-size-related-static-assertions.patch

This moves some obviously poorly placed ones.

0003-Move-some-static-assertions-to-better-places.patch

This moves some that I thought were suboptimally placed but it could be 
debated or refined.


0004-Use-StaticAssertDecl-where-possible.patch

This just changes some StaticAssertStmt() to StaticAssertDecl() where 
appropriate.  It's optional.
From ca1cb9e59e2216f1bfed234792379a46e425e105 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 8 Dec 2022 14:30:01 +0100
Subject: [PATCH 1/4] Update static assert usage comment

Since we added StaticAssertStmt() first before StaticAssertDecl(), the
instructions in c.h are a bit backwards from the "native" way static
assertions are meant to be used in C.  Update this to make it more
precise.
---
 src/include/c.h | 33 ++---
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/src/include/c.h b/src/include/c.h
index 98cdd285dd..bd6d8e5bf5 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -847,47 +847,50 @@ extern void ExceptionalCondition(const char 
*conditionName,
  * If the "condition" (a compile-time-constant expression) evaluates to false,
  * throw a compile error using the "errmessage" (a string literal).
  *
- * gcc 4.6 and up supports _Static_assert(), but there are bizarre syntactic
- * placement restrictions.  Macros StaticAssertStmt() and StaticAssertExpr()
+ * C11 has _Static_assert(), and most C99 compilers already support that.  For
+ * portability, we wrap it into StaticAssertDecl().  _Static_assert() is a
+ * "declaration", and so it must be placed where for example a variable
+ * declaration would be valid.  As long as we compile with
+ * -Wno-declaration-after-statement, that also means it cannot be placed after
+ * statements in a function.  Macros StaticAssertStmt() and StaticAssertExpr()
  * make it safe to use as a statement or in an expression, respectively.
- * The macro StaticAssertDecl() is suitable for use at file scope (outside of
- * any function).
  *
- * Otherwise we fall back on a kluge that assumes the compiler will complain
- * about a negative width for a struct bit-field.  This will not include a
- * helpful error message, but it beats not getting an error at all.
+ * For compilers without _Static_assert(), we fall back on a kluge that
+ * assumes the compiler will complain about a negative width for a struct
+ * bit-field.  This will not include a helpful error message, but it beats not
+ * getting an error at all.
  */
 #ifndef __cplusplus
 #ifdef HAVE__STATIC_ASSERT
+#define StaticAssertDecl(condition, errmessage) \
+   _Static_assert(condition, errmessage)
 #define StaticAssertStmt(condition, errmessage) \
do { _Static_assert(condition, errmessage); } while(0)
 #define StaticAssertExpr(condition, errmessage) \
((void) ({ StaticAssertStmt(condition, errmessage); true; }))
-#define StaticAssertDecl(condition, errmessage) \
-   _Static_assert(condition, errmessage)
 #else  /* !HAVE__STATIC_ASSERT 
*/
+#define StaticAssertDecl(condition, errmessage) \
+   extern void static_assert_func(int static_assert_failure[(condition) ? 
1 : -1])
 #define StaticAssertStmt(condition, errmessage) \
((void) sizeof(struct { int static_assert_failure : (condition) ? 1 : 
-1; }))
 #define StaticAssertExpr(condition, errmessage) \
StaticAssertStmt(condition, errmessage)
-#define StaticAssertDecl(condition, errmessage) \
-   extern void static_assert_func(int static_assert_failure[(condition) ? 
1 : -1])
 #endif /* HAVE__STATIC_ASSERT 
*/
 #else  /* C++ */
 #if defined(__cpp_static_assert) && __cpp_static_assert >= 200410
+#define StaticAssertDecl(condition, errmessage) \
+   static_assert(condition, errmessage)
 #define StaticAssertStmt(condition, errmessage) \
static_assert(condition, errmessage)
 #define StaticAssertExpr(condition, errmessage) \
({ static_assert(condition, errmessage); })
-#define StaticAssertDecl(condition, errmessage) \
-   static_assert(condition, errmessage)
 #else  /* !__cpp_static_assert 
*/
+#define StaticAssertDecl(condition, errmessage) \
+   extern void static_assert_func(int static_assert_failure[(condition) ? 
1 : -1])
 #define

Re: Perform streaming logical transactions by background workers and parallel apply

2022-12-08 Thread Amit Kapila

On Thu, Dec 8, 2022 at 12:37 PM houzj.f...@fujitsu.com
 wrote:
>

Review comments
==
1. Currently, we don't release the stream lock in LA (leade apply
worker) for "rollback to savepoint" and the reason is mentioned in
comments of apply_handle_stream_abort() in the patch. But, today,
while testing, I found that can lead to deadlock which otherwise,
won't happen on the publisher. The key point is rollback to savepoint
releases the locks acquired by the particular subtransaction, so
parallel apply worker should also do the same. Consider the following
example where the transaction in session-1 is being performed by the
parallel apply worker and the transaction in session-2 is being
performed by the leader apply worker. I have simulated it by using GUC
force_stream_mode.

Publisher
==
Session-1
postgres=# begin;
BEGIN
postgres=*# savepoint s1;
SAVEPOINT
postgres=*# truncate t1;
TRUNCATE TABLE

Session-2
postgres=# begin;
BEGIN
postgres=*# insert into t1 values(4);

Session-1
postgres=*# rollback to savepoint s1;
ROLLBACK

Session-2
Commit;

With or without commit of Session-2, this scenario will lead to
deadlock on the subscriber because PA (parallel apply worker) is
waiting for LA to send the next command, and LA is blocked by
Exclusive of PA. There is no deadlock on the publisher because
rollback to savepoint will release the lock acquired by truncate.

To solve this, How about if we do three things before sending abort of
sub-transaction (a) unlock the stream lock, (b) increment
pending_stream_count, (c) take the stream lock again?

Now, if the PA is not already waiting on the stop, it will not wait at
stream_stop but will wait after applying abort of sub-transaction and
if it is already waiting at stream_stop, the wait will be released. If
this works then probably we should try to do (b) before (a) to match
the steps with stream_start.

2. There seems to be another general problem in the way the patch
waits for stream_stop in PA (parallel apply worker). Currently, PA
checks, if there are no more pending streams then it tries to wait for
the next stream by waiting on a stream lock. However, it is possible
after PA checks there is no pending stream and before it actually
starts waiting on a lock, the LA sends another stream for which even
stream_stop is sent, in this case, PA will start waiting for the next
stream whereas there is actually a pending stream available. In this
case, it won't lead to any problem apart from delay in applying the
changes in such cases but for the case mentioned in the previous point
(Pont 1), it can lead to deadlock even after we implement the solution
proposed to solve it.

3. The other point to consider is that for
stream_commit/prepare/abort, in LA, we release the stream lock after
sending the message whereas for stream_start we release it before
sending the message. I think for the earlier cases
(stream_commit/prepare/abort), the patch has done like this because
pa_send_data() may need to require the lock again when it times out
and start serializing, so there will be no sense in first releasing
it, then re-acquiring it, and then again releasing it. Can't we also
release the lock for stream_start after pa_send_data() only if it is
not switched to serialize mode?

-- 
With Regards,
Amit Kapila.

Re: Add PL/pgSQL extra check no_data_found

2022-12-08 Thread Pavel Stehule

čt 8. 12. 2022 v 12:29 odesílatel Sergey Shinderuk <
s.shinde...@postgrespro.ru> napsal:

> Hello,
>
> I propose to add a new value "no_data_found" for the
> plpgsql.extra_errors and plpgsql.extra_warnings parameters [1].
>
> With plpgsql.extra_errors=no_data_found SELECT INTO raises no_data_found
> exception when the result set is empty. With
> plpgsql.extra_errors=too_many_rows,no_data_found SELECT INTO behaves
> like SELECT INTO STRICT [2]. This could simplify migration from PL/SQL
> and may be just more convenient.
>
> One potential downside is that plpgsql.extra_errors=no_data_found could
> break existing functions expecting to get null or checking IF found
> explicitly. This is also true for the too_many_rows exception, but
> arguably it's a programmer error, while no_data_found switches to a
> different convention for handling (or not handling) an empty result with
> SELECT INTO.
>
> Otherwise the patch is straightforward.
>
> What do you think?
>

I am not against it. It makes sense.

I don't like the idea about possible replacement of INTO STRICT by INTO +
extra warnings.

Handling exceptions is significantly more expensive than in Oracle, and
using INTO without STRICT with the next test IF NOT FOUND THEN can save one
safepoint and one handling an exception. It should be mentioned in the
documentation. Using this very common Oracle's pattern can have a very
negative impact on performance in Postgres. If somebody does port from
Oracle, and wants compatible behavior then he should use INTO STRICT. I
think it is counterproductive to hide syntax differences when there is a
significant difference in performance (and will be).

Regards

Pavel




> --
> Sergey Shinderukhttps://postgrespro.com/
>
>
> [1]
>
> https://www.postgresql.org/docs/devel/plpgsql-development-tips.html#PLPGSQL-EXTRA-CHECKS
> [2]
>
> https://www.postgresql.org/docs/devel/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-ONEROW

RE: Time delayed LR (WAS Re: logical replication restrictions)

2022-12-08 Thread Hayato Kuroda (Fujitsu)

Dear Andres,

Thanks for reporting! I have analyzed the problem and found the root cause.

This feature seemed not to work on 32-bit OSes. This was because the calculation
of delay_time was wrong. The first argument of this should be TimestampTz 
datatype, not Datum:

```
+   /* Set apply delay */
+   delay_until = TimestampTzPlusMilliseconds(TimestampTzGetDatum(ts),
+   
  MySubscription->applydelay);
```

In more detail, the datum representation of int64 contains the value itself
on 64-bit OSes, but it contains the pointer to the value on 32-bit.

After modifying the issue, this will work on 32-bit environments.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Perform streaming logical transactions by background workers and parallel apply

2022-12-08 Thread Amit Kapila

On Fri, Dec 9, 2022 at 7:45 AM Peter Smith  wrote:
>
> On Thu, Dec 8, 2022 at 7:43 PM Masahiko Sawada  wrote:
> >
> > On Thu, Dec 8, 2022 at 4:42 PM Amit Kapila  wrote:
> > >
> > > On Thu, Dec 8, 2022 at 12:42 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Wed, Dec 7, 2022 at 10:03 PM houzj.f...@fujitsu.com
> > > >  wrote:
> > > > >
> > > > >
> > > > > > +static void
> > > > > > +ProcessParallelApplyInterrupts(void)
> > > > > > +{
> > > > > > +CHECK_FOR_INTERRUPTS();
> > > > > > +
> > > > > > +if (ShutdownRequestPending)
> > > > > > +{
> > > > > > +ereport(LOG,
> > > > > > +(errmsg("logical replication 
> > > > > > parallel
> > > > > > apply worker for subscrip
> > > > > > tion \"%s\" has finished",
> > > > > > +
> > > > > > MySubscription->name)));
> > > > > > +
> > > > > > +apply_worker_clean_exit(false);
> > > > > > +}
> > > > > > +
> > > > > > +if (ConfigReloadPending)
> > > > > > +{
> > > > > > +ConfigReloadPending = false;
> > > > > > +ProcessConfigFile(PGC_SIGHUP);
> > > > > > +}
> > > > > > +}
> > > > > >
> > > > > > I personally think that we don't need to have a function to do only
> > > > > > these few things.
> > > > >
> > > > > I thought that introduce a new function make the handling of worker 
> > > > > specific
> > > > > Interrupts logic similar to other existing ones. Like:
> > > > > ProcessWalRcvInterrupts () in walreceiver.c and 
> > > > > HandlePgArchInterrupts() in
> > > > > pgarch.c ...
> > > >
> > > > I think the difference from them is that there is only one place to
> > > > call ProcessParallelApplyInterrupts().
> > > >
> > >
> > > But I feel it is better to isolate this code in a separate function.
> > > What if we decide to extend it further by having some logic to stop
> > > workers after reloading of config?
> >
> > I think we can separate the function at that time. But let's keep the
> > current code as you and Hou agree with the current code. I'm not going
> > to insist on that.
> >
> > >
> > > > >
> > > > > > ---
> > > > > > server_version = walrcv_server_version(LogRepWorkerWalRcvConn);
> > > > > > options.proto.logical.proto_version =
> > > > > > +server_version >= 16 ?
> > > > > > LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM :
> > > > > > server_version >= 15 ?
> > > > > > LOGICALREP_PROTO_TWOPHASE_VERSION_NUM :
> > > > > > server_version >= 14 ?
> > > > > > LOGICALREP_PROTO_STREAM_VERSION_NUM :
> > > > > > LOGICALREP_PROTO_VERSION_NUM;
> > > > > >
> > > > > > Instead of always using the new protocol version, I think we can use
> > > > > > LOGICALREP_PROTO_TWOPHASE_VERSION_NUM if the streaming is not
> > > > > > 'parallel'. That way, we don't need to change protocl version check
> > > > > > logic in pgoutput.c and don't need to expose defGetStreamingMode().
> > > > > > What do you think?
> > > > >
> > > > > I think that some user can also use the new version number when 
> > > > > trying to get
> > > > > changes (via pg_logical_slot_peek_binary_changes or other functions), 
> > > > > so I feel
> > > > > leave the check for new version number seems fine.
> > > > >
> > > > > Besides, I feel even if we don't use new version number, we still 
> > > > > need to use
> > > > > defGetStreamingMode to check if parallel mode in used as we need to 
> > > > > send
> > > > > abort_lsn when parallel is in used. I might be missing something, 
> > > > > sorry for
> > > > > that. Can you please explain the idea a bit ?
> > > >
> > > > My idea is that we use LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM if
> > > > (server_version >= 16 && MySubscription->stream ==
> > > > SUBSTREAM_PARALLEL). If the stream is SUBSTREAM_ON, we use
> > > > LOGICALREP_PROTO_TWOPHASE_VERSION_NUM even if server_version is
> > > > 16. That way, in pgoutput.c, we can send abort_lsn if the protocol
> > > > version is LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM. We don't need
> > > > to send "streaming = parallel" to the publisher since the publisher
> > > > can decide whether or not to send abort_lsn based on the protocol
> > > > version (still needs to send "streaming = on" though). I might be
> > > > missing something.
> > > >
> > >
> > > What if we decide to send some more additional information as part of
> > > another patch like we are discussing in the thread [1]? Now, we won't
> > > be able to decide the version number based on just the streaming
> > > option. Also, in such a case, even for
> > > LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM, it may not be a good
> > > idea to send additional abort information unless the user has used the
> > > streaming=parallel option.
> >
> > If we're going to send the additional information, it makes sense to
> > send streaming=parallel. But the next question came to me is why do we
> > need to increase the

RE: Time delayed LR (WAS Re: logical replication restrictions)

2022-12-08 Thread Hayato Kuroda (Fujitsu)

Hi Vignesh,

> In the case of physical replication by setting
> recovery_min_apply_delay, I noticed that both primary and standby
> nodes were getting stopped successfully immediately after the stop
> server command. In case of logical replication, stop server fails:
> pg_ctl -D publisher -l publisher.log stop -c
> waiting for server to shut
> down...
> failed
> pg_ctl: server does not shut down
> 
> In case of logical replication, the server does not get stopped
> because the walsender process is not able to exit:
> ps ux | grep walsender
> vignesh  1950789 75.3  0.0 8695216 22284 ?   Rs   11:51   1:08
> postgres: walsender vignesh [local] START_REPLICATION

Thanks for reporting the issue. I analyzed about it.


This issue has occurred because the apply worker cannot reply during the delay.
I think we may have to modify the mechanism that delays applying transactions.

When walsender processes are requested to shut down, it can shut down only after
that all the sent WALs are replicated on the subscriber. This check is done in
WalSndDone(), and the replicated position will be updated when processes handle
the reply messages from a subscriber, in ProcessStandbyReplyMessage().

In the case of physical replication, the walreciever can receive WALs and reply
even if the application is delayed. It means that the replicated position will
be transported to the publisher side immediately. So the walsender can exit.

In terms of logical replication, however, the worker cannot reply to the
walsender while delaying the transaction with this patch at present. It causes
the replicated position to be never transported upstream and the walsender 
cannot
exit.


Based on the above analysis, we can conclude that the worker must update the
flushpos and reply to the walsender while delaying the transaction if we want
to solve the issue. This cannot be done in the current approach, and a newer
proposed one[1] may be able to solve this, although it's currently under 
discussion.


Note that a similar issue can reproduce while doing the physical replication.
When the wal_sender_timeout is set to 0 and the network between primary and
secondary is broken after that primary sends WALs to secondary, we cannot stop
the primary node.

[1]: 
https://www.postgresql.org/message-id/TYCPR01MB8373FA10EB2DB2BF8E458604ED1B9%40TYCPR01MB8373.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PATCH] Backport perl tests for pg_upgrade from 322becb60

2022-12-08 Thread Michael Paquier

On Mon, Aug 01, 2022 at 01:02:21AM +0300, Anton A. Melnikov wrote:
> As far as i understand from this thread: 
> https://www.postgresql.org/message-id/flat/Yox1ME99GhAemMq1%40paquier.xyz,
> the aim of the perl version for the pg_upgrade tests is to achieve equality 
> of dumps for most cross-versions cases.
> If so this is the significant improvement as previously in test.sh resulted 
> dumps retained unequal and the user
> was asked to eyeball them manually during cross upgrades between different 
> major versions.
> So, the backport of the perl tests also seems preferable to me.

I don't really agree with that.  These TAP tests are really new
development, and it took a few tries to get them completely right
(well, as much right as it holds for HEAD).  If we were to backport
any of this, there is a risk of introducing a bug in what we do with
any of that, potentially hiding a issue critical related to
pg_upgrade.  That's not worth taking a risk for.

Saying that, I agree that more needs to be done, but I would limit
that only to HEAD and let it mature more into the tree in an
incremental fashion.
--
Michael

signature.asc
Description: PGP signature

Re: add \dpS to psql

2022-12-08 Thread Michael Paquier

On Thu, Dec 08, 2022 at 09:15:03AM -0800, Nathan Bossart wrote:
> The main idea behind this work is breaking out privileges into more
> granular pieces.  If I want to create a role that only runs VACUUM on some
> tables on the weekend, why ѕhould I have to also give it the ability to
> ANALYZE, REFRESH, CLUSTER, and REINDEX?  IMHO we should really let the user
> decide what set of privileges makes sense for their use-case.  I'm unsure
> the grouping all these privileges together serves much purpose besides
> preserving ACL bits.

Hmm.  I'd like to think that we should keep a frugal mind here.  More
bits are now available, but it does not strike me as a good idea to
force their usage more than necessary, so grouping all these no-quite
DDL commands into the same bag does not sound that bad to me.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] random_normal function

2022-12-08 Thread Michael Paquier

On Thu, Dec 08, 2022 at 04:44:56PM -0800, Paul Ramsey wrote:
> Final tme, with fixes from cirrusci.

Well, why not.  Seems like you would use that a lot with PostGIS.

 #include   /* for ldexp() */
+#include  /* for DBL_EPSILON */
And be careful with the order here.

+static void
+drandom_check_default_seed()
We always use (void) rather than empty parenthesis sets.

I would not leave that unchecked, so I think that you should add
something in ramdom.sql.  Or would you prefer switching some of
the regression tests be switched so as they use the new normal
function?

(Ahem.  Bonus points for a random_string() returning a bytea, based on
pg_strong_random().)
--
Michael


signature.asc
Description: PGP signature

Re: Allow tests to pass in OpenSSL FIPS mode

2022-12-08 Thread Michael Paquier

On Wed, Dec 07, 2022 at 03:14:09PM +0100, Peter Eisentraut wrote:
> Here is the next step.  To contain the scope, I focused on just "make check"
> for now.  This patch removes all incidental calls to md5(), replacing them
> with sha256(), so that they'd pass with or without FIPS mode.  (Two tests
> would need alternative expected files: md5 and password.  I have not
> included those here.)

Yeah, fine by me to do that step-by-step.

> Some tests inspect the actual md5 result strings or build statistics based
> on them.  I have tried to carefully preserve the meaning of the original
> tests, to the extent that they could be inferred, in some cases adjusting
> example values by matching the md5 outputs to the equivalent sha256 outputs.
> Some cases are tricky or mysterious or both and could use another look.

incremental_sort mostly relies on the plan generated, so the change
should be rather straight-forward I guess, though there may be a side
effect depending on costing.  Hmm, it does not look like stats_ext
would be an issue as it checks the stats correlation of the attributes
for mcv_lists_arrays.

largeobject_1.out has been forgotten in the set requiring a refresh.
--
Michael


signature.asc
Description: PGP signature

Re: Support logical replication of DDLs

2022-12-08 Thread li jie

>
> Attached please find a new solution that skips the deparsing of ALTER TABLE
> subcommands generated for TableLikeClause. The patch v42-0005 added a new
> boolean field table_like to AlterTableStmt in order to identify an ALTER TABLE
> subcommand generated internally for the TableLikeClause.
>
> Regards,
> Zheng

I took a look at this patch and it appears to be incomplete.

> @@ -1974,6 +1974,7 @@ typedef struct AlterTableStmt
> List *cmds; /* list of subcommands */
> ObjectType objtype; /* type of object */
> bool missing_ok; /* skip error if table missing */
> + bool table_like; /* internally generated for TableLikeClause */
> } AlterTableStmt;

The table_like field should include implementations of the `copynode`
and `equalnode `methods.

Re: Support logical replication of DDLs

2022-12-08 Thread li jie

It is worth considering whether temporary objects, such as tables,
indexes, and sequences,
should be replicated to  the subscriber side.

Like temp tables, different sessions create their own temp tables.
If they are all transferred to the subscriber side, there will
inevitably be errors,
 because there is only one subscription session.

I think temporary objects should not be part of replication because
they are visible within the session.
 replicate them over would not make them visible to the user and would
not be meaningful.

Here is a case:
```
create temp table t1(id int);
\c
create temp table t1(id int);
```


Thoughts?
li jie

vignesh C  于2022年12月8日周四 13:07写道：
>
> On Wed, 7 Dec 2022 at 17:50, Alvaro Herrera  wrote:
> >
> > I think this patch is split badly.
> >
> > You have:
> >
> > 0001 an enormous patch including some required infrastructure, plus the
> > DDL deparsing bits themselves.
> >
> > 0002 another enormous (though not as much) patch, this time for
> > DDL replication using the above.
> >
> > 0003 a bugfix for 0001, which includes changes in both the
> > infrastructure and the deparsing bits.
> >
> > 0004 test stuff for 0002.
> >
> > 0005 Another bugfix for 0001
> >
> > 0006 Another bugfix for 0001
> >
> > As presented, I think it has very little chance of being reviewed
> > usefully.  A better way to go about this, I think, would be:
> >
> > 0001 - infrastructure bits to support the DDL deparsing parts (all these
> > new functions in ruleutils.c, sequence.c, etc).  That means, everything
> > (?) that's currently in your 0001 except ddl_deparse.c and friends.
> > Clearly there are several independent changes here; maybe it is possible
> > to break it down even further.  This patch or these patches should also
> > include the parts of 0003, 0005, 0006 that require changes outside of
> > ddl_deparse.c.
> > I expect that this patch should be fairly small.
> >
> > 0002 - ddl_deparse.c and its very close friends.  This should not have
> > any impact on places such as ruleutils.c, sequence.c, etc.  The parts of
> > the bugfixes (0001, 0005, 0006) that touch this could should be merged
> > here as well; there's no reason to have them as separate patches.  Some
> > test code should be here also, though it probably doesn't need to aim to
> > be complete.
> > This one is likely to be very large, but also self-contained.
> >
> > 0003 - ddlmessage.c and friends.  I understand that DDL-messaging is
> > supporting infrastructure for DDL replication; I think it should be its
> > own patch.  Probably include its own simple-ish test bits.
> > Not a very large patch.
> >
> > 0004 - DDL replication proper, including 0004.
> > Probably not a very large patch either, not sure.
> >
> >
> > Some review comments, just skimming:
> > - 0002 adds some functions to event_trigger.c, but that doesn't seem to
> > be their place.  Maybe some new file in src/backend/replication/logical
> > would make more sense.
> >
> > - publication_deparse_ddl_command_end has a long strcmp() list; why?
> > Maybe change things so that it compares some object type enum instead.
> >
> > - CreatePublication has a long list of command tags; is that good?
> > Maybe it'd be better to annotate the list in cmdtaglist.h somehow.
> >
> > - The change in pg_dump's getPublications needs updated to 16.
> >
> > - Don't "git add" src/bin/pg_waldump/logicalddlmsgdesc.c, just update
> > its Makefile and meson.build
> >
> > - I think psql's \dRp should not have the new column at the end.
> > Maybe one of:
> > + Name | Owner | DDL | All tables | Inserts | Updates | Deletes | Truncates 
> > | Via root
> > + Name | Owner | All tables | DDL | Inserts | Updates | Deletes | Truncates 
> > | Via root
> > + Name | Owner | All tables | Inserts | Updates | Deletes | Truncates | DDL 
> > | Via root
> > (I would not add the "s" at the end of that column title, also).
>
> Thanks for the comments, these comments will make the patch reviewing easier.
> There are a couple of review comments [1] and [2] which are spread
> across the code, it will be difficult to handle this after
> restructuring of the patch as the comments are spread across the code
> in the patch. So we will handle [1] and [2] first and then work on
> restructuring work suggested by you.
>
> [1] - 
> https://www.postgresql.org/message-id/CAHut%2BPsERMFwO8oK3LFH_3CRG%2B512T%2Bay_viWzrgNetbH2MwxA%40mail.gmail.com
> [2] - 
> https://www.postgresql.org/message-id/CAHut%2BPuxo_kq2toicNK_BQdeccK3REGW-Xv8tVauFvTNku6V-w%40mail.gmail.com
>
> Regards,
> Vignesh
>
>

Re: Error-safe user functions

2022-12-08 Thread Tom Lane

Andres Freund  writes:
> On 2022-12-08 17:57:09 -0500, Tom Lane wrote:
>> Given that this additional experimentation didn't find any holes
>> in the API design, I think this is pretty much ready to go.

> One interesting area is timestamp / datetime related code. There's been some
> past efforts in the area, mostly in 5bc450629b3. See the RETURN_ERROR macro in
> formatting.c.
> This is not directly about type input functions, but it looks to me that the
> functionality in the patchset should work.

Yeah, I was planning to take a look at that before walking away from
this stuff.  (I'm sure not volunteering to convert ALL the input
functions, but I'll do the datetime code.)

You're right that formatting.c is doing stuff that's not exactly
an input function, but I don't see why we can't apply the same
API concepts to it.

regards, tom lane

Re: Error-safe user functions

2022-12-08 Thread Andres Freund

Hi,

On 2022-12-08 17:57:09 -0500, Tom Lane wrote:
> Given that this additional experimentation didn't find any holes
> in the API design, I think this is pretty much ready to go.

One interesting area is timestamp / datetime related code. There's been some
past efforts in the area, mostly in 5bc450629b3. See the RETURN_ERROR macro in
formatting.c.

This is not directly about type input functions, but it looks to me that the
functionality in the patchset should work.

I certainly have the hope that it'll make the code look a bit less ugly...

It looks like a fair bit of work to convert this code, so I don't think we
should tie converting formatting.c to the patchset. But it might be a good
idea for Tom to skim the code to see whether there's any things impacting the
design.

Greetings,

Andres Freund

Re: Perform streaming logical transactions by background workers and parallel apply

2022-12-08 Thread Peter Smith

On Thu, Dec 8, 2022 at 7:43 PM Masahiko Sawada  wrote:
>
> On Thu, Dec 8, 2022 at 4:42 PM Amit Kapila  wrote:
> >
> > On Thu, Dec 8, 2022 at 12:42 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Wed, Dec 7, 2022 at 10:03 PM houzj.f...@fujitsu.com
> > >  wrote:
> > > >
> > > >
> > > > > +static void
> > > > > +ProcessParallelApplyInterrupts(void)
> > > > > +{
> > > > > +CHECK_FOR_INTERRUPTS();
> > > > > +
> > > > > +if (ShutdownRequestPending)
> > > > > +{
> > > > > +ereport(LOG,
> > > > > +(errmsg("logical replication parallel
> > > > > apply worker for subscrip
> > > > > tion \"%s\" has finished",
> > > > > +
> > > > > MySubscription->name)));
> > > > > +
> > > > > +apply_worker_clean_exit(false);
> > > > > +}
> > > > > +
> > > > > +if (ConfigReloadPending)
> > > > > +{
> > > > > +ConfigReloadPending = false;
> > > > > +ProcessConfigFile(PGC_SIGHUP);
> > > > > +}
> > > > > +}
> > > > >
> > > > > I personally think that we don't need to have a function to do only
> > > > > these few things.
> > > >
> > > > I thought that introduce a new function make the handling of worker 
> > > > specific
> > > > Interrupts logic similar to other existing ones. Like:
> > > > ProcessWalRcvInterrupts () in walreceiver.c and 
> > > > HandlePgArchInterrupts() in
> > > > pgarch.c ...
> > >
> > > I think the difference from them is that there is only one place to
> > > call ProcessParallelApplyInterrupts().
> > >
> >
> > But I feel it is better to isolate this code in a separate function.
> > What if we decide to extend it further by having some logic to stop
> > workers after reloading of config?
>
> I think we can separate the function at that time. But let's keep the
> current code as you and Hou agree with the current code. I'm not going
> to insist on that.
>
> >
> > > >
> > > > > ---
> > > > > server_version = walrcv_server_version(LogRepWorkerWalRcvConn);
> > > > > options.proto.logical.proto_version =
> > > > > +server_version >= 16 ?
> > > > > LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM :
> > > > > server_version >= 15 ?
> > > > > LOGICALREP_PROTO_TWOPHASE_VERSION_NUM :
> > > > > server_version >= 14 ?
> > > > > LOGICALREP_PROTO_STREAM_VERSION_NUM :
> > > > > LOGICALREP_PROTO_VERSION_NUM;
> > > > >
> > > > > Instead of always using the new protocol version, I think we can use
> > > > > LOGICALREP_PROTO_TWOPHASE_VERSION_NUM if the streaming is not
> > > > > 'parallel'. That way, we don't need to change protocl version check
> > > > > logic in pgoutput.c and don't need to expose defGetStreamingMode().
> > > > > What do you think?
> > > >
> > > > I think that some user can also use the new version number when trying 
> > > > to get
> > > > changes (via pg_logical_slot_peek_binary_changes or other functions), 
> > > > so I feel
> > > > leave the check for new version number seems fine.
> > > >
> > > > Besides, I feel even if we don't use new version number, we still need 
> > > > to use
> > > > defGetStreamingMode to check if parallel mode in used as we need to send
> > > > abort_lsn when parallel is in used. I might be missing something, sorry 
> > > > for
> > > > that. Can you please explain the idea a bit ?
> > >
> > > My idea is that we use LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM if
> > > (server_version >= 16 && MySubscription->stream ==
> > > SUBSTREAM_PARALLEL). If the stream is SUBSTREAM_ON, we use
> > > LOGICALREP_PROTO_TWOPHASE_VERSION_NUM even if server_version is
> > > 16. That way, in pgoutput.c, we can send abort_lsn if the protocol
> > > version is LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM. We don't need
> > > to send "streaming = parallel" to the publisher since the publisher
> > > can decide whether or not to send abort_lsn based on the protocol
> > > version (still needs to send "streaming = on" though). I might be
> > > missing something.
> > >
> >
> > What if we decide to send some more additional information as part of
> > another patch like we are discussing in the thread [1]? Now, we won't
> > be able to decide the version number based on just the streaming
> > option. Also, in such a case, even for
> > LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM, it may not be a good
> > idea to send additional abort information unless the user has used the
> > streaming=parallel option.
>
> If we're going to send the additional information, it makes sense to
> send streaming=parallel. But the next question came to me is why do we
> need to increase the protocol version for parallel apply feature? If
> sending the additional information is also controlled by an option
> like "streaming", we can decide what we send based on these options,
> no?
>

AFAIK the protocol version defines what protocol message bytes are
transmitted on the wire. So I thought

Re: Error-safe user functions

2022-12-08 Thread Andrew Dunstan



On 2022-12-08 Th 17:57, Tom Lane wrote:
> Andres Freund  writes:
>> On 2022-12-08 16:00:10 -0500, Robert Haas wrote:
>>> Yes, I think just putting "struct Node;" in as many places as
>>> necessary is the way to go. Or even:
>> +1
> OK, here's a v5 that does it like that.
>
> I've spent a little time pushing ahead on other input functions,
> and realized that my original plan to require a pre-encoded typmod
> for these test functions was not very user-friendly.  So in v5
> you can write something like
>
> pg_input_is_valid('1234.567', 'numeric(7,4)')
>
> 0004 attached finishes up the remaining core numeric datatypes
> (int*, float*, numeric).  I ripped out float8in_internal_opt_error
> in favor of a function that uses the new APIs.


Great, that takes care of some of the relatively urgent work.


>
> 0005 converts contrib/cube, which I chose to tackle partly because
> I'd already touched it in 0004, partly because it seemed like a
> good idea to verify that extension modules wouldn't have any
> problems with this apprach, and partly because I wondered whether
> an input function that uses a Bison/Flex parser would have big
> problems getting converted.  This one didn't, anyway.


Cool


>
> Given that this additional experimentation didn't find any holes
> in the API design, I think this is pretty much ready to go.
>
>   


I will look in more detail tomorrow, but it LGTM on a quick look.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: postgres_fdw uninterruptible during connection establishment / ProcSignalBarrier

2022-12-08 Thread Andres Freund

Hi,

I just re-discovered this issue, via
https://www.postgresql.org/message-id/20221209003607.bz2zdznvfnkq4zz3%40awork3.anarazel.de

On 2022-09-25 16:22:37 -0700, Andres Freund wrote:
> Maybe I am missing something, but I don't think it's OK for
> connect_pg_server() to connect in a blocking manner, without accepting
> interrupts?

It's definitely not. This basically means network issues or such can lead to
connections being unkillable...

We know how to do better, c.f. libpqrcv_connect(). I hacked that up for
postgres_fdw, and got through quite a few runs without related issues ([1]).

The same problem is present in two places in dblink.c. Obviously we can copy
and paste the code to dblink.c as well. But that means we have the same not
quite trivial code in three different c files. There's already a fair bit of
duplicated code around AcquireExternalFD().

It seems we should find a place to put backend specific libpq wrapper code
somewhere. Unless we want to relax the rule about not linking libpq into the
backend we can't just put it in the backend directly, though.

The only alternative way to provide a wrapper that I can think of are to
a) introduce a new static library that can be linked to by libpqwalreceiver,
   postgres_fdw, dblink
b) add a header with static inline functions implementing interrupt-processing
   connection establishment for libpq

Neither really has precedent.

The attached quick patch just adds and uses libpq_connect_interruptible() in
postgres_fdw. If we wanted to move this somewhere generic, at least part of
the external FD handling should also be moved into it.

dblink.c uses a lot of other blocking libpq functions, which obviously also
isn't ok.

Perhaps we could somehow make this easier from within libpq? My first thought
was a connection parameter that'd provide a different implementation of
pqSocketCheck() or such - but I don't think that'd work, because we might
throw errors when processing interrupts, and that'd not be ok from deep within
libpq.

Greetings,

Andres Freund

[1] I eventually encountered a deadlock in REINDEX, but it didn't involve
postgres_fw / ProcSignalBarrier
diff --git i/contrib/postgres_fdw/connection.c w/contrib/postgres_fdw/connection.c
index f0c45b00db8..f9da564a539 100644
--- i/contrib/postgres_fdw/connection.c
+++ w/contrib/postgres_fdw/connection.c
@@ -347,6 +347,67 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 		 entry->conn, server->servername, user->umid, user->userid);
 }

+/*
+ * PQconnectStartParams() wrapper that processes interrupts. Backend code
+ * should *never* enter blocking libpq code as that would prevent
+ * cancellations, global barriers etc from being processed.
+ */
+static PGconn *
+libpq_connect_interruptible(const char *const *keywords,
+			const char *const *values,
+			int expand_dbname,
+			uint32 wait_event_info)
+{
+	PGconn *conn;
+	PostgresPollingStatusType status;
+
+	conn = PQconnectStartParams(keywords, values, false);
+
+	if (!conn)
+		return NULL;
+
+	/*
+	 * Poll connection until we have OK or FAILED status.
+	 *
+	 * Per spec for PQconnectPoll, first wait till socket is write-ready.
+	 */
+	status = PGRES_POLLING_WRITING;
+	while (status != PGRES_POLLING_OK && status != PGRES_POLLING_FAILED)
+	{
+		int			io_flag;
+		int			rc;
+
+		if (status == PGRES_POLLING_READING)
+			io_flag = WL_SOCKET_READABLE;
+#ifdef WIN32
+		/* Windows needs a different test while waiting for connection-made */
+		else if (PQstatus(conn->streamConn) == CONNECTION_STARTED)
+			io_flag = WL_SOCKET_CONNECTED;
+#endif
+		else
+			io_flag = WL_SOCKET_WRITEABLE;
+
+		rc = WaitLatchOrSocket(MyLatch,
+			   WL_EXIT_ON_PM_DEATH | WL_LATCH_SET | io_flag,
+			   PQsocket(conn),
+			   0,
+			   wait_event_info);
+
+		/* Interrupted? */
+		if (rc & WL_LATCH_SET)
+		{
+			ResetLatch(MyLatch);
+			CHECK_FOR_INTERRUPTS();
+		}
+
+		/* If socket is ready, advance the libpq state machine */
+		if (rc & io_flag)
+			status = PQconnectPoll(conn);
+	}
+
+	return conn;
+}
+
 /*
  * Connect to remote server using specified server and user mapping properties.
  */
@@ -471,7 +532,8 @@ connect_pg_server(ForeignServer *server, UserMapping *user)
 		}

 		/* OK to make connection */
-		conn = PQconnectdbParams(keywords, values, false);
+		conn = libpq_connect_interruptible(keywords, values,
+		   false, PG_WAIT_EXTENSION);

 		if (!conn)
 			ReleaseExternalFD();	/* because the PG_CATCH block won't */

Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-12-08 Thread Masahiko Sawada

On Tue, Dec 6, 2022 at 7:32 PM John Naylor  wrote:
>
> On Fri, Dec 2, 2022 at 11:42 PM Masahiko Sawada  wrote:
> >
> > > On Mon, Nov 14, 2022 at 7:59 PM John Naylor 
> > >  wrote:
> > > >
> > > > - Optimize node128 insert.
> > >
> > > I've attached a rough start at this. The basic idea is borrowed from our 
> > > bitmapset nodes, so we can iterate over and operate on word-sized (32- or 
> > > 64-bit) types at a time, rather than bytes.
> >
> > Thanks! I think this is a good idea.
> >
> > > To make this easier, I've moved some of the lower-level macros and types 
> > > from bitmapset.h/.c to pg_bitutils.h. That's probably going to need a 
> > > separate email thread to resolve the coding style clash this causes, so 
> > > that can be put off for later.
>
> I started a separate thread [1], and 0002 comes from feedback on that. There 
> is a FIXME about using WORDNUM and BITNUM, at least with that spelling. I'm 
> putting that off to ease rebasing the rest as v13 -- getting some CI testing 
> with 0002 seems like a good idea. There are no other changes yet. Next, I 
> will take a look at templating local vs. shared memory. I might try basing 
> that on the styles of both v12 and v8, and see which one works best with 
> templating.

Thank you so much!

In the meanwhile, I've been working on vacuum integration. There are
two things I'd like to discuss some time:

The first is the minimum of maintenance_work_mem, 1 MB. Since the
initial DSA segment size is 1MB (DSA_INITIAL_SEGMENT_SIZE), parallel
vacuum with radix tree cannot work with the minimum
maintenance_work_mem. It will need to increase it to 4MB or so. Maybe
we can start a new thread for that.

The second is how to limit the size of the radix tree to
maintenance_work_mem. I think that it's tricky to estimate the maximum
number of keys in the radix tree that fit in maintenance_work_mem. The
radix tree size varies depending on the key distribution. The next
idea I considered was how to limit the size when inserting a key. In
order to strictly limit the radix tree size, probably we have to
change the rt_set so that it breaks off and returns false if the radix
tree size is about to exceed the memory limit when we allocate a new
node or grow a node kind/class. Ideally, I'd like to control the size
outside of radix tree (e.g. TIDStore) since it could introduce
overhead to rt_set() but probably we need to add such logic in radix
tree.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: [PATCH] random_normal function

2022-12-08 Thread Paul Ramsey



> On Dec 8, 2022, at 3:25 PM, Paul Ramsey  wrote:
> 
>> 
>> Revised patch attached.
> 
> And again, because I think I missed one change in the last one.
> 
> 

Final tme, with fixes from cirrusci.



random_normal_04.patch
Description: Binary data

Re: [PoC] Federated Authn/z with OAUTHBEARER

2022-12-08 Thread Jacob Champion

On Wed, Dec 7, 2022 at 3:22 PM Andrey Chudnovsky
 wrote:
> 
>> I think it's okay to have the extension and HBA collaborate to
>> provide discovery information. Your proposal goes further than
>> that, though, and makes the server aware of the chosen client flow.
>> That appears to be an architectural violation: why does an OAuth
>> resource server need to know the client flow at all?
> 
> Ok. It may have left there from intermediate iterations. We did 
> consider making extension drive the flow for specific grant_type,
> but decided against that idea. For the same reason you point to. Is
> it correct that your main concern about use of grant_type was that 
> it's propagated to the server? Then yes, we will remove sending it
> to the server.

Okay. Yes, that was my primary concern.

>> Ideally, yes, but that only works if all identity providers
>> implement the same flows in compatible ways. We're already seeing
>> instances where that's not the case and we'll necessarily have to
>> deal with that up front.
> 
> Yes, based on our analysis OIDC spec is detailed enough, that 
> providers implementing that one, can be supported with generic code
> in libpq / client. Github specifically won't fit there though.
> Microsoft Azure AD, Google, Okta (including Auth0) will. 
> Theoretically discovery documents can be returned from the extension 
> (server-side) which is provider specific. Though we didn't plan to 
> prioritize that.

As another example, Google's device authorization grant is incompatible
with the spec (which they co-authored). I want to say I had problems
with Azure AD not following that spec either, but I don't remember
exactly what they were. I wouldn't be surprised to find more tiny
departures once we get deeper into implementation.

>> That seems to be restating the goal of OAuth and OIDC. Can you
>> explain how the incompatible change allows you to accomplish this
>> better than standard implementations?
> 
> Do you refer to passing grant_type to the server? Which we will get 
> rid of in the next iteration. Or other incompatible changes as well?

Just the grant type, yeah.

>> Why? I claim that standard OAUTHBEARER can handle all of that.
>> What does your proposed architecture (the third diagram) enable
>> that my proposed hook (the second diagram) doesn't?
> 
> The hook proposed on the 2nd diagram effectively delegates all Oauth 
> flows implementations to the client. We propose libpq takes care of
> pulling OpenId discovery and coordination. Which is effectively
> Diagram 1 + more flows + server hook providing root url/audience.
> 
> Created the diagrams with all components for 3 flows: [snip]

(I'll skip ahead to your later mail on this.)

>> (For a future conversation: they need to set up authorization,
>> too, with custom scopes or some other magic. It's not enough to
>> check who the token belongs to; even if Postgres is just using the
>> verified email from OpenID as an authenticator, you have to also
>> know that the user authorized the token -- and therefore the client
>> -- to access Postgres on their behalf.)
> 
> My understanding is that metadata in the tokens is provider
> specific, so server side hook would be the right place to handle
> that. Plus I can envision for some providers it can make sense to
> make a remote call to pull some information.

The server hook is the right place to check the scopes, yes, but I think
the DBA should be able to specify what those scopes are to begin with.
The provider of the extension shouldn't be expected by the architecture
to hardcode those decisions, even if Azure AD chooses to short-circuit
that choice and provide magic instead.

On 12/7/22 20:25, Andrey Chudnovsky wrote:
> That being said, the Diagram 2 would look like this with our proposal:
> [snip]
> 
> With the application taking care of all Token acquisition logic. While
> the server-side hook is participating in the pre-authentication reply.
> 
> That is definitely a required scenario for the long term and the
> easiest to implement in the client core.> And if we can do at least that flow 
> in PG16 it will be a strong
> foundation to provide more support for specific grants in libpq going
> forward.

Agreed.
> Does the diagram above look good to you? We can then start cleaning up
> the patch to get that in first.

I maintain that the hook doesn't need to hand back artifacts to the
client for a second PQconnect call. It can just use those artifacts to
obtain the access token and hand that right back to libpq. (I think any
requirement that clients be rewritten to call PQconnect twice will
probably be a sticking point for adoption of an OAuth patch.)

That said, now that your proposal is also compatible with OAUTHBEARER, I
can pony up some code to hopefully prove my point. (I don't know if I'll
be able to do that by the holidays though.)

Thanks!
--Jacob

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2022-12-08 Thread Andres Freund

Hi,

On 2022-12-08 16:15:11 -0800, Andres Freund wrote:
> commit 3f0e786ccbf
> Author: Andres Freund 
> Date:   2022-12-07 12:13:35 -0800
> 
> meson: Add 'running' test setup, as a replacement for installcheck
> 
> CI tests the pg_regress/isolationtester tests that support doing so against a
> running server.
> 
> 
> Unfortunately cfbot shows that that doesn't work entirely reliably.
> 
> The most frequent case is postgres_fdw, which somewhat regularly fails with a
> regression.diff like this:
> 
> diff -U3 /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out 
> /tmp/cirrus-ci-build/build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out
> --- /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out   
> 2022-12-08 20:35:24.772888000 +
> +++ 
> /tmp/cirrus-ci-build/build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out
>   2022-12-08 20:43:38.19945 +
> @@ -9911,8 +9911,7 @@
>   WHERE application_name = 'fdw_retry_check';
>   pg_terminate_backend
>  --
> - t
> -(1 row)
> +(0 rows)
> 
>  -- This query should detect the broken connection when starting new remote
>  -- transaction, reestablish new connection, and then succeed.
> 
> 
> See e.g.
> https://cirrus-ci.com/task/5925540020879360
> https://api.cirrus-ci.com/v1/artifact/task/5925540020879360/testrun/build/testrun/postgres_fdw-running/regress/regression.diffs
> https://api.cirrus-ci.com/v1/artifact/task/5925540020879360/testrun/build/testrun/runningcheck.log
> 
> 
> The following comment in the test provides a hint what might be happening:
> 
> -- If debug_discard_caches is active, it results in
> -- dropping remote connections after every transaction, making it
> -- impossible to test termination meaningfully.  So turn that off
> -- for this test.
> SET debug_discard_caches = 0;
> 
> 
> I guess that a cache reset message arrives and leads to the connection being
> terminated. Unfortunately that's hard to see right now, as the relevant log
> messages are output with DEBUG3 - it's quite verbose, so enabling it for all
> tests will be painful.
> 
> I *think* I have seen this failure locally at least once, when running the
> test normally.
> 
> 
> I'll try to reproduce this locally for a bit. If I can't, the only other idea
> I have for debugging this is to change log_min_messages in that section of the
> postgres_fdw test to DEBUG3.

Oh. I tried to reproduce the issue, without success so far, but eventually my
test loop got stuck in something I reported previously and forgot about since:
https://www.postgresql.org/message-id/20220925232237.p6uskba2dw6fnwj2%40awork3.anarazel.de

I wonder if the timing on the freebsd CI task works out to hitting a "smaller
version" of the problem that eventually resolves itself, which then leads to a
sinval reset getting sent, causing the observable problem.

Greetings,

Andres Freund

tests against running server occasionally fail, postgres_fdw & tenk1

2022-12-08 Thread Andres Freund

Hi,

Since

commit 3f0e786ccbf
Author: Andres Freund 
Date:   2022-12-07 12:13:35 -0800

meson: Add 'running' test setup, as a replacement for installcheck

CI tests the pg_regress/isolationtester tests that support doing so against a
running server.


Unfortunately cfbot shows that that doesn't work entirely reliably.

The most frequent case is postgres_fdw, which somewhat regularly fails with a
regression.diff like this:

diff -U3 /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out 
/tmp/cirrus-ci-build/build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out
--- /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out 
2022-12-08 20:35:24.772888000 +
+++ 
/tmp/cirrus-ci-build/build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out
2022-12-08 20:43:38.19945 +
@@ -9911,8 +9911,7 @@
WHERE application_name = 'fdw_retry_check';
  pg_terminate_backend
 --
- t
-(1 row)
+(0 rows)

 -- This query should detect the broken connection when starting new remote
 -- transaction, reestablish new connection, and then succeed.


See e.g.
https://cirrus-ci.com/task/5925540020879360
https://api.cirrus-ci.com/v1/artifact/task/5925540020879360/testrun/build/testrun/postgres_fdw-running/regress/regression.diffs
https://api.cirrus-ci.com/v1/artifact/task/5925540020879360/testrun/build/testrun/runningcheck.log


The following comment in the test provides a hint what might be happening:

-- If debug_discard_caches is active, it results in
-- dropping remote connections after every transaction, making it
-- impossible to test termination meaningfully.  So turn that off
-- for this test.
SET debug_discard_caches = 0;


I guess that a cache reset message arrives and leads to the connection being
terminated. Unfortunately that's hard to see right now, as the relevant log
messages are output with DEBUG3 - it's quite verbose, so enabling it for all
tests will be painful.

I *think* I have seen this failure locally at least once, when running the
test normally.


I'll try to reproduce this locally for a bit. If I can't, the only other idea
I have for debugging this is to change log_min_messages in that section of the
postgres_fdw test to DEBUG3.



The second failure case I observed, at a lower frequency, is in the main
regression tests:
https://cirrus-ci.com/task/5640584912699392
https://api.cirrus-ci.com/v1/artifact/task/5640584912699392/testrun/build/testrun/regress-running/regress/regression.diffs
https://api.cirrus-ci.com/v1/artifact/task/5640584912699392/testrun/build/testrun/runningcheck.log

diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/create_index.out 
/tmp/cirrus-ci-build/build/testrun/regress-running/regress/results/create_index.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/create_index.out 
2022-12-08 16:49:28.239508000 +
+++ 
/tmp/cirrus-ci-build/build/testrun/regress-running/regress/results/create_index.out
 2022-12-08 16:57:17.28665 +
@@ -1910,11 +1910,15 @@
 SELECT unique1 FROM tenk1
 WHERE unique1 IN (1,42,7)
 ORDER BY unique1;
-  QUERY PLAN

- Index Only Scan using tenk1_unique1 on tenk1
-   Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
-(2 rows)
+QUERY PLAN
+---
+ Sort
+   Sort Key: unique1
+   ->  Bitmap Heap Scan on tenk1
+ Recheck Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+ ->  Bitmap Index Scan on tenk1_unique1
+   Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+(6 rows)

 SELECT unique1 FROM tenk1
 WHERE unique1 IN (1,42,7)


Which I think we've seen a number of times before, even in the temp-install
case. We fixed one source of this issue in this thread:
https://www.postgresql.org/message-id/1346227.1649887693%40sss.pgh.pa.us
but it looks like there's some remaining instability.

According to the server log (link above), there's no autovacuum on
tenk1.

Unfortunately we don't log non-automatic vacuums unless they are run with
verbose, so we can't see what horizon was used (see heap_vacuum_rel()'s
computation of 'instrument').

I don't have a better idea than to either change the above, or to revert
91998539b227dfc6dd091714da7d106f2c95a321.

Greetings,

Andres Freund

Re: [PATCH] random_normal function

2022-12-08 Thread Paul Ramsey

> 
> Revised patch attached.

And again, because I think I missed one change in the last one.



random_normal_03.patch
Description: Binary data

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2022-12-08 Thread Jacob Champion

On Mon, Dec 5, 2022 at 10:53 AM Jacob Champion  wrote:
> We are not the first using Homebrew to run into this, and best I can
> tell, it is a brew-specific bug. The certificate directory that's been
> configured isn't actually installed by the formula. (A colleague here
> was able to verify the same behavior on their local machine, so it's
> not a Cirrus problem.)

Correction -- it is installed, but then it's removed during `brew
cleanup`. I asked about it over on their discussion board [1].

> (If this is eventually considered a bug in the formula, we'll need to
> update to get the fix regardless.)

For now, it's worked around in v4. This should finally get the cfbot
fully green.

(The "since diff" is now in range-diff format; if you use them, let me
know if this is more or less useful than before.)

Thanks!
--Jacob

[1] https://github.com/orgs/Homebrew/discussions/4030
1:  d19b0bfc95 ! 1:  b01812f604 libpq: add sslrootcert=system to use default CAs
@@ Commit message
 
 Based on a patch by Thomas Habets.
 
+Note the workaround in .cirrus.yml for a strange interaction between
+brew and the openssl@3 formula; hopefully this can be removed in the
+near future.
+
 Discussion: 
https://www.postgresql.org/message-id/flat/CA%2BkHd%2BcJwCUxVb-Gj_0ptr3_KZPwi3%2B67vK6HnLFBK9MzuYrLA%40mail.gmail.com
 
+ ## .cirrus.yml ##
+@@ .cirrus.yml: task:
+   make \
+   meson \
+   openldap \
+-  openssl \
++  openssl@3 \
+   python \
+   tcl-tk \
+   zstd
+ 
+ brew cleanup -s # to reduce cache size
++
++# brew cleanup removes the empty certs directory in OPENSSLDIR, 
causing
++# OpenSSL to report unexpected errors ("unregistered scheme") during
++# verification failures. Put it back for now as a workaround.
++#
++#   https://github.com/orgs/Homebrew/discussions/4030
++#
++# Note that $(brew --prefix openssl) will give us the opt/ prefix but 
not
++# the etc/ prefix, so we hardcode the full path here. openssl@3 is 
pinned
++# above to try to minimize the chances of this changing beneath us, 
but it's
++# brittle...
++mkdir -p "/usr/local/etc/openssl@3/certs"
+   upload_caches: homebrew
+ 
+   ccache_cache:
+
  ## configure ##
 @@ configure: $as_echo "$ac_res" >&6; }
  
2:  87a324efcf = 2:  432453942a libpq: force sslmode=verify-full for system CAs
From b01812f60495d158ba4de7865187a1dbd132a93b Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 24 Oct 2022 15:30:25 -0700
Subject: [PATCH v4 1/2] libpq: add sslrootcert=system to use default CAs

Based on a patch by Thomas Habets.

Note the workaround in .cirrus.yml for a strange interaction between
brew and the openssl@3 formula; hopefully this can be removed in the
near future.

Discussion: https://www.postgresql.org/message-id/flat/CA%2BkHd%2BcJwCUxVb-Gj_0ptr3_KZPwi3%2B67vK6HnLFBK9MzuYrLA%40mail.gmail.com
---
 .cirrus.yml   |  14 +-
 configure | 289 +-
 configure.ac  |   2 +
 doc/src/sgml/libpq.sgml   |  17 ++
 doc/src/sgml/runtime.sgml |   6 +-
 meson.build   |   7 +
 src/include/pg_config.h.in|   4 +
 src/interfaces/libpq/fe-secure-openssl.c  |  26 +-
 src/test/ssl/ssl/server-cn-only+server_ca.crt |  38 +++
 src/test/ssl/sslfiles.mk  |   6 +-
 src/test/ssl/t/001_ssltests.pl|  29 ++
 11 files changed, 295 insertions(+), 143 deletions(-)
 create mode 100644 src/test/ssl/ssl/server-cn-only+server_ca.crt

diff --git a/.cirrus.yml b/.cirrus.yml
index f3192e..508158ecd3 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -448,12 +448,24 @@ task:
   make \
   meson \
   openldap \
-  openssl \
+  openssl@3 \
   python \
   tcl-tk \
   zstd
 
 brew cleanup -s # to reduce cache size
+
+# brew cleanup removes the empty certs directory in OPENSSLDIR, causing
+# OpenSSL to report unexpected errors ("unregistered scheme") during
+# verification failures. Put it back for now as a workaround.
+#
+#   https://github.com/orgs/Homebrew/discussions/4030
+#
+# Note that $(brew --prefix openssl) will give us the opt/ prefix but not
+# the etc/ prefix, so we hardcode the full path here. openssl@3 is pinned
+# above to try to minimize the chances of this changing beneath us, but it's
+# brittle...
+mkdir -p "/usr/local/etc/openssl@3/certs"
   upload_caches: homebrew
 
   ccache_cache:
diff --git a/configure b/configure
index 650755a6b1..92bd7b98c0 100755
--- a/configure
+++ b/configure
@@ -2091,6 +2091,56 @@ $as_echo "$ac_res" >&6; }
 
 } # ac_fn_c_check_func
 
+# ac_fn_c_check_decl LINENO SYMBOL VAR INCLUDES
+#

Re: [PATCH] random_normal function

2022-12-08 Thread Paul Ramsey




> On Dec 8, 2022, at 2:46 PM, Justin Pryzby  wrote:
> 
> I guess make_interval is a typo ?
> 
> This is causing it to fail tests:
> http://cfbot.cputube.org/paul-ramsey.html
> 

Yep, dumb typo, thanks! This bot is amazeballs, thank you!

P.

Re: [PATCH] random_normal function

2022-12-08 Thread Paul Ramsey

> On Dec 8, 2022, at 2:40 PM, David G. Johnston  
> wrote:
> 
> On Thu, Dec 8, 2022 at 2:53 PM Paul Ramsey  wrote:
> 
> random_normal(stddev float8 DEFAULT 1.0, mean float8 DEFAULT 0.0)
> 
> Any particular justification for placing stddev before mean?  A brief survey 
> seems to indicate other libraries, as well as (at least for me) learned 
> convention, has the mean be supplied first, then the standard deviation.  The 
> implementation/commentary seems to use that convention as well.

No, I'm not sure what was going through my head, but I'm sure it made sense at 
the time (maybe something like "people will tend to jimmy with the stddev more 
frequently than the mean"). I've reversed the order

> Some suggestions:

Thanks! Taken :)

> And a possible micro-optimization...
> 
> + bool   rescale = true
> + if (PG_NARGS() = 0)
> +rescale = false
> ...
> + if (rescale)
> ... result = (stddev * z) + mean
> + else
> +  result = z

Feels a little too micro to me (relative to the overall cost of the transform 
from uniform to normal distribution). I'm going to leave it out unless you 
violently want it.

Revised patch attached.

Thanks!

P

random_normal_02.patch
Description: Binary data

Re: Error-safe user functions

2022-12-08 Thread Tom Lane

Andres Freund  writes:
> On 2022-12-08 16:00:10 -0500, Robert Haas wrote:
>> Yes, I think just putting "struct Node;" in as many places as
>> necessary is the way to go. Or even:

> +1

OK, here's a v5 that does it like that.

I've spent a little time pushing ahead on other input functions,
and realized that my original plan to require a pre-encoded typmod
for these test functions was not very user-friendly.  So in v5
you can write something like

pg_input_is_valid('1234.567', 'numeric(7,4)')

0004 attached finishes up the remaining core numeric datatypes
(int*, float*, numeric).  I ripped out float8in_internal_opt_error
in favor of a function that uses the new APIs.

0005 converts contrib/cube, which I chose to tackle partly because
I'd already touched it in 0004, partly because it seemed like a
good idea to verify that extension modules wouldn't have any
problems with this apprach, and partly because I wondered whether
an input function that uses a Bison/Flex parser would have big
problems getting converted.  This one didn't, anyway.

Given that this additional experimentation didn't find any holes
in the API design, I think this is pretty much ready to go.

regards, tom lane

diff --git a/doc/src/sgml/ref/create_type.sgml b/doc/src/sgml/ref/create_type.sgml
index 693423e524..994dfc6526 100644
--- a/doc/src/sgml/ref/create_type.sgml
+++ b/doc/src/sgml/ref/create_type.sgml
@@ -900,6 +900,17 @@ CREATE TYPE name
function is written in C.
   
 
+  
+   In PostgreSQL version 16 and later,
+   it is desirable for base types' input functions to
+   return soft errors using the
+   new errsave()/ereturn()
+   mechanism, rather than throwing ereport()
+   exceptions as in previous versions.
+   See src/backend/utils/fmgr/README for more
+   information.
+  
+
  
 
  
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 4368c30fdb..7c594be583 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -56,6 +56,7 @@ node_headers = \
 	nodes/bitmapset.h \
 	nodes/extensible.h \
 	nodes/lockoptions.h \
+	nodes/miscnodes.h \
 	nodes/replnodes.h \
 	nodes/supportnodes.h \
 	nodes/value.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 7212bc486f..08992dfd47 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -68,6 +68,7 @@ my @all_input_files = qw(
   nodes/bitmapset.h
   nodes/extensible.h
   nodes/lockoptions.h
+  nodes/miscnodes.h
   nodes/replnodes.h
   nodes/supportnodes.h
   nodes/value.h
@@ -89,6 +90,7 @@ my @nodetag_only_files = qw(
   executor/tuptable.h
   foreign/fdwapi.h
   nodes/lockoptions.h
+  nodes/miscnodes.h
   nodes/replnodes.h
   nodes/supportnodes.h
 );
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index f5cd1b7493..eb489ea3a7 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -71,6 +71,7 @@
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
+#include "nodes/miscnodes.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/bgworker.h"
@@ -611,6 +612,128 @@ errfinish(const char *filename, int lineno, const char *funcname)
 	CHECK_FOR_INTERRUPTS();
 }
 
+
+/*
+ * errsave_start --- begin a "soft" error-reporting cycle
+ *
+ * If "context" isn't an ErrorSaveContext node, this behaves as
+ * errstart(ERROR, domain), and the errsave() macro ends up acting
+ * exactly like ereport(ERROR, ...).
+ *
+ * If "context" is an ErrorSaveContext node, but the node creator only wants
+ * notification of the fact of a soft error without any details, we just set
+ * the error_occurred flag in the ErrorSaveContext node and return false,
+ * which will cause us to skip the remaining error processing steps.
+ *
+ * Otherwise, create and initialize error stack entry and return true.
+ * Subsequently, errmsg() and perhaps other routines will be called to further
+ * populate the stack entry.  Finally, errsave_finish() will be called to
+ * tidy up.
+ */
+bool
+errsave_start(struct Node *context, const char *domain)
+{
+	ErrorSaveContext *escontext;
+	ErrorData  *edata;
+
+	/*
+	 * Do we have a context for soft error reporting?  If not, just punt to
+	 * errstart().
+	 */
+	if (context == NULL || !IsA(context, ErrorSaveContext))
+		return errstart(ERROR, domain);
+
+	/* Report that a soft error was detected */
+	escontext = (ErrorSaveContext *) context;
+	escontext->error_occurred = true;
+
+	/* Nothing else to do if caller wants no further details */
+	if (!escontext->details_wanted)
+		return false;
+
+	/*
+	 * Okay, crank up a stack entry to store the info in.
+	 */
+
+	recursion_depth++;
+
+	/* Initialize data for this error frame */
+	edata = get_error_stack_entry();
+	edata->elevel = LOG;		/* signal all is well to errsave_finish */
+	set_stack_entry_domain(edata, domain);
+	/* Select default errcode based on the assumed elevel of ERROR */
+

Re: [PATCH] random_normal function

2022-12-08 Thread Justin Pryzby

On Thu, Dec 08, 2022 at 01:53:23PM -0800, Paul Ramsey wrote:
> Just a utility function to generate random numbers from a normal
> distribution. I find myself doing this several times a year, and I am
> sure I must not be the only one.
> 
> random_normal(stddev float8 DEFAULT 1.0, mean float8 DEFAULT 0.0)

+++ b/src/backend/catalog/system_functions.sql
@@ -620,6 +620,13 @@ CREATE OR REPLACE FUNCTION
  STABLE PARALLEL SAFE
  AS 'sql_localtimestamp';
 
+CREATE OR REPLACE FUNCTION
+  random_normal(stddev float8 DEFAULT 1.0, mean float8 DEFAULT 0.0)
+RETURNS float8
+LANGUAGE INTERNAL
+STRICT VOLATILE PARALLEL SAFE
+AS 'make_interval';

I guess make_interval is a typo ?

This is causing it to fail tests:
http://cfbot.cputube.org/paul-ramsey.html

BTW you can run the same tests as CFBOT does from your own github
account; see:
https://www.postgresql.org/message-id/20221116232507.go26...@telsasoft.com

-- 
Justin

Re: [PATCH] random_normal function

2022-12-08 Thread David G. Johnston

On Thu, Dec 8, 2022 at 2:53 PM Paul Ramsey 
wrote:

>
> random_normal(stddev float8 DEFAULT 1.0, mean float8 DEFAULT 0.0)
>

Any particular justification for placing stddev before mean?  A brief
survey seems to indicate other libraries, as well as (at least for me)
learned convention, has the mean be supplied first, then the standard
deviation.  The implementation/commentary seems to use that convention as
well.

Some suggestions:

/* Apply optional user parameters */ - that isn't important or even what is
happening though, and the body of the function shouldn't care about the
source of the values for the variables it uses.

Instead:
/* Transform the normal standard variable (z) using the target normal
distribution parameters */

Personally I'd probably make that even more explicit:

+ float8z
...
* z = pg_prng_double_normal(_seed)
+ /* ... */
* result = (stddev * z) + mean

And a possible micro-optimization...

+ bool   rescale = true
+ if (PG_NARGS() = 0)
+rescale = false
...
+ if (rescale)
... result = (stddev * z) + mean
+ else
+  result = z

David J.

Re: Error-safe user functions

2022-12-08 Thread Andres Freund

Hi,

On 2022-12-08 16:00:10 -0500, Robert Haas wrote:
> On Thu, Dec 8, 2022 at 11:32 AM Tom Lane  wrote:
> > If we go with "struct Node *" then we can solve such problems by
> > just repeating "struct Node;" forward-declarations in as many
> > headers as we have to.
> 
> Yes, I think just putting "struct Node;" in as many places as
> necessary is the way to go. Or even:

+1


> struct Node;
> typedef struct Node Node;

That doesn't work well, because C99 doesn't allow typedefs to be redeclared in
the same scope. IIRC C11 added suppport for it, and a lot of compilers already
supported it before.

Greetings,

Andres Freund

[PATCH] random_normal function

2022-12-08 Thread Paul Ramsey

Just a utility function to generate random numbers from a normal
distribution. I find myself doing this several times a year, and I am
sure I must not be the only one.

random_normal(stddev float8 DEFAULT 1.0, mean float8 DEFAULT 0.0)


random_normal_01.patch
Description: Binary data

Re: sendFileWithContent() does not advance the source pointer

2022-12-08 Thread Andres Freund

Hi,

On 2022-12-08 20:44:05 +0100, Antonin Houska wrote:
> When checking something else in the base backup code, I've noticed that
> sendFileWithContent() does not advance the 'content' pointer.

Oof. Luckily it looks like that is a relatively recent issue, introduced in
bef47ff85df, which is only in 15.

Greetings,

Andres Freund

Re: Aggregate node doesn't include cost for sorting

2022-12-08 Thread David Rowley

On Fri, 9 Dec 2022 at 03:38, Tom Lane  wrote:
> It's true that the cost attributed to the Agg node won't impact any
> live decisions in the plan level in which it appears.  However, if
> that's a subquery, then the total cost attributed to the subplan
> could in principle affect plan choices in the outer query.  So there
> is a valid argument for wanting to try to get it right.

I guess the jit thresholds are another reason to try to make the costs
a reflection of the expected run-time too.

> Having said that, I think that the actual impact on outer-level choices
> is usually minimal.  So it didn't bother me that we swept this under
> the rug before --- and I'm pretty sure that we're sweeping similar
> things under the rug elsewhere in top-of-query planning.  However,
> given 1349d279 it should surely be true that the planner knows how many
> sorts it's left to be done at runtime (a number we did not have at hand
> before).  So it seems like it ought to be simple enough to account for
> this effect more accurately.  I'd be in favor of doing so if it's
> simple and cheap to get the numbers.

Ok, probably Heikki's work in 0a2bc5d61 is a more useful piece of work
to get us closer to that goal.  I think all that's required to make it
work is adding on the costs in the final foreach loop in
get_agg_clause_costs().  The Aggrefs have already been marked as
aggpresorted by that time, so it should be a matter of:

if ((aggref->aggorder != NIL || aggref->aggdistinct != NIL) &&
!aggref->aggpresorted)
 // add costs for sort

David

Re: fix and document CLUSTER privileges

2022-12-08 Thread Robert Haas

On Thu, Dec 8, 2022 at 1:13 PM Nathan Bossart  wrote:
> On Thu, Dec 08, 2022 at 07:20:28AM -0500, Andrew Dunstan wrote:
> > We should probably talk about what the privileges should be, though. I
> > think there's a case to be made that CLUSTER should be governed by the
> > VACUUM privileges, given how VACUUM FULL is now implemented.
>
> Currently, CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX (minus REINDEX
> SCHEMA|DATABASE|SYSTEM) require ownership of the relation or superuser.  In
> fact, all three use the same RangeVarCallbackOwnsTable() callback function.
> My current thinking is that this is good enough.  I don't sense any strong
> demand for allowing database owners to run these commands on all non-shared
> relations, and there's ongoing work to break out the privileges to GRANT
> and predefined roles.

+1.

I don't see why being the database owner should give you the right to
run a random subset of commands on any table in the database. Tables
have their own system for access privileges; we should use that, or
extend it as required.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Error-safe user functions

2022-12-08 Thread Robert Haas

On Thu, Dec 8, 2022 at 11:32 AM Tom Lane  wrote:
> If we go with "struct Node *" then we can solve such problems by
> just repeating "struct Node;" forward-declarations in as many
> headers as we have to.

Yes, I think just putting "struct Node;" in as many places as
necessary is the way to go. Or even:

struct Node;
typedef struct Node Node;

which I think then allows for Node * to be used later.

A small problem with typedef struct Something *SomethingElse is that
it can get hard to keep track of whether some identifier is a pointer
to a struct or just a struct. This doesn't bother me as much as it
does some other hackers, from what I gather anyway, but I think we
should be pretty judicious in using typedef that way. "SomethingPtr"
really has no advantage over "Something *". It is neither shorter nor
clearer.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Aggregate node doesn't include cost for sorting

2022-12-08 Thread David Rowley

On Fri, 9 Dec 2022 at 01:12, David Geier  wrote:
> Both plans were captured on 14.5, which is indeed prior to 1349d279.
>
> I disabled sequential scan to show that there's an alternative plan
> which is superior to the chosen plan: Index Only Scan is more expensive
> and takes longer than the Seq Scan, but the subsequent Aggregate runs
> much faster as it doesn't have to sort, making the plan overall superior.

Aha, 14.5. What's going on there is that it's still doing the sort.
The aggregate code in that version does not skip the sort because of
the presorted input. A likely explanation for the performance increase
is due to the presorted check in our qsort implementation. The
successful presort check is O(N), whereas an actual sort is O(N *
logN).

It's true that if we had been doing proper costing on these ORDER BY /
DISTINCT aggregates that we could have noticed that the input path's
pathkeys indicate that no sort is required and costed accordingly, but
if we'd gone to the trouble of factoring that into the costs, then it
would also have made sense to make nodeAgg.c not sort on presorted
input.  We got the latter in 1349d279. It's just we didn't do anything
about the costings in that commit.

Anyway, in the next version of Postgres, the planner is highly likely
to choose the 2nd plan in your original email. It'll also be even
faster than you've shown due to the aggregate code not having to store
and read tuples in the tuplesort object. Also, no O(N) presort check
either.  The performance should be much closer to what it would be if
you disabled seqscan and dropped the DISTINCT out of your aggregate.

David

sendFileWithContent() does not advance the source pointer

2022-12-08 Thread Antonin Houska

When checking something else in the base backup code, I've noticed that
sendFileWithContent() does not advance the 'content' pointer. The sink buffer
is large enough (32kB) so that the first iteration usually processes the whole
file (only special files are processed by this function), and thus that the
problem is hidden.

However it's possible to hit the issue: if there are too many tablespaces,
pg_basebackup generates corrupted tablespace_map. Instead of writing all the
tablespace paths it writes only some and then starts to write the contents
from the beginning again.

The attached script generates scripts to create many tablespaces as well as
the underlying directories. Fix is attached here as well.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

#!/bin/bash

TBSPDIR=/mnt/ramdisk/tbspcs
TBSPCOUNT=2048
SCRIPT_SH=create.sh
SCRIPT_SQL=create.sql

echo "#!/bin/bash" > ${SCRIPT_SH}
echo "mkdir -p $TBSPDIR" >> ${SCRIPT_SH}

echo "" > ${SCRIPT_SQL}

i=0
while [ $i -lt $TBSPCOUNT ];
do
printf "mkdir %s/%.4d\n" $TBSPDIR $i >> ${SCRIPT_SH}
printf "CREATE TABLESPACE tbsp_%.4d LOCATION '%s/%.4d';\n" $i $TBSPDIR $i >> ${SCRIPT_SQL}
i=$((i+1))
done

chmod u+x ${SCRIPT_SH}
diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index 74fb529380..7aa5f6e44d 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -1073,6 +1073,7 @@ sendFileWithContent(bbsink *sink, const char *filename, const char *content,
 		memcpy(sink->bbs_buffer, content, nbytes);
 		bbsink_archive_contents(sink, nbytes);
 		bytes_done += nbytes;
+		content += nbytes;
 	}
 
 	_tarWritePadding(sink, len);

Re: Documenting MERGE INTO ONLY ...

2022-12-08 Thread Nathan Bossart

On Thu, Dec 08, 2022 at 03:26:52PM +, Dean Rasheed wrote:
> While testing MERGE, I noticed that it supports inheritance
> hierarchies and the ONLY keyword, but that isn't documented. Attached
> is a patch to merge.sgml, borrowing text from update.sgml and
> delete.sgml.

LGTM.  I didn't see any tests for this in merge.sql or inherit.sql.  Do you
think we should add some?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: add \dpS to psql

2022-12-08 Thread Nathan Bossart

I've created a new thread for making CLUSTER, REFRESH MATERIALIZED VIEW,
and REINDEX grantable:

https://postgr.es/m/20221208183707.GA55474%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

allow granting CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX

2022-12-08 Thread Nathan Bossart

Hi hackers,

This is meant as a continuation of the work to make VACUUM and ANALYZE
grantable privileges [0].  As noted there, the primary motivation for this
is to continue chipping away at things that require special privileges or
even superuser.  I've attached two patches.  0001 makes it possible to
grant CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX.  0002 adds
predefined roles that allow performing these commands on all relations.
After applying these patches, there are 13 privilege bits remaining for
future use.

There is an ongoing discussion in another thread [1] about how these
privileges should be divvied up.  Should each command get it's own
privilege bit (as I've done in the attached patches), or should the
privileges be grouped in some fashion (e.g., adding a MAINTAIN bit that
governs all of them, splitting out exclusive-lock operations from
non-exclusive-lock ones)?

Most of the changes in the attached patches are rather mechanical, and like
VACUUM/ANALYZE, there is room for future enhancement, such as granting the
privileges on databases/schemas instead of just tables.

[0] https://postgr.es/m/20220722203735.GB3996698%40nathanxps13
[1] https://postgr.es/m/20221206193606.GB3078082%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From a16e59770da430e43bc85c244705a239ddd03c7b Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 7 Dec 2022 11:20:01 -0800
Subject: [PATCH v1 1/2] add grantable privileges for cluster, refresh matview,
 and reindex

---
 doc/src/sgml/ddl.sgml | 62 ++--
 doc/src/sgml/func.sgml|  5 +-
 .../sgml/ref/alter_default_privileges.sgml|  4 +-
 doc/src/sgml/ref/cluster.sgml |  6 +-
 doc/src/sgml/ref/grant.sgml   |  5 +-
 .../sgml/ref/refresh_materialized_view.sgml   |  3 +-
 doc/src/sgml/ref/reindex.sgml |  6 +-
 doc/src/sgml/ref/revoke.sgml  |  2 +-
 src/backend/catalog/aclchk.c  | 12 +++
 src/backend/commands/cluster.c| 19 ++--
 src/backend/commands/indexcmds.c  | 31 +++---
 src/backend/commands/matview.c|  3 +-
 src/backend/commands/tablecmds.c  | 17 ++--
 src/backend/utils/adt/acl.c   | 24 +
 src/bin/pg_dump/dumputils.c   |  3 +
 src/bin/pg_dump/t/002_pg_dump.pl  |  2 +-
 src/bin/psql/tab-complete.c   |  4 +-
 src/include/commands/tablecmds.h  |  4 +-
 src/include/nodes/parsenodes.h|  5 +-
 src/include/utils/acl.h   |  7 +-
 src/test/regress/expected/create_index.out|  4 +-
 src/test/regress/expected/dependency.out  | 22 ++---
 src/test/regress/expected/privileges.out  | 96 ++-
 src/test/regress/expected/rowsecurity.out | 34 +++
 src/test/regress/sql/dependency.sql   |  2 +-
 src/test/regress/sql/privileges.sql   | 61 
 26 files changed, 333 insertions(+), 110 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 38618de01c..eec49debfe 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -1692,8 +1692,9 @@ ALTER TABLE products RENAME TO items;
TRUNCATE, REFERENCES, TRIGGER,
CREATE, CONNECT, TEMPORARY,
EXECUTE, USAGE, SET,
-   ALTER SYSTEM, VACUUM, and
-   ANALYZE.
+   ALTER SYSTEM, VACUUM,
+   ANALYZE, CLUSTER,
+   REFRESH, and REINDEX.
The privileges applicable to a particular
object vary depending on the object's type (table, function, etc.).
More detail about the meanings of these privileges appears below.
@@ -2001,6 +2002,34 @@ REVOKE ALL ON accounts FROM PUBLIC;
  
 

+
+   
+CLUSTER
+
+ 
+  Allows CLUSTER on a relation.
+ 
+
+   
+
+   
+REFRESH
+
+ 
+  Allows REFRESH MATERIALIZED VIEW on a materialized
+  view.
+ 
+
+   
+
+   
+REINDEX
+
+ 
+  Allows REINDEX on a relation and its indexes.
+ 
+
+   
   
 
The privileges required by other commands are listed on the
@@ -2160,6 +2189,21 @@ REVOKE ALL ON accounts FROM PUBLIC;
   z
   TABLE
  
+ 
+  CLUSTER
+  S
+  TABLE
+ 
+ 
+  REFRESH
+  f
+  TABLE
+ 
+ 
+  REINDEX
+  n
+  TABLE
+ 
  

   
@@ -2250,7 +2294,7 @@ REVOKE ALL ON accounts FROM PUBLIC;
  
  
   TABLE (and table-like objects)
-  arwdDxtvz
+  arwdDxtvzSfn
   none
   \dp
  
@@ -2308,12 +2352,12 @@ GRANT SELECT (col1), UPDATE (col1) ON mytable TO miriam_rw;
would show:
 
 = \dp mytable
-   Access privileges
- Schema |  Name   | Type  |Access privileges|   Column privileges   | Policies
-+-+---+-+---+--
- public | mytable | table | miriam=arwdDxtvz/miriam+| col1:+|
-

Re: fix and document CLUSTER privileges

2022-12-08 Thread Nathan Bossart

On Thu, Dec 08, 2022 at 07:20:28AM -0500, Andrew Dunstan wrote:
> We should probably talk about what the privileges should be, though. I
> think there's a case to be made that CLUSTER should be governed by the
> VACUUM privileges, given how VACUUM FULL is now implemented.

Currently, CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX (minus REINDEX
SCHEMA|DATABASE|SYSTEM) require ownership of the relation or superuser.  In
fact, all three use the same RangeVarCallbackOwnsTable() callback function.
My current thinking is that this is good enough.  I don't sense any strong
demand for allowing database owners to run these commands on all non-shared
relations, and there's ongoing work to break out the privileges to GRANT
and predefined roles.  However, I don't have a strong opinion about this.

If we do want to change the permissions model for CLUSTER, it might make
sense to change all three of the aforementioned commands to look more like
VACUUM/ANALYZE.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: add \dpS to psql

2022-12-08 Thread Nathan Bossart

On Thu, Dec 08, 2022 at 12:15:23AM -0500, Tom Lane wrote:
> Nathan Bossart  writes:
>> On Wed, Dec 07, 2022 at 11:48:20PM -0500, Isaac Morland wrote:
>>> My previous analysis
>>> shows that there is no vast hidden demand for new privilege bits. If we
>>> implement MAINTAIN to control access to VACUUM, ANALYZE, REFRESH, CLUSTER,
>>> and REINDEX, we will cover everything that I can find that has seriously
>>> discussed on this list, and still leave 3 unused bits for future expansion.
> 
>> If we added CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX as individual
>> privilege bits, we'd still have 13 remaining for future use.
> 
> I think the appropriate question is not "have we still got bits left?".
> It should be more like "under what plausible scenario would it be useful
> to grant somebody CLUSTER but not VACUUM privileges on a table?".
> 
> I'm really thinking that MAINTAIN is the right level of granularity
> here.  Or maybe it's worth segregating exclusive-lock from
> not-exclusive-lock maintenance.  But I really fail to see how it's
> useful to distinguish CLUSTER from REINDEX, say.

The main idea behind this work is breaking out privileges into more
granular pieces.  If I want to create a role that only runs VACUUM on some
tables on the weekend, why ѕhould I have to also give it the ability to
ANALYZE, REFRESH, CLUSTER, and REINDEX?  IMHO we should really let the user
decide what set of privileges makes sense for their use-case.  I'm unsure
the grouping all these privileges together serves much purpose besides
preserving ACL bits.

The other reason I'm hesitant to group the privileges together is because I
suspect it will be difficult to reach agreement on how to do so, as
evidenced by past discussion [0].  That being said, I'm open to it if we
find a way that folks are happy with.  For example, separating
exclusive-lock and non-exclusive-lock maintenance actions seems like a
reasonable idea (which perhaps is an argument for moving VACUUM FULL out of
the VACUUM privilege).

[0] 
https://postgr.es/m/67a1d667e8ec228b5e07f232184c80348c5d93f4.camel%40j-davis.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Error-safe user functions

2022-12-08 Thread Tom Lane

Andres Freund  writes:
> On 2022-12-07 17:32:21 -0500, Tom Lane wrote:
>> +typedef struct Node *NodePtr;

> Seems like it'd be easier to just forward declare the struct, and use the
> non-typedef'ed name in the header than to have to deal with these
> interdependencies and the differing typenames.

I've been having second thoughts about how to handle this issue.
As we convert more and more datatypes, references to "Node *" are
going to be needed in assorted headers that don't currently have
any reason to #include nodes.h.  Rather than bloating their include
footprints, we'll want to use the alternate spelling, whichever
it is.  (I already had to do this in array.h.)  Some of these headers
might be things that are also read by frontend compiles, in which
case they won't have access to elog.h either, so that NodePtr in
this formulation won't work for them.  (I ran into a variant of that
with an early draft of this patch series.)

If we stick with NodePtr we'll probably end by putting that typedef
into c.h so that it's accessible in frontend as well as backend.
I don't have a huge problem with that, but I concede it's a little ugly.

If we go with "struct Node *" then we can solve such problems by
just repeating "struct Node;" forward-declarations in as many
headers as we have to.  This is a bit ugly too, but maybe less so,
and it's a method we use elsewhere.  The main downside I can see
to it is that we will probably not find out all the places where
we need such declarations until we get field complaints that
"header X doesn't compile for me".  elog.h will have a struct Node
declaration, and that will be visible in every backend compilation
we do as well as every cpluspluscheck/headerscheck test.

Another notational point I'm wondering about is whether we want
to create hundreds of direct references to fcinfo->context.
Is it time to invent

#define PG_GET_CONTEXT()(fcinfo->context)

and write that instead in all these input functions?

Thoughts?

regards, tom lane

Documenting MERGE INTO ONLY ...

2022-12-08 Thread Dean Rasheed

While testing MERGE, I noticed that it supports inheritance
hierarchies and the ONLY keyword, but that isn't documented. Attached
is a patch to merge.sgml, borrowing text from update.sgml and
delete.sgml.

I note that there are also a couple of places early in the manual
(advanced.sgml and ddl.sgml) that also discuss inheritance, citing
SELECT, UPDATE and DELETE as examples of (already-discussed) commands
that support ONLY. However, since MERGE isn't mentioned until much
later in the manual, it's probably best to leave those as-is. They
don't claim to be complete lists of commands supporting ONLY, and it
would be a pain to make them that.

Regards,
Dean
diff --git a/doc/src/sgml/ref/merge.sgml b/doc/src/sgml/ref/merge.sgml
new file mode 100644
index e07adda..bc7f4b4
--- a/doc/src/sgml/ref/merge.sgml
+++ b/doc/src/sgml/ref/merge.sgml
@@ -22,13 +22,13 @@ PostgreSQL documentation
  
 
 [ WITH with_query [, ...] ]
-MERGE INTO target_table_name [ [ AS ] target_alias ]
+MERGE INTO [ ONLY ] target_table_name [ [ AS ] target_alias ]
 USING data_source ON join_condition
 when_clause [...]
 
 where data_source is:
 
-{ source_table_name | ( source_query ) } [ [ AS ] source_alias ]
+{ [ ONLY ] source_table_name | ( source_query ) } [ [ AS ] source_alias ]
 
 and when_clause is:
 
@@ -129,6 +129,14 @@ DELETE
 
  
   The name (optionally schema-qualified) of the target table to merge into.
+  If ONLY is specified before the table name, matching
+  rows are updated or deleted in the named table only.  If
+  ONLY is not specified, matching rows are also updated
+  or deleted in any tables inheriting from the named table.  Optionally,
+  * can be specified after the table name to explicitly
+  indicate that descendant tables are included.  The
+  ONLY keyword and * option do not
+  affect insert actions, which always insert into the named table only.
  
 

@@ -151,7 +159,12 @@ DELETE
 
  
   The name (optionally schema-qualified) of the source table, view, or
-  transition table.
+  transition table.  If ONLY is specified before the
+  table name, matching rows are included from the named table only.  If
+  ONLY is not specified, matching rows are also included
+  from any tables inheriting from the named table.  Optionally,
+  * can be specified after the table name to explicitly
+  indicate that descendant tables are included.

Re: Aggregate node doesn't include cost for sorting

2022-12-08 Thread Tom Lane

David Rowley  writes:
> So, with the assumption that you've used 2 different versions to show
> this output, for post 1349d279, there does not seem to be much choice
> on how the aggregate is executed. What's your concern about the
> costings having to be accurate given there's no other plan choice?

It's true that the cost attributed to the Agg node won't impact any
live decisions in the plan level in which it appears.  However, if
that's a subquery, then the total cost attributed to the subplan
could in principle affect plan choices in the outer query.  So there
is a valid argument for wanting to try to get it right.

Having said that, I think that the actual impact on outer-level choices
is usually minimal.  So it didn't bother me that we swept this under
the rug before --- and I'm pretty sure that we're sweeping similar
things under the rug elsewhere in top-of-query planning.  However,
given 1349d279 it should surely be true that the planner knows how many
sorts it's left to be done at runtime (a number we did not have at hand
before).  So it seems like it ought to be simple enough to account for
this effect more accurately.  I'd be in favor of doing so if it's
simple and cheap to get the numbers.

regards, tom lane

Re: Add tracking of backend memory allocated to pg_stat_activity

2022-12-08 Thread Reid Thompson

On Sun, 2022-11-27 at 09:40 -0600, Justin Pryzby wrote:
> > BTW, these should have some kind of prefix, like PG_ALLOC_* to
> > avoid causing the same kind of problem for someone else that
> > another header caused for you by defining something somewhere
> > called IGNORE (ignore what, I don't know).  The other problem was
> > probably due to a define, though.  Maybe instead of an enum, the
> > function should take a boolean.
> > 

Patch updated to current master and includes above prefix
recommendation and combining of two function calls to one recommended
by Ted Yu.

> > 
> > I still wonder whether there needs to be a separate CF entry for
> > the 0001 patch.  One issue is that there's two different lists of
> > people involved in the threads.
> > 

I'm OK with containing the conversation to one thread if everyone else
is.  If there's no argument against, then patches after today will go
to the "Add the ability to limit the amount of memory that can be
allocated to backends" thread 
https://www.postgresql.org/message-id/bd57d9a4c219cc1392665fd5fba61dde8027b3da.ca...@crunchydata.com

-- 
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.

reid.thomp...@crunchydata.com
www.crunchydata.com


From fdb7e6d5bb653e9c5031fd058bf168bdf80a20eb Mon Sep 17 00:00:00 2001
From: Reid Thompson 
Date: Thu, 11 Aug 2022 12:01:25 -0400
Subject: [PATCH] Add tracking of backend memory allocated to pg_stat_activity

This new field displays the current bytes of memory allocated to the
backend process. It is updated as memory for the process is
malloc'd/free'd. Memory allocated to items on the freelist is included in
the displayed value.  Dynamic shared memory allocations are included
only in the value displayed for the backend that created them, they are
not included in the value for backends that are attached to them to
avoid double counting. On occasion, orphaned memory segments may be
cleaned up on postmaster startup. This may result in decreasing the sum
without a prior increment. We limit the floor of backend_mem_allocated
to zero. Updated pg_stat_activity documentation for the new column.
---
 doc/src/sgml/monitoring.sgml| 15 
 src/backend/catalog/system_views.sql|  1 +
 src/backend/postmaster/autovacuum.c |  6 ++
 src/backend/postmaster/postmaster.c | 13 
 src/backend/postmaster/syslogger.c  |  3 +
 src/backend/storage/ipc/dsm_impl.c  | 81 +
 src/backend/utils/activity/backend_status.c | 45 
 src/backend/utils/adt/pgstatfuncs.c |  4 +-
 src/backend/utils/mmgr/aset.c   | 17 +
 src/backend/utils/mmgr/generation.c | 15 
 src/backend/utils/mmgr/slab.c   | 21 ++
 src/include/catalog/pg_proc.dat |  6 +-
 src/include/utils/backend_status.h  | 59 ++-
 src/test/regress/expected/rules.out |  9 ++-
 src/test/regress/expected/stats.out | 11 +++
 src/test/regress/sql/stats.sql  |  3 +
 16 files changed, 300 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 11a8ebe5ec..13ecbe5877 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -948,6 +948,21 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
  
 
 
+
+ 
+  allocated_bytes bigint
+ 
+ 
+  Memory currently allocated to this backend in bytes. This is the balance
+  of bytes allocated and freed by this backend. Dynamic shared memory
+  allocations are included only in the value displayed for the backend that
+  created them, they are not included in the value for backends that are
+  attached to them to avoid double counting. Use pg_size_pretty
+  described in  to make this value
+  more easily readable.
+ 
+
+
  
   
query text
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2d8104b090..9ea8f78c95 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -864,6 +864,7 @@ CREATE VIEW pg_stat_activity AS
 S.state,
 S.backend_xid,
 s.backend_xmin,
+S.allocated_bytes,
 S.query_id,
 S.query,
 S.backend_type
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 0746d80224..f6b6f71cdb 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -407,6 +407,9 @@ StartAutoVacLauncher(void)
 
 #ifndef EXEC_BACKEND
 		case 0:
+			/* Zero allocated bytes to avoid double counting parent allocation */
+			pgstat_zero_my_allocated_bytes();
+
 			/* in postmaster child ... */
 			InitPostmasterChild();
 
@@ -1485,6 +1488,9 @@ StartAutoVacWorker(void)
 
 #ifndef EXEC_BACKEND
 		case 0:
+			/* Zero allocated bytes to avoid double counting parent allocation

Re: Checksum errors in pg_stat_database

2022-12-08 Thread Drouvot, Bertrand





On 4/2/19 7:06 PM, Magnus Hagander wrote:

On Tue, Apr 2, 2019 at 8:47 AM Michael Paquier mailto:mich...@paquier.xyz>> wrote:

On Tue, Apr 02, 2019 at 07:43:12AM +0200, Julien Rouhaud wrote:
 > On Tue, Apr 2, 2019 at 6:56 AM Michael Paquier mailto:mich...@paquier.xyz>> wrote:
 >>  One thing which is not
 >> proposed on this patch, and I am fine with it as a first draft, is
 >> that we don't have any information about the broken block number and
 >> the file involved.  My gut tells me that we'd want a separate view,
 >> like pg_stat_checksums_details with one tuple per (dboid, rel, fork,
 >> blck) to be complete.  But that's just for future work.
 >
 > That could indeed be nice.

Actually, backpedaling on this one...  pg_stat_checksums_details may
be a bad idea as we could finish with one row per broken block.  If
a corruption is spreading quickly, pgstat would not be able to sustain
that amount of objects.  Having pg_stat_checksums would allow us to
plugin more data easily based on the last failure state:
- last relid of failure
- last fork type of failure
- last block number of failure.
Not saying to do that now, but having that in pg_stat_database does
not seem very natural to me.  And on top of that we would have an
extra row full of NULLs for shared objects in pg_stat_database if we
adopt the unique view approach...  I find that rather ugly.


I think that tracking each and every block is of course a non-starter, as 
you've noticed.


I think that's less of a concern now that the stats collector process has gone 
and that the stats are now collected in shared memory, what do you think?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Creating HeapTuple from char and date values

2022-12-08 Thread Ashutosh Bapat

On Thu, Dec 8, 2022 at 12:56 AM Amin  wrote:
>
> Hi All,
>
> I am trying to create HeapTuple data structure.
> First, I create a tuple descriptor:
>  TupleDesc *td=CreateTemplateTupleDesc(colCount);
> Then, for each variable, I do:
>   TupleDescInitEntry(*td,v->varattno,NULL,v->vartype,v->vartypmod,0);
> Then, I assign values:
>   if int32: values[v->varattno-1]=Int8GetDatum(myValue);
> Similarly for float.
> Finally, I create the HeapTuple:
>   HeapTuple tuple=heap_form_tuple(td,values,isnull);
>
> Everything works fine with int and float. But I don't know how to handle 
> chars.
> Let's say we have a character(10) column. One problem is v->vartypmod will be 
> set to 14. Shouldn't it be 10?

I think the 4 extra bytes is varlena header - not sure. but typmod in
this case indicates the length of binary representation. 14 looks
correct.

> Second, how should I assign values? Is 
> values[v->varattno-1]=CStringGetDatum(myValue); correct? Should I set the 
> last parameter to TupleDescInitEntry? Why am I getting "invalid memory alloc 
> request size" or segfault with different configurations?

is myValue a char *?
I think you need to use CStringGetTextDatum instead of CStringGetDatum.

-- 
Best Wishes,
Ashutosh Bapat

Re: ANY_VALUE aggregate

2022-12-08 Thread Vik Fearing


On 12/8/22 06:48, David G. Johnston wrote:

On Wed, Dec 7, 2022 at 10:00 PM Vik Fearing  wrote:


On 12/7/22 04:22, David G. Johnston wrote:

On Mon, Dec 5, 2022 at 10:40 PM Vik Fearing 

wrote:



On 12/6/22 05:57, David G. Johnston wrote:

On Mon, Dec 5, 2022 at 9:48 PM Vik Fearing 

wrote:



I can imagine an optimization that would remove an ORDER BY clause
because it isn't needed for any other aggregate.



I'm referring to the query:

select any_value(v order by v) from (values (2),(1),(3)) as vals (v);
// produces 1, per the documented implementation-defined behavior.


Implementation-dependent.  It is NOT implementation-defined, per spec.


I really don't care all that much about the spec here given that ORDER BY
in an aggregate call is non-spec.



Well, this is demonstrably wrong.

 ::=
ARRAY_AGG 
  
  [ ORDER BY  ]
  



Demoable only by you and a few others...



The standard is publicly available.  It is strange that we, being so 
open, hold ourselves to such a closed standard; but that is what we do.




We should update our documentation - the source of SQL Standard knowledge
for mere mortals.

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES

"Note: The ability to specify both DISTINCT and ORDER BY in an aggregate
function is a PostgreSQL extension."

Apparently only DISTINCT remains as our extension.



Using DISTINCT in an aggregate is also standard.  What that note is 
saying is that the standard does not allow *both* to be used at the same 
time.


The standard defines these things for specific aggregates whereas we are 
much more generic about it and therefore have to deal with the combinations.


I have submitted a doc patch to clarify that.



You are de-facto creating a first_value aggregate (which is by definition
non-standard) whether you like it or not.



I am de jure creating an any_value aggregate (which is by definition
standard) whether you like it or not.



Yes, both statements seem true.  At least until we decide to start ignoring
a user's explicit order by clause.



I ran some tests and including an ORDER BY in an aggregate that doesn't 
care (like COUNT) is devastating for performance.  I will be proposing a 
solution to that soon and I invite you to participate in that 
conversation when I do.

--
Vik Fearing

Re: fix and document CLUSTER privileges

2022-12-08 Thread Andrew Dunstan



On 2022-12-07 We 23:13, Nathan Bossart wrote:
> On Wed, Dec 07, 2022 at 08:25:59PM -0600, Justin Pryzby wrote:
>> Your patch makes it inconsistent with vacuum full, which is strange
>> because vacuum full calls cluster.
>>
>> postgres=> VACUUM FULL t;
>> VACUUM
>> postgres=> CLUSTER t;
>> ERROR:  must be owner of table t
> This is the existing behavior on HEAD.  I think it has been this way for a
> while.  Granted, that doesn't mean it's ideal, but AFAICT it's intentional.
>
> Looking closer, the database ownership check in
> get_tables_to_cluster_partitioned() appears to have no meaningful effect.
> In this code path, cluster_rel() will always be called with CLUOPT_RECHECK,
> and that function only checks for table ownership.  Anything gathered in
> get_tables_to_cluster_partitioned() that the user doesn't own will be
> skipped.  So, I don't think my patch changes the behavior in any meaningful
> way, but I still think it's worthwhile to make the checks consistent.



We should probably talk about what the privileges should be, though. I
think there's a case to be made that CLUSTER should be governed by the
VACUUM privileges, given how VACUUM FULL is now implemented.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Aggregate node doesn't include cost for sorting

2022-12-08 Thread David Geier


Hi David,

Thanks for the explanations and awesome that this was improved on already.
I didn't have this change on my radar.

On 12/8/22 11:40, David Rowley wrote:


This output surely must be from a version of PostgreSQL prior to
1349d279?  I can't quite figure out why you've added a "SET
enable_seqscan = FALSE;". That makes it look like you've used the same
version of PostgreSQL to produce both of these plans.  The 2nd plan
you've shown must be post 1349d279.



Both plans were captured on 14.5, which is indeed prior to 1349d279.

I disabled sequential scan to show that there's an alternative plan 
which is superior to the chosen plan: Index Only Scan is more expensive 
and takes longer than the Seq Scan, but the subsequent Aggregate runs 
much faster as it doesn't have to sort, making the plan overall superior.




So, with the assumption that you've used 2 different versions to show
this output, for post 1349d279, there does not seem to be much choice
on how the aggregate is executed. What's your concern about the
costings having to be accurate given there's no other plan choice?


There's another plan choice which is using the index to get pre-sorted 
input rows, see previous comment.




The new pre-sorted aggregate code will always request the sort order
that suits the largest number of ORDER BY / DISTINCT aggregates.  When
there are multiple ORDER BY / DISTINCT aggregates and they have
different sort requirements then there certainly are completing ways
that the aggregation portion of the plan can be executed.  I opted to
make the choice just based on the number of aggregates that could
become presorted.  nodeAgg.c is currently not very smart about sharing
sorts between multiple aggregates with the same sort requirements.  If
that was made better, there might be more motivation to have better
costing code in make_pathkeys_for_groupagg().  However, the reasons
for the reversion of db0d67db seemed to be mainly around the lack of
ability to accurately cost multiple competing sort orders. We'd need
to come up with some better way to do that if we were to want to give
make_pathkeys_for_groupagg() similar abilities.


Also, why does the Aggregate node sort itself? Why don't we instead emit
an explicit Sort node in the plan so that it's visible to the user what
is happening? As soon as there's also a GROUP BY in the query, a Sort
node occurs in the plan. This seems inconsistent.

Post 1349d279 we do that, but it can only do it for 1 sort order.
There can be any number of aggregate functions which require a sort
and they don't all have to have the same sort order requirements.  We
can't do the same as WindowAgg does by chaining nodes together either
because the aggregate node aggregates the results and we'd need all
the same input rows to be available at each step.

The only other way would be to have it so an Aggregate node could be
fed by multiple different input nodes and then it would only work on
the aggregates that suit the given input before reading the next input
and doing the other aggregates.  Making it work like that would cause
quite a bit of additional effort during planning (not to mention the
executor). We'd have to run the join search once per required order,
which is one of the slowest parts of planning.  Right now, you could
probably make that work by just writing the SQL to have a subquery per
sort requirement.


Thanks for the explanation!

--
David Geier
(ServiceNow)

Re: Error-safe user functions

2022-12-08 Thread Alvaro Herrera

On 2022-Dec-07, David G. Johnston wrote:

> Are you suggesting we should not go down the path that v8-0003 does in the
> monitoring section cleanup thread?  I find the usability of Chapter 54
> System Views to be superior to these two run-on chapters and would rather
> we emulate it in both these places - for what is in the end very little
> additional effort, all mechanical in nature.

I think the new 9.26 is much better now than what we had there two days
ago.  Maybe it would be even better with your proposed changes, but
let's see what you come up with.

As for Chapter 54, while it's a lot better than what we had previously,
I have a complaint about the new presentation: the overview table
appears (at least in the HTML presentation) in a separate page from the
initial page of the chapter.  So to get the intended table of contents I
have to move forward from the unintended table of contents (i.e. from
https://www.postgresql.org/docs/devel/views.html forward to
https://www.postgresql.org/docs/devel/views-overview.html ).  This seems
pointless.  I think it would be better if we just removed the line
, which would put that table in the "front page".

I also have an issue with Chapter 28, more precisely 28.2.2, where we
have a similar TOC-style tables (Tables 28.1 and 28.2), but these ones
seem inferior to the new table in Chapter 54 in that the outgoing links
are in random positions in the text of the table.  It would be better to
put those in a column of their own, so that they are all vertically
aligned and easier to spot/click.  Not sure if you've been here already.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"En las profundidades de nuestro inconsciente hay una obsesiva necesidad
de un universo lógico y coherente. Pero el universo real se halla siempre
un paso más allá de la lógica" (Irulan)

Re: add \dpS to psql

2022-12-08 Thread Alvaro Herrera

On 2022-Dec-08, Pavel Luzanov wrote:

> For the complete picture, I tried to see what other actions with the table
> could *potentially* be considered as maintenance.
> Here is the list:
> 
> - create|alter|drop on extended statistics objects
> - alter table|index alter column set statistics
> - alter table|index [re]set (storage_parameters)
> - alter table|index set tablespace
> - alter table alter column set storage|compression
> - any actions with the TOAST table that can be performed separately from the
> main table

Well, I can't see that any of these is valuable to grant separately from
the table's owner.  The maintenance ones are the ones that are
interesting to run from a database-owner perspective, but these ones do
not seem to need that treatment.

If you're extremely generous you could think that ALTER .. SET STORAGE
would be reasonable to be run by the db-owner.  However, that's not
something you do on an ongoing basis -- you just do it once -- so it
seems pointless.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Add PL/pgSQL extra check no_data_found

2022-12-08 Thread Sergey Shinderuk


Hello,

I propose to add a new value "no_data_found" for the 
plpgsql.extra_errors and plpgsql.extra_warnings parameters [1].


With plpgsql.extra_errors=no_data_found SELECT INTO raises no_data_found 
exception when the result set is empty. With 
plpgsql.extra_errors=too_many_rows,no_data_found SELECT INTO behaves 
like SELECT INTO STRICT [2]. This could simplify migration from PL/SQL 
and may be just more convenient.


One potential downside is that plpgsql.extra_errors=no_data_found could 
break existing functions expecting to get null or checking IF found 
explicitly. This is also true for the too_many_rows exception, but 
arguably it's a programmer error, while no_data_found switches to a 
different convention for handling (or not handling) an empty result with 
SELECT INTO.


Otherwise the patch is straightforward.

What do you think?

--
Sergey Shinderukhttps://postgrespro.com/


[1] 
https://www.postgresql.org/docs/devel/plpgsql-development-tips.html#PLPGSQL-EXTRA-CHECKS
[2] 
https://www.postgresql.org/docs/devel/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-ONEROWdiff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index a6473429480..80672a3672f 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -4197,8 +4197,14 @@ exec_stmt_execsql(PLpgSQL_execstate *estate,
 	long		tcount;
 	int			rc;
 	PLpgSQL_expr *expr = stmt->sqlstmt;
+	int			no_data_found_level = 0;
 	int			too_many_rows_level = 0;
 
+	if (plpgsql_extra_errors & PLPGSQL_XCHECK_NODATAFOUND)
+		no_data_found_level = ERROR;
+	else if (plpgsql_extra_warnings & PLPGSQL_XCHECK_NODATAFOUND)
+		no_data_found_level = WARNING;
+
 	if (plpgsql_extra_errors & PLPGSQL_XCHECK_TOOMANYROWS)
 		too_many_rows_level = ERROR;
 	else if (plpgsql_extra_warnings & PLPGSQL_XCHECK_TOOMANYROWS)
@@ -4355,16 +4361,19 @@ exec_stmt_execsql(PLpgSQL_execstate *estate,
 		 */
 		if (n == 0)
 		{
-			if (stmt->strict)
+			if (stmt->strict || no_data_found_level)
 			{
 char	   *errdetail;
+int			errlevel;
 
 if (estate->func->print_strict_params)
 	errdetail = format_expr_params(estate, expr);
 else
 	errdetail = NULL;
 
-ereport(ERROR,
+errlevel = stmt->strict ? ERROR : no_data_found_level;
+
+ereport(errlevel,
 		(errcode(ERRCODE_NO_DATA_FOUND),
 		 errmsg("query returned no rows"),
 		 errdetail ? errdetail_internal("parameters: %s", errdetail) : 0));
diff --git a/src/pl/plpgsql/src/pl_handler.c b/src/pl/plpgsql/src/pl_handler.c
index 5dc334b61b3..80151540c38 100644
--- a/src/pl/plpgsql/src/pl_handler.c
+++ b/src/pl/plpgsql/src/pl_handler.c
@@ -90,6 +90,8 @@ plpgsql_extra_checks_check_hook(char **newvalue, void **extra, GucSource source)
 
 			if (pg_strcasecmp(tok, "shadowed_variables") == 0)
 extrachecks |= PLPGSQL_XCHECK_SHADOWVAR;
+			else if (pg_strcasecmp(tok, "no_data_found") == 0)
+extrachecks |= PLPGSQL_XCHECK_NODATAFOUND;
 			else if (pg_strcasecmp(tok, "too_many_rows") == 0)
 extrachecks |= PLPGSQL_XCHECK_TOOMANYROWS;
 			else if (pg_strcasecmp(tok, "strict_multi_assignment") == 0)
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index 088768867d9..b69c058ba86 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -1202,8 +1202,9 @@ extern bool plpgsql_check_asserts;
 /* extra compile-time and run-time checks */
 #define PLPGSQL_XCHECK_NONE		0
 #define PLPGSQL_XCHECK_SHADOWVAR(1 << 1)
-#define PLPGSQL_XCHECK_TOOMANYROWS(1 << 2)
-#define PLPGSQL_XCHECK_STRICTMULTIASSIGNMENT	(1 << 3)
+#define PLPGSQL_XCHECK_NODATAFOUND(1 << 2)
+#define PLPGSQL_XCHECK_TOOMANYROWS(1 << 3)
+#define PLPGSQL_XCHECK_STRICTMULTIASSIGNMENT	(1 << 4)
 #define PLPGSQL_XCHECK_ALL		((int) ~0)
 
 extern int	plpgsql_extra_warnings;
diff --git a/src/test/regress/expected/plpgsql.out b/src/test/regress/expected/plpgsql.out
index 08e42f17dc2..454e517fcc9 100644
--- a/src/test/regress/expected/plpgsql.out
+++ b/src/test/regress/expected/plpgsql.out
@@ -3086,6 +3086,25 @@ select shadowtest(1);
 (1 row)
 
 -- runtime extra checks
+set plpgsql.extra_warnings to 'no_data_found';
+do $$
+declare x int;
+begin
+  select 1 into x where 0 = 1;
+end;
+$$;
+WARNING:  query returned no rows
+set plpgsql.extra_errors to 'no_data_found';
+do $$
+declare x int;
+begin
+  select 1 into x where 0 = 1;
+end;
+$$;
+ERROR:  query returned no rows
+CONTEXT:  PL/pgSQL function inline_code_block line 4 at SQL statement
+reset plpgsql.extra_errors;
+reset plpgsql.extra_warnings;
 set plpgsql.extra_warnings to 'too_many_rows';
 do $$
 declare x int;
diff --git a/src/test/regress/sql/plpgsql.sql b/src/test/regress/sql/plpgsql.sql
index 588c3310337..60896b3d0da 100644
--- a/src/test/regress/sql/plpgsql.sql
+++ b/src/test/regress/sql/plpgsql.sql
@@ -2620,6 +2620,27 @@ declare f1 int; begin return 1; end $$ language plpgsql;
 select shadowtest(1);
 
 -- runtime extra checks
+set plpgsql.extra_warnings to 'no_data_found';
+
+do

Re: [PATCH] psql: Add tab-complete for optional view parameters

2022-12-08 Thread Melih Mutlu

Hi Christoph,

I just took a quick look at your patch.
Some suggestions:

+   else if (Matches("ALTER", "VIEW", MatchAny, "SET", "("))
> +   COMPLETE_WITH_LIST(view_optional_parameters);
> +   /* ALTER VIEW xxx RESET ( yyy , ... ) */
> +   else if (Matches("ALTER", "VIEW", MatchAny, "RESET", "("))
> +   COMPLETE_WITH_LIST(view_optional_parameters);


What about combining these two cases into one like Matches("ALTER", "VIEW",
MatchAny, "SET|RESET", "(") ?

/* ALTER VIEW  */
> else if (Matches("ALTER", "VIEW", MatchAny))
> COMPLETE_WITH("ALTER COLUMN", "OWNER TO", "RENAME",
>   "SET SCHEMA");


Also seems like SET and RESET don't get auto-completed for "ALTER VIEW
".
I think it would be nice to include those missing words.

Thanks,
--
Melih Mutlu
Microsoft

Re: Non-decimal integer literals

2022-12-08 Thread Peter Eisentraut


On 29.11.22 21:22, David Rowley wrote:

There seems to be a small bug in the pg_strtointXX functions in the
code that checks that there's at least 1 digit.  This causes 0x to be
a valid representation of zero.  That does not seem to be allowed by
the parser, so I think we should likely reject it in COPY too.
-- probably shouldn't work
postgres=# copy a from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

0x
\.

COPY 1


Fixed in new patch.  I moved the "require at least one digit" checks 
after the loops over the digits, to make it easier to write one check 
for all bases.


This patch is also incorporates your changes to the digit analysis 
algorithm.  I didn't check it carefully, but all the tests still pass. ;-)
From 76510f2077d3075653a9bbe899b9d4752953d30e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 8 Dec 2022 12:10:41 +0100
Subject: [PATCH v12] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

0x42F
0o273
0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: 
https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.com
---
 doc/src/sgml/syntax.sgml   |  34 
 src/backend/catalog/information_schema.sql |   6 +-
 src/backend/catalog/sql_features.txt   |   1 +
 src/backend/parser/parse_node.c|  37 +++-
 src/backend/parser/scan.l  | 101 ---
 src/backend/utils/adt/numutils.c   | 185 +---
 src/fe_utils/psqlscan.l|  78 +++--
 src/interfaces/ecpg/preproc/pgc.l  | 106 ++-
 src/test/regress/expected/int2.out |  92 ++
 src/test/regress/expected/int4.out |  92 ++
 src/test/regress/expected/int8.out |  92 ++
 src/test/regress/expected/numerology.out   | 193 -
 src/test/regress/sql/int2.sql  |  26 +++
 src/test/regress/sql/int4.sql  |  26 +++
 src/test/regress/sql/int8.sql  |  26 +++
 src/test/regress/sql/numerology.sql|  51 +-
 16 files changed, 1028 insertions(+), 118 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 93ad71737f..956182e7c6 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,40 @@ Numeric Constants
 
 
 
+
+ Additionally, non-decimal integer constants can be used in these forms:
+
+0xhexdigits
+0ooctdigits
+0bbindigits
+
+ hexdigits is one or more hexadecimal digits
+ (0-9, A-F), octdigits is one or more octal
+ digits (0-7), bindigits is one or more binary
+ digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+ upper or lower case.  Note that only integers can have non-decimal forms,
+ not numbers with fractional parts.
+
+
+
+ These are some examples of this:
+0b100101
+0B10011001
+0o273
+0O755
+0x42f
+0X
+
+
+
+
+ 
+  Nondecimal integer constants are currently only supported in the range
+  of the bigint type (see ).
+ 
+
+
 
  integer
  bigint
diff --git a/src/backend/catalog/information_schema.sql 
b/src/backend/catalog/information_schema.sql
index 18725a02d1..95c27a625e 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod 
int4) RETURNS integer
  WHEN 1700 /*numeric*/ THEN
   CASE WHEN $2 = -1
THEN null
-   ELSE (($2 - 4) >> 16) & 65535
+   ELSE (($2 - 4) >> 16) & 0x
END
  WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
  WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
@@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4) 
RETURNS integer
WHEN $1 IN (1700) THEN
 CASE WHEN $2 = -1
  THEN null
- ELSE ($2 - 4) & 65535
+ ELSE ($2 - 4) & 0x
  END
ELSE null
   END;
@@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod 
int4) RETURNS integer
WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END
WHEN $1 IN (1186) /* interval */
-   THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535 
END
+   THEN CASE WHEN $2 < 0 OR $2 & 0x = 0x THEN 6 ELSE $2 & 
0x END
ELSE null
   END;
 
diff --git a/src/backend/catalog/sql_features.txt 
b/src/backend/catalog/sql_features.txt
index 8704a42b60..abad216b7e 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -527,6 +527,7 @@ T652SQL-dynamic

Re: fix and document CLUSTER privileges

2022-12-08 Thread Pavel Luzanov


On 08.12.2022 01:39, Nathan Bossart wrote:

It was also noted elsewhere [1] that the privilege requirements for CLUSTER
are not documented.  The attached patch adds such documentation.
[1] https://postgr.es/m/661148f4-c7f1-dec1-2bc8-29f3bd58e242%40postgrespro.ru


Thanks for the patch. It correctly states the existing behavior.

But perhaps we should wait for the decision in discussion [1] (link above),
since this decision may affect the documentation on the CLUSTER command.

--
Pavel Luzanov
Postgres Professional: https://postgrespro.com

Re: Minimal logical decoding on standbys

2022-12-08 Thread Drouvot, Bertrand


Hi,

On 12/7/22 6:58 PM, Andres Freund wrote:

Hi,

On 2022-12-07 10:00:25 +0100, Drouvot, Bertrand wrote:

Please find attached a new patch series:

v27-0001-Add-info-in-WAL-records-in-preparation-for-logic.patch
v27-0002-Handle-logical-slot-conflicts-on-standby.patch
v27-0003-Allow-logical-decoding-on-standby.patch
v27-0004-New-TAP-test-for-logical-decoding-on-standby.patch
v27-0005-Doc-changes-describing-details-about-logical-dec.patch
v27-0006-Fixing-Walsender-corner-case-with-logical-decodi.patch


This failed on cfbot [1]. The tap output [2] has the following bit:

[09:48:56.216](5.979s) not ok 26 - cannot read from logical replication slot
[09:48:56.223](0.007s) #   Failed test 'cannot read from logical replication 
slot'
#   at C:/cirrus/src/test/recovery/t/034_standby_logical_decoding.pl line 422.
...
Warning: unable to close filehandle GEN150 properly: Bad file descriptor during 
global destruction.
Warning: unable to close filehandle GEN155 properly: Bad file descriptor during 
global destruction.

The "unable to close filehandle" stuff in my experience indicates an IPC::Run
process that wasn't ended before the tap test ended.

Greetings,

Andres Freund

[1] https://cirrus-ci.com/task/5092676671373312
[2] 
https://api.cirrus-ci.com/v1/artifact/task/5092676671373312/testrun/build/testrun/recovery/034_standby_logical_decoding/log/regress_log_034_standby_logical_decoding


Thanks for pointing out!

Please find attached V29 addressing this "Windows perl" issue: V29 changes the way the slot 
invalidation is tested and adds a "handle->finish". That looks ok now (I launched several 
successful consecutive tests on my enabled cirrus-ci repository).

V29 differs from V28 only in 0004 to workaround the above "Windows perl" issue.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom ef473f252c6e03470c975c3cc2f93482e3d98473 Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Thu, 8 Dec 2022 10:24:18 +
Subject: [PATCH v29 6/6] Fixing Walsender corner case with logical decoding on
 standby.

The problem is that WalSndWaitForWal() waits for the *replay* LSN to
increase, but gets woken up by walreceiver when new WAL has been
flushed. Which means that typically walsenders will get woken up at the
same time that the startup process will be - which means that by the
time the logical walsender checks GetXLogReplayRecPtr() it's unlikely
that the startup process already replayed the record and updated
XLogCtl->lastReplayedEndRecPtr.

Introducing a new condition variable to fix this corner case.
---
 src/backend/access/transam/xlogrecovery.c | 28 
 src/backend/replication/walsender.c   | 31 +--
 src/backend/utils/activity/wait_event.c   |  3 +++
 src/include/access/xlogrecovery.h |  3 +++
 src/include/replication/walsender.h   |  1 +
 src/include/utils/wait_event.h|  1 +
 6 files changed, 59 insertions(+), 8 deletions(-)
  41.2% src/backend/access/transam/
  48.5% src/backend/replication/
   3.6% src/backend/utils/activity/
   3.4% src/include/access/

diff --git a/src/backend/access/transam/xlogrecovery.c 
b/src/backend/access/transam/xlogrecovery.c
index 97b882564f..e958545e9b 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -358,6 +358,9 @@ typedef struct XLogRecoveryCtlData
RecoveryPauseState recoveryPauseState;
ConditionVariable recoveryNotPausedCV;
 
+   /* Replay state (see getReplayedCV() for more explanation) */
+   ConditionVariable replayedCV;
+
slock_t info_lck;   /* locks shared variables shown 
above */
 } XLogRecoveryCtlData;
 
@@ -467,6 +470,7 @@ XLogRecoveryShmemInit(void)
SpinLockInit(>info_lck);
InitSharedLatch(>recoveryWakeupLatch);
ConditionVariableInit(>recoveryNotPausedCV);
+   ConditionVariableInit(>replayedCV);
 }
 
 /*
@@ -1916,6 +1920,11 @@ ApplyWalRecord(XLogReaderState *xlogreader, XLogRecord 
*record, TimeLineID *repl
XLogRecoveryCtl->lastReplayedTLI = *replayTLI;
SpinLockRelease(>info_lck);
 
+   /*
+* wake up walsender(s) used by logical decoding on standby.
+*/
+   ConditionVariableBroadcast(>replayedCV);
+
/*
 * If rm_redo called XLogRequestWalReceiverReply, then we wake up the
 * receiver so that it notices the updated lastReplayedEndRecPtr and 
sends
@@ -4914,3 +4923,22 @@ assign_recovery_target_xid(const char *newval, void 
*extra)
else
recoveryTarget = RECOVERY_TARGET_UNSET;
 }
+
+/*
+ * Return the ConditionVariable indicating that a replay has been done.
+ *
+ * This is needed for logical decoding on standby. Indeed the "problem" is that
+ * WalSndWaitForWal() waits for the *replay* LSN to increase, but gets woken up
+ * by walreceiver when new WAL has been flushed. Which means that typically
+ * walsenders

Re: add \dpS to psql

2022-12-08 Thread Pavel Luzanov


On 08.12.2022 07:48, Isaac Morland wrote:
If we implement MAINTAIN to control access to VACUUM, ANALYZE, 
REFRESH, CLUSTER, and REINDEX, we will cover everything that I can 
find that has seriously discussed on this list


I like this approach with MAINTAIN privilege. I'm trying to find any 
disadvantages ... and I can't.


For the complete picture, I tried to see what other actions with the 
table could *potentially* be considered as maintenance.

Here is the list:

- create|alter|drop on extended statistics objects
- alter table|index alter column set statistics
- alter table|index [re]set (storage_parameters)
- alter table|index set tablespace
- alter table alter column set storage|compression
- any actions with the TOAST table that can be performed separately from 
the main table


I have to admit that the discussion has moved away from the $subject.

--
Pavel Luzanov
Postgres Professional: https://postgrespro.com

Re: Allow batched insert during cross-partition updates

2022-12-08 Thread Etsuro Fujita

Amit-san,

On Thu, Dec 8, 2022 at 5:00 PM Amit Langote  wrote:
> On Tue, Dec 6, 2022 at 6:47 PM Etsuro Fujita  wrote:
> > * In postgresGetForeignModifyBatchSize():
> >
> > /*
> > -* Should never get called when the insert is being performed as part 
> > of a
> > -* row movement operation.
> > +* Use the auxiliary state if any; see postgresBeginForeignInsert() for
> > +* details on what it represents.
> >  */
> > -   Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
> > +   if (fmstate != NULL && fmstate->aux_fmstate != NULL)
> > +   fmstate = fmstate->aux_fmstate;
> >
> > I might be missing something, but I think we should leave the Assert
> > as-is, because we still disallow to move rows to another foreign-table
> > partition that is also an UPDATE subplan result relation, which means
> > that any fmstate should have fmstate->aux_fmstate=NULL.
>
> Hmm, yes.  I forgot that 86dc90056df effectively disabled *all*
> attempts of inserting into foreign partitions that are also UPDATE
> target relations, so you are correct that fmstate->aux_fmstate would
> never be set when entering this function.
>
> That means this functionality is only useful for foreign partitions
> that are not also being updated by the original UPDATE.

Yeah, I think so too.

> I've reinstated the Assert, removed the if block as it's useless, and
> updated the comment a bit to clarify the restriction a bit.

Looks good to me.

> > * Also in that function:
> >
> > -   if (fmstate)
> > +   if (fmstate != NULL)
> >
> > This is correct, but I would vote for leaving that as-is, to make
> > back-patching easy.
>
> Removed this hunk.

Thanks!

> Updated patch attached.

Thanks for the patch!  I will review the patch a bit more, but I think
it would be committable.

Best regards,
Etsuro Fujita

Re: Aggregate node doesn't include cost for sorting

2022-12-08 Thread David Rowley

On Thu, 8 Dec 2022 at 22:06, David Geier  wrote:
> The cost of an Aggregate node seems to be the same regardless of the
> input being pre-sorted or not. If however the input is not sorted, the
> Aggregate node must additionally perform a sort which can impact runtime
> significantly. Here is an example:
>
> -- Unsorted input. Aggregate node must additionally sort all rows.
> EXPLAIN ANALYZE SELECT COUNT(DISTINCT(col1)) FROM foo;
>
>  QUERY PLAN
> ---
>   Aggregate  (cost=2584.00..2584.01 rows=1 width=8) (actual
> time=1684.705..1684.809 rows=1 loops=1)
> ->  Seq Scan on foo  (cost=0.00..2334.00 rows=10 width=71)

This output surely must be from a version of PostgreSQL prior to
1349d279?  I can't quite figure out why you've added a "SET
enable_seqscan = FALSE;". That makes it look like you've used the same
version of PostgreSQL to produce both of these plans.  The 2nd plan
you've shown must be post 1349d279.

So, with the assumption that you've used 2 different versions to show
this output, for post 1349d279, there does not seem to be much choice
on how the aggregate is executed. What's your concern about the
costings having to be accurate given there's no other plan choice?

The new pre-sorted aggregate code will always request the sort order
that suits the largest number of ORDER BY / DISTINCT aggregates.  When
there are multiple ORDER BY / DISTINCT aggregates and they have
different sort requirements then there certainly are completing ways
that the aggregation portion of the plan can be executed.  I opted to
make the choice just based on the number of aggregates that could
become presorted.  nodeAgg.c is currently not very smart about sharing
sorts between multiple aggregates with the same sort requirements.  If
that was made better, there might be more motivation to have better
costing code in make_pathkeys_for_groupagg().  However, the reasons
for the reversion of db0d67db seemed to be mainly around the lack of
ability to accurately cost multiple competing sort orders. We'd need
to come up with some better way to do that if we were to want to give
make_pathkeys_for_groupagg() similar abilities.

> Also, why does the Aggregate node sort itself? Why don't we instead emit
> an explicit Sort node in the plan so that it's visible to the user what
> is happening? As soon as there's also a GROUP BY in the query, a Sort
> node occurs in the plan. This seems inconsistent.

Post 1349d279 we do that, but it can only do it for 1 sort order.
There can be any number of aggregate functions which require a sort
and they don't all have to have the same sort order requirements.  We
can't do the same as WindowAgg does by chaining nodes together either
because the aggregate node aggregates the results and we'd need all
the same input rows to be available at each step.

The only other way would be to have it so an Aggregate node could be
fed by multiple different input nodes and then it would only work on
the aggregates that suit the given input before reading the next input
and doing the other aggregates.  Making it work like that would cause
quite a bit of additional effort during planning (not to mention the
executor). We'd have to run the join search once per required order,
which is one of the slowest parts of planning.  Right now, you could
probably make that work by just writing the SQL to have a subquery per
sort requirement.

David

Re: refactor ExecGrant_*() functions

2022-12-08 Thread Peter Eisentraut


On 06.12.22 09:41, Antonin Houska wrote:

Attached are my proposals for improvements. One is to avoid memory leak, the
other tries to improve readability a little bit.


I added the readability improvement to my v2 patch.  The pfree() calls 
aren't necessary AFAICT.

Re: refactor ExecGrant_*() functions

2022-12-08 Thread Peter Eisentraut


On 02.12.22 18:28, Andres Freund wrote:

Hi,

On 2022-12-02 08:30:55 +0100, Peter Eisentraut wrote:

 From 200879e5edfc1ce93b7af3cbfafc1f618626cbe9 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 2 Dec 2022 08:16:53 +0100
Subject: [PATCH] Refactor ExecGrant_*() functions

Instead of half a dozen of mostly-duplicate ExecGrant_Foo() functions,
write one common function ExecGrant_generic() that can handle most of
them.


I'd name it ExecGrant_common() or such instead - ExecGrant_generic() sounds
like it will handle arbitrary things, which it doesn't. And, as you mention,
we could implement e.g. ExecGrant_Language() as using ExecGrant_common() +
additional checks.


Done


Perhaps it'd be useful to add a callback to ExecGrant_generic() that can
perform additional checks, so that e.g. ExecGrant_Language() can easily be
implemented using ExecGrant_generic()?


Done.  This allows getting rid of ExecGrant_Language and ExecGrant_Type 
in addition to the previous patch.
From 7c4c3bd5d1cb776506b157c01c7fd975d700e8d9 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 8 Dec 2022 10:23:07 +0100
Subject: [PATCH v2] Refactor ExecGrant_*() functions

Instead of half a dozen of mostly-duplicate ExecGrant_Foo() functions,
write one common function ExecGrant_generic() that can handle most of
them.  We already have all the information we need, such as which
system catalog corresponds to which catalog table and which column is
the ACL column.

Discussion: 
https://www.postgresql.org/message-id/flat/22c7e802-4e7d-8d87-8b71-cba95e6f4bcf%40enterprisedb.com
---
 src/backend/catalog/aclchk.c | 948 +++
 1 file changed, 73 insertions(+), 875 deletions(-)

diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 8d84a7b056..43968d569c 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -106,15 +106,11 @@ bool  binary_upgrade_record_init_privs = 
false;
 
 static void ExecGrantStmt_oids(InternalGrant *istmt);
 static void ExecGrant_Relation(InternalGrant *istmt);
-static void ExecGrant_Database(InternalGrant *istmt);
-static void ExecGrant_Fdw(InternalGrant *istmt);
-static void ExecGrant_ForeignServer(InternalGrant *istmt);
-static void ExecGrant_Function(InternalGrant *istmt);
-static void ExecGrant_Language(InternalGrant *istmt);
+static void ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode 
default_privs,
+void (*object_check) 
(InternalGrant *istmt, HeapTuple tuple));
+static void ExecGrant_Language_check(InternalGrant *istmt, HeapTuple tuple);
 static void ExecGrant_Largeobject(InternalGrant *istmt);
-static void ExecGrant_Namespace(InternalGrant *istmt);
-static void ExecGrant_Tablespace(InternalGrant *istmt);
-static void ExecGrant_Type(InternalGrant *istmt);
+static void ExecGrant_Type_check(InternalGrant *istmt, HeapTuple tuple);
 static void ExecGrant_Parameter(InternalGrant *istmt);
 
 static void SetDefaultACLsInSchemas(InternalDefaultACL *iacls, List *nspnames);
@@ -602,34 +598,34 @@ ExecGrantStmt_oids(InternalGrant *istmt)
ExecGrant_Relation(istmt);
break;
case OBJECT_DATABASE:
-   ExecGrant_Database(istmt);
+   ExecGrant_common(istmt, DatabaseRelationId, 
ACL_ALL_RIGHTS_DATABASE, NULL);
break;
case OBJECT_DOMAIN:
case OBJECT_TYPE:
-   ExecGrant_Type(istmt);
+   ExecGrant_common(istmt, TypeRelationId, 
ACL_ALL_RIGHTS_TYPE, ExecGrant_Type_check);
break;
case OBJECT_FDW:
-   ExecGrant_Fdw(istmt);
+   ExecGrant_common(istmt, ForeignDataWrapperRelationId, 
ACL_ALL_RIGHTS_FDW, NULL);
break;
case OBJECT_FOREIGN_SERVER:
-   ExecGrant_ForeignServer(istmt);
+   ExecGrant_common(istmt, ForeignServerRelationId, 
ACL_ALL_RIGHTS_FOREIGN_SERVER, NULL);
break;
case OBJECT_FUNCTION:
case OBJECT_PROCEDURE:
case OBJECT_ROUTINE:
-   ExecGrant_Function(istmt);
+   ExecGrant_common(istmt, ProcedureRelationId, 
ACL_ALL_RIGHTS_FUNCTION, NULL);
break;
case OBJECT_LANGUAGE:
-   ExecGrant_Language(istmt);
+   ExecGrant_common(istmt, LanguageRelationId, 
ACL_ALL_RIGHTS_LANGUAGE, ExecGrant_Language_check);
break;
case OBJECT_LARGEOBJECT:
ExecGrant_Largeobject(istmt);
break;
case OBJECT_SCHEMA:
-   ExecGrant_Namespace(istmt);
+   ExecGrant_common(istmt, NamespaceRelationId,

Re: on placeholder entries in view rule action query's range table

2022-12-08 Thread Alvaro Herrera

On 2022-Dec-07, Amit Langote wrote:

> However, this
> approach of not storing the placeholder in the stored rule would lead
> to a whole lot of regression test output changes, because the stored
> view queries of many regression tests involving views would now end up
> with only 1 entry in the range table instead of 3, causing ruleutils.c
> to no longer qualify the column names in the deparsed representation
> of those queries appearing in those regression test expected outputs.
> 
> To avoid that churn (not sure if really a goal to strive for in this
> case!), I thought it might be better to keep the OLD entry in the
> stored action query while getting rid of the NEW entry.

If the *only* argument for keeping the RTE for OLD is to avoid
regression test churn, then definitely it is not worth doing and it
should be ripped out.

> Other than avoiding the regression test output churn, this also makes
> the changes of ApplyRetrieveRule unnecessary.

But do these changes mean the code is worse afterwards?  Changing stuff,
per se, is not bad.  Also, since you haven't posted the "complete" patch
since Nov 7th, it's not easy to tell what those changes are.

Maybe you should post both versions of the patch -- one that removes
just NEW, and one that removes both OLD and NEW, so that we can judge.

> Actually, as I was addressing Alvaro's comments on the now-committed
> patch, I was starting to get concerned about the implications of the
> change in position of the view relation RTE in the query's range table
> if ApplyRetrieveRule() adds one from scratch instead of simply
> recycling the OLD entry from stored rule action query, even though I
> could see that there are no *user-visible* changes, especially after
> decoupling permission checking from the range table.

Hmm, I think I see the point, though I don't necessarily agree that
there is a problem.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Aggregate node doesn't include cost for sorting

2022-12-08 Thread David Geier


Hi hackers,

The cost of an Aggregate node seems to be the same regardless of the 
input being pre-sorted or not. If however the input is not sorted, the 
Aggregate node must additionally perform a sort which can impact runtime 
significantly. Here is an example:


CREATE TABLE foo(col0 INT, col1 TEXT);
INSERT INTO foo SELECT generate_series(1, 10), 
'aa' || md5(random()::TEXT);

CREATE INDEX foo_idx ON foo(col1);
SET max_parallel_workers_per_gather = 0;
SET enable_bitmapscan = FALSE;

-- Unsorted input. Aggregate node must additionally sort all rows.
EXPLAIN ANALYZE SELECT COUNT(DISTINCT(col1)) FROM foo;

    QUERY PLAN
---
 Aggregate  (cost=2584.00..2584.01 rows=1 width=8) (actual 
time=1684.705..1684.809 rows=1 loops=1)
   ->  Seq Scan on foo  (cost=0.00..2334.00 rows=10 width=71) 
(actual time=0.018..343.280 rows=10 loops=1)

 Planning Time: 0.317 ms
 Execution Time: 1685.543 ms


-- Pre-sorted input. Aggregate node doesn't have to sort all rows.
SET enable_seqscan = FALSE;
EXPLAIN ANALYZE SELECT COUNT(DISTINCT(col1)) FROM foo;

QUERY PLAN
---
 Aggregate  (cost=6756.42..6756.43 rows=1 width=8) (actual 
time=819.028..819.041 rows=1 loops=1)
   ->  Index Only Scan using foo_idx on foo (cost=6506.42..6506.42 
rows=10 width=71) (actual time=0.046..404.260 rows=10 loops=1)

 Heap Fetches: 10
 Heap Prefetches: 1334
 Planning Time: 0.438 ms
 Execution Time: 819.515 ms

The cost of the Aggregate node is in both cases the same (250.0) while 
its runtime is 1.3s in the unsorted case and 0.4s in the pre-sorted case.


Also, why does the Aggregate node sort itself? Why don't we instead emit 
an explicit Sort node in the plan so that it's visible to the user what 
is happening? As soon as there's also a GROUP BY in the query, a Sort 
node occurs in the plan. This seems inconsistent.


--
David Geier
(ServiceNow)

Re: Perform streaming logical transactions by background workers and parallel apply

2022-12-08 Thread Masahiko Sawada

On Thu, Dec 8, 2022 at 4:42 PM Amit Kapila  wrote:
>
> On Thu, Dec 8, 2022 at 12:42 PM Masahiko Sawada  wrote:
> >
> > On Wed, Dec 7, 2022 at 10:03 PM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > >
> > > > +static void
> > > > +ProcessParallelApplyInterrupts(void)
> > > > +{
> > > > +CHECK_FOR_INTERRUPTS();
> > > > +
> > > > +if (ShutdownRequestPending)
> > > > +{
> > > > +ereport(LOG,
> > > > +(errmsg("logical replication parallel
> > > > apply worker for subscrip
> > > > tion \"%s\" has finished",
> > > > +
> > > > MySubscription->name)));
> > > > +
> > > > +apply_worker_clean_exit(false);
> > > > +}
> > > > +
> > > > +if (ConfigReloadPending)
> > > > +{
> > > > +ConfigReloadPending = false;
> > > > +ProcessConfigFile(PGC_SIGHUP);
> > > > +}
> > > > +}
> > > >
> > > > I personally think that we don't need to have a function to do only
> > > > these few things.
> > >
> > > I thought that introduce a new function make the handling of worker 
> > > specific
> > > Interrupts logic similar to other existing ones. Like:
> > > ProcessWalRcvInterrupts () in walreceiver.c and HandlePgArchInterrupts() 
> > > in
> > > pgarch.c ...
> >
> > I think the difference from them is that there is only one place to
> > call ProcessParallelApplyInterrupts().
> >
>
> But I feel it is better to isolate this code in a separate function.
> What if we decide to extend it further by having some logic to stop
> workers after reloading of config?

I think we can separate the function at that time. But let's keep the
current code as you and Hou agree with the current code. I'm not going
to insist on that.

>
> > >
> > > > ---
> > > > server_version = walrcv_server_version(LogRepWorkerWalRcvConn);
> > > > options.proto.logical.proto_version =
> > > > +server_version >= 16 ?
> > > > LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM :
> > > > server_version >= 15 ?
> > > > LOGICALREP_PROTO_TWOPHASE_VERSION_NUM :
> > > > server_version >= 14 ?
> > > > LOGICALREP_PROTO_STREAM_VERSION_NUM :
> > > > LOGICALREP_PROTO_VERSION_NUM;
> > > >
> > > > Instead of always using the new protocol version, I think we can use
> > > > LOGICALREP_PROTO_TWOPHASE_VERSION_NUM if the streaming is not
> > > > 'parallel'. That way, we don't need to change protocl version check
> > > > logic in pgoutput.c and don't need to expose defGetStreamingMode().
> > > > What do you think?
> > >
> > > I think that some user can also use the new version number when trying to 
> > > get
> > > changes (via pg_logical_slot_peek_binary_changes or other functions), so 
> > > I feel
> > > leave the check for new version number seems fine.
> > >
> > > Besides, I feel even if we don't use new version number, we still need to 
> > > use
> > > defGetStreamingMode to check if parallel mode in used as we need to send
> > > abort_lsn when parallel is in used. I might be missing something, sorry 
> > > for
> > > that. Can you please explain the idea a bit ?
> >
> > My idea is that we use LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM if
> > (server_version >= 16 && MySubscription->stream ==
> > SUBSTREAM_PARALLEL). If the stream is SUBSTREAM_ON, we use
> > LOGICALREP_PROTO_TWOPHASE_VERSION_NUM even if server_version is
> > 16. That way, in pgoutput.c, we can send abort_lsn if the protocol
> > version is LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM. We don't need
> > to send "streaming = parallel" to the publisher since the publisher
> > can decide whether or not to send abort_lsn based on the protocol
> > version (still needs to send "streaming = on" though). I might be
> > missing something.
> >
>
> What if we decide to send some more additional information as part of
> another patch like we are discussing in the thread [1]? Now, we won't
> be able to decide the version number based on just the streaming
> option. Also, in such a case, even for
> LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM, it may not be a good
> idea to send additional abort information unless the user has used the
> streaming=parallel option.

If we're going to send the additional information, it makes sense to
send streaming=parallel. But the next question came to me is why do we
need to increase the protocol version for parallel apply feature? If
sending the additional information is also controlled by an option
like "streaming", we can decide what we send based on these options,
no?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Allow batched insert during cross-partition updates

2022-12-08 Thread Amit Langote

Hi Fujita-san,

On Tue, Dec 6, 2022 at 6:47 PM Etsuro Fujita  wrote:
> On Tue, Mar 22, 2022 at 10:17 AM Amit Langote  wrote:
> > Rebased to fix a minor conflict with some recently committed
> > nodeModifyTable.c changes.
>
> Apologies for not having reviewed the patch.  Here are some review comments:

No problem and thanks for taking a look.

> * The patch conflicts with commit ffbb7e65a, so please update the
> patch.  (That commit would cause an API break, so I am planning to
> apply a fix to HEAD as well [1].)  That commit fixed the handling of
> pending inserts, which I think would eliminate the need to do this:
>
> * ExecModifyTable(), when inserting any remaining batched tuples,
> must look at the correct set of ResultRelInfos that would've been
> used by such inserts, because failing to do so would result in those
> tuples not actually getting inserted.  To fix, ExecModifyTable() is
> now made to get the ResultRelInfos from the PartitionTupleRouting
> data structure which contains the ResultRelInfo that would be used by
> those internal inserts. To allow nodeModifyTable.c look inside
> PartitionTupleRouting, its definition, which was previously local to
> execPartition.c, is exposed via execPartition.h.

Ah, I see.  Removed those hunks.

> * In postgresGetForeignModifyBatchSize():
>
> /*
> -* Should never get called when the insert is being performed as part of a
> -* row movement operation.
> +* Use the auxiliary state if any; see postgresBeginForeignInsert() for
> +* details on what it represents.
>  */
> -   Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
> +   if (fmstate != NULL && fmstate->aux_fmstate != NULL)
> +   fmstate = fmstate->aux_fmstate;
>
> I might be missing something, but I think we should leave the Assert
> as-is, because we still disallow to move rows to another foreign-table
> partition that is also an UPDATE subplan result relation, which means
> that any fmstate should have fmstate->aux_fmstate=NULL.

Hmm, yes.  I forgot that 86dc90056df effectively disabled *all*
attempts of inserting into foreign partitions that are also UPDATE
target relations, so you are correct that fmstate->aux_fmstate would
never be set when entering this function.

That means this functionality is only useful for foreign partitions
that are not also being updated by the original UPDATE.

I've reinstated the Assert, removed the if block as it's useless, and
updated the comment a bit to clarify the restriction a bit.

> * Also in that function:
>
> -   if (fmstate)
> +   if (fmstate != NULL)
>
> This is correct, but I would vote for leaving that as-is, to make
> back-patching easy.

Removed this hunk.

> That is all I have for now.  I will mark this as Waiting on Author.

Updated patch attached.


--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


v11-0001-Allow-batching-of-inserts-during-cross-partition.patch
Description: Binary data

67 matches

Mail list logo