Re: SQL JSON path enhanced numeric literals

2023-02-27 Thread Peter Eisentraut

On 28.02.23 01:09, Vik Fearing wrote:

On 2/27/23 20:13, Peter Eisentraut wrote:
Attached is a patch to add nondecimal integer literals and underscores 
in numeric literals to the SQL JSON path language.  This matches the 
recent additions to the core SQL syntax.  It follows ECMAScript in 
combination with the current SQL draft.


Internally, all the numeric literal parsing of jsonpath goes through 
numeric_in, which already supports all this, so this patch is just a 
bit of lexer work and some tests.


Is T840 really NO after this patch?


That was meant to be a YES.






Re: Allow tests to pass in OpenSSL FIPS mode

2023-02-27 Thread Peter Eisentraut

On 28.02.23 06:01, Michael Paquier wrote:

On Mon, Feb 27, 2023 at 08:23:34AM +0100, Peter Eisentraut wrote:

On 27.02.23 08:16, Michael Paquier wrote:

This refers to brin_minmax_multi_distance_macaddr8(), no?  This is
amazing.  I have a hard time imagining how FIPS would interact with
what we do in mac8.c to explain that, so it may be something entirely
different.  Is that reproducible?


This is no longer present in the v2 patch.


Sure, but why was it happening in the first place?


Because the earlier patch only changed the test input values (which were 
generated on the fly using md5()), but did not adjust the expected test 
results in all the places.






Re: Provide PID data for "cannot wait on a latch owned by another process" in latch.c

2023-02-27 Thread Peter Eisentraut

On 28.02.23 00:59, Michael Paquier wrote:

On Mon, Feb 27, 2023 at 05:48:10PM +0900, Kyotaro Horiguchi wrote:

+1 for adding that information, I'm afraid that MyProcId is not
necessary since it is displayed in log lines in most cases.  If you
want to display the both PIDs I suggest making them more distinctive.


What would you suggest?  This message is basically impossible to
reach so the wording of the patch was OK for me (see async.c) so you
would need to look at the internals anyway.  Now if you'd like
something like "could not blah: owner PID=%d, MyProcPid=%d", that's
also fine by me.


I would also have asked for some kind of prefix that introduces the numbers.

I wonder what these numbers are useful for though?  Is this a 
development aid?  Can you do anything with these numbers?





Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-02-27 Thread John Naylor
On Fri, Feb 24, 2023 at 12:50 PM Masahiko Sawada 
wrote:
>
> On Wed, Feb 22, 2023 at 6:55 PM John Naylor
>  wrote:
> >
> > That doesn't seem useful to me. If we've done enough testing to
reassure us the new way always gives the same answer, the old way is not
needed at commit time. If there is any doubt it will always give the same
answer, then the whole patchset won't be committed.

> My idea is to make the bug investigation easier but on
> reflection, it seems not the best idea given this purpose.

My concern with TIDSTORE_DEBUG is that it adds new code that mimics the old
tid array. As I've said, that doesn't seem like a good thing to carry
forward forevermore, in any form. Plus, comparing new code with new code is
not the same thing as comparing existing code with new code. That was my
idea upthread.

Maybe the effort my idea requires is too much vs. the likelihood of finding
a problem. In any case, it's clear that if I want that level of paranoia,
I'm going to have to do it myself.

> What do you think
> about the attached patch? Please note that it also includes the
> changes for minimum memory requirement.

Most of the asserts look logical, or at least harmless.

- int max_off; /* the maximum offset number */
+ OffsetNumber max_off; /* the maximum offset number */

I agree with using the specific type for offsets here, but I'm not sure why
this change belongs in this patch. If we decided against the new asserts,
this would be easy to lose.

This change, however, defies common sense:

+/*
+ * The minimum amount of memory required by TidStore is 2MB, the current
minimum
+ * valid value for the maintenance_work_mem GUC. This is required to
allocate the
+ * DSA initial segment, 1MB, and some meta data. This number is applied
also to
+ * the local TidStore cases for simplicity.
+ */
+#define TIDSTORE_MIN_MEMORY (2 * 1024 * 1024L) /* 2MB */

+ /* Sanity check for the max_bytes */
+ if (max_bytes < TIDSTORE_MIN_MEMORY)
+ elog(ERROR, "memory for TidStore must be at least %ld, but %zu provided",
+ TIDSTORE_MIN_MEMORY, max_bytes);

Aside from the fact that this elog's something that would never get past
development, the #define just adds a hard-coded copy of something that is
already hard-coded somewhere else, whose size depends on an implementation
detail in a third place.

This also assumes that all users of tid store are limited by
maintenance_work_mem. Andres thought of an example of some day unifying
with tidbitmap.c, and maybe other applications will be limited by work_mem.

But now that I'm looking at the guc tables, I am reminded that work_mem's
minimum is 64kB, so this highlights a design problem: There is obviously no
requirement that the minimum work_mem has to be >= a single DSA segment,
even though operations like parallel hash and parallel bitmap heap scan are
limited by work_mem. It would be nice to find out what happens with these
parallel features when work_mem is tiny (maybe parallelism is not even
considered?).

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Make some xlogreader messages more accurate

2023-02-27 Thread Bharath Rupireddy
On Thu, Feb 23, 2023 at 1:06 PM Peter Eisentraut
 wrote:
>
> Here is a small patch to make some invalid-record error messages in
> xlogreader a bit more accurate (IMO).

+1 for these changes.

> My starting point was that when you have some invalid WAL, you often get
> a message like "wanted 24, got 0".  This is a bit incorrect, since it
> really wanted *at least* 24, not exactly 24.  So I have updated the
> messages to that effect, and

Yes, it's not exactly "wanted", but "wanted at least" because
xl_tot_len is the total length of the entire record including header
and payload.

> also added that detail to one message where
> it was available but not printed.

Looks okay.

> Going through the remaining report_invalid_record() calls I then
> adjusted the use of "invalid" vs. "incorrect" in one case.  The message
> "record with invalid length" makes it sound like the length was
> something like -5, but really we know what the length should be and what
> we got wasn't it, so "incorrect" sounded better and is also used in
> other error messages in that file.

I have no strong opinion about this change. We seem to be using
"invalid length" and "incorrect length" interchangeably [1] without
distinguishing between "invalid" if length is < 0 and "incorrect" if
length >= 0 and not something we're expecting.

Another comment on the patch:
1. Why is "wanted >=%u" any better than "wanted at least %u"? IMO, the
wording as opposed to >= symbol in the user-facing messages works
better.
+report_invalid_record(state, "invalid record offset at %X/%X:
wanted >=%u, got %u",
+  "invalid record length at %X/%X:
wanted >=%u, got %u",
+  "invalid record length at %X/%X: wanted
>=%u, got %u",

[1]
elog(ERROR, "incorrect length %d in streaming transaction's changes
file \"%s\"",
"record with invalid length at %X/%X",
(errmsg("invalid length of checkpoint record")));
errmsg("invalid length of startup packet")));
errmsg("invalid length of startup packet")));
elog(ERROR, "invalid zero-length dimension array in MCVList");
elog(ERROR, "invalid length (%d) dimension array in MCVList",
errmsg("invalid length in external \"%s\" value",
errmsg("invalid length in external bit string")));
libpq_append_conn_error(conn, "certificate contains IP address with
invalid length %zu

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: ICU locale validation / canonicalization

2023-02-27 Thread Jeff Davis
On Mon, 2023-02-20 at 15:23 -0800, Jeff Davis wrote:
> 
> New patch attached. The new patch also includes a GUC that (when
> enabled) validates that the collator is actually found.

New patch attached.

Now it always preserves the exact locale string during pg_upgrade, and
does not attempt to canonicalize it. Before it was trying to be clever
by determining if the language tag was finding the same collator as the
original string -- I didn't find a problem with that, but it just
seemed a bit too clever. So, only newly-created locales and databases
have the ICU locale string canonicalized to a language tag.

Also, I added a SQL function pg_icu_language_tag() that can convert
locale strings to language tags, and check whether they exist or not.


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From e7b67f0410a18c32cf271532f7a4719cf8c1c560 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Wed, 15 Feb 2023 23:05:08 -0800
Subject: [PATCH v3] ICU locale string canonicalization and validation.

Before storing the locale name in the catalog, convert to a BCP47
language tag. The language tag is an unambiguous representation of the
locale that holds all of the necessary information.

Also, add a new GUC icu_locale_validation. When set to true, it raises
an ERROR if the locale string is malformed or if it is not a valid
locale in ICU.

During pg_upgrade, the previous locale string is preserved verbatim.

Discussion: https://postgr.es/m/11b1eeb7e7667fdd4178497aeb796c48d26e69b9.ca...@j-davis.com
---
 doc/src/sgml/config.sgml  |  16 +++
 doc/src/sgml/func.sgml|  17 +++
 src/backend/commands/collationcmds.c  | 101 +++---
 src/backend/commands/dbcommands.c |  39 ++
 src/backend/utils/adt/pg_locale.c | 128 --
 src/backend/utils/misc/guc_tables.c   |  10 ++
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/bin/pg_dump/t/002_pg_dump.pl  |   4 +-
 src/include/catalog/pg_proc.dat   |   5 +
 src/include/commands/dbcommands.h |   1 +
 src/include/utils/pg_locale.h |   4 +
 .../regress/expected/collate.icu.utf8.out |  87 +++-
 src/test/regress/sql/collate.icu.utf8.sql |  24 +++-
 13 files changed, 405 insertions(+), 33 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e5c41cc6c6..f7fdb54a1b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9776,6 +9776,22 @@ SET XML OPTION { DOCUMENT | CONTENT };
   
  
 
+ 
+  icu_locale_validation (boolean)
+  
+   icu_locale_validation configuration parameter
+  
+  
+  
+   
+If set to true, validates that ICU locale strings
+are well-formed, and that they represent valid locale in ICU. Does not
+cause any locale string to be rejected during . The default is false.
+   
+  
+ 
+
  
   default_text_search_config (string)
   
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0cbdf63632..e2604c41ad 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27406,6 +27406,23 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
 Use of this function is restricted to superusers.

   
+
+  
+   
+
+ pg_icu_language_tag
+
+pg_icu_language_tag ( locale text, validate boolean )
+text
+   
+   
+Canonicalizes the given locale string into a
+BCP 47 language tag (see ). If
+validate is true, check that
+the resulting language tag represents a valid locale in ICU.
+   
+  
  
 

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index eb62d285ea..8edc22f579 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -47,6 +47,8 @@ typedef struct
 	int			enc;			/* encoding */
 } CollAliasData;
 
+extern bool icu_locale_validation;
+
 
 /*
  * CREATE COLLATION
@@ -240,10 +242,50 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		}
 		else if (collprovider == COLLPROVIDER_ICU)
 		{
+#ifdef USE_ICU
+			char	*langtag;
+
 			if (!colliculocale)
 ereport(ERROR,
 		(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 		 errmsg("parameter \"locale\" must be specified")));
+
+			check_icu_locale(colliculocale);
+
+			/*
+			 * During binary upgrade, preserve locale string verbatim.
+			 * Otherwise, canonicalize to a language tag.
+			 */
+			if (!IsBinaryUpgrade)
+			{
+int elevel = icu_locale_validation ? ERROR : WARNING;
+
+langtag = icu_language_tag(colliculocale);
+if (langtag)
+{
+	ereport(NOTICE,
+			(errmsg("using language tag \"%s\" for locale \"%s\"",
+	langtag, colliculocale)));
+
+	if (!icu_collator_exists(langtag))
+		ereport(elevel,
+

Re: Track Oldest Initialized WAL Buffer Page

2023-02-27 Thread Bharath Rupireddy
On Tue, Feb 28, 2023 at 5:52 AM Nathan Bossart  wrote:
>
> On Tue, Feb 07, 2023 at 07:30:00PM +0530, Bharath Rupireddy wrote:
> > + /*
> > +  * Try updating oldest initialized XLog buffer page.
> > +  *
> > +  * Update it if we are initializing an XLog buffer page for 
> > the first
> > +  * time or if XLog buffers are full and we are wrapping 
> > around.
> > +  */
> > + if (XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) ||
> > + (!XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) 
> > &&
> > +  XLogRecPtrToBufIdx(XLogCtl->OldestInitializedPage) 
> > == nextidx))
> > + {
> > + Assert(XLogCtl->OldestInitializedPage < 
> > NewPageBeginPtr);
> > +
> > + XLogCtl->OldestInitializedPage = NewPageBeginPtr;
> > + }
>
> nitpick: I think you can simplify the conditional to
>
> if (XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) ||
> XLogRecPtrToBufIdx(XLogCtl->OldestInitializedPage) == nextidx)

Oh, yes, done that.

> It's confusing to me that OldestInitializedPage is set to NewPageBeginPtr.
> Doesn't that set it to the beginning of the newest initialized page?

Yes, that's the intention, see below. OldestInitializedPage points to
the start address of the oldest initialized page whereas the
InitializedUpTo points to the end address of the latest initialized
page. With this, one can easily track all the WAL between
OldestInitializedPage and InitializedUpTo.

+/*
+ * OldestInitializedPage and InitializedUpTo are always starting and
+ * ending addresses of (same or different) XLog buffer page
+ * respectively. Hence, they can never be same even if there's only one
+ * initialized page in XLog buffers.
+ */
+Assert(XLogCtl->OldestInitializedPage != XLogCtl->InitializedUpTo);

Thanks for looking at it. I'm attaching v2 patch with the above review
comment addressed for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From b20ddef1ca852f86cf309417e598f02ff91ff946 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Tue, 28 Feb 2023 05:22:20 +
Subject: [PATCH v2] Track Oldest Initialized WAL Buffer Page

---
 src/backend/access/transam/xlog.c | 169 ++
 src/include/access/xlog.h |   1 +
 2 files changed, 170 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f9f0f6db8d..f4531d3fb5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -504,6 +504,44 @@ typedef struct XLogCtlData
 	XLogRecPtr *xlblocks;		/* 1st byte ptr-s + XLOG_BLCKSZ */
 	int			XLogCacheBlck;	/* highest allocated xlog buffer index */
 
+	/*
+	 * Start address of oldest initialized page in XLog buffers.
+	 *
+	 * We mainly track oldest initialized page explicitly to quickly tell if a
+	 * given WAL record is available in XLog buffers. It also can be used for
+	 * other purposes, see notes below.
+	 *
+	 * OldestInitializedPage gives XLog buffers following properties:
+	 *
+	 * 1) At any given point of time, pages in XLog buffers array are sorted in
+	 * an ascending order from OldestInitializedPage till InitializedUpTo.
+	 * Note that we verify this property for assert-only builds, see
+	 * IsXLogBuffersArraySorted() for more details.
+	 *
+	 * 2) OldestInitializedPage is monotonically increasing (by virtue of how
+	 * postgres generates WAL records), that is, its value never decreases.
+	 * This property lets someone read its value without a lock. There's no
+	 * problem even if its value is slightly stale i.e. concurrently being
+	 * updated. One can still use it for finding if a given WAL record is
+	 * available in XLog buffers. At worst, one might get false positives (i.e.
+	 * OldestInitializedPage may tell that the WAL record is available in XLog
+	 * buffers, but when one actually looks at it, it isn't really available).
+	 * This is more efficient and performant than acquiring a lock for reading.
+	 * Note that we may not need a lock to read OldestInitializedPage but we
+	 * need to update it holding WALBufMappingLock.
+	 *
+	 * 3) One can start traversing XLog buffers from OldestInitializedPage till
+	 * InitializedUpTo to list out all valid WAL records and stats, and expose
+	 * them via SQL-callable functions to users.
+	 *
+	 * 4) XLog buffers array is inherently organized as a circular, sorted and
+	 * rotated array with OldestInitializedPage as pivot with the property
+	 * where LSN of previous buffer page (if valid) is greater than
+	 * OldestInitializedPage and LSN of next buffer page (if valid) is greater
+	 * than OldestInitializedPage.
+	 */
+	XLogRecPtr	OldestInitializedPage;
+
 	/*
 	 * InsertTimeLineID is the timeline into which new WAL is being 

Re: Improve WALRead() to suck data directly from WAL buffers when possible

2023-02-27 Thread Bharath Rupireddy
On Tue, Feb 28, 2023 at 6:14 AM Nathan Bossart  wrote:
>
> On Wed, Feb 08, 2023 at 08:00:00PM +0530, Bharath Rupireddy wrote:
> > + /*
> > +  * We read some of the requested bytes. Continue to 
> > read remaining
> > +  * bytes.
> > +  */
> > + ptr += nread;
> > + nbytes -= nread;
> > + dst += nread;
> > + *read_bytes += nread;
>
> Why do we only read a page at a time in XLogReadFromBuffersGuts()?  What is
> preventing us from copying all the data we need in one go?

Note that most of the WALRead() callers request a single page of
XLOG_BLCKSZ bytes even if the server has less or more available WAL
pages. It's the streaming replication wal sender that can request less
than XLOG_BLCKSZ bytes and upto MAX_SEND_SIZE (16 * XLOG_BLCKSZ). And,
if we read, say, MAX_SEND_SIZE at once while holding
WALBufMappingLock, that might impact concurrent inserters (at least, I
can say it in theory) - one of the main intentions of this patch is
not to impact inserters much.

Therefore, I feel reading one WAL buffer page at a time, which works
for most of the cases, without impacting concurrent inserters much is
better - 
https://www.postgresql.org/message-id/CALj2ACWXHP6Ha1BfDB14txm%3DXP272wCbOV00mcPg9c6EXbnp5A%40mail.gmail.com.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Force testing of query jumbling code in TAP tests

2023-02-27 Thread Michael Paquier
On Thu, Feb 16, 2023 at 02:08:42PM +0900, Michael Paquier wrote:
> Other ideas are welcome.  At least this would be a start.

The main idea of the patch is here:

> +# Check some data from pg_stat_statements.
> +$node_primary->safe_psql('postgres', 'CREATE EXTENSION pg_stat_statements');
> +# This gathers data based on the first characters for some common query 
> types,
> +# providing coverage for SELECT, DMLs, and some DDLs.
> +my $result = $node_primary->safe_psql(
> + 'postgres',
> + qq{WITH select_stats AS
> +  (SELECT upper(substr(query, 1, 6)) AS select_query
> + FROM pg_stat_statements
> + WHERE upper(substr(query, 1, 6)) IN ('SELECT', 'UPDATE',
> +  'INSERT', 'DELETE',
> +  'CREATE'))
> +  SELECT select_query, count(select_query) > 1 AS some_rows
> +FROM select_stats
> +GROUP BY select_query ORDER BY select_query;});
> +is( $result, qq(CREATE|t
> +DELETE|t
> +INSERT|t
> +SELECT|t
> +UPDATE|t), 'check contents of pg_stat_statements on regression database');

Are there any objections to do what's proposed in the patch and
improve the testing coverage of query jumbling by default?
--
Michael


signature.asc
Description: PGP signature


Re: Allow tests to pass in OpenSSL FIPS mode

2023-02-27 Thread Michael Paquier
On Mon, Feb 27, 2023 at 08:23:34AM +0100, Peter Eisentraut wrote:
> On 27.02.23 08:16, Michael Paquier wrote:
>> This refers to brin_minmax_multi_distance_macaddr8(), no?  This is
>> amazing.  I have a hard time imagining how FIPS would interact with
>> what we do in mac8.c to explain that, so it may be something entirely
>> different.  Is that reproducible?
> 
> This is no longer present in the v2 patch.

Sure, but why was it happening in the first place?  The proposed patch
set only reworks some regression tests.  So It seems to me that this
is a sign that we may have issues in some code area that got stressed
in some new way, no?
--
Michael


signature.asc
Description: PGP signature


Re: Doc update for pg_stat_statements normalization

2023-02-27 Thread Michael Paquier
On Fri, Feb 24, 2023 at 08:54:00PM +, Imseih (AWS), Sami wrote:
> I think the only thing to do here is to call this out in docs with a 
> suggestion to increase
> pg_stat_statements.max to reduce the likelihood. I also attached the suggested
> doc enhancement as well.

+  
+   A query text may be observed with constants in
+   pg_stat_statements, especially when there is a high
+   rate of entry deallocations. To reduce the likelihood of this occuring, 
consider
+   increasing pg_stat_statements.max.
+   The pg_stat_statements_info view provides entry
+   deallocation statistics.
+  

I am OK with an addition to the documentation to warn that one may
have to increase the maximum number of entries that can be stored if
seeing a non-normalized entry that should have been normalized.

Shouldn't this text make it clear that this concerns only query
strings that can be normalized?  Utility queries can have constants,
for one (well, not for long for most of them with the work I have been
doing lately, but there will still be some exceptions like CREATE
VIEW or utilities with A_Const nodes).
--
Michael


signature.asc
Description: PGP signature


Re: pg_upgrade and logical replication

2023-02-27 Thread Julien Rouhaud
On Tue, Feb 28, 2023 at 08:56:37AM +0530, Amit Kapila wrote:
> On Tue, Feb 28, 2023 at 7:55 AM Julien Rouhaud  wrote:
> >
> >
> > The scenario I'm interested in is to rely on logical replication only for 
> > the
> > upgrade, so the end state (and start state) is to go back to physical
> > replication.  In that case, I would just create new physical replica from 
> > the
> > pg_upgrade'd server and failover to that node, or rsync the previous 
> > publisher
> > node to make it a physical replica.
> >
> > But even if you want to only rely on logical replication, I'm not sure why 
> > you
> > would want to keep the publisher node as a publisher node?  I think that 
> > doing
> > it this way will lead to a longer downtime compared to doing a failover on 
> > the
> > pg_upgrade'd node, make it a publisher and then move the former publisher 
> > node
> > to a subscriber.
> >
>
> I am not sure if this is usually everyone follows because it sounds
> like a lot of work to me. IIUC, to achieve this, one needs to recreate
> all the publications and subscriptions after changing the roles of
> publisher and subscriber. Can you please write steps to show exactly
> what you have in mind to avoid any misunderstanding?

Well, as I mentioned I'm *not* interested in a logical-replication-only
scenario.  Logical replication is nice but it will always be less efficient
than physical replication, and some workloads also don't really play well with
it.  So while it can be a huge asset in some cases I'm for now looking at
leveraging logical replication for the purpose of major upgrade only for a
physical replication cluster, so the publications and subscriptions are only
temporary and trashed after use.

That being said I was only saying that if I had to do a major upgrade of a
logical replication cluster this is probably how I would try to do it, to
minimize downtime, even if there are probably *a lot* difficulties to
overcome.




Re: We shouldn't signal process groups with SIGQUIT

2023-02-27 Thread Michael Paquier
On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
> Just naively hacking this behaviour change into the current code, would yield
> sending SIGQUIT to postgres, and then SIGTERM to the whole process
> group. Which seems like a reasonable order?  quickdie() should _exit()
> immediately in the signal handler, so we shouldn't get to processing the
> SIGTERM.  Even if both signals are "reacted to" at the same time, possibly
> with SIGTERM being processed first, the SIGQUIT handler should be executed
> long before the next CFI().

I have been poking a bit at that, and did a change as simple as this
one in signal_child():
 #ifdef HAVE_SETSID
+   if (signal == SIGQUIT)
+   signal = SIGTERM;

From what I can see, SIGTERM is actually received by the backends
before SIGQUIT, and I can also see that the backends have enough room
to process CFIs in some cases, especially short queries, even before 
reaching quickdie() and its exit().  So the window between SIGTERM and
SIGQUIT is not as long as one would think.
--
Michael


signature.asc
Description: PGP signature


RE: Rework LogicalOutputPluginWriterUpdateProgress

2023-02-27 Thread Takamichi Osumi (Fujitsu)
Hi,


On Monday, February 27, 2023 6:30 PM wangw.f...@fujitsu.com 
 wrote:
> Attach the new patch.
Thanks for sharing v3. Minor review comments and question.


(1) UpdateDecodingProgressAndKeepalive header comment

The comment should be updated to explain maybe why we reset some other flags as 
discussed in [1] and the functionality to update and keepalive of the function 
simply.

(2) OutputPluginPrepareWrite

Probably the changed error string is too wide.

@@ -662,7 +657,7 @@ void
 OutputPluginPrepareWrite(struct LogicalDecodingContext *ctx, bool last_write)
 {
if (!ctx->accept_writes)
-   elog(ERROR, "writes are only accepted in commit, begin and 
change callbacks");
+   elog(ERROR, "writes are only accepted in callbacks in the 
OutputPluginCallbacks structure (except startup, shutdown, filter_by_origin and 
filter_prepare callbacks)");

I thought you can break the error message into two string lines. Or, you can 
rephrase it to different expression.

(3) Minor question

The patch introduced the goto statements into the cb_wrapper functions. Is the 
purpose to call the update_progress_and_keepalive after pop the error stack, 
even if the corresponding callback is missing ? I thought we can just have 
"else" clause for the check of the existence of callback, but did you choose 
the current goto style for readability ?

(4) Name of is_skip_threshold_change

I also feel the name of is_skip_threshold_change can be changed to 
"exceeded_keepalive_threshold" or something. Other candidates are proposed by 
Peter-san in [2].



[1] - 
https://www.postgresql.org/message-id/OS3PR01MB6275374EBE7C8CABBE6730099EAF9%40OS3PR01MB6275.jpnprd01.prod.outlook.com
[2] - 
https://www.postgresql.org/message-id/CAHut%2BPt3ZEMo-KTF%3D5KJSU%2BHdWJD19GPGGCKOmBeM47484Ychw%40mail.gmail.com


Best Regards,
Takamichi Osumi





Re: pg_upgrade and logical replication

2023-02-27 Thread Amit Kapila
On Tue, Feb 28, 2023 at 7:55 AM Julien Rouhaud  wrote:
>
> On Mon, Feb 27, 2023 at 03:39:18PM +0530, Amit Kapila wrote:
> >
> > BTW, thinking some more
> > on this, how will we allow to continue replication after upgrading the
> > publisher? During upgrade, we don't retain slots, so the replication
> > won't continue. I think after upgrading subscriber-node, user will
> > need to upgrade the publisher as well.
>
> The scenario I'm interested in is to rely on logical replication only for the
> upgrade, so the end state (and start state) is to go back to physical
> replication.  In that case, I would just create new physical replica from the
> pg_upgrade'd server and failover to that node, or rsync the previous publisher
> node to make it a physical replica.
>
> But even if you want to only rely on logical replication, I'm not sure why you
> would want to keep the publisher node as a publisher node?  I think that doing
> it this way will lead to a longer downtime compared to doing a failover on the
> pg_upgrade'd node, make it a publisher and then move the former publisher node
> to a subscriber.
>

I am not sure if this is usually everyone follows because it sounds
like a lot of work to me. IIUC, to achieve this, one needs to recreate
all the publications and subscriptions after changing the roles of
publisher and subscriber. Can you please write steps to show exactly
what you have in mind to avoid any misunderstanding?

-- 
With Regards,
Amit Kapila.




Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-27 Thread Amit Kapila
On Tue, Feb 28, 2023 at 8:14 AM Kyotaro Horiguchi
 wrote:
>
> At Mon, 27 Feb 2023 14:56:19 +0530, Amit Kapila  
> wrote in
> > The one difference w.r.t recovery_min_apply_delay is that here we will
> > hold locks for the duration of the delay which didn't seem to be a
> > good idea. This will also probably lead to more bloat as we will keep
> > transactions open for a long time. Doing it before DecodePrepare won't
>
> I don't have a concrete picture but could we tell reorder buffer to
> retain a PREPAREd transaction until a COMMIT PREPARED comes?
>

Yeah, we could do that and that is what is the behavior unless the
user enables 2PC via 'two_phase' subscription option. But, I don't see
the need to unnecessarily delay the prepare till the commit if a user
has specified 'two_phase' option. It is quite possible that COMMIT
PREPARED happens at a much later time frame than the amount of delay
the user is expecting.

>  If
> delaying non-prepared transactions until COMMIT is adequate, then the
> same thing seems to work for prepared transactions.
>
> > have such problems. This is the reason that we decide to perform a
> > delay at the start of the transaction instead at commit/prepare in the
> > subscriber-side approach.
>
> It seems that there are no technical obstacles to do that on the
> publisher side. The only observable difference would be that
> relatively large prepared transactions may experience noticeable
> additional delays.  IMHO I don't think it's a good practice
> protocol-wise to intentionally choke a stream at the receiving end
> when it has not been flow-controlled on the transmitting end.
>

But in this proposal, we are not choking/delaying anything on the receiving end.

-- 
With Regards,
Amit Kapila.




Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-27 Thread Kyotaro Horiguchi
At Mon, 27 Feb 2023 14:56:19 +0530, Amit Kapila  wrote 
in 
> The one difference w.r.t recovery_min_apply_delay is that here we will
> hold locks for the duration of the delay which didn't seem to be a
> good idea. This will also probably lead to more bloat as we will keep
> transactions open for a long time. Doing it before DecodePrepare won't

I don't have a concrete picture but could we tell reorder buffer to
retain a PREPAREd transaction until a COMMIT PREPARED comes?  If
delaying non-prepared transactions until COMMIT is adequate, then the
same thing seems to work for prepared transactions.

> have such problems. This is the reason that we decide to perform a
> delay at the start of the transaction instead at commit/prepare in the
> subscriber-side approach.

It seems that there are no technical obstacles to do that on the
publisher side. The only observable difference would be that
relatively large prepared transactions may experience noticeable
additional delays.  IMHO I don't think it's a good practice
protocol-wise to intentionally choke a stream at the receiving end
when it has not been flow-controlled on the transmitting end.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Non-superuser subscription owners

2023-02-27 Thread Stephen Frost
Greetings,

* Jeff Davis (pg...@j-davis.com) wrote:
> Not all steps would be breaking changes, and a lot of those steps are
> things we should do anyway. We could make it easier to write safe
> SECURITY DEFINER functions, provide more tools for users to opt-out of
> executing SECURITY INVOKER code, provide a way for superusers to safely
> drop privileges, document the problems with security invoker and what
> to do about them, etc.

Agreed.

> But we also shouldn't exaggerate it -- for instance, others have
> proposed that we run code as the table owner for logical subscriptions,
> and that's going to break things in the same way. Arguably, if we are
> going to break something, it's better to break it consistently rather
> than one subsystem at a time.

I tend to agree with this.

> Back to the $SUBJECT, if we allow non-superusers to run subscriptions,
> and the subscription runs the code as the table owner, that might also
> lead to some weird behavior for triggers that rely on SECURITY INVOKER
> semantics.

Indeed.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Non-superuser subscription owners

2023-02-27 Thread Stephen Frost
Greetings,

* Jeff Davis (pg...@j-davis.com) wrote:
> On Mon, 2023-02-27 at 14:10 -0500, Stephen Frost wrote:
> > I do think there are some use-cases for it, but agree that it'd be
> > better to encourage more use of SECURITY DEFINER and one approach to
> > that might be to have a way for users to explicitly say "don't run
> > code
> > that isn't mine or a superuser's with my privileges." 
> 
> I tried that:
> 
> https://www.postgresql.org/message-id/75b0dbb55e9febea54c441efff8012a6d2cb5bd7.ca...@j-davis.com
> 
> but Andres pointed out some problems with my implementation. They
> didn't seem easily fixable, but perhaps with more effort it could work
> (run all the expressions as security definer, as well?).

Presumably.  Ultimately, I tend to agree it won't be easy.  That doesn't
mean it's not a worthwhile effort.

> >  Of course, we
> > need to make sure it's possible to write safe SECURITY DEFINER
> > functions
> > and to be clear about how to do that to avoid the risk in the other
> > direction.
> 
> Agreed. Perhaps we can force search_path to be set for SECURITY
> DEFINER, and require that the temp schema be explicitly included rather
> than the current "must be at the end". We could also provide a way to
> turn public access off in the same statement, so that you don't need to
> use a transaction block to keep the function private.

We do pretty strongly encourage a search_path setting for SECURITY
DEFINER today..  That said, I'm not against pushing on that harder.  The
issue about temporary schemas is a more difficult issue... but frankly,
I'd like an option to say "no temporary schemas should be allowed in my
search path" when it comes to a security definer function.

> > I don't think we'd be able to get away with just getting rid of
> > SECURITY
> > INVOKER entirely or even in changing the current way triggers (or
> > functions in views, etc) are run by default.
> 
> I didn't propose anything radical. I'm just trying to get some
> agreement that SECURITY INVOKER is central to a lot of our security
> woes, and that we should be treating it with skepticism on a
> fundamental level.

Sure, but if we want to make progress then we have to provide a
direction for folks to go in that's both secure and convenient.

> Individual proposals for how to get away from SECURITY INVOKER should
> be evaluated on their merits (i.e. don't break a bunch of stuff).

Of course.  That said ... we don't want to spend a lot of time
going in a direction that won't bear fruit; I'm hopeful that this
direction will though.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: pg_upgrade and logical replication

2023-02-27 Thread Julien Rouhaud
On Mon, Feb 27, 2023 at 03:39:18PM +0530, Amit Kapila wrote:
>
> BTW, thinking some more
> on this, how will we allow to continue replication after upgrading the
> publisher? During upgrade, we don't retain slots, so the replication
> won't continue. I think after upgrading subscriber-node, user will
> need to upgrade the publisher as well.

The scenario I'm interested in is to rely on logical replication only for the
upgrade, so the end state (and start state) is to go back to physical
replication.  In that case, I would just create new physical replica from the
pg_upgrade'd server and failover to that node, or rsync the previous publisher
node to make it a physical replica.

But even if you want to only rely on logical replication, I'm not sure why you
would want to keep the publisher node as a publisher node?  I think that doing
it this way will lead to a longer downtime compared to doing a failover on the
pg_upgrade'd node, make it a publisher and then move the former publisher node
to a subscriber.




Re: Should vacuum process config file reload more often

2023-02-27 Thread Masahiko Sawada
On Tue, Feb 28, 2023 at 10:21 AM Andres Freund  wrote:
>
> Hi,
>
> On 2023-02-27 23:11:53 +0900, Masahiko Sawada wrote:
> > As far as I know there are not such GUC parameters in the core but
> > there might be in third-party table AM and index AM extensions.
>
> We already reload in a pretty broad range of situations, so I'm not sure
> there's a lot that could be unsafe that isn't already.
>
>
> > Also, I'm concerned that allowing to change any GUC parameters during
> > vacuum/analyze could be a foot-gun in the future. When modifying
> > vacuum/analyze-related codes, we have to consider the case where any GUC
> > parameters could be changed during vacuum/analyze.
>
> What kind of scenario are you thinking of?

For example, I guess we will need to take care of changes of
maintenance_work_mem. Currently we initialize the dead tuple space at
the beginning of lazy vacuum, but perhaps we would need to
enlarge/shrink it based on the new value?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: pg_rewind: warn when checkpoint hasn't happened after promotion

2023-02-27 Thread kuroda . keisuke

hi Heikki,

Thanks to mail, and thanks also for the commit(0a0500207a)
to fix the document.
I'm glad the problem was solved.

Best Regards,
Keisuke Kuroda
NTT COMWARE

2023-02-27 16:33 に Heikki Linnakangas さんは書きました:

On 16/11/2022 07:17, kuroda.keis...@nttcom.co.jp wrote:

I fixed this last week in commit 009746, see thread [1]. I'm sorry
I didn't notice this thread earlier.

I didn't realize that we had a notice about this in the docs. I'll go
and remove that. Thanks!

- Heikki






Re: Should vacuum process config file reload more often

2023-02-27 Thread Andres Freund
Hi,

On 2023-02-27 23:11:53 +0900, Masahiko Sawada wrote:
> As far as I know there are not such GUC parameters in the core but
> there might be in third-party table AM and index AM extensions.

We already reload in a pretty broad range of situations, so I'm not sure
there's a lot that could be unsafe that isn't already.


> Also, I'm concerned that allowing to change any GUC parameters during
> vacuum/analyze could be a foot-gun in the future. When modifying
> vacuum/analyze-related codes, we have to consider the case where any GUC
> parameters could be changed during vacuum/analyze.

What kind of scenario are you thinking of?


> I guess it would be better to apply the parameter changes for only vacuum
> delay related parameters. For example, autovacuum launcher advertises the
> values of the vacuum delay parameters on the shared memory not only for
> autovacuum processes but also for manual vacuum/analyze processes.  Both
> processes can update them accordingly in vacuum_delay_point().

I don't think this is a good idea. It'd introduce a fair amount of complexity
without, as far as I can tell, a benefit.

Greetings,

Andres Freund




Re: Rework LogicalOutputPluginWriterUpdateProgress

2023-02-27 Thread Peter Smith
Here are some comments for the v2-0001 patch.

(I haven't looked at the v3 that was posted overnight; maybe some of
my comments have already been addressed.)

==
General

1. (Info from the commit message)
Since we can know whether the change is an end of transaction change in the
common code, we removed the LogicalDecodingContext->end_xact introduced in
commit f95d53e.

~

TBH, it was not clear to me that this change was an improvement. IIUC,
it removes the "unnecessary" member, but only does that by replacing
it everywhere with a boolean parameter passed to
update_progress_and_keepalive(). So the end result seems no less code,
but it is less readable code now because you need to know what the
true/false parameter means. I wonder if it would have been better just
to leave this how it was.

==
src/backend/replication/logical/logical.c

2. General - blank lines

There are multiple places in this file where the patch removed some
statements but left blank lines. The result is 2 blank lines remaining
instead of one.

see change_cb_wrapper.
see truncate_cb_wrapper.
see stream_start_cb_wrapper.
see stream_stop_cb_wrapper.
see stream_change_cb_wrapper.

e.g.

BEFORE
ctx->write_location = last_lsn;

ctx->end_xact = false;

/* in streaming mode, stream_stop_cb is required */

AFTER (now there are 2 blank lines)
ctx->write_location = last_lsn;


/* in streaming mode, stream_stop_cb is required */

~~~

3. General - calls to is_skip_threshold_change()

+ if (is_skip_threshold_change(ctx))
+ update_progress_and_keepalive(ctx, false);

There are multiple calls like this, which are guarding the
update_progress_and_keepalive() with the is_skip_threshold_change()
- See truncate_cb_wrapper
- See message_cb_wrapper
- See stream_change_cb_wrapper
- See stream_message_cb_wrapper
- See stream_truncate_cb_wrapper
- See UpdateDecodingProgressAndKeepalive

IIUC, then I was thinking all those conditions maybe can be pushed
down *into* the wrapper, thereby making every calling code simpler.

e.g. make the wrapper function code look similar to the current
UpdateDecodingProgressAndKeepalive:

BEFORE (update_progress_and_keepalive)
{
if (!ctx->update_progress_and_keepalive)
return;

ctx->update_progress_and_keepalive(ctx, ctx->write_location,
   ctx->write_xid, ctx->did_write,
   finished_xact);
}
AFTER
{
if (!ctx->update_progress_and_keepalive)
return;

if (finished_xact || is_skip_threshold_change(ctx))
{
ctx->update_progress_and_keepalive(ctx, ctx->write_location,
   ctx->write_xid, ctx->did_write,
   finished_xact);
}
}


~~~

4. StartupDecodingContext

@@ -334,7 +329,7 @@ CreateInitDecodingContext(const char *plugin,
XLogReaderRoutine *xl_routine,
LogicalOutputPluginWriterPrepareWrite prepare_write,
LogicalOutputPluginWriterWrite do_write,
-   LogicalOutputPluginWriterUpdateProgress update_progress)
+   LogicalOutputPluginWriterUpdateProgressAndKeepalive
update_progress_and_keepalive)

TBH, I find it confusing that the new parameter name
('update_progress_and_keepalive') is identical to the static function
name in the same C source file. It introduces a kind of unnecessary
shadowing and makes it harder to search/read the code.

I suggest just calling this param something unique and local to the
function like 'do_update_keepalive'.

~~~

5. @@ -334,7 +329,7 @@ CreateInitDecodingContext(const char *plugin,
XLogReaderRoutine *xl_routine,
LogicalOutputPluginWriterPrepareWrite prepare_write,
LogicalOutputPluginWriterWrite do_write,
-   LogicalOutputPluginWriterUpdateProgress update_progress)
+   LogicalOutputPluginWriterUpdateProgressAndKeepalive
update_progress_and_keepalive)

(Ditto previous comment #4)

TBH, I find it confusing that the new parameter name
('update_progress_and_keepalive') is identical to the static function
name in the same C source file. It introduces a kind of unnecessary
shadowing and makes it harder to search/read the code.

I suggest just calling this param something unique and local to the
function like 'do_update_keepalive'.

~~~

6. CreateDecodingContext

@@ -493,7 +488,7 @@ CreateDecodingContext(XLogRecPtr start_lsn,
XLogReaderRoutine *xl_routine,
LogicalOutputPluginWriterPrepareWrite prepare_write,
LogicalOutputPluginWriterWrite do_write,
-   LogicalOutputPluginWriterUpdateProgress update_progress)
+   LogicalOutputPluginWriterUpdateProgressAndKeepalive
update_progress_and_keepalive)

(Ditto previous comment #4)

TBH, I find it confusing that the new parameter name
('update_progress_and_keepalive') is identical to the static function
name in the same C source file. It introduces a kind of unnecessary
shadowing and makes it harder to search/read the code.

I suggest just calling this param something unique and local to the
function like 'do_update_keepalive'.

~~~

7. OutputPluginPrepareWrite

@@ -662,7 +657,7 @@ void
 OutputPluginPrepareWrite(struct LogicalDecodingContext *ctx, bool last_write)
 {
  if (!ctx->accept_writes)
- elog(ERROR, "writes are only accepte

Re: Improve WALRead() to suck data directly from WAL buffers when possible

2023-02-27 Thread Nathan Bossart
On Wed, Feb 08, 2023 at 08:00:00PM +0530, Bharath Rupireddy wrote:
> + /*
> +  * We read some of the requested bytes. Continue to 
> read remaining
> +  * bytes.
> +  */
> + ptr += nread;
> + nbytes -= nread;
> + dst += nread;
> + *read_bytes += nread;

Why do we only read a page at a time in XLogReadFromBuffersGuts()?  What is
preventing us from copying all the data we need in one go?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Non-superuser subscription owners

2023-02-27 Thread Jeff Davis
On Mon, 2023-02-27 at 16:13 -0500, Robert Haas wrote:
> On Mon, Feb 27, 2023 at 1:25 PM Jeff Davis  wrote:
> > I think you are saying that we should still run Alice's code with
> > the
> > privileges of Bob, but somehow make that safe(r) for Bob. Is that
> > right?
> 
> Yeah. That's the idea I was floating, at least.

Isn't that a hard problem; maybe impossible?

> 
> I guess I have a pretty hard time imagining that we can just
> obliterate SECURITY INVOKER entirely.

Of course not.

>  It seems fundamentally
> reasonable to me that Alice might want to make some code available to
> be executed in the form of a function or procedure but without
> offering to execute it with her own privileges.

It also seems fundamentally reasonable that if someone grants you
privileges on one of their tables, it might be safe to access it.

I'm sure there are a few use cases for SECURITY INVOKER, but they are
quite narrow.

Perhaps most frustratingly, even if none of the users on a system has
any use case for SECURITY INVOKER, they still must all live in fear of
accessing each others' tables, because at any time a SECURITY INVOKER
function could be attached to one of the tables.

I feel like we are giving up mainstream utility and safety in exchange
for contrived or exceptional cases. That's not a good trade.

> We already take the position that VACUUM always runs as the
> table owner, and while VACUUM runs index expressions but not for
> example triggers, why not just be consistent and run all code that is
> tied to the table as the table owner, all the time?

I'd also extend this to default expressions and other code that can be
executed implicitly.

> Maybe that's the right thing to do

If it's the right place to go, then I think we should consider
reasonable steps to take in that direction that don't cause unnecessary
breakage.

> , but I think it would inevitably
> break some things for some users.

Not all steps would be breaking changes, and a lot of those steps are
things we should do anyway. We could make it easier to write safe
SECURITY DEFINER functions, provide more tools for users to opt-out of
executing SECURITY INVOKER code, provide a way for superusers to safely
drop privileges, document the problems with security invoker and what
to do about them, etc.

> Alice might choose to write her
> triggers or default expressions in ways that rely on them running
> with
> Bob's permissions in any number of ways.

Sure, breakage is possible, and we should mitigate it.

But we also shouldn't exaggerate it -- for instance, others have
proposed that we run code as the table owner for logical subscriptions,
and that's going to break things in the same way. Arguably, if we are
going to break something, it's better to break it consistently rather
than one subsystem at a time.

Back to the $SUBJECT, if we allow non-superusers to run subscriptions,
and the subscription runs the code as the table owner, that might also
lead to some weird behavior for triggers that rely on SECURITY INVOKER
semantics.

Regards,
Jeff Davis






Re: Track Oldest Initialized WAL Buffer Page

2023-02-27 Thread Nathan Bossart
On Tue, Feb 07, 2023 at 07:30:00PM +0530, Bharath Rupireddy wrote:
> + /*
> +  * Try updating oldest initialized XLog buffer page.
> +  *
> +  * Update it if we are initializing an XLog buffer page for the 
> first
> +  * time or if XLog buffers are full and we are wrapping around.
> +  */
> + if (XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) ||
> + (!XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) &&
> +  XLogRecPtrToBufIdx(XLogCtl->OldestInitializedPage) == 
> nextidx))
> + {
> + Assert(XLogCtl->OldestInitializedPage < 
> NewPageBeginPtr);
> +
> + XLogCtl->OldestInitializedPage = NewPageBeginPtr;
> + }

nitpick: I think you can simplify the conditional to

if (XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) ||
XLogRecPtrToBufIdx(XLogCtl->OldestInitializedPage) == nextidx)

It's confusing to me that OldestInitializedPage is set to NewPageBeginPtr.
Doesn't that set it to the beginning of the newest initialized page?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: verbose mode for pg_input_error_message?

2023-02-27 Thread Nathan Bossart
On Tue, Feb 28, 2023 at 09:01:48AM +0900, Michael Paquier wrote:
> On Mon, Feb 27, 2023 at 11:25:01AM -0800, Nathan Bossart wrote:
>> I found a couple of more small changes required to make cfbot happy.
>> Otherwise, it looks good to me.
> 
> Thanks, I have confirmed the spots the CI was complaining about, so
> applied.  There was an extra place that was not right in xml_2.out as
> reported by prion, parula and snakefly because of a bad copy-paste, so
> fixed as well.

Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: SQL JSON path enhanced numeric literals

2023-02-27 Thread Vik Fearing

On 2/27/23 20:13, Peter Eisentraut wrote:
Attached is a patch to add nondecimal integer literals and underscores 
in numeric literals to the SQL JSON path language.  This matches the 
recent additions to the core SQL syntax.  It follows ECMAScript in 
combination with the current SQL draft.


Internally, all the numeric literal parsing of jsonpath goes through 
numeric_in, which already supports all this, so this patch is just a bit 
of lexer work and some tests.


Is T840 really NO after this patch?
--
Vik Fearing





Re: Doc update for pg_stat_statements normalization

2023-02-27 Thread Michael Paquier
On Mon, Feb 27, 2023 at 10:53:26PM +, Imseih (AWS), Sami wrote:
> Wouldn't be less in terms of memory usage to just store the required
> JumbleState fields in Query[Desc]?
> 
> clocations_count,
> highest_extern_param_id,
> clocations_locations,
> clocations_length

Yes, these would be enough to ensure a proper rebuild of the
normalized query in either the 2nd or 3rd call of pgss_store().  With
a high deallocation rate, perhaps we just don't care about bearing
the extra computation to build a normalized qury more than once, still
it could be noticeable?

Anything that gets changed is going to need some serious benchmarking
(based on deallocation rate, string length, etc.) to check that the
cost of this extra memory is worth the correctness gained when storing
the normalization.  FWIW, I am all in if it means code simplifications
with better performance and better correctness of the results.
--
Michael


signature.asc
Description: PGP signature


Re: Non-superuser subscription owners

2023-02-27 Thread Jeff Davis
On Mon, 2023-02-27 at 14:10 -0500, Stephen Frost wrote:
> I do think there are some use-cases for it, but agree that it'd be
> better to encourage more use of SECURITY DEFINER and one approach to
> that might be to have a way for users to explicitly say "don't run
> code
> that isn't mine or a superuser's with my privileges." 

I tried that:

https://www.postgresql.org/message-id/75b0dbb55e9febea54c441efff8012a6d2cb5bd7.ca...@j-davis.com

but Andres pointed out some problems with my implementation. They
didn't seem easily fixable, but perhaps with more effort it could work
(run all the expressions as security definer, as well?).

>  Of course, we
> need to make sure it's possible to write safe SECURITY DEFINER
> functions
> and to be clear about how to do that to avoid the risk in the other
> direction.

Agreed. Perhaps we can force search_path to be set for SECURITY
DEFINER, and require that the temp schema be explicitly included rather
than the current "must be at the end". We could also provide a way to
turn public access off in the same statement, so that you don't need to
use a transaction block to keep the function private.

> I don't think we'd be able to get away with just getting rid of
> SECURITY
> INVOKER entirely or even in changing the current way triggers (or
> functions in views, etc) are run by default.

I didn't propose anything radical. I'm just trying to get some
agreement that SECURITY INVOKER is central to a lot of our security
woes, and that we should be treating it with skepticism on a
fundamental level.

Individual proposals for how to get away from SECURITY INVOKER should
be evaluated on their merits (i.e. don't break a bunch of stuff).

Regards,
Jeff Davis





Re: verbose mode for pg_input_error_message?

2023-02-27 Thread Michael Paquier
On Mon, Feb 27, 2023 at 11:25:01AM -0800, Nathan Bossart wrote:
> I found a couple of more small changes required to make cfbot happy.
> Otherwise, it looks good to me.

Thanks, I have confirmed the spots the CI was complaining about, so
applied.  There was an extra place that was not right in xml_2.out as
reported by prion, parula and snakefly because of a bad copy-paste, so
fixed as well.
--
Michael


signature.asc
Description: PGP signature


Re: Provide PID data for "cannot wait on a latch owned by another process" in latch.c

2023-02-27 Thread Michael Paquier
On Mon, Feb 27, 2023 at 05:48:10PM +0900, Kyotaro Horiguchi wrote:
> +1 for adding that information, I'm afraid that MyProcId is not
> necessary since it is displayed in log lines in most cases.  If you
> want to display the both PIDs I suggest making them more distinctive.

What would you suggest?  This message is basically impossible to
reach so the wording of the patch was OK for me (see async.c) so you
would need to look at the internals anyway.  Now if you'd like
something like "could not blah: owner PID=%d, MyProcPid=%d", that's
also fine by me.
--
Michael


signature.asc
Description: PGP signature


Re: Adding argument names to aggregate functions

2023-02-27 Thread Vik Fearing

On 2/27/23 14:22, Dagfinn Ilmari Mannsåker wrote:

Hi hackers,

I'm sure I'm not the only one who can never remember which way around
the value and delimiter arguments go for string_agg() and has to look it
up in the manual every time. To make it more convenient, here's a patch
that adds proargnames to its pg_proc entries so that it can be seen with
a quick \df in psql.

I also added names to json(b)_object_agg() for good measure, even though
they're more obvious. The remaining built-in multi-argument aggregate
functions are the stats-related ones, where it's all just Y/X (but why
in that order?), so I didn't think it was necessary. If others feel more
strongly, I can add those too.


No comment on adding names for everything, but a big +1 for the ones 
included here.

--
Vik Fearing





Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-02-27 Thread Andres Freund
On 2023-02-27 14:58:30 -0500, Tom Lane wrote:
> Agreed.  I'll push this along with the earlier patch if there are
> not objections.

None here.




Re: Doc update for pg_stat_statements normalization

2023-02-27 Thread Imseih (AWS), Sami
>On Sat, Feb 25, 2023 at 01:59:04PM +, Imseih (AWS), Sami wrote:>
>> The overhead of storing this additional private data for the life of the 
> query
>> execution may not be  desirable.

>Okay, but why?

Additional memory to maintain the JumbleState data in other structs, and
the additional copy operations.

>This seems like the key point to me here.  If we copy more information
>into the Query structures, then we basically have no need for sticky
>entries, which could be an advantage on its own as it simplifies the
>deallocation and lookup logic.

Removing the sticky entry logic will be a big plus, and of course we can
eliminate a query not showing up properly normalized.

>For a DML or a SELECT, the manipulation of the hash table would still
>be a three-step process (post-analyze, planner and execution end), but
>the first step would have no need to use an exclusive lock on the hash
>table because we could just read and copy over the Query the
>normalized query if an entry exists, meaning that we could actually
>relax things a bit?  

No lock is held while normalizing, and a shared lock is held while storing,
so there is no apparent benefit from that aspect.

>This relaxation has as cost the extra memory used
>to store more data to allow the insertion to use a proper state of the
>Query[Desc] coming from the JumbleState (this extra data has no need
>to be JumbleState, just the results we generate from it aka the
>normalized query).

Wouldn't be less in terms of memory usage to just store the required
JumbleState fields in Query[Desc]?

clocations_count,
highest_extern_param_id,
clocations_locations,
clocations_length

>> In v14, we added a dealloc metric to pg_stat_statements_info, which is 
> helpful.
>> However, this only deals with the pgss_hash entry deallocation.
>> I think we should also add a metric for the text file garbage collection.

>This sounds like a good idea on its own.

I can create a separate patch for this.

Regards,

--
Sami Imseih
Amazon Web services



Re: Proposal: :SQL_EXEC_TIME (like :ROW_COUNT) Variable (psql)

2023-02-27 Thread Kirk Wolak
On Sun, Feb 26, 2023 at 11:45 PM Pavel Stehule 
wrote:

> po 27. 2. 2023 v 5:08 odesílatel Kirk Wolak  napsal:
>
>> On Fri, Feb 24, 2023 at 10:56 PM Kirk Wolak  wrote:
>>
>>> On Fri, Feb 24, 2023 at 2:11 AM Gurjeet Singh  wrote:
>>>
 On Thu, Feb 23, 2023 at 8:42 PM Kirk Wolak  wrote:

>>> ...
>>
>>> ...
>>>
>>> Okay, I've written and tested this using SQL_EXEC_ELAPSED (suggested
>> name improvement).
>> First, the instant you have ANY output, it swamps the impact. (I settled
>> on: SELECT 1 as v \gset xxx) for no output
>> Second, the variability of running even a constant script is mind-blowing.
>> Third, I've limited the output...  I built this in layers (init.sql
>> initializes the psql variables I use), run_100.sql runs
>> another file (\i tst_2000.sql) 100 times.  Resulting in 200k selects.
>>
>
> This is the very worst case.
>
> But nobody will run from psql 200K selects - can you try little bit more
> real but still synthetic test case?
>
> create table foo(a int);
> begin
>   insert into foo values(1);
>  ...
>   insert into foo values(20);
> commit;
>

*Without timing:*
postgres=# \i ins.sql
Elapsed Time: 29.518647 (seconds)
postgres=# \i ins.sql
Elapsed Time: 24.973943 (seconds)
postgres=# \i ins.sql
Elapsed Time: 21.916432 (seconds)
postgres=# \i ins.sql
Elapsed Time: 25.440978 (seconds)
postgres=# \i ins.sql
Elapsed Time: 24.848986 (seconds)

-- Because that was slower than expected, I exited, and tried again...
Getting really different results
postgres=# \i ins.sql
Elapsed Time: 17.763167 (seconds)
postgres=# \i ins.sql
Elapsed Time: 19.210436 (seconds)
postgres=# \i ins.sql
Elapsed Time: 19.903553 (seconds)
postgres=# \i ins.sql
Elapsed Time: 21.687750 (seconds)
postgres=# \i ins.sql
Elapsed Time: 19.046642 (seconds)



*With timing:*
\i ins.sql
Elapsed Time: 20.479442 (seconds)
postgres=# \i ins.sql
Elapsed Time: 21.493303 (seconds)
postgres=# \i ins.sql
Elapsed Time: 22.732409 (seconds)
postgres=# \i ins.sql
Elapsed Time: 20.246637 (seconds)
postgres=# \i ins.sql
Elapsed Time: 20.493607 (seconds)

Again, it's really hard to measure the difference as the impact, again, is
a bit below the variance.
In this case, I could see about a 1s - 2s (max)  difference in total time.
for 200k statements.
Run 5 times (for 1 million).

It's a little worse than noise.  But if I used the first run, the timing
version would have seemed faster.

I think this is sufficiently fast, and the patch simplifies the code.  We
end up only checking "if (timing)"
in the few places that we print the timing...

Anything else to provide?


>
> Regards
>
> Pavel
>
>
>>
>> Executive Summary:  1,000,000 statements executed, consumes ~2 - 2.5
>> seconds of extra time (Total)
>>
>> So, the per statement cost is: 2.5s / 1,000,000 = 0.000,0025 s per
>> statement
>> Roughly: 2.5us
>>
>> Unfortunately, my test lines look like this:
>> Without Timing
>> done 0.198215 (500) *total *98.862548 *min* 0.167614 *avg*
>> 0.197725096000 *max *0.290659
>>
>> With Timing
>> done 0.191583 (500) *total* 100.729868 *min *0.163280 *avg 
>> *0.201459736000
>> *max *0.275787
>>
>> Notice that the With Timing had a lower min, and a lower max.  But a
>> higher average.
>> The distance between min - avg  AND min - max, is big (those are for
>> 1,000 selects each)
>>
>> Are these numbers at the "So What" Level?
>>
>> While testing, I got the distinct impression that I am measuring
>> something that changes, or that the
>> variance in the system itself really swamps this on a per statement
>> basis.  It's only impact is felt
>> on millions of PSQL queries, and that's a couple of seconds...
>>
>> Curious what others think before I take this any further.
>>
>> regards, Kirk
>>
>>>
>>> Thanks!
>>>

 [1]:
 https://github.com/gurjeet/home/blob/08f1051fb854f4fc8fbc4f1326f393ed507a55ce/.bashrc#L278
 [2]:
 https://github.com/gurjeet/home/blob/08f1051fb854f4fc8fbc4f1326f393ed507a55ce/.bashrc#L262

 Best regards,
 Gurjeet
 http://Gurje.et

>>>


Re: WIN32 pg_import_system_collations

2023-02-27 Thread Juan José Santamaría Flecha
El lun, 27 feb 2023, 23:05, Juan José Santamaría Flecha <
juanjo.santama...@gmail.com> escribió:

>
> The command that's failing is "SET lc_time TO 'de_DE';", and that area of
> code is untouched by this patch. As mentioned in [1], the problem seems to
> come from a Windows bug that the CI images and my development machines have
> patched out.
>

What I wanted to post as [1]:

https://www.postgresql.org/message-id/CAC%2BAXB1agvrgpyHEfqbDr2MOpcON3d%2BWYte_SLzn1E4TamLs9g%40mail.gmail.com


> Regards,
>
> Juan José Santamaría Flecha
>


Re: WIN32 pg_import_system_collations

2023-02-27 Thread Juan José Santamaría Flecha
On Mon, Feb 27, 2023 at 1:10 PM Andrew Dunstan  wrote:

> On 2023-02-26 Su 16:02, Andrew Dunstan wrote:
>
> Now that I have removed the barrier to testing this in the buildfarm, and
> added an appropriate locale setting to drongo, we can see that this test
> fails like this:
>
>
> diff -w -U3 
> c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/collate.windows.win1252.out
>  
> c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/results/collate.windows.win1252.out
> --- 
> c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/collate.windows.win1252.out
> 2023-01-23 04:39:06.755149600 +
> +++ 
> c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/results/collate.windows.win1252.out
>  2023-02-26 17:32:54.115515200 +
> @@ -363,16 +363,17 @@
>
>  -- to_char
>  SET lc_time TO 'de_DE';
> +ERROR:  invalid value for parameter "lc_time": "de_DE"
>  SELECT to_char(date '2010-03-01', 'DD TMMON ');
> to_char
>  -
> - 01 MRZ 2010
> + 01 MAR 2010
>  (1 row)
>
>  SELECT to_char(date '2010-03-01', 'DD TMMON ' COLLATE "de_DE");
> to_char
>  -
> - 01 MRZ 2010
> + 01 MAR 2010
>  (1 row)
>
>  -- to_date
>
>
> The last of these is especially an issue, as it doesn't even throw an
> error.
>
> See
> 
> 
>
>
> Further investigation shows that if we change the two instances of "de_DE"
> to "de-DE" the tests behave as expected, so it appears that while POSIX
> style aliases have been created for the BCP 47 style locales, using the
> POSIX aliases doesn't in fact work. I cant see anything that turns the
> POSIX locale name back into BCP 47 at the point of use, which seems to be
> what's needed.
>

The command that's failing is "SET lc_time TO 'de_DE';", and that area of
code is untouched by this patch. As mentioned in [1], the problem seems to
come from a Windows bug that the CI images and my development machines have
patched out.

I think we should change the locale name to make the test more robust, as
the attached. But I don't see a problem with making an alias for the
collations.

Regards,

Juan José Santamaría Flecha


0001-change-locale-name-for-test-collate.windows.win1252.patch
Description: Binary data


Re: refactoring relation extension and BufferAlloc(), faster COPY

2023-02-27 Thread Andres Freund
Hi,

On 2023-02-27 18:06:22 +0200, Heikki Linnakangas wrote:
> We also do this in freespace.c and visibilitymap.c:
> 
> /* Extend as needed. */
> while (fsm_nblocks_now < fsm_nblocks)
> {
> PageSetChecksumInplace((Page) pg.data, fsm_nblocks_now);
> 
> smgrextend(reln, FSM_FORKNUM, fsm_nblocks_now,
>pg.data, false);
> fsm_nblocks_now++;
>  }
> 
> We could use the new smgrzeroextend function here. But it would be better to
> go through the buffer cache, because after this, the last block, at
> 'fsm_nblocks', will be read with ReadBuffer() and modified.

I doubt it's a particularly crucial thing to optimize.

But, uh, isn't this code racy? Because this doesn't go through shared buffers,
there's no IO_IN_PROGRESS interlocking against a concurrent reader. We know
that writing pages isn't atomic vs readers. So another connection could
connection could see the new relation size, but a read might return a
partially written state of the page. Which then would cause checksum
failures. And even worse, I think it could lead to loosing a write, if the
concurrent connection writes out a page.


> We could use BulkExtendSharedRelationBuffered() to extend the relation and
> keep the last page locked, but the BulkExtendSharedRelationBuffered()
> signature doesn't allow that. It can return the first N pages locked, but
> there's no way to return the *last* page locked.

We can't rely on bulk extending a, potentially, large number of pages in one
go anyway (since we might not be allowed to pin that many pages). So I don't
think requiring locking the last page is a really viable API.

I think for this case I'd just just use the ExtendRelationTo() API we were
discussing nearby. Compared to the cost of reducing syscalls / filesystem
overhead to extend the relation, the cost of the buffer mapping lookup does't
seem significant. That's different in e.g. the hio.c case, because there we
need a buffer with free space, and concurrent activity could otherwise fill up
the buffer before we can lock it again.


I had started hacking on ExtendRelationTo() that when I saw problems with the
existing code that made me hesitate:
https://postgr.es/m/20230223010147.32oir7sb66slqnjk%40awork3.anarazel.de


> Perhaps we should decompose this function into several function calls.
> Something like:
> 
> /* get N victim buffers, pinned and !BM_VALID */
> buffers = BeginExtendRelation(int npages);
> 
> LockRelationForExtension(rel)
> 
> /* Insert buffers into buffer table */
> first_blk = smgrnblocks()
> for (blk = first_blk; blk < last_blk; blk++)
> MapNewBuffer(blk, buffers[i])
> 
> /* extend the file on disk */
> smgrzeroextend();
> 
> UnlockRelationForExtension(rel)
> 
> for (blk = first_blk; blk < last_blk; blk++)
> {
> memset(BufferGetPage(buffers[i]), 0,
> FinishNewBuffer(buffers[i])
> /* optionally lock the buffer */
> LockBuffer(buffers[i]);
> }
> 
> That's a lot more verbose, of course, but gives the callers the flexibility.
> And might even be more readable than one function call with lots of
> arguments.

To me this seems like a quite bad idea. The amount of complexity this would
expose all over the tree is substantial. Which would also make it harder to
further improve relation extension at a later date. It certainly shouldn't be
the default interface. And I'm not sure I see a promisung usecase.


> This would expose the concept of a buffer that's mapped but marked as
> IO-in-progress outside bufmgr.c. On one hand, maybe that's exposing details
> that shouldn't be exposed. On the other hand, it might come handy. Instead
> of RBM_ZERO_AND_LOCK mode, for example, it might be handy to have a function
> that returns an IO-in-progress buffer that you can initialize any way you
> want.

I'd much rather encapsulate that in additional functions, or perhaps a
callback that can make decisions about what to do.

Greetings,

Andres Freund




Re: PGDOCS - sgml linkend using single-quotes

2023-02-27 Thread Peter Smith
On Mon, Feb 27, 2023 at 7:04 PM Heikki Linnakangas  wrote:
>
...
>
> There were also a few "id" attributes using single-quotes. Fixed those
> too, and pushed. Thanks!
>

Thankyou for pushing.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-02-27 Thread Andres Freund
Hi,

On 2023-02-22 16:32:53 -0500, Melanie Plageman wrote:
> I was wondering about the status of the autovacuum wraparound failsafe
> test suggested in [1]. I don't see it registered for the March's
> commitfest. I'll probably review it since it will be useful for this
> patchset.

It's pretty hard to make it work reliably. I was suggesting somewhere that we
ought to add a EMERGENCY parameter to manual VACUUMs to allow testing that
path a tad more easily.


> The first patch in the set is to free the BufferAccessStrategy object
> that is made in do_autovacuum() -- I don't see when the memory context
> it is allocated in is destroyed, so it seems like it might be a leak?

The backend shuts down just after, so that's not a real issue. Not that it'd
hurt to fix it.


> > I can see that making sense, particularly if we were to later extend this
> > to
> > other users of ringbuffers. E.g. COPYs us of the ringbuffer makes loading
> > of
> > data > 16MB but also << s_b vastly slower, but it can still be very
> > important
> > to use if there's lots of parallel processes loading data.
> >
> > Maybe BUFFER_USAGE_LIMIT, with a value from -1 to N, with -1 indicating the
> > default value, 0 preventing use of a buffer access strategy, and 1..N
> > indicating the size in blocks?

> I have found the implementation you suggested very hard to use.
> The attached fourth patch in the set implements it the way you suggest.
> I had to figure out what number to set the BUFER_USAGE_LIMIT to -- and,
> since I don't specify shared buffers in units of nbuffer, it's pretty
> annoying to have to figure out a valid number.

I think we should be able to parse it in a similar way to how we parse
shared_buffers. You could even implement this as a GUC that is then set by
VACUUM (similar to how VACUUM FREEZE is implemented).


> I think that it would be better to have it be either a percentage of
> shared buffers or a size in units of bytes/kb/mb like that of shared
> buffers.

I don't think a percentage of shared buffers works particularly well - you
very quickly run into the ringbuffer becoming impractically big.


> > Would we want to set an upper limit lower than implied by the memory limit
> > for
> > the BufferAccessStrategy allocation?
> >
> >
> So, I was wondering what you thought about NBuffers / 8 (the current
> limit). Does it make sense?

That seems *way* too big. Imagine how large allocations we'd end up with a
shared_buffers size of a few TB.

I'd probably make it a hard error at 1GB and a silent cap at NBuffers / 2 or
such.


> @@ -547,7 +547,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool 
> heapkeyspace,
>   state->targetcontext = AllocSetContextCreate(CurrentMemoryContext,
>   
>  "amcheck context",
>   
>  ALLOCSET_DEFAULT_SIZES);
> - state->checkstrategy = GetAccessStrategy(BAS_BULKREAD);
> + state->checkstrategy = GetAccessStrategy(BAS_BULKREAD, -1);
>  
>   /* Get true root block from meta-page */
>   metapage = palloc_btree_page(state, BTREE_METAPAGE);

Changing this everywhere seems pretty annoying, particularly because I suspect
a bunch of extensions also use GetAccessStrategy(). How about a
GetAccessStrategyExt(), GetAccessStrategyCustomSize() or such?


>  BufferAccessStrategy
> -GetAccessStrategy(BufferAccessStrategyType btype)
> +GetAccessStrategy(BufferAccessStrategyType btype, int buffers)
>  {
>   BufferAccessStrategy strategy;
>   int ring_size;
> + const char *strategy_name = btype_get_name(btype);

Shouldn't be executed when we don't need it.


> + if (btype != BAS_VACUUM)
> + {
> + if (buffers == 0)
> + elog(ERROR, "Use of shared buffers unsupported for 
> buffer access strategy: %s. nbuffers must be -1.",
> + strategy_name);
> +
> + if (buffers > 0)
> + elog(ERROR, "Specification of ring size in buffers 
> unsupported for buffer access strategy: %s. nbuffers must be -1.",
> + strategy_name);
> + }
> +
> + // TODO: DEBUG logging message for dev?
> + if (buffers == 0)
> + btype = BAS_NORMAL;

GetAccessStrategy() often can be executed hundreds of thousands of times a
second, so I'm very sceptical that adding log messages to it useful.

Greetings,

Andres Freund




Re: Non-superuser subscription owners

2023-02-27 Thread Robert Haas
On Mon, Feb 27, 2023 at 1:25 PM Jeff Davis  wrote:
> I think you are saying that we should still run Alice's code with the
> privileges of Bob, but somehow make that safe(r) for Bob. Is that
> right?

Yeah. That's the idea I was floating, at least.

> That sounds hard, and I'm still stuck at the "why" question. Why do we
> want to run Alice's code with Bob's permissions?
>
> The answers I have so far are abstract. For instance, maybe it's a
> clever SRF that takes table names as inputs and you want people to only
> be able to use the clever SRF with tables they have privileges on. But
> that's not what most functions do, and it's certainly not what most
> default expressions do.

I guess I have a pretty hard time imagining that we can just
obliterate SECURITY INVOKER entirely. It seems fundamentally
reasonable to me that Alice might want to make some code available to
be executed in the form of a function or procedure but without
offering to execute it with her own privileges. But I think maybe
you're asking a different question, which is whether when the code is
attached to a table we ought to categorically switch to the table
owner before executing it. I'm less sure about the answer to that
question. We already take the position that VACUUM always runs as the
table owner, and while VACUUM runs index expressions but not for
example triggers, why not just be consistent and run all code that is
tied to the table as the table owner, all the time?

Maybe that's the right thing to do, but I think it would inevitably
break some things for some users. Alice might choose to write her
triggers or default expressions in ways that rely on them running with
Bob's permissions in any number of ways. For instance, maybe those
functions issue a SELECT query against an RLS-enabled table, such that
the answer depends on whose privileges are used to run the query. More
simply, she might refer to CURRENT_ROLE, say to record who inserted
any particular row into her table, which seems like a totally
reasonable thing to want to do. If she was feeling really clever, she
might even have designed queries that she's using inside those
triggers or default expressions to fail if Bob doesn't have enough
permissions to do some particular modification that he's attempting,
and thus block certain kinds of access to her own tables. That would
be pretty weird and perhaps too clever by half, but the point is that
the current behavior is probably known to many, many users and we
really can't know what they've done that depends on that. If we change
any behavior here, some people are going to notice those changes, and
they may not like them.

To put that another way, we're not talking about a black-and-white
security vulnerability here, like a buffer overrun that allows for
arbitrary code execution. We're talking about a set of semantics that
seem to be somewhat fragile and vulnerable to spawning security
problems. Nobody wants those security problems, for sure. But it
doesn't follow that nobody is relying on the semantics.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-27 Thread Tom Lane
Andres Freund  writes:
> On 2023-02-27 12:42:00 -0500, Tom Lane wrote:
>> I went ahead and coded it that way, and it doesn't look too awful.
>> Any objections?

> Looks good to me.

> I think it'd be an indication of a bug around the invalidation handling if the
> terminations were required. So even leaving other things aside, I prefer this
> version.

Sounds good.  I'll work on getting this back-patched.

regards, tom lane




Re: Auth extensions, with an LDAP/SCRAM example [was: Proposal: Support custom authentication methods using hooks]

2023-02-27 Thread Stephen Frost
Greetings,

* Jacob Champion (jchamp...@timescale.com) wrote:
> This patchset should ideally have required zero client side changes, but
> because our SCRAM implementation is slightly nonstandard too -- it
> doesn't embed the username into the SCRAM data -- libpq can't talk to
> the OpenLDAP/Cyrus SASL implementation. I included a quick-and-bad patch
> to fix it in libpq; that needs its own conversation.

Indeed it does... as there were specific reasons that what we
implemented for PG's SCRAM doesn't embed the username into the SCRAM
data and my recollection is that we don't because SCRAM (or maybe SASL?
either way though...) requires it to be utf-8 encoded and we support a
lot of other encoding that aren't that, so it wouldn't work for a lot
of our users.

Not really seeing that as being something we get to be picky about or
decide to change our mind on.  It's unfortunate that the standard seems
to be short-sighted in this way but that's the reality of it.

> I think this is exactly the sort of thing that's too niche to be in core
> but might be extremely useful for the few people it applies to, so I'm
> proposing it as an argument in favor of general authn/z extensibility.

If it was able to actually work for our users (and maybe it is if we
make it optional somehow) and we have users who want it (which certainly
seems plausible) then I'd say that it's something we would benefit from
having in core.  While it wouldn't solve all the issues with 'ldap'
auth, if it works with AD's LDAP servers and doesn't require the
password to be passed from the client to the server (in any of this, be
the client psql, pgadmin, or the PG server when it goes to talk to the
LDAP server..) then that would be a fantastic thing and we could
replace the existing 'ldap' auth with that and be *much* better off for
it, and so would our users.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: [PoC] Federated Authn/z with OAUTHBEARER

2023-02-27 Thread Stephen Frost
Greetings,

* Andrey Chudnovsky (achudnovs...@gmail.com) wrote:
> > This really doesn't feel like a great area to try and do hooks or
> > similar in, not the least because that approach has been tried and tried
> > again (PAM, GSSAPI, SASL would all be examples..) and frankly none of
> > them has turned out great (which is why we can't just tell people "well,
> > install the pam_oauth2 and watch everything work!") and this strikes me
> > as trying to do that yet again but worse as it's not even a dedicated
> > project trying to solve the problem but more like a side project.
> 
> In this case it's not intended to be an open-ended hook, but rather an
> implementation of a specific rfc (rfc-7628) which defines a
> client-server communication for the authentication flow.
> The rfc itself does leave a lot of flexibility on specific parts of
> the implementation. Which do require hooks:

Color me skeptical on an RFC that requires hooks.

> (1.) Server side hook to validate the token, which is specific to the
> OAUTH provider.
> (2.) Client side hook to request the client to obtain the token.

Perhaps I'm missing it... but weren't these handled with what the
original patch that Jacob had was doing?

> On (1.), we would need a hook for the OAUTH provider extension to do
> validation. We can though do some basic check that the credential is
> indeed a JWT token signed by the requested issuer.
> 
> Specifically (2.) is where we can provide a layer in libpq to simplify
> the integration. i.e. implement some OAUTH flows.
> Though we would need some flexibility for the clients to bring their own 
> token:
> For example there are cases where the credential to obtain the token
> is stored in a separate secure location and the token is returned from
> a separate service or pushed from a more secure environment.

In those cases... we could, if we wanted, simply implement the code to
actually pull the token, no?  We don't *have* to have a hook here for
this, we could just make it work.

> > another new "generic" set of hooks/APIs that will just cause DBAs and
> > our users headaches trying to make work.
> As I mentioned above, it's an rfc implementation, rather than our invention.

While I only took a quick look, I didn't see anything in that RFC that
explicitly says that hooks or a plugin or a library or such is required
to meet the RFC.  Sure, there are places which say that the
implementation is specific to a particular server or client but that's
not the same thing.

> When it comes to DBAs and the users.
> Builtin libpq implementations which allows psql and pgadmin to
> seamlessly connect should suffice those needs.
> While extensibility would allow the ecosystem to be open for OAUTH
> providers, SAAS developers, PAAS providers and other institutional
> players.

Each to end up writing their own code to do largely the same thing
without the benefit of the larger community to be able to review and
ensure that it's done properly?

That doesn't sound like a great approach to me.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Move defaults toward ICU in 16?

2023-02-27 Thread Tom Lane
Jeff Davis  writes:
> On Fri, 2023-02-24 at 15:54 -0800, Jeff Davis wrote:
>> 0001: default autoconf to build with ICU (meson already uses
>> 'auto')

> What's the best way to prepare for the impact of this on the buildfarm?
> How should we migrate to using --without-icu for those animals not
> currently specifying --with-icu?

Tell the buildfarm owners to add --without-icu to their config if
they don't have and don't want to install ICU.  Wait a couple weeks.
Commit, then nag the owners whose machines turn red.

regards, tom lane




Re: psql: Add role's membership options to the \du+ command

2023-02-27 Thread Pavel Luzanov

On 22.02.2023 00:34, David G. Johnston wrote:
This is the format I've gone for (more-or-less) in my RoleGraph view 
(I'll be sharing it publicly in the near future).


bob from grantor (a, s, i) \n
adam from postgres (a, s, i) \n
emily from postgres (empty)


I think this is a good compromise.

Based upon prior comments going for something like the following is 
undesirable: bob=asi/grantor


Agree. Membership options are not the ACL (although they have 
similarities). Therefore, showing them as a ACL-like column will be 
confusing.


So, please find attached the second version of the patch. It implements 
suggested display format and small refactoring of existing code for \du 
command.

As a non-native writer, I have doubts about the documentation part.

--
Pavel Luzanov
Postgres Professional:https://postgrespro.com
From 70489a605687a325287bad109e2741dd7d08cea3 Mon Sep 17 00:00:00 2001
From: Pavel Luzanov 
Date: Mon, 27 Feb 2023 22:35:29 +0300
Subject: [PATCH v2] psql: \du shows membership options

---
 doc/src/sgml/ref/psql-ref.sgml | 12 
 src/bin/psql/describe.c| 45 +++---
 src/test/regress/expected/psql.out | 45 ++
 src/test/regress/sql/psql.sql  | 25 +
 4 files changed, 111 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index dc6528dc11..c94a2287f0 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1724,6 +1724,12 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g
 S modifier to include system roles.
 If pattern is specified,
 only those roles whose names match the pattern are listed.
+For each membership in the role, the membership options and
+the role that granted the membership are displayed.
+Оne-letter abbreviations are used for membership options:
+a — admin option, i — inherit option,
+s — set option and empty if no one is set.
+See GRANT command for their meaning.
 If the form \dg+ is used, additional information
 is shown about each role; currently this adds the comment for each
 role.
@@ -1966,6 +1972,12 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g
 S modifier to include system roles.
 If pattern is specified,
 only those roles whose names match the pattern are listed.
+For each membership in the role, the membership options and
+the role that granted the membership are displayed.
+Оne-letter abbreviations are used for membership options:
+a — admin option, i — inherit option,
+s — set option and empty if no one is set.
+See GRANT command for their meaning.
 If the form \du+ is used, additional information
 is shown about each role; currently this adds the comment for each
 role.
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index c8a0bb7b3a..27a8680ddf 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -3623,22 +3623,36 @@ describeRoles(const char *pattern, bool verbose, bool showSystem)
 	  "SELECT r.rolname, r.rolsuper, r.rolinherit,\n"
 	  "  r.rolcreaterole, r.rolcreatedb, r.rolcanlogin,\n"
 	  "  r.rolconnlimit, r.rolvaliduntil,\n"
-	  "  ARRAY(SELECT b.rolname\n"
-	  "FROM pg_catalog.pg_auth_members m\n"
-	  "JOIN pg_catalog.pg_roles b ON (m.roleid = b.oid)\n"
-	  "WHERE m.member = r.oid) as memberof");
+	  "  r.rolreplication, r.rolbypassrls,\n");
+
+	if (pset.sversion >= 16)
+		appendPQExpBufferStr(&buf,
+			 "  (SELECT pg_catalog.string_agg(\n"
+			 "pg_catalog.format('%I from %I (%s)',\n"
+			 "  b.rolname, m.grantor::pg_catalog.regrole::pg_catalog.text,\n"
+			 "  pg_catalog.regexp_replace(\n"
+			 "pg_catalog.concat_ws(', ',\n"
+			 "  CASE WHEN m.admin_option THEN 'a' END,\n"
+			 "  CASE WHEN m.inherit_option THEN 'i' END,\n"
+			 "  CASE WHEN m.set_option THEN 's' END),\n"
+			 "  '^$', 'empty')\n"
+			 "), E'\\n'\n"
+			 "ORDER BY b.rolname, m.grantor::pg_catalog.regrole::pg_catalog.text)\n"
+			 "  FROM pg_catalog.pg_auth_members m\n"
+			 "   JOIN pg_catalog.pg_roles b ON (m.roleid = b.oid)\n"
+			 "  WHERE m.member = r.oid) as memberof");
+	else
+		appendPQExpBufferStr(&buf,
+			 "  ARRAY(SELECT b.rolname\n"
+			 "FROM pg_catalog.pg_auth_members m\n"
+			 "JOIN pg_catalog.pg_roles b ON (m.roleid = b.oid)\n"
+			 "WHERE m.member = r.oid) as memberof");
 
 	if (verbose)
 	{
 		appendPQExpBufferStr(&buf, "\n, pg_catalog.shobj_description(r.oid, 'pg_authid') AS description");
 		ncols++;
 	}
-	appendPQExpBufferStr(&buf, "\n, r.rolreplication");
-
-	if (pset.sversion >= 90500)
-	{
-		appendPQExpBuffe

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-02-27 Thread Tom Lane
Melanie Plageman  writes:
> On Mon, Feb 27, 2023 at 10:30 AM Tom Lane  wrote:
>> The risk of needing to cast when using the "int" loop variable
>> as an enum is obviously the downside of that approach, but we have
>> not seen any indication that any compilers actually do warn.
>> It's interesting that you did see such a warning ... I wonder which
>> compiler you were using at the time?

> so, pretty much any version of clang I tried with
> -Wsign-conversion produces a warning.

> :35:32: warning: implicit conversion changes signedness: 'int'
> to 'IOOp' (aka 'enum IOOp') [-Wsign-conversion]

Oh, interesting --- so it's not about the implicit conversion to enum
but just about signedness.  I bet we could silence that by making the
loop variables be "unsigned int".  I doubt it's worth any extra keystrokes
though, because we are not at all clean about sign-conversion warnings.
I tried enabling -Wsign-conversion on Apple's clang 14.0.0 just now,
and counted 13462 such warnings just in the core build :-(.  I don't
foresee anybody trying to clean that up.

> I didn't do the casts in the attached patch since they aren't done elsewhere.

Agreed.  I'll push this along with the earlier patch if there are
not objections.

regards, tom lane




Re: Allow logical replication to copy tables in binary format

2023-02-27 Thread Melih Mutlu
Hi,

Thanks for all of your reviews!

So, I made some changes in the v10 according to your comments.

1- copy_format is changed to a boolean parameter copy_binary.
It sounds easier to use a boolean to enable/disable binary copy. Default
value is false, so nothing changes in the current behaviour if copy_binary
is not specified.
It's still persisted into the catalog. This is needed since its value will
be needed by tablesync workers later. And tablesync workers fetch
subscription configurations from the catalog.
In the copy_data case, it is not directly stored anywhere but it affects
the state of the table which is stored in the catalog. So, I guess this is
the convenient way if we decide to go with a new parameter.

2- copy_binary is not tied to another parameter
The patch does not disallow any combination of copy_binary with binary and
copy_data options. I tried to explain what binary copy can and cannot do in
the docs. Rest is up to the user now.

3- Removed version check for copy_binary
HEAD already does not have any check for binary option. Making binary copy
work only if publisher and subscriber are the same major version can be too
restricted.

4- Added separate test file
Although I believe 002_types.pl and 014_binary.pl can be improved more for
binary enabled subscription cases, this patch might not be the best place
to make those changes.
032_binary_copy.pl now has the tests for binary copy option. There are also
some regression tests in subscription.sql.

Finally, some other small fixes are done according to the reviews.

 Also, thanks Bharath for performance testing [1]. It can be seen that the
patch has some benefits.

I appreciate further thoughts/reviews.


[1]
https://www.postgresql.org/message-id/CALj2ACUfE08ZNjKK-nK9JiwGhwUMRLM%2BqRhNKTVM9HipFk7Fow%40mail.gmail.com


Thanks,
-- 
Melih Mutlu
Microsoft


v10-0001-Allow-logical-replication-to-copy-table-in-binar.patch
Description: Binary data


Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-27 Thread Andres Freund
Hi,

On 2023-02-27 12:42:00 -0500, Tom Lane wrote:
> I wrote:
> > Hah - I thought of a solution.  We can avoid this race condition if
> > we make the remote session itself inspect pg_stat_activity and
> > return its displayed application_name.  Just need a foreign table
> > that maps onto pg_stat_activity.

Sounds reasonable. I guess you could also do it with a function that is
allowed to be pushed down. But given that you already solved it this way...

I think it's worth having an example for checks like this in the postgres_fdw
tests, even if it's perhaps not worth it for the application_name GUC on its
own. We saw that the GUC test copied the debug_discard_caches use of another
test...


> I went ahead and coded it that way, and it doesn't look too awful.
> Any objections?

Looks good to me.

I think it'd be an indication of a bug around the invalidation handling if the
terminations were required. So even leaving other things aside, I prefer this
version.

Greetings,

Andres Freund




Re: Move defaults toward ICU in 16?

2023-02-27 Thread Jeff Davis
On Fri, 2023-02-24 at 15:54 -0800, Jeff Davis wrote:
>   0001: default autoconf to build with ICU (meson already uses
> 'auto')

What's the best way to prepare for the impact of this on the buildfarm?
How should we migrate to using --without-icu for those animals not
currently specifying --with-icu?

Regards,
Jeff Davis





SQL JSON path enhanced numeric literals

2023-02-27 Thread Peter Eisentraut
Attached is a patch to add nondecimal integer literals and underscores 
in numeric literals to the SQL JSON path language.  This matches the 
recent additions to the core SQL syntax.  It follows ECMAScript in 
combination with the current SQL draft.


Internally, all the numeric literal parsing of jsonpath goes through 
numeric_in, which already supports all this, so this patch is just a bit 
of lexer work and some tests.From abeefa990231dea398ddd923d9e992e0ad945159 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 27 Feb 2023 19:27:32 +0100
Subject: [PATCH v1] SQL JSON path enhanced numeric literals

Add support for non-decimal integer literals and underscores in
numeric literals to SQL JSON path language.  This follows the rules of
ECMAScript, as referred to by the SQL standard.
---
 src/backend/catalog/sql_features.txt   |   1 +
 src/backend/utils/adt/jsonpath_scan.l  |  59 ++---
 src/test/regress/expected/jsonpath.out | 162 +
 src/test/regress/sql/jsonpath.sql  |  50 
 4 files changed, 258 insertions(+), 14 deletions(-)

diff --git a/src/backend/catalog/sql_features.txt 
b/src/backend/catalog/sql_features.txt
index 75a09f14e0..a8300ad694 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -553,6 +553,7 @@ T836SQL/JSON path language: starts with predicate   
YES
 T837   SQL/JSON path language: regex_like predicateYES 
 T838   JSON_TABLE: PLAN DEFAULT clause NO  
 T839   Formatted cast of datetimes to/from character strings   
NO  
+T840   Hex integer literals in SQL/JSON path language  NO  
SQL:202x draft
 M001   Datalinks   NO  
 M002   Datalinks via SQL/CLI   NO  
 M003   Datalinks via Embedded SQL  NO  
diff --git a/src/backend/utils/adt/jsonpath_scan.l 
b/src/backend/utils/adt/jsonpath_scan.l
index e08b1c7cd7..cd550ad5d8 100644
--- a/src/backend/utils/adt/jsonpath_scan.l
+++ b/src/backend/utils/adt/jsonpath_scan.l
@@ -90,21 +90,31 @@ blank   [ \t\n\r\f]
 /* "other" means anything that's not special, blank, or '\' or '"' */
 other  [^\?\%\$\.\[\]\{\}\(\)\|\&\!\=\<\>\@\#\,\*:\-\+\/\\\" \t\n\r\f]
 
-digit  [0-9]
-integer(0|[1-9]{digit}*)
-decimal({integer}\.{digit}*|\.{digit}+)
-real   ({integer}|{decimal})[Ee][-+]?{digit}+
-realfail   ({integer}|{decimal})[Ee][-+]
-
-integer_junk   {integer}{other}
+decdigit   [0-9]
+hexdigit   [0-9A-Fa-f]
+octdigit   [0-7]
+bindigit   [0-1]
+
+/* DecimalInteger in ECMAScript; must not start with 0 unless it's exactly 0 */
+decinteger (0|[1-9](_?{decdigit})*)
+/* DecimalDigits in ECMAScript; only used as part of other rules */
+decdigits  {decdigit}(_?{decdigit})*
+hexinteger 0[xX]{hexdigit}(_?{hexdigit})*
+octinteger 0[oO]{octdigit}(_?{octdigit})*
+bininteger 0[bB]{bindigit}(_?{bindigit})*
+
+decimal({decinteger}\.{decdigits}?|\.{decdigits})
+real   ({decinteger}|{decimal})[Ee][-+]?{decdigits}
+realfail   ({decinteger}|{decimal})[Ee][-+]
+
+decinteger_junk{decinteger}{other}
 decimal_junk   {decimal}{other}
 real_junk  {real}{other}
 
-hex_dig[0-9A-Fa-f]
-unicode\\u({hex_dig}{4}|\{{hex_dig}{1,6}\})
-unicodefail\\u({hex_dig}{0,3}|\{{hex_dig}{0,6})
-hex_char   \\x{hex_dig}{2}
-hex_fail   \\x{hex_dig}{0,1}
+unicode\\u({hexdigit}{4}|\{{hexdigit}{1,6}\})
+unicodefail\\u({hexdigit}{0,3}|\{{hexdigit}{0,6})
+hex_char   \\x{hexdigit}{2}
+hex_fail   \\x{hexdigit}{0,1}
 
 %%
 
@@ -274,7 +284,28 @@ hex_fail   \\x{hex_dig}{0,1}
return 
NUMERIC_P;
}
 
-{integer}  {
+{decinteger}   {
+   
addstring(true, yytext, yyleng);
+   
addchar(false, '\0');
+   
yylval->str = scanstring;
+   return 
INT_P;
+   }
+
+{hexinteger}   {
+   
addstring(true, yytext, yyleng);
+   
addchar(false, '\0');
+   
yylval->str = scanstring;
+   return 
INT_P;
+  

Re: Non-superuser subscription owners

2023-02-27 Thread Stephen Frost
Greetings,

* Jeff Davis (pg...@j-davis.com) wrote:
> On Mon, 2023-02-27 at 10:45 -0500, Robert Haas wrote:
> >  Suppose Alice owns a table and attaches a trigger to it. If
> > Bob inserts into that table, I think we have to run the trigger,
> > because Alice is entitled to assume that, for example, any BEFORE
> > triggers she might have defined that block certain kinds of inserts
> > are actually going to block those inserts; any constraints that she
> > has applied to the table are going to be enforced against all new
> > rows; and any default expressions she supplies are actually going to
> > work.
> 
> True, but I still find myself suspending my disbelief. Which of these
> use cases make sense for SECURITY INVOKER?

I do think there are some use-cases for it, but agree that it'd be
better to encourage more use of SECURITY DEFINER and one approach to
that might be to have a way for users to explicitly say "don't run code
that isn't mine or a superuser's with my privileges."  Of course, we
need to make sure it's possible to write safe SECURITY DEFINER functions
and to be clear about how to do that to avoid the risk in the other
direction.  We also need to provide some additonal functions along the
lines of "calling_role()" or similar (so that the function can know who
the actual role is that's running the trigger) for the common case of
auditing or needing to know the calling role for RLS or similar.

I don't think we'd be able to get away with just getting rid of SECURITY
INVOKER entirely or even in changing the current way triggers (or
functions in views, etc) are run by default.

> > I think Bob has to be OK with those things too; otherwise, he
> > just shouldn't insert anything into the table.
> 
> Right, but why should Bob's privileges be needed to do any of those
> things? Any difference in privileges, for those use cases, could only
> either get in the way of achieving Alice's goals, or cause a security
> problem for Bob.
> 
> > But Bob doesn't have to be OK with Alice's code changing the session
> > state, or executing DML or DDL with his permissions.
> 
> What's left? Should Bob be OK with Alice's code using his permissions
> for anything?

I don't know about trying to define that X things are ok and Y things
are not, that seems like it would be more confusing and difficult to
work with.  Regular SELECT queries that pull data that Bob has access to
but Alice doesn't is a security issue too, were Alice to install a
function that Bob calls which writes that data into a place that Alice
could then access it.  Perhaps if we could allow Bob to say "these
things are ok for Alice's code to access" then it could work ... but if
that's what is going on then the code could run with Alice's permissions
and Bob could use our nice and granular GRANT/RLS system to say what
Alice is allowed to access.

> >  I wonder if
> > that's where we should be trying to insert restrictions here. Right
> > now, we think of SECURITY_RESTRICTED_OPERATION as a way to prevent a
> > function or procedure that runs under a different user ID than the
> > session user from poisoning the session state. But I'm thinking that
> > maybe the problem isn't really with code running under a different
> > user ID. It's with running code *provided by* a different user ID.
> > Maybe we should stop thinking about the security context as something
> > that you set when you switch to running as a different user ID, and
> > start thinking about it as something that needs to be set based on
> > the
> > relationship between the user that provided the code and the session
> > user. If they're not the same, some restrictions are probably
> > appropriate, except I think in the case where the user who provided
> > the code can become the session user anyway.
> 
> I think you are saying that we should still run Alice's code with the
> privileges of Bob, but somehow make that safe(r) for Bob. Is that
> right?
> 
> That sounds hard, and I'm still stuck at the "why" question. Why do we
> want to run Alice's code with Bob's permissions?
> 
> The answers I have so far are abstract. For instance, maybe it's a
> clever SRF that takes table names as inputs and you want people to only
> be able to use the clever SRF with tables they have privileges on. But
> that's not what most functions do, and it's certainly not what most
> default expressions do.

current_role / current_user are certainly common as a default
expression.  I agree that that's more of an edge case that would be nice
to solve in a different way though.  I do think there's some other use
cases for SECURITY INVOKER but not enough folks understand the security
risk associated with it and it'd be good for us to improve on that
situation.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: meson / pg_regress --no-locale

2023-02-27 Thread Andrew Dunstan


On 2023-02-27 Mo 09:34, Andrew Dunstan wrote:


I can't see an obvious way to run the regression tests via meson with 
the --no-locale setting. This is particularly important on Windows. 
The buildfarm client first runs the regression tests with this setting 
and then tests (via installcheck) against instances set up with its 
configured locales. We do it this way so we are not subject to the 
vagaries of whatever environment we are running in.




Found a way to do this using --test-args


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-02-27 Thread Melanie Plageman
On Mon, Feb 27, 2023 at 10:30 AM Tom Lane  wrote:
>
> Melanie Plageman  writes:
> > Attached is a patch to remove the *_FIRST macros.
> > I was going to add in code to change
>
> > for (IOObject io_object = 0; io_object < IOOBJECT_NUM_TYPES; 
> > io_object++)
> > to
> > for (IOObject io_object = 0; (int) io_object < IOOBJECT_NUM_TYPES; 
> > io_object++)
>
> I don't really like that proposal.  ISTM it's just silencing the
> messenger rather than addressing the underlying problem, namely that
> there's no guarantee that an IOObject variable can hold the value
> IOOBJECT_NUM_TYPES, which it had better do if you want the loop to
> terminate.  Admittedly it's quite unlikely that these three enums would
> grow to the point that that becomes an actual hazard for them --- but
> IMO it's still bad practice and a bad precedent for future code.

That's fair. Patch attached.

> > but then I couldn't remember why we didn't just do
>
> > for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
>
> > I recall that when passing that loop variable into a function I was
> > getting a compiler warning that required me to cast the value back to an
> > enum to silence it:
>
> > pgstat_tracks_io_op(bktype, (IOObject) io_object,
> > io_context, io_op))
>
> > However, I am now unable to reproduce that warning.
> > Moreover, I see in cases like table_block_relation_size() with
> > ForkNumber, the variable i is passed with no cast to smgrnblocks().
>
> Yeah, my druthers would be to just do it the way we do comparable
> things with ForkNumber.  I don't feel like we need to invent a
> better way here.
>
> The risk of needing to cast when using the "int" loop variable
> as an enum is obviously the downside of that approach, but we have
> not seen any indication that any compilers actually do warn.
> It's interesting that you did see such a warning ... I wonder which
> compiler you were using at the time?

so, pretty much any version of clang I tried with
-Wsign-conversion produces a warning.

:35:32: warning: implicit conversion changes signedness: 'int'
to 'IOOp' (aka 'enum IOOp') [-Wsign-conversion]

I didn't do the casts in the attached patch since they aren't done elsewhere.

- Melanie
From 34fee4a3d1d1353aa38a95b3afc2d302a5f586ff Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 27 Feb 2023 08:48:11 -0500
Subject: [PATCH v1 2/2] Change IO stats enum loop variables to ints

Per [1], using an enum as the loop variable with a macro-defined
termination condition of #_enum_values + 1 is not guaranteed to be safe
- as compilers are free to make enums as small as a char.

Some (older) compilers will notice this when building with
-Wtautological-constant-out-of-range-compare.

[1] https://www.postgresql.org/message-id/354645.1677511842%40sss.pgh.pa.us
---
 src/backend/utils/activity/pgstat_io.c | 12 ++--
 src/backend/utils/adt/pgstatfuncs.c|  8 
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index c478b126fa..c4199d18c8 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -36,16 +36,16 @@ pgstat_bktype_io_stats_valid(PgStat_BktypeIO *backend_io,
 {
 	bool		bktype_tracked = pgstat_tracks_io_bktype(bktype);
 
-	for (IOObject io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
+	for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
 	{
-		for (IOContext io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
 		{
 			/*
 			 * Don't bother trying to skip to the next loop iteration if
 			 * pgstat_tracks_io_object() would return false here. We still
 			 * need to validate that each counter is zero anyway.
 			 */
-			for (IOOp io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
 			{
 /* No stats, so nothing to validate */
 if (backend_io->data[io_object][io_context][io_op] == 0)
@@ -109,11 +109,11 @@ pgstat_flush_io(bool nowait)
 	else if (!LWLockConditionalAcquire(bktype_lock, LW_EXCLUSIVE))
 		return true;
 
-	for (IOObject io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
+	for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
 	{
-		for (IOContext io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
 		{
-			for (IOOp io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
 bktype_shstats->data[io_object][io_context][io_op] +=
 	PendingIOStats.data[io_object][io_context][io_op];
 		}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 12eda4ade0..b61a12382b 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1306,7 +1

Re: Non-superuser subscription owners

2023-02-27 Thread Jeff Davis
On Mon, 2023-02-27 at 10:45 -0500, Robert Haas wrote:

>  Suppose Alice owns a table and attaches a trigger to it. If
> Bob inserts into that table, I think we have to run the trigger,
> because Alice is entitled to assume that, for example, any BEFORE
> triggers she might have defined that block certain kinds of inserts
> are actually going to block those inserts; any constraints that she
> has applied to the table are going to be enforced against all new
> rows; and any default expressions she supplies are actually going to
> work.

True, but I still find myself suspending my disbelief. Which of these
use cases make sense for SECURITY INVOKER?

> I think Bob has to be OK with those things too; otherwise, he
> just shouldn't insert anything into the table.

Right, but why should Bob's privileges be needed to do any of those
things? Any difference in privileges, for those use cases, could only
either get in the way of achieving Alice's goals, or cause a security
problem for Bob.

> But Bob doesn't have to be OK with Alice's code changing the session
> state, or executing DML or DDL with his permissions.

What's left? Should Bob be OK with Alice's code using his permissions
for anything?

>  I wonder if
> that's where we should be trying to insert restrictions here. Right
> now, we think of SECURITY_RESTRICTED_OPERATION as a way to prevent a
> function or procedure that runs under a different user ID than the
> session user from poisoning the session state. But I'm thinking that
> maybe the problem isn't really with code running under a different
> user ID. It's with running code *provided by* a different user ID.
> Maybe we should stop thinking about the security context as something
> that you set when you switch to running as a different user ID, and
> start thinking about it as something that needs to be set based on
> the
> relationship between the user that provided the code and the session
> user. If they're not the same, some restrictions are probably
> appropriate, except I think in the case where the user who provided
> the code can become the session user anyway.

I think you are saying that we should still run Alice's code with the
privileges of Bob, but somehow make that safe(r) for Bob. Is that
right?

That sounds hard, and I'm still stuck at the "why" question. Why do we
want to run Alice's code with Bob's permissions?

The answers I have so far are abstract. For instance, maybe it's a
clever SRF that takes table names as inputs and you want people to only
be able to use the clever SRF with tables they have privileges on. But
that's not what most functions do, and it's certainly not what most
default expressions do.

Regards,
Jeff Davis






Re: SLRUs in the main buffer pool - Page Header definitions

2023-02-27 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Feb 27, 2023 at 8:56 AM Heikki Linnakangas  wrote:
> > I'm not sure if I like that or not. I think we should clean up and
> > finish the other patches that this builds on first, and then decide if
> > we want to use the standard page header for the SLRUs or not. And if we
> > decide that we want the SLRU pages to have a page header, clean this up
> > or rewrite it from scratch.
> 
> I'm not entirely sure either, but I think the idea has some potential.
> If SLRU pages have headers, that means that they have LSNs, and
> perhaps then we could use those LSNs to figure out when they're safe
> to write to disk, instead of ad-hoc mechanisms. See SlruSharedData's
> group_lsn field.

I agree that it's got potential and seems like the right direction to go
in.  That would also allow for checksums for SLRUs and possibly support
for encryption which leverages the LSN and for a dynamic page feature
area which could allow for an extended checksum or perhaps authenticated
encryption with additonal authenticated data.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: pg_dump versus hash partitioning

2023-02-27 Thread Tom Lane
Robert Haas  writes:
> Sure, but I was responding to your assertion that there's no case in
> which --load-via-partition-root could cause a restore failure. I'm not
> sure that's accurate.

Perhaps it's not, but it's certainly far less likely to cause a restore
failure than the behavior I want to replace.

More to the current point perhaps, I doubt that it's likely enough to
cause a restore failure to justify the existing docs warning.  There
may have been a time when the warning was justified, but I don't believe
it today.

regards, tom lane




Re: pg_dump versus hash partitioning

2023-02-27 Thread Robert Haas
On Mon, Feb 27, 2023 at 12:50 PM Tom Lane  wrote:
> Robert Haas  writes:
> > On Mon, Feb 27, 2023 at 11:20 AM Tom Lane  wrote:
> >> Well, that's a user error not pg_dump's fault.  Particularly so for hash
> >> partitioning, where there is no defensible reason to make the partitions
> >> semantically different.
>
> > I am still of the opinion that you're going down a dangerous path of
> > redefining pg_dump's mission from "dump and restore the database, as
> > it actually exists" to "dump and restore the database, unless the user
> > did something that I think is silly".
>
> Let's not attack straw men, shall we?  I'm defining pg_dump's mission
> as "dump and restore the database successfully".  Failure to restore
> does not help anyone, especially if they are in a disaster recovery
> situation where it's not possible to re-take the dump.  It's not like
> there's no precedent for having pg_dump tweak things to ensure a
> successful restore.

Sure, but I was responding to your assertion that there's no case in
which --load-via-partition-root could cause a restore failure. I'm not
sure that's accurate. The fact that the case is something that nobody
is especially likely to do doesn't mean it doesn't exist, and ISTM we
have had more than a few pg_dump bug reports that come down to someone
having done something which we didn't foresee.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_dump versus hash partitioning

2023-02-27 Thread Tom Lane
Robert Haas  writes:
> On Mon, Feb 27, 2023 at 11:20 AM Tom Lane  wrote:
>> Well, that's a user error not pg_dump's fault.  Particularly so for hash
>> partitioning, where there is no defensible reason to make the partitions
>> semantically different.

> I am still of the opinion that you're going down a dangerous path of
> redefining pg_dump's mission from "dump and restore the database, as
> it actually exists" to "dump and restore the database, unless the user
> did something that I think is silly".

Let's not attack straw men, shall we?  I'm defining pg_dump's mission
as "dump and restore the database successfully".  Failure to restore
does not help anyone, especially if they are in a disaster recovery
situation where it's not possible to re-take the dump.  It's not like
there's no precedent for having pg_dump tweak things to ensure a
successful restore.

regards, tom lane




Re: libpq: PQgetCopyData() and allocation overhead

2023-02-27 Thread Jeroen Vermeulen
Done.  Thanks for looking!

Jelte Fennema pointed out that I should probably be using PQExpBuffer for
this.  I'll look into that later; this is a proof of concept, not a
production-ready API proposal.


Jeroen

On Mon, 27 Feb 2023 at 14:48, Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Fri, Feb 10, 2023 at 5:49 PM Jeroen Vermeulen  wrote:
> >
> > OK, I've updated the PR with a benchmark (in the main directory).
> >
> > On this benchmark I'm seeing about a 24% reduction in "user" CPU time,
> and a 8% reduction in "system" CPU time.  (Almost no reduction in
> wall-clock time.)
>
> I can help run some logical replication performance benchmarks
> tomorrow. Would you mind cleaning the PR and providing the changes
> (there are multiple commits in the PR) as a single patch here for the
> sake of ease of review and test?
>
> --
> Bharath Rupireddy
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>


Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-27 Thread Tom Lane
I wrote:
> Hah - I thought of a solution.  We can avoid this race condition if
> we make the remote session itself inspect pg_stat_activity and
> return its displayed application_name.  Just need a foreign table
> that maps onto pg_stat_activity.

I went ahead and coded it that way, and it doesn't look too awful.
Any objections?

regards, tom lane

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index d5fc61446a..e3275caa2d 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9926,11 +9926,6 @@ WARNING:  there is no transaction in progress
 -- Change application_name of remote connection to special one
 -- so that we can easily terminate the connection later.
 ALTER SERVER loopback OPTIONS (application_name 'fdw_retry_check');
--- If debug_discard_caches is active, it results in
--- dropping remote connections after every transaction, making it
--- impossible to test termination meaningfully.  So turn that off
--- for this test.
-SET debug_discard_caches = 0;
 -- Make sure we have a remote connection.
 SELECT 1 FROM ft1 LIMIT 1;
  ?column? 
@@ -9939,13 +9934,12 @@ SELECT 1 FROM ft1 LIMIT 1;
 (1 row)
 
 -- Terminate the remote connection and wait for the termination to complete.
-SELECT pg_terminate_backend(pid, 18) FROM pg_stat_activity
+-- (If a cache flush happens, the remote connection might have already been
+-- dropped; so code this step in a way that doesn't fail if no connection.)
+DO $$ BEGIN
+PERFORM pg_terminate_backend(pid, 18) FROM pg_stat_activity
 	WHERE application_name = 'fdw_retry_check';
- pg_terminate_backend 
---
- t
-(1 row)
-
+END $$;
 -- This query should detect the broken connection when starting new remote
 -- transaction, reestablish new connection, and then succeed.
 BEGIN;
@@ -9958,13 +9952,10 @@ SELECT 1 FROM ft1 LIMIT 1;
 -- If we detect the broken connection when starting a new remote
 -- subtransaction, we should fail instead of establishing a new connection.
 -- Terminate the remote connection and wait for the termination to complete.
-SELECT pg_terminate_backend(pid, 18) FROM pg_stat_activity
+DO $$ BEGIN
+PERFORM pg_terminate_backend(pid, 18) FROM pg_stat_activity
 	WHERE application_name = 'fdw_retry_check';
- pg_terminate_backend 
---
- t
-(1 row)
-
+END $$;
 SAVEPOINT s;
 -- The text of the error might vary across platforms, so only show SQLSTATE.
 \set VERBOSITY sqlstate
@@ -9972,7 +9963,6 @@ SELECT 1 FROM ft1 LIMIT 1;-- should fail
 ERROR:  08006
 \set VERBOSITY default
 COMMIT;
-RESET debug_discard_caches;
 -- =
 -- test connection invalidation cases and postgres_fdw_get_connections function
 -- =
@@ -11629,74 +11619,66 @@ HINT:  There are no valid options in this context.
 -- ===
 -- test postgres_fdw.application_name GUC
 -- ===
 Turn debug_discard_caches off for this test to make sure that
 the remote connection is alive when checking its application_name.
-SET debug_discard_caches = 0;
+-- To avoid race conditions in checking the remote session's application_name,
+-- use this view to make the remote session itself read its application_name.
+CREATE VIEW my_application_name AS
+  SELECT application_name FROM pg_stat_activity WHERE pid = pg_backend_pid();
+CREATE FOREIGN TABLE remote_application_name (application_name text)
+  SERVER loopback2
+  OPTIONS (schema_name 'public', table_name 'my_application_name');
+SELECT * FROM remote_application_name;
+ application_name 
+--
+ postgres_fdw
+(1 row)
+
 -- Specify escape sequences in application_name option of a server
 -- object so as to test that they are replaced with status information
--- expectedly.
+-- expectedly.  Note that we are also relying on ALTER SERVER to force
+-- the remote session to be restarted with its new application name.
 --
 -- Since pg_stat_activity.application_name may be truncated to less than
 -- NAMEDATALEN characters, note that substring() needs to be used
 -- at the condition of test query to make sure that the string consisting
 -- of database name and process ID is also less than that.
 ALTER SERVER loopback2 OPTIONS (application_name 'fdw_%d%p');
-SELECT 1 FROM ft6 LIMIT 1;
- ?column? 
---
-1
-(1 row)
-
-SELECT pg_terminate_backend(pid, 18) FROM pg_stat_activity
+SELECT count(*) FROM remote_application_name
   WHERE application_name =
 substring('fdw_' || current_database() || pg_backend_pid() for
   current_setting('max_identifier_length')::int);
- pg_terminate_backend 
---
- t
+ count 
+---
+ 1
 (1 row)
 
 -- postgres_fdw

Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-02-27 Thread John Naylor
On Fri, Feb 24, 2023 at 3:41 PM Masahiko Sawada 
wrote:
>
> In v29 vacuum took twice as long (286 s vs. 573 s)?

Not sure what happened there, and clearly I was looking at the wrong number
:/
I scripted the test for reproducibility and ran it three times. Also
included some variations (attached):

UUID times look comparable here, so no speedup or regression:

master:
system usage: CPU: user: 216.05 s, system: 35.81 s, elapsed: 634.22 s
system usage: CPU: user: 173.71 s, system: 31.24 s, elapsed: 599.04 s
system usage: CPU: user: 171.16 s, system: 30.21 s, elapsed: 583.21 s

v29:
system usage: CPU: user:  93.47 s, system: 40.92 s, elapsed: 594.10 s
system usage: CPU: user:  99.58 s, system: 44.73 s, elapsed: 606.80 s
system usage: CPU: user:  96.29 s, system: 42.74 s, elapsed: 600.10 s

Then, I tried sequential integers, which is a much more favorable access
pattern in general, and the new tid storage shows substantial improvement:

master:
system usage: CPU: user: 100.39 s, system: 7.79 s, elapsed: 121.57 s
system usage: CPU: user: 104.90 s, system: 8.81 s, elapsed: 124.24 s
system usage: CPU: user:  95.04 s, system: 7.55 s, elapsed: 116.44 s

v29:
system usage: CPU: user:  24.57 s, system: 8.53 s, elapsed: 61.07 s
system usage: CPU: user:  23.18 s, system: 8.25 s, elapsed: 58.99 s
system usage: CPU: user:  23.20 s, system: 8.98 s, elapsed: 66.86 s

That's fast enough that I thought an improvement would show up even with
standard WAL logging (no separate attachment, since it's a trivial change).
Seems a bit faster:

master:
system usage: CPU: user: 152.27 s, system: 11.76 s, elapsed: 216.86 s
system usage: CPU: user: 137.25 s, system: 11.07 s, elapsed: 213.62 s
system usage: CPU: user: 149.48 s, system: 12.15 s, elapsed: 220.96 s

v29:
system usage: CPU: user: 40.88 s, system: 15.99 s, elapsed: 170.98 s
system usage: CPU: user: 41.33 s, system: 15.45 s, elapsed: 166.75 s
system usage: CPU: user: 41.51 s, system: 18.20 s, elapsed: 203.94 s

There is more we could test here, but I feel better about these numbers.

In the next few days, I'll resume style review and list the remaining
issues we need to address.

--
John Naylor
EDB: http://www.enterprisedb.com


vacuum-test-lookup-int.sql
Description: application/sql


vacuum-test-lookup-uuid.sql
Description: application/sql


Re: pg_dump versus hash partitioning

2023-02-27 Thread Robert Haas
On Mon, Feb 27, 2023 at 11:20 AM Tom Lane  wrote:
> Well, that's a user error not pg_dump's fault.  Particularly so for hash
> partitioning, where there is no defensible reason to make the partitions
> semantically different.

I am still of the opinion that you're going down a dangerous path of
redefining pg_dump's mission from "dump and restore the database, as
it actually exists" to "dump and restore the database, unless the user
did something that I think is silly".

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Stale references to guc.c in comments/tests

2023-02-27 Thread Tom Lane
Daniel Gustafsson  writes:
> On 24 Feb 2023, at 16:19, Tom Lane  wrote:
>> Perhaps you could use "the GUC mechanisms" in these places, but it's a bit
>> longer than "guc.c".  Leaving such references alone seems OK too.

> I've opted for mostly leaving them in the attached v2.

This version seems OK to me except for this bit:

  * This is a straightforward one-to-one mapping, but doing it this way makes
- * guc.c independent of OpenSSL availability and version.
+ * GUC definition independent of OpenSSL availability and version.

The grammar is a bit off ("the GUC definition" would read better),
but really I think the wording was vague already and we should tighten
it up.  Can we specify exactly which GUC variable(s) we're talking about?

regards, tom lane




Re: pgsql: pg_rewind: Fix determining TLI when server was just promoted.

2023-02-27 Thread Robert Haas
On Thu, Feb 23, 2023 at 8:40 AM Heikki Linnakangas
 wrote:
> This is arguably a bug fix, but don't backpatch because we haven't
> really treated it as a bug so far.

I guess I'm having trouble understanding why this is only arguably a
bug fix. Seems like flat-out wrong behavior.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_dump versus hash partitioning

2023-02-27 Thread Tom Lane
Robert Haas  writes:
> On Tue, Feb 14, 2023 at 2:21 PM Tom Lane  wrote:
>> This made me wonder if this could be a usable solution at all, but
>> after thinking for awhile, I don't see how the claim about foreign key
>> constraints is anything but FUD.  pg_dump/pg_restore have sufficient
>> dependency logic to prevent that from happening.  I think we can just
>> drop the "or perhaps ..." clause here, and tolerate the possible
>> inefficiency as better than failing.

> Right, but isn't that dependency logic based around the fact that the
> inserts are targeting the original partition? Like, suppose partition
> A has a foreign key that is not present on partition B. A row that is
> originally in partition B gets rerouted into partition A. It must now
> satisfy the foreign key constraint when, previously, that was
> unnecessary.

Well, that's a user error not pg_dump's fault.  Particularly so for hash
partitioning, where there is no defensible reason to make the partitions
semantically different.

There could be a risk of a timing problem, namely that parallel pg_restore
tries to check an FK constraint before all the relevant data has arrived.
But in practice I don't believe that either.  We load all the data in
"data" phase and then create indexes and check FKs in "post-data" phase,
and I do not believe that parallel restore weakens that separation
(because it's enforced by a post-data boundary object in the dependencies).

regards, tom lane




Re: SLRUs in the main buffer pool - Page Header definitions

2023-02-27 Thread Robert Haas
On Mon, Feb 27, 2023 at 8:56 AM Heikki Linnakangas  wrote:
> I'm not sure if I like that or not. I think we should clean up and
> finish the other patches that this builds on first, and then decide if
> we want to use the standard page header for the SLRUs or not. And if we
> decide that we want the SLRU pages to have a page header, clean this up
> or rewrite it from scratch.

I'm not entirely sure either, but I think the idea has some potential.
If SLRU pages have headers, that means that they have LSNs, and
perhaps then we could use those LSNs to figure out when they're safe
to write to disk, instead of ad-hoc mechanisms. See SlruSharedData's
group_lsn field.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring relation extension and BufferAlloc(), faster COPY

2023-02-27 Thread Heikki Linnakangas

On 21/02/2023 18:33, Alvaro Herrera wrote:

On 2023-Feb-21, Heikki Linnakangas wrote:


+static BlockNumber
+BulkExtendSharedRelationBuffered(Relation rel,
+SMgrRelation 
smgr,
+bool 
skip_extension_lock,
+char 
relpersistence,
+ForkNumber 
fork, ReadBufferMode mode,
+
BufferAccessStrategy strategy,
+uint32 
*num_pages,
+uint32 
num_locked_pages,
+Buffer 
*buffers)


Ugh, that's a lot of arguments, some are inputs and some are outputs. I
don't have any concrete suggestions, but could we simplify this somehow?
Needs a comment at least.


Yeah, I noticed this too.  I think it would be easy enough to add a new
struct that can be passed as a pointer, which can be stack-allocated
by the caller, and which holds the input arguments that are common to
both functions, as is sensible.


We also do this in freespace.c and visibilitymap.c:

/* Extend as needed. */
while (fsm_nblocks_now < fsm_nblocks)
{
PageSetChecksumInplace((Page) pg.data, fsm_nblocks_now);

smgrextend(reln, FSM_FORKNUM, fsm_nblocks_now,
   pg.data, false);
fsm_nblocks_now++;
 }

We could use the new smgrzeroextend function here. But it would be 
better to go through the buffer cache, because after this, the last 
block, at 'fsm_nblocks', will be read with ReadBuffer() and modified.


We could use BulkExtendSharedRelationBuffered() to extend the relation 
and keep the last page locked, but the 
BulkExtendSharedRelationBuffered() signature doesn't allow that. It can 
return the first N pages locked, but there's no way to return the *last* 
page locked.


Perhaps we should decompose this function into several function calls. 
Something like:


/* get N victim buffers, pinned and !BM_VALID */
buffers = BeginExtendRelation(int npages);

LockRelationForExtension(rel)

/* Insert buffers into buffer table */
first_blk = smgrnblocks()
for (blk = first_blk; blk < last_blk; blk++)
MapNewBuffer(blk, buffers[i])

/* extend the file on disk */
smgrzeroextend();

UnlockRelationForExtension(rel)

for (blk = first_blk; blk < last_blk; blk++)
{
memset(BufferGetPage(buffers[i]), 0,
FinishNewBuffer(buffers[i])
/* optionally lock the buffer */
LockBuffer(buffers[i]);
}

That's a lot more verbose, of course, but gives the callers the 
flexibility. And might even be more readable than one function call with 
lots of arguments.


This would expose the concept of a buffer that's mapped but marked as 
IO-in-progress outside bufmgr.c. On one hand, maybe that's exposing 
details that shouldn't be exposed. On the other hand, it might come 
handy. Instead of RBM_ZERO_AND_LOCK mode, for example, it might be handy 
to have a function that returns an IO-in-progress buffer that you can 
initialize any way you want.


- Heikki





Re: pg_dump versus hash partitioning

2023-02-27 Thread Robert Haas
On Tue, Feb 14, 2023 at 2:21 PM Tom Lane  wrote:
> This made me wonder if this could be a usable solution at all, but
> after thinking for awhile, I don't see how the claim about foreign key
> constraints is anything but FUD.  pg_dump/pg_restore have sufficient
> dependency logic to prevent that from happening.  I think we can just
> drop the "or perhaps ..." clause here, and tolerate the possible
> inefficiency as better than failing.

Right, but isn't that dependency logic based around the fact that the
inserts are targeting the original partition? Like, suppose partition
A has a foreign key that is not present on partition B. A row that is
originally in partition B gets rerouted into partition A. It must now
satisfy the foreign key constraint when, previously, that was
unnecessary.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Non-superuser subscription owners

2023-02-27 Thread Robert Haas
On Wed, Feb 22, 2023 at 12:18 PM Jeff Davis  wrote:
> There are two questions:
>
> 1. Is the security situation with logical replication bad? Yes. You
> nicely summarized just how bad.
>
> 2. Is it the same situation as accessing a table owned by a user you
> don't absolutely trust?
>
> Regardless of how the second question is answered, it won't diminish
> your point that logical replication is in a bad state. If another
> situation is also bad, we should fix that too.

Well said.

> So I don't think "shouldn't" is quite good enough. In the first place,
> the user needs to know that the risk exists. Second, what if they
> actually do want to access a table owned by someone else for whatever
> reason -- how do they do that safely?

Good question. I don't think we currently have a good answer.

> I can't resist mentioning that these are all SECURITY INVOKER problems.
> SECURITY INVOKER is insecure unless the invoker absolutely trusts the
> definer, and that only really makes sense if the definer is a superuser
> (or something very close). That's why we keep adding exceptions with
> SECURITY_RESTRICTED_OPERATION, which is really just a way to silently
> ignore the SECURITY INVOKER label and use SECURITY DEFINER instead.

That's an interesting way to look at it. I think there are perhaps two
different possible perspectives here. One possibility is to take the
view that you've adopted here, and blame it on SECURITY INVOKER. The
other possibility, at least as I see it, is to blame it on the fact
that we have so many places to attach executable code to tables and
very few ways for people using those tables to limit their exposure to
such code. Suppose Alice owns a table and attaches a trigger to it. If
Bob inserts into that table, I think we have to run the trigger,
because Alice is entitled to assume that, for example, any BEFORE
triggers she might have defined that block certain kinds of inserts
are actually going to block those inserts; any constraints that she
has applied to the table are going to be enforced against all new
rows; and any default expressions she supplies are actually going to
work. I think Bob has to be OK with those things too; otherwise, he
just shouldn't insert anything into the table.

But Bob doesn't have to be OK with Alice's code changing the session
state, or executing DML or DDL with his permissions. I wonder if
that's where we should be trying to insert restrictions here. Right
now, we think of SECURITY_RESTRICTED_OPERATION as a way to prevent a
function or procedure that runs under a different user ID than the
session user from poisoning the session state. But I'm thinking that
maybe the problem isn't really with code running under a different
user ID. It's with running code *provided by* a different user ID.
Maybe we should stop thinking about the security context as something
that you set when you switch to running as a different user ID, and
start thinking about it as something that needs to be set based on the
relationship between the user that provided the code and the session
user. If they're not the same, some restrictions are probably
appropriate, except I think in the case where the user who provided
the code can become the session user anyway.

> Another option is having some kind SECURITY NONE that would run the
> code as a very limited-privilege user that can basically only access
> the catalog. That would be useful for running default expressions and
> the like without the definer or invoker needing to be careful.

This might be possible, but I have some doubts about how difficult it
would be to get all the details right. We'd need to make sure that
this limited-privilege user couldn't ever create a table, or own one,
or be granted any privileges to do anything other than the minimal set
of things it's supposed to be able to do, or poison the session state,
etc. And it would have weird results like current_user returning the
name of the limited-privilege user rather than any of the users
involved in the operation. Maybe that's all OK, but I find it more
appealing to try to think about what kinds of operations can be
performed in what contexts than to invent entirely new users.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: PATCH: Using BRIN indexes for sorted output

2023-02-27 Thread Matthias van de Meent
Hi,

On Thu, 23 Feb 2023 at 17:44, Matthias van de Meent
 wrote:
>
> I'll see to further reviewing 0004 and 0005 when I have additional time.

Some initial comments on 0004:

> +/*
> + * brin_minmax_ranges
> + *Load the BRIN ranges and sort them.
> + */
> +Datum
> +brin_minmax_ranges(PG_FUNCTION_ARGS)
> +{

Like in 0001, this seems to focus on only single columns. Can't we put
the scan and sort infrastructure in brin, and put the weight- and
compare-operators in the opclasses? I.e.
brin_minmax_minorder(PG_FUNCTION_ARGS=brintuple) -> range.min and
brin_minmax_maxorder(PG_FUNCTION_ARGS=brintuple) -> range.max, and a
brin_minmax_compare(order, order) -> int? I'm thinking of something
similar to GIST's distance operators, which would make implementing
ordering by e.g. `pointcolumn <-> (1, 2)::point` implementable in the
brin infrastructure.

Note: One big reason I don't really like the current
brin_minmax_ranges (and the analyze code in 0001) is because it breaks
the operatorclass-vs-index abstraction layer. Operator classes don't
(shouldn't) know or care about which attribute number they have, nor
what the index does with the data.
Scanning the index is not something that I expect the operator class
to do, I expect that the index code organizes the scan, and forwards
the data to the relevant operator classes.
Scanning the index N times for N attributes can be excused if there
are good reasons, but I'd rather have that decision made in the
index's core code rather than at the design level.

> +/*
> + * XXX Does it make sense (is it possible) to have a sort by more than one
> + * column, using a BRIN index?
> + */

Yes, even if only one prefix column is included in the BRIN index
(e.g. `company` in `ORDER BY company, date`, the tuplesort with table
tuples can add additional sorting without first reading the whole
table, potentially (likely) reducing the total resource usage of the
query. That utilizes the same idea as incremental sorts, but with the
caveat that the input sort order is approximately likely but not at
all guaranteed. So, even if the range sort is on a single index
column, we can still do the table's tuplesort on all ORDER BY
attributes, as long as a prefix of ORDER BY columns are included in
the BRIN index.

> +/*
> + * XXX We can be a bit smarter for LIMIT queries - once we
> + * know we have more rows in the tuplesort than we need to
> + * output, we can stop spilling - those rows are not going
> + * to be needed. We can discard the tuplesort (no need to
> + * respill) and stop spilling.
> + */

Shouldn't that be "discard the tuplestore"?

> +#define BRIN_PROCNUM_RANGES 12/* optional */

It would be useful to add documentation for this in this patch.


Kind regards,

Matthias van de Meent.




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-02-27 Thread Tom Lane
Melanie Plageman  writes:
> Attached is a patch to remove the *_FIRST macros.
> I was going to add in code to change

> for (IOObject io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
> to
> for (IOObject io_object = 0; (int) io_object < IOOBJECT_NUM_TYPES; 
> io_object++)

I don't really like that proposal.  ISTM it's just silencing the
messenger rather than addressing the underlying problem, namely that
there's no guarantee that an IOObject variable can hold the value
IOOBJECT_NUM_TYPES, which it had better do if you want the loop to
terminate.  Admittedly it's quite unlikely that these three enums would
grow to the point that that becomes an actual hazard for them --- but
IMO it's still bad practice and a bad precedent for future code.

> but then I couldn't remember why we didn't just do

> for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)

> I recall that when passing that loop variable into a function I was
> getting a compiler warning that required me to cast the value back to an
> enum to silence it:

> pgstat_tracks_io_op(bktype, (IOObject) io_object,
> io_context, io_op))

> However, I am now unable to reproduce that warning.
> Moreover, I see in cases like table_block_relation_size() with
> ForkNumber, the variable i is passed with no cast to smgrnblocks().

Yeah, my druthers would be to just do it the way we do comparable
things with ForkNumber.  I don't feel like we need to invent a
better way here.

The risk of needing to cast when using the "int" loop variable
as an enum is obviously the downside of that approach, but we have
not seen any indication that any compilers actually do warn.
It's interesting that you did see such a warning ... I wonder which
compiler you were using at the time?

regards, tom lane




Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-27 Thread Tom Lane
I wrote:
> ... maybe we could do "select 1 from
> pg_stat_activity where application_name = computed-pattern", but that
> has the same problem that a cache flush might have terminated the
> remote session.

Hah - I thought of a solution.  We can avoid this race condition if
we make the remote session itself inspect pg_stat_activity and
return its displayed application_name.  Just need a foreign table
that maps onto pg_stat_activity.  Of course, this'd add yet another
layer of baroque-ness to a test section that I already don't think
is worth the trouble.  Should we go that way, or just rip it out?

regards, tom lane




Re: Add LZ4 compression in pg_dump

2023-02-27 Thread gkokolatos






--- Original Message ---
On Saturday, February 25th, 2023 at 3:05 PM, Justin Pryzby 
 wrote:


> 
> 
> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
> 
> > I have some fixes (attached) and questions while polishing the patch for
> > zstd compression. The fixes are small and could be integrated with the
> > patch for zstd, but could be applied independently.


Please find some comments on the rest of the fixes patch that Tomas has not
commented on.

can be compressed with the gzip or
-   lz4tool.
+   lz4 tools.

+1

 The compression method can be set to gzip or
-lz4 or none for no compression.
+lz4, or none for no compression.

I am not a native English speaker. Yet I think that if one adds commas
in one of the options, then one should add commas to all the options.
Namely, the aboveis missing a comma between gzip and lz4. However I
think that not having any commas still works grammatically and
syntactically.

-   /*
-* A level of zero simply copies the input one block at the 
time. This
-* is probably not what the user wanted when calling this 
interface.
-*/
-   if (cs->compression_spec.level == 0)
-   pg_fatal("requested to compress the archive yet no 
level was specified");


I disagree with change. WriteDataToArchiveGzip() is far away from
what ever the code in pg_dump.c is doing. Any non valid values for
level will emit an error in when the proper gzip/zlib code is
called. A zero value however, will not emit such error. Having the
extra check there is a future proof guarantee in a very low cost.
Furthermore, it quickly informs the reader of the code about that
specific value helping with readability and comprehension.

If any change is required, something for which I vote strongly
against, I would at least recommend to replace it with an
assertion.

- * Initialize a compress file stream. Deffer the compression algorithm
+ * Initialize a compress file stream. Infer the compression algorithm

:+1:

-   # Skip command-level tests for gzip if there is no support for it.
+   # Skip command-level tests for gzip/lz4 if they're not supported.

We will be back at that again soon. Maybe change to:

Skip command-level test for unsupported compression methods

To include everything.


-   ($pgdump_runs{$run}->{compile_option} eq 'gzip' && 
!$supports_gzip) ||
-   ($pgdump_runs{$run}->{compile_option} eq 'lz4' && 
!$supports_lz4))
+   (($pgdump_runs{$run}->{compile_option} eq 'gzip' && 
!$supports_gzip) ||
+   ($pgdump_runs{$run}->{compile_option} eq 'lz4' && 
!$supports_lz4)))

Good catch, :+1:

Cheers,
//Georgios

> --
> Justin




meson / pg_regress --no-locale

2023-02-27 Thread Andrew Dunstan
I can't see an obvious way to run the regression tests via meson with 
the --no-locale setting. This is particularly important on Windows. The 
buildfarm client first runs the regression tests with this setting and 
then tests (via installcheck) against instances set up with its 
configured locales. We do it this way so we are not subject to the 
vagaries of whatever environment we are running in.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: libpq: PQgetCopyData() and allocation overhead

2023-02-27 Thread Jelte Fennema
Instead of implementing new growable buffer logic in this patch. It
seems much nicer to reuse the already existing PQExpBuffer type for
this patch.


On Mon, 27 Feb 2023 at 14:48, Bharath Rupireddy
 wrote:
>
> On Fri, Feb 10, 2023 at 5:49 PM Jeroen Vermeulen  wrote:
> >
> > OK, I've updated the PR with a benchmark (in the main directory).
> >
> > On this benchmark I'm seeing about a 24% reduction in "user" CPU time, and 
> > a 8% reduction in "system" CPU time.  (Almost no reduction in wall-clock 
> > time.)
>
> I can help run some logical replication performance benchmarks
> tomorrow. Would you mind cleaning the PR and providing the changes
> (there are multiple commits in the PR) as a single patch here for the
> sake of ease of review and test?
>
> --
> Bharath Rupireddy
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>
>




Re: Add LZ4 compression in pg_dump

2023-02-27 Thread gkokolatos






--- Original Message ---
On Sunday, February 26th, 2023 at 3:59 PM, Tomas Vondra 
 wrote:


> 
> 
> 
> 
> On 2/25/23 06:02, Justin Pryzby wrote:
> 
> > I have some fixes (attached) and questions while polishing the patch for
> > zstd compression. The fixes are small and could be integrated with the
> > patch for zstd, but could be applied independently.
> > 
> > - I'm unclear about get_error_func(). That's called in three places
> > from pg_backup_directory.c, after failures from write_func(), to
> > supply an compression-specific error message to pg_fatal(). But it's
> > not being used outside of directory format, nor for errors for other
> > function pointers, or even for all errors in write_func(). Is there
> > some reason why each compression method's write_func() shouldn't call
> > pg_fatal() directly, with its compression-specific message ?
> 
> 
> I think there are a couple more places that might/should call
> get_error_func(). For example ahwrite() in pg_backup_archiver.c now
> simply does
> 
> if (bytes_written != size * nmemb)
> WRITE_ERROR_EXIT;
> 
> but perhaps it should call get_error_func() too. There are probably
> other places that call write_func() and should use get_error_func().

Agreed, calling get_error_func() would be preferable to a fatal error. It
should be the caller of the api who decides how to proceed.

> 
> > - I still think supports_compression() should be renamed, or made into a
> > static function in the necessary file. The main reason is that it's
> > more clear what it indicates - whether compression is "implemented by
> > pgdump" and not whether compression is "supported by this postgres
> > build". It also seems possible that we'd want to add a function
> > called something like supports_compression(), indicating whether the
> > algorithm is supported by the current build. It'd be better if pgdump
> > didn't subjugate that name.
> 
> 
> If we choose to rename this to have pgdump_ prefix, fine with me. But I
> don't think there's a realistic chance of conflict, as it's restricted
> to pgdump header etc. And it's not part of an API, so I guess we could
> rename that in the future if needed.

Agreed, it is very unrealistic that one will include that header file anywhere
but within pg_dump. Also. I think that adding a prefix, "pgdump", "pg_dump",
or similar does not add value and subtracts readability. 

> 
> > - Finally, the "Nothing to do in the default case" comment comes from
> > Michael's commit 5e73a6048:
> > 
> > + /*
> > + * Custom and directory formats are compressed by default with gzip when
> > + * available, not the others.
> > + /
> > + if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
> > + !user_compression_defined)
> > {
> > #ifdef HAVE_LIBZ
> > - if (archiveFormat == archCustom || archiveFormat == archDirectory)
> > - compressLevel = Z_DEFAULT_COMPRESSION;
> > - else
> > + parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
> > + &compression_spec);
> > +#else
> > + / Nothing to do in the default case */
> > #endif
> > - compressLevel = 0;
> > }
> > 
> > As the comment says: for -Fc and -Fd, the compression is set to zlib, if
> > enabled, and when not otherwise specified by the user.
> > 
> > Before 5e73a6048, this set compressLevel=0 for -Fp and -Ft, and when
> > zlib was unavailable.
> > 
> > But I'm not sure why there's now an empty "#else". I also don't know
> > what "the default case" refers to.
> > 
> > Maybe the best thing here is to move the preprocessor #if, since it's no
> > longer in the middle of a runtime conditional:
> > 
> > #ifdef HAVE_LIBZ
> > + if ((archiveFormat == archCustom || archiveFormat == archDirectory) &&
> > + !user_compression_defined)
> > + parse_compress_specification(PG_COMPRESSION_GZIP, NULL,
> > + &compression_spec);
> > #endif
> > 
> > ...but that elicits a warning about "variable set but not used"...
> 
> 
> Not sure, I need to think about this a bit.

Not having warnings is preferable, isn't it? I can understand the confusion
on the message though. Maybe a phrasing like:
/* Nothing to do for the default case when LIBZ is not available */
is easier to understand.

Cheers,
//Georgios

> 
> regards
> 
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-02-27 Thread Melanie Plageman
On Sun, Feb 26, 2023 at 1:52 PM Tom Lane  wrote:
>
> I wrote:
> > The issue seems to be that code like this:
> > ...
> > is far too cute for its own good.
>
> Oh, there's another thing here that qualifies as too-cute: loops like
>
> for (IOObject io_object = IOOBJECT_FIRST;
>  io_object < IOOBJECT_NUM_TYPES; io_object++)
>
> make it look like we could define these enums as 1-based rather
> than 0-based, but if we did this code would fail, because it's
> confusing "the number of values" with "1 more than the last value".
>
> Again, we could fix that with tests like "io_context <= IOCONTEXT_LAST",
> but I don't see the point of adding more macros rather than removing
> some.  We do need IOOBJECT_NUM_TYPES to declare array sizes with,
> so I think we should nuke the "xxx_FIRST" macros as being not worth
> the electrons they're written on, and write these loops like
>
> for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
>
> which is not actually adding any assumptions that you don't already
> make by using io_object as a C array subscript.

Attached is a patch to remove the *_FIRST macros.
I was going to add in code to change

for (IOObject io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
to
for (IOObject io_object = 0; (int) io_object < IOOBJECT_NUM_TYPES;
io_object++)

but then I couldn't remember why we didn't just do

for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)

I recall that when passing that loop variable into a function I was
getting a compiler warning that required me to cast the value back to an
enum to silence it:

pgstat_tracks_io_op(bktype, (IOObject) io_object,
io_context, io_op))

However, I am now unable to reproduce that warning.
Moreover, I see in cases like table_block_relation_size() with
ForkNumber, the variable i is passed with no cast to smgrnblocks().

- Melanie
From cce6dc75e9e4fc9adc018a1d05874be5f3be96ae Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 27 Feb 2023 08:22:53 -0500
Subject: [PATCH v1 1/2] Remove potentially misleading *_FIRST macros

28e626bde00ef introduced IO statistic enums IOOp, IOObject, and
IOContext along with macros *_FIRST intended for use when looping
through the enumerated values of each. Per discussion in [1] these
macros are confusing and error-prone. Remove them.

[1] https://www.postgresql.org/message-id/23770.1677437567%40sss.pgh.pa.us
---
 src/backend/utils/activity/pgstat_io.c | 17 ++---
 src/backend/utils/adt/pgstatfuncs.c| 10 --
 src/include/pgstat.h   |  3 ---
 3 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index 0e07e0848d..c478b126fa 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -36,18 +36,16 @@ pgstat_bktype_io_stats_valid(PgStat_BktypeIO *backend_io,
 {
 	bool		bktype_tracked = pgstat_tracks_io_bktype(bktype);
 
-	for (IOObject io_object = IOOBJECT_FIRST;
-		 io_object < IOOBJECT_NUM_TYPES; io_object++)
+	for (IOObject io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
 	{
-		for (IOContext io_context = IOCONTEXT_FIRST;
-			 io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		for (IOContext io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
 		{
 			/*
 			 * Don't bother trying to skip to the next loop iteration if
 			 * pgstat_tracks_io_object() would return false here. We still
 			 * need to validate that each counter is zero anyway.
 			 */
-			for (IOOp io_op = IOOP_FIRST; io_op < IOOP_NUM_TYPES; io_op++)
+			for (IOOp io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
 			{
 /* No stats, so nothing to validate */
 if (backend_io->data[io_object][io_context][io_op] == 0)
@@ -111,14 +109,11 @@ pgstat_flush_io(bool nowait)
 	else if (!LWLockConditionalAcquire(bktype_lock, LW_EXCLUSIVE))
 		return true;
 
-	for (IOObject io_object = IOOBJECT_FIRST;
-		 io_object < IOOBJECT_NUM_TYPES; io_object++)
+	for (IOObject io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
 	{
-		for (IOContext io_context = IOCONTEXT_FIRST;
-			 io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		for (IOContext io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
 		{
-			for (IOOp io_op = IOOP_FIRST;
- io_op < IOOP_NUM_TYPES; io_op++)
+			for (IOOp io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
 bktype_shstats->data[io_object][io_context][io_op] +=
 	PendingIOStats.data[io_object][io_context][io_op];
 		}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9d707c3521..12eda4ade0 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1306,7 +1306,7 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 
 	reset_time = TimestampTzGetDatum(backends_io_stats->stat_reset_timestamp);
 
-	for (BackendType bktype = B_INVALID; bktype < BACKEND_NUM_TYPES; bktype++)
+	for (

Re: Should vacuum process config file reload more often

2023-02-27 Thread Masahiko Sawada
Hi,

On Fri, Feb 24, 2023 at 7:08 AM Melanie Plageman
 wrote:
>
> Hi,
>
> Users may wish to speed up long-running vacuum of a large table by
> decreasing autovacuum_vacuum_cost_delay/vacuum_cost_delay, however the
> config file is only reloaded between tables (for autovacuum) or after
> the statement (for explicit vacuum). This has been brought up for
> autovacuum in [1].
>
> Andres suggested that it might be possible to check ConfigReloadPending
> in vacuum_delay_point(), so I thought I would draft a rough patch and
> start a discussion.

In vacuum_delay_point(), we need to update VacuumCostActive too if necessary.

> Apart from this, one higher level question I have is if there are other
> gucs whose modification would make reloading the configuration file
> during vacuum/analyze unsafe.

As far as I know there are not such GUC parameters in the core but
there might be in third-party table AM and index AM extensions. Also,
I'm concerned that allowing to change any GUC parameters during
vacuum/analyze could be a foot-gun in the future. When modifying
vacuum/analyze-related codes, we have to consider the case where any
GUC parameters could be changed during vacuum/analyze. I guess it
would be better to apply the parameter changes for only vacuum delay
related parameters. For example, autovacuum launcher advertises the
values of the vacuum delay parameters on the shared memory not only
for autovacuum processes but also for manual vacuum/analyze processes.
Both processes can update them accordingly in vacuum_delay_point().

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: PATCH: Using BRIN indexes for sorted output

2023-02-27 Thread Matthias van de Meent
On Fri, 24 Feb 2023, 20:14 Tomas Vondra,  wrote:
>
> On 2/24/23 19:03, Matthias van de Meent wrote:
> > On Thu, 23 Feb 2023 at 19:48, Tomas Vondra
> >> Yeah, that sounds like a bug. Also a sign the tests should have some
> >> by-ref data types (presumably there are none, as that would certainly
> >> trip some asserts etc.).
> >
> > I'm not sure we currently trip asserts, as the data we store in the
> > memory context is very limited, making it unlikely we actually release
> > the memory region back to the OS.
> > I did get assertion failures by adding the attached patch, but I don't
> > think that's something we can do in the final release.
> >
>
> But we should randomize the memory if we ever do pfree(), and it's
> strange valgrind didn't complain when I ran tests with it.

Well, we don't clean up the decoding tuple immediately after our last
iteration, so the memory context (and the last tuple's attributes) are
still valid memory addresses. And, assuming that min/max values for
all brin ranges all have the same max-aligned length, the attribute
pointers are likely to point to the same offset within the decoding
tuple's memory context's memory segment, which would mean the dangling
pointers still point to valid memory - just not memory with the
contents they expected to be pointing to.

> >> Considering how tiny BRIN indexes are, this is likely orders of
> >> magnitude less I/O than we expend on sampling rows from the table. I
> >> mean, with the default statistics target we read ~3 pages (~240MB)
> >> or more just to sample the rows. Randomly, while the BRIN index is
> >> likely scanned mostly sequentially.
> >
> > Mostly agreed; except I think it's not too difficult to imagine a BRIN
> > index that is on that scale; with as an example the bloom filters that
> > easily take up several hundreds of bytes.
> >
> > With the default configuration of 128 pages_per_range,
> > n_distinct_per_range of -0.1, and false_positive_rate of 0.01, the
> > bloom filter size is 4.36 KiB - each indexed item on its own page. It
> > is still only 1% of the original table's size, but there are enough
> > tables that are larger than 24GB that this could be a significant
> > issue.
> >
>
> Right, it's certainly true BRIN indexes may be made fairly large (either
> by using something like bloom or by making the ranges much smaller). But
> those are exactly the indexes where building statistics for all columns
> at once is going to cause issues with memory usage etc.
>
> Note: Obviously, that depends on how much data per range we need to keep
> in memory. For bloom I doubt we'd actually want to keep all the filters,
> we'd probably calculate just some statistics (e.g. number of bits set),
> so maybe the memory consumption is not that bad.

Yes, I was thinking something along the same lines for bloom as well.
Something like 'average number of bits set' (or: histogram number of
bits set), and/or for each bit a count (or %) how many times it is
set, etc.

> >> Maybe there are cases where this would be an issue, but I haven't seen
> >> one when working on this patch (and I did a lot of experiments). I'd
> >> like to see one before we start optimizing it ...
> >
> > I'm not only worried about optimizing it, I'm also worried that we're
> > putting this abstraction at the wrong level in a way that is difficult
> > to modify.
> >
>
> Yeah, that's certainly a valid concern. I admit I only did the minimum
> amount of work on this part, as I was focused on the sorting part.
>
> >> This also reminds me that the issues I actually saw (e.g. memory
> >> consumption) would be made worse by processing all columns at once,
> >> because then you need to keep more columns in memory.
> >
> > Yes, I that can be a valid concern, but don't we already do similar
> > things in the current table statistics gathering?
> >
>
> Not really, I think. We sample a bunch of rows once, but then we build
> statistics on this sample for each attribute / expression independently.
> We could of course read the whole index into memory and then run the
> analysis, but I think tuples tend to be much smaller (thanks to TOAST
> etc.) and we only really scan a limited amount of them (30k).

Just to note, with default settings, sampling 30k index entries from
BRIN would cover ~29 GiB of a table. This isn't a lot, but it also
isn't exactly a small table. I think that it would be difficult to get
accurate avg_overlap statistics for some shapes of BRIN data...

> But if we're concerned about the I/O, the BRIN is likely fairly large,
> so maybe reading it into memory at once is not a great idea.

Agreed, we can't always expect that the interesting parts of the BRIN
index always fit in the available memory.

Kind regards,

Matthias van de Meent




Re: SLRUs in the main buffer pool - Page Header definitions

2023-02-27 Thread Heikki Linnakangas

On 08/02/2023 22:26, Andres Freund wrote:

On 2023-02-08 20:04:52 +, Bagga, Rishu wrote:

To summarize, our underlying effort is to move the SLRUs to the buffer
cache. We were working with Thomas Munro off a patch he introduced here
[1]. Munro’s patch moves SLRUs to the buffer cache by introducing the
pseudo db id 9 to denote SLRU pages, but maintains the current “raw”
data format of SLRU pages. The addition of page headers in our patch
resolves this issue [2] Munro mentions in this email [3].

Heikki Linnakangas then introduced  patch on top of Munro’s patch that
modularizes the storage manager, allowing SLRUs to use it [4]. Instead
of using db id 9, SLRUs use spcOid 9, and each SLRU is its own relation.
Here, Heikki simplifies the storage manager by having each struct be
responsible for just one fork of a relation; thus increasing
extensibility of the smgr API, including for SLRUs. [5] We integrated
our changes introducing page headers for SLRU pages, and upgrade logic
to Heikki’s latest patch.


That doesn't explain the bulk of the changes here.  Why e.g. does any of the
above require RelationGetSmgr() to handle the fork as well? Why do we need
smgrtruncate_multi()? And why does all of this happens as one patch?

As is, with a lot of changes mushed together, without distinct explanations
for why is what done, this patch is essentially unreviewable. It'll not make
progress in this form.

It doesn't help that much to reference prior discussions in the email I'm
responding to - the patches need to be mostly understandable on their own,
without reading several threads.  And there needs to be explanations in the
code as well, otherwise we'll have no chance to understand any of this in a
few years.


Agreed. I rebased this over my rebased patch set from the other thread 
at 
https://www.postgresql.org/message-id/02825393-615a-ac81-0f05-f3cc2e6f875f%40iki.fi. 
Attached is a new patch with only the changes relative to that patch set.


This is still messy, but now I can see what the point is: make the 
SLRUs, which are tracked in the main buffer pool thanks to the other 
patches, use the standard page header.


I'm not sure if I like that or not. I think we should clean up and 
finish the other patches that this builds on first, and then decide if 
we want to use the standard page header for the SLRUs or not. And if we 
decide that we want the SLRU pages to have a page header, clean this up 
or rewrite it from scratch.


- Heikki
From 81eca4ed2f929e4e9c2f3de4040042fc070a1462 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Mon, 27 Feb 2023 15:46:36 +0200
Subject: [PATCH 1/1] slru_to_buffer_cache_with_page_headers_v6.patch, rebased

from https://www.postgresql.org/message-id/BBA4E674-ABCC-4788-AE1C-8EB295B217FE%40amazon.com
---
 src/backend/access/transam/clog.c  |  47 ++--
 src/backend/access/transam/commit_ts.c |  53 +++-
 src/backend/access/transam/multixact.c | 191 -
 src/backend/access/transam/slru.c  |  21 +-
 src/backend/access/transam/subtrans.c  |  14 +-
 src/backend/commands/async.c   |  37 +--
 src/backend/commands/dbcommands.c  |   2 +-
 src/backend/storage/lmgr/predicate.c   |  15 +-
 src/backend/storage/page/bufpage.c |  21 ++
 src/bin/pg_upgrade/file.c  | 175 +++-
 src/bin/pg_upgrade/function.c  |  66 +
 src/bin/pg_upgrade/pg_upgrade.c|  74 -
 src/bin/pg_upgrade/pg_upgrade.h|  19 +-
 src/include/access/slru.h  |   3 +-
 src/include/access/slrudefs.h  |  19 ++
 src/include/storage/bufmgr.h   |  11 +-
 src/include/storage/bufpage.h  |  17 ++
 src/test/isolation/expected/stats.out  | 356 +
 src/test/isolation/specs/stats.spec|  60 +
 src/test/regress/expected/stats.out|  38 ---
 src/test/regress/expected/sysviews.out |   7 -
 src/test/regress/sql/stats.sql |  15 --
 src/test/regress/sql/sysviews.sql  |   3 -
 23 files changed, 659 insertions(+), 605 deletions(-)
 create mode 100644 src/include/access/slrudefs.h

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index b6f5ae987b1..b6c76482ccb 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -61,7 +61,7 @@
 /* We need two bits per xact, so four xacts fit in a byte */
 #define CLOG_BITS_PER_XACT	2
 #define CLOG_XACTS_PER_BYTE 4
-#define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
+#define CLOG_XACTS_PER_PAGE ((BLCKSZ - SizeOfPageHeaderData) * CLOG_XACTS_PER_BYTE)
 #define CLOG_XACT_BITMASK	((1 << CLOG_BITS_PER_XACT) - 1)
 
 #define TransactionIdToPage(xid)	((xid) / (TransactionId) CLOG_XACTS_PER_PAGE)
@@ -86,7 +86,7 @@
 
 static Buffer ZeroCLOGPage(int pageno, bool writeXlog);
 static bool CLOGPagePrecedes(int page1, int page2);
-static void WriteZeroPageXlogRec(int pageno);
+static XLogRecPtr WriteZeroPageXlogRec(int pageno);
 static void WriteTruncateXlogRec(int pageno, TransactionId oldest

Re: libpq: PQgetCopyData() and allocation overhead

2023-02-27 Thread Bharath Rupireddy
On Fri, Feb 10, 2023 at 5:49 PM Jeroen Vermeulen  wrote:
>
> OK, I've updated the PR with a benchmark (in the main directory).
>
> On this benchmark I'm seeing about a 24% reduction in "user" CPU time, and a 
> 8% reduction in "system" CPU time.  (Almost no reduction in wall-clock time.)

I can help run some logical replication performance benchmarks
tomorrow. Would you mind cleaning the PR and providing the changes
(there are multiple commits in the PR) as a single patch here for the
sake of ease of review and test?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: meson vs make: missing/inconsistent ENV

2023-02-27 Thread Andrew Dunstan


On 2023-02-27 Mo 07:33, Andrew Dunstan wrote:



On 2023-02-27 Mo 06:17, Dagfinn Ilmari Mannsåker wrote:

Justin Pryzby  writes:


On Sun, Feb 26, 2023 at 03:21:04PM -0800, Andres Freund wrote:

Is there any consideration of promoting these or other warnings to
fatal?

You mean the perl warnings?

Yes - it'd be nice if the warnings caused an obvious failure to allow
addressing the issue.  I noticed the icu warning while looking at a bug
in 0da243fed, and updating to add ZSTD.

Perl warnings can be made fatal with `use warnings FATAL =>
;`, but one should be careful about which categories to
fatalise, per.

Some categories are inherently unsafe to fatalise, as documented in
.



Yeah.


It would be nice if there were some fuller explanation of the various 
categories, but I don't know of one.






Looks like the explanations are in the perldiag manual page. 




cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Stale references to guc.c in comments/tests

2023-02-27 Thread Daniel Gustafsson
> On 24 Feb 2023, at 16:19, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> I happened to notice that there were a few references to guc.c regarding
>> variables, which with the recent refactoring in 0a20ff54f have become stale.
>> Attached is a trivial patch to instead point to guc_tables.c.
> 
> Hmm, I think you may have done an overenthusiastic replacement here.

Fair enough, I only changed those places I felt referenced variables, or their
definition, in guc_tables.c but I agree that there is a lot of greyzone in the
interpretation.

> Perhaps you could use "the GUC mechanisms" in these places, but it's a bit
> longer than "guc.c".  Leaving such references alone seems OK too.

I've opted for mostly leaving them in the attached v2.

--
Daniel Gustafsson



v2-0001-Fix-outdated-references-to-guc.c.patch
Description: Binary data


Re: Allow logical replication to copy tables in binary format

2023-02-27 Thread Bharath Rupireddy
On Tue, Feb 21, 2023 at 7:18 PM Bharath Rupireddy
 wrote:
>
> On Mon, Feb 20, 2023 at 5:17 PM Melih Mutlu  wrote:
> >
> > Thanks for letting me know.
> > Attached the fixed version of the patch.
>
> Thanks. I have few comments on v9 patch:
>
> 1.
> +/* Do not allow binary = false with copy_format = binary 
> */
> +if (!opts.binary &&
> +sub->copyformat == LOGICALREP_COPY_AS_BINARY &&
> +!IsSet(opts.specified_opts, SUBOPT_COPY_FORMAT))
> +ereport(ERROR,
> +
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot set %s for a
> subscription with %s",
> +"binary = false",
> "copy_format = binary")));
>
> I don't understand why we'd need to tie an option (binary) that deals
> with data types at column-level with another option (copy_format) that
> requests the entire table data to be in binary. This'd essentially
> make one to set binary = true to use copy_format = binary, no? IMHO,
> this inter-dependency is not good for better usability.
>
> 2. Why can't the tests that this patch adds be simple? Why would it
> need to change the existing tests at all? I'm thinking to create a new
> 00X_binary_copy_format.pl or such and setting up logical replication
> with copy_format = binary and letting table sync worker request
> publisher in binary format - you can verify this via publisher server
> logs - look for COPY with BINARY option. If required, have the table
> with different data types. This greatly reduces the patch's footprint.

I've done performance testing with the v9 patch.

I can constantly observe 1.34X improvement or 25% improvement in table
sync/copy performance with the patch:
HEAD binary = false
Time: 214398.637 ms (03:34.399)

PATCHED binary = true, copy_format = binary:
Time: 160509.262 ms (02:40.509)

There's a clear reduction (5.68% to 4.49%) in the CPU cycles spent in
copy_table with the patch:
perf report HEAD:
-5.68% 0.00%  postgres  postgres  [.] copy_table
   - copy_table
  - 5.67% CopyFrom
 - 4.26% NextCopyFrom
- 2.16% NextCopyFromRawFields
   - 1.55% CopyReadLine
  - 1.52% CopyReadLineText
 - 0.76% CopyLoadInputBuf
  0.50% CopyConvertBuf
 0.60% CopyReadAttributesText
- 1.93% InputFunctionCall
 0.69% timestamptz_in
 0.53% byteain
 - 0.73% CopyMultiInsertInfoFlush
- 0.73% CopyMultiInsertBufferFlush
   - 0.66% table_multi_insert
0.65% heap_multi_insert

perf report PATCHED:
-4.49% 0.00%  postgres  postgres  [.] copy_table
   - copy_table
  - 4.48% CopyFrom
 - 2.36% NextCopyFrom
- 1.81% CopyReadBinaryAttribute
 1.47% ReceiveFunctionCall
 - 1.21% CopyMultiInsertInfoFlush
- 1.21% CopyMultiInsertBufferFlush
   - 1.11% table_multi_insert
  - 1.09% heap_multi_insert
 - 0.71% RelationGetBufferForTuple
- 0.63% ReadBufferBI
   - 0.62% ReadBufferExtended
0.62% ReadBuffer_common

I've tried to choose the table columns such that the binary send/recv
vs non-binary/plain/text copy has some benefit. The table has 100mn
rows, and is of 11GB size. I've measured the benefit using Melih's
helper function wait_for_rep(). Note that I had to compile source code
with -ggdb3 -O0 for perf report, otherwise with -O3 for performance
numbers:

wal_level = 'logical'
shared_buffers = '8GB'
wal_buffers = '1GB'
max_wal_size = '32GB'

create table foo(i int, n numeric, v varchar, b bytea, t timestamp
with time zone default current_timestamp);
insert into foo select i, i+1, md5(i::text), md5(i::text)::bytea from
generate_series(1, 1) i;

CREATE OR REPLACE PROCEDURE wait_for_rep()
LANGUAGE plpgsql
AS $$
BEGIN
WHILE (SELECT count(*) != 0 FROM pg_subscription_rel WHERE
srsubstate <> 'r') LOOP COMMIT;
END LOOP;
END;
$$;

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Adding argument names to aggregate functions

2023-02-27 Thread Dagfinn Ilmari Mannsåker
Hi hackers,

I'm sure I'm not the only one who can never remember which way around
the value and delimiter arguments go for string_agg() and has to look it
up in the manual every time. To make it more convenient, here's a patch
that adds proargnames to its pg_proc entries so that it can be seen with
a quick \df in psql.

I also added names to json(b)_object_agg() for good measure, even though
they're more obvious. The remaining built-in multi-argument aggregate
functions are the stats-related ones, where it's all just Y/X (but why
in that order?), so I didn't think it was necessary. If others feel more
strongly, I can add those too.

- ilmari

>From 73f323d5e97dca2e2452f5be199864a8358559c5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Mon, 27 Feb 2023 13:06:29 +
Subject: [PATCH] Add argument names to multi-argument aggregates

This makes it easier to see which way around the arguments go when
using \dfa.  This is particularly relevant for string_agg(), but add
it to json(b)_object_agg() too for good measure.
---
 src/include/catalog/pg_proc.dat | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index e2a7642a2b..f96d29278f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -4988,6 +4988,7 @@
 { oid => '3538', descr => 'concatenate aggregate input into a string',
   proname => 'string_agg', prokind => 'a', proisstrict => 'f',
   prorettype => 'text', proargtypes => 'text text',
+  proargnames => '{value,delimiter}',
   prosrc => 'aggregate_dummy' },
 { oid => '3543', descr => 'aggregate transition function',
   proname => 'bytea_string_agg_transfn', proisstrict => 'f',
@@ -5000,6 +5001,7 @@
 { oid => '3545', descr => 'concatenate aggregate input into a bytea',
   proname => 'string_agg', prokind => 'a', proisstrict => 'f',
   prorettype => 'bytea', proargtypes => 'bytea bytea',
+  proargnames => '{value,delimiter}',
   prosrc => 'aggregate_dummy' },
 
 # To ASCII conversion
@@ -8899,6 +8901,7 @@
 { oid => '3197', descr => 'aggregate input into a json object',
   proname => 'json_object_agg', prokind => 'a', proisstrict => 'f',
   provolatile => 's', prorettype => 'json', proargtypes => 'any any',
+  proargnames => '{key,value}',
   prosrc => 'aggregate_dummy' },
 { oid => '3198', descr => 'build a json array from any inputs',
   proname => 'json_build_array', provariadic => 'any', proisstrict => 'f',
@@ -9791,6 +9794,7 @@
 { oid => '3270', descr => 'aggregate inputs into jsonb object',
   proname => 'jsonb_object_agg', prokind => 'a', proisstrict => 'f',
   prorettype => 'jsonb', proargtypes => 'any any',
+  proargnames => '{key,value}',
   prosrc => 'aggregate_dummy' },
 { oid => '3271', descr => 'build a jsonb array from any inputs',
   proname => 'jsonb_build_array', provariadic => 'any', proisstrict => 'f',
-- 
2.39.1



[PoC] Add CANONICAL option to xmlserialize

2023-02-27 Thread Jim Jones

Hi,

In order to compare pairs of XML documents for equivalence it is 
necessary to convert them first to their canonical form, as described at 
W3C Canonical XML 1.1.[1] This spec basically defines a standard 
physical representation of xml documents that have more then one 
possible representation, so that it is possible to compare them, e.g. 
forcing UTF-8 encoding, entity reference replacement, attributes 
normalization, etc.


Although it is not part of the XML/SQL standard, it would be nice to 
have the option CANONICAL in xmlserialize. Additionally, we could also 
add the attribute WITH [NO] COMMENTS to keep or remove xml comments from 
the documents.


Something like this:

WITH t(col) AS (
 VALUES
  ('
  
  
  ]>

  
  http://postgresql.org";>

    
    
 http://postgresql.org"; >&val;

    
    

    
    ©

    
    
     321 
  
  '::xml)
)
SELECT xmlserialize(DOCUMENT col AS text CANONICAL) FROM t;
xmlserialize

 http://postgresql.org"; ns:a="1" ns:b="2" 
ns:c="3">42©attr="default"> 321 

(1 row)

-- using WITH COMMENTS

WITH t(col) AS (
 VALUES
  (' http://postgresql.org";>
    
     321 
  '::xml)
)
SELECT xmlserialize(DOCUMENT col AS text CANONICAL WITH COMMENTS) FROM t;
xmlserialize

 http://postgresql.org"; ns:a="1" ns:b="2" ns:c="3"> 321 

(1 row)


Another option would be to simply create a new function, e.g. 
xmlcanonical(doc xml, keep_comments boolean), but I'm not sure if this 
would be the right approach.


Attached a very short draft. What do you think?

Best, Jim

1- https://www.w3.org/TR/xml-c14n11/
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..f8f10f0ed9 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
 			{
 Datum	   *argvalue = op->d.xmlexpr.argvalue;
 bool	   *argnull = op->d.xmlexpr.argnull;
+XmlSerializeFormat	format = op->d.xmlexpr.xexpr->format;
+text	   *data;
 
 /* argument type is known to be xml */
 Assert(list_length(xexpr->args) == 1);
@@ -3837,9 +3839,15 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
 	return;
 value = argvalue[0];
 
-*op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),
-		 xexpr->xmloption));
 *op->resnull = false;
+
+data = xmltotext_with_xmloption(DatumGetXmlP(value),
+xexpr->xmloption);
+
+if (format == XMLDEFAULT_FORMAT)
+	*op->resvalue = PointerGetDatum(data);
+else if (format == XMLCANONICAL || format == XMLCANONICAL_WITH_COMMENTS)
+	*op->resvalue = PointerGetDatum(xmlserialize_canonical(data,format));
 			}
 			break;
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a0138382a1..af5f3dfdfd 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -619,6 +619,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	xmltable_column_option_el
 %type 	xml_namespace_list
 %type 	xml_namespace_el
+%type  	opt_xml_serialize_format
 
 %type 	func_application func_expr_common_subexpr
 %type 	func_expr func_expr_windowless
@@ -676,7 +677,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT
 	BOOLEAN_P BOTH BREADTH BY
 
-	CACHE CALL CALLED CASCADE CASCADED CASE CAST CATALOG_P CHAIN CHAR_P
+	CACHE CALL CALLED CANONICAL CASCADE CASCADED CASE CAST CATALOG_P CHAIN CHAR_P
 	CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
 	CLUSTER COALESCE COLLATE COLLATION COLUMN COLUMNS COMMENT COMMENTS COMMIT
 	COMMITTED COMPRESSION CONCURRENTLY CONFIGURATION CONFLICT
@@ -15532,13 +15533,14 @@ func_expr_common_subexpr:
 	$$ = makeXmlExpr(IS_XMLROOT, NULL, NIL,
 	 list_make3($3, $5, $6), @1);
 }
-			| XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename ')'
+			| XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename opt_xml_serialize_format ')'
 {
 	XmlSerialize *n = makeNode(XmlSerialize);
 
 	n->xmloption = $3;
 	n->expr = $4;
 	n->typeName = $6;
+	n->format = $7;
 	n->location = @1;
 	$$ = (Node *) n;
 }
@@ -15622,6 +15624,12 @@ xml_passing_mech:
 			| BY VALUE_P
 		;
 
+opt_xml_serialize_format:
+			CANONICAL{ $$ = XMLCANONICAL; }
+			| CANONICAL WITH NO COMMENTS			{ $$ = XMLCANONICAL; }
+			| CANONICAL WITH COMMENTS{ $$ = XMLCANONICAL_WITH_COMMENTS; }
+			| /*EMPTY*/{ $$ = XMLDEFAULT_FORMAT; }
+		;
 
 /*
  * Aggregate decoration clauses
@@ -16737,6 +16745,7 @@ unreserved_keyword:
 			| CACHE
 			| CALL
 			| CALLED
+			| CANONICAL
 			| CASCADE
 

Re: buildfarm + meson

2023-02-27 Thread Juan José Santamaría Flecha
On Fri, Feb 24, 2023 at 2:22 PM Andrew Dunstan  wrote:

>
> On drongo, this test isn't failing, and I think the reason is that it runs
> "make NO_LOCALE=1 check" so it never gets a database with win1252 encoding.
>
> I'm going to try adding a win1252 test to drongo's locales.
>

What seems to be failing is the setlocale() for 'de_DE'. I haven't been
able to reproduce it locally, but I've seen something similar reported for
python [1].

As a workaround, can you please test "SET lc_time TO 'de-DE';"?

[1] https://bugs.python.org/issue36792

Regards,

Juan José Santamaría Flecha


Add error functions: erf() and erfc()

2023-02-27 Thread Dean Rasheed
Now that we have random_normal(), it seems like it would be useful to
add the error functions erf() and erfc(), which I think are
potentially useful to the people who will find random_normal() useful,
and possibly others.

An immediate use for erf() is that it allows us to do a
Kolmogorov-Smirnov test for random_normal(), similar to the one for
random().

Both of these functions are defined in POSIX and C99, so in theory
they should be available on all platforms. If that turns out not to be
the case, then there's a commonly used implementation (e.g., see [1]),
which we could include. I played around with that (replacing the
direct bit manipulation stuff with frexp()/ldexp(), see pg_erf.c
attached), and it appeared to be accurate to +/-1 ULP across the full
range of inputs. Hopefully we won't need that though.

I tested this on a couple of different platforms and found I needed to
reduce extra_float_digits to -1 to get the regression tests to pass
consistently, due to rounding errors. It wouldn't surprise me if that
needs to be reduced further, though perhaps it's not necessary to have
so many tests (I included one test value from each branch, while
testing the hand-rolled implementation).

Regards,
Dean

[1] https://github.com/lsds/musl/blob/master/src/math/erf.c
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 0cbdf63..e30e285
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1289,6 +1289,41 @@ repeat('Pg', 4) PgPgPgPg

 
+ erf
+
+erf ( double precision )
+double precision
+   
+   
+Error function
+   
+   
+erf(1.0)
+0.8427007929497149
+   
+  
+
+  
+   
+
+ erfc
+
+erfc ( double precision )
+double precision
+   
+   
+Complementary error function (1 - erf(x), without
+loss of precision for large inputs)
+   
+   
+erfc(1.0)
+0.15729920705028513
+   
+  
+
+  
+   
+
  exp
 
 exp ( numeric )
diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
new file mode 100644
index d290b4c..aa79487
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -2742,6 +2742,53 @@ datanh(PG_FUNCTION_ARGS)
 }
 
 
+/* == ERROR FUNCTIONS == */
+
+
+/*
+ *		derf			- returns the error function: erf(arg1)
+ */
+Datum
+derf(PG_FUNCTION_ARGS)
+{
+	float8		arg1 = PG_GETARG_FLOAT8(0);
+	float8		result;
+
+	/*
+	 * For erf, we don't need an errno check because it never overflows.
+	 */
+	result = erf(arg1);
+
+	if (unlikely(isinf(result)))
+		float_overflow_error();
+
+	PG_RETURN_FLOAT8(result);
+}
+
+/*
+ *		derfc			- returns the complementary error function: 1 - erf(arg1)
+ */
+Datum
+derfc(PG_FUNCTION_ARGS)
+{
+	float8		arg1 = PG_GETARG_FLOAT8(0);
+	float8		result;
+
+	/*
+	 * For erfc, we don't need an errno check because it never overflows.
+	 */
+	result = erfc(arg1);
+
+	if (unlikely(isinf(result)))
+		float_overflow_error();
+
+	PG_RETURN_FLOAT8(result);
+}
+
+
+/* == RANDOM FUNCTIONS == */
+
+
 /*
  * initialize_drandom_seed - initialize drandom_seed if not yet done
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
new file mode 100644
index e2a7642..4ae8fef
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3465,6 +3465,13 @@
   proname => 'atanh', prorettype => 'float8', proargtypes => 'float8',
   prosrc => 'datanh' },
 
+{ oid => '8788', descr => 'error function',
+  proname => 'erf', prorettype => 'float8', proargtypes => 'float8',
+  prosrc => 'derf' },
+{ oid => '8789', descr => 'complementary error function',
+  proname => 'erfc', prorettype => 'float8', proargtypes => 'float8',
+  prosrc => 'derfc' },
+
 { oid => '1618',
   proname => 'interval_mul', prorettype => 'interval',
   proargtypes => 'interval float8', prosrc => 'interval_mul' },
diff --git a/src/test/regress/expected/float8.out b/src/test/regress/expected/float8.out
new file mode 100644
index 2b25784..0e89e24
--- a/src/test/regress/expected/float8.out
+++ b/src/test/regress/expected/float8.out
@@ -790,6 +790,45 @@ SELECT atanh(float8 'nan');
NaN
 (1 row)
 
+-- error functions
+-- we run these with extra_float_digits = -1, to get consistently rounded
+-- results on all platforms.
+SET extra_float_digits = -1;
+SELECT x,
+   erf(x),
+   erfc(x)
+FROM (VALUES (float8 '-infinity'),
+  (-28), (-6), (-3.4), (-2.1), (-1.1), (-0.45),
+  (-1.2e-9), (-2.3e-13), (-1.2e-17), (0),
+  (1.2e-17), (2.3e-13), (1.2e-9),
+  (0.45), (1.1), (2.1), (3.4), (6), (28),
+  (float8 'infinity'), (float8 'nan')) AS t(x);
+ x | erf  |erfc 
+---+--+-
+ -Infinity |   -1 |   2
+   -28 |   -1 | 

Re: meson vs make: missing/inconsistent ENV

2023-02-27 Thread Andrew Dunstan


On 2023-02-27 Mo 06:17, Dagfinn Ilmari Mannsåker wrote:

Justin Pryzby  writes:


On Sun, Feb 26, 2023 at 03:21:04PM -0800, Andres Freund wrote:

Is there any consideration of promoting these or other warnings to
fatal?

You mean the perl warnings?

Yes - it'd be nice if the warnings caused an obvious failure to allow
addressing the issue.  I noticed the icu warning while looking at a bug
in 0da243fed, and updating to add ZSTD.

Perl warnings can be made fatal with `use warnings FATAL =>
;`, but one should be careful about which categories to
fatalise, per.

Some categories are inherently unsafe to fatalise, as documented in
.



Yeah.


It would be nice if there were some fuller explanation of the various 
categories, but I don't know of one.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: meson vs make: missing/inconsistent ENV

2023-02-27 Thread Dagfinn Ilmari Mannsåker
Dagfinn Ilmari Mannsåker  writes:

> Justin Pryzby  writes:
>
>> On Sun, Feb 26, 2023 at 03:21:04PM -0800, Andres Freund wrote:
>>> > Is there any consideration of promoting these or other warnings to
>>> > fatal?
>>> 
>>> You mean the perl warnings?
>>
>> Yes - it'd be nice if the warnings caused an obvious failure to allow
>> addressing the issue.  I noticed the icu warning while looking at a bug
>> in 0da243fed, and updating to add ZSTD.  
>
> Perl warnings can be made fatal with `use warnings FATAL =>
> ;`, but one should be careful about which categories to
> fatalise, per .
>
> Some categories are inherently unsafe to fatalise, as documented in
> .

One disadvantage of making the warnings fatal is that it immediately
aborts the test.  Another option would be to to turn warnings into test
failures, à la https://metacpan.org/pod/Test::Warnings or
https://metacpan.org/pod/Test::FailWarnings.  Both those modules support
all the Perl versions we do, and have no non-core dependencies, but if
we don't want to add any more dependencies we can incorporate the logic
into one of our own testing modules.

- ilmari




Re: how does postgresql handle LOB/CLOB/BLOB column data that dies before the query ends

2023-02-27 Thread Noel Grandin
On Sat, 25 Feb 2023 at 19:06, Tom Lane  wrote:

> That could be a piece of the puzzle, yeah.
>
>
Thank you very much, this conversion has been a great help.

Regards, Noel Grandin


Re: WIN32 pg_import_system_collations

2023-02-27 Thread Andrew Dunstan


On 2023-02-26 Su 16:02, Andrew Dunstan wrote:



On 2023-01-03 Tu 08:48, Peter Eisentraut wrote:

On 09.12.22 13:48, Juan José Santamaría Flecha wrote:
On Thu, Dec 1, 2022 at 8:46 AM Peter Eisentraut 
> wrote:



    What is the status of this now?  I think the other issue has been
    addressed?


Yes, that's addressed for MSVC builds. I think there are a couple of 
pending issues for MinGW, but those should have their own threads.


The patch had rotten, so PFA a rebased version.


committed




Now that I have removed the barrier to testing this in the buildfarm, 
and added an appropriate locale setting to drongo, we can see that 
this test fails like this:



diff -w -U3 
c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/collate.windows.win1252.out
 
c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/results/collate.windows.win1252.out
--- 
c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/expected/collate.windows.win1252.out
  2023-01-23 04:39:06.755149600 +
+++ 
c:/prog/bf/root/HEAD/pgsql.build/src/test/regress/results/collate.windows.win1252.out
   2023-02-26 17:32:54.115515200 +
@@ -363,16 +363,17 @@
  
  -- to_char

  SET lc_time TO 'de_DE';
+ERROR:  invalid value for parameter "lc_time": "de_DE"
  SELECT to_char(date '2010-03-01', 'DD TMMON ');
 to_char
  -
- 01 MRZ 2010
+ 01 MAR 2010
  (1 row)
  
  SELECT to_char(date '2010-03-01', 'DD TMMON ' COLLATE "de_DE");

 to_char
  -
- 01 MRZ 2010
+ 01 MAR 2010
  (1 row)
  
  -- to_date



The last of these is especially an issue, as it doesn't even throw an 
error.


See 








Further investigation shows that if we change the two instances of 
"de_DE" to "de-DE" the tests behave as expected, so it appears that 
while POSIX style aliases have been created for the BCP 47 style 
locales, using the POSIX aliases doesn't in fact work. I cant see 
anything that turns the POSIX locale name back into BCP 47 at the point 
of use, which seems to be what's needed.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: meson vs make: missing/inconsistent ENV

2023-02-27 Thread Dagfinn Ilmari Mannsåker
Justin Pryzby  writes:

> On Sun, Feb 26, 2023 at 03:21:04PM -0800, Andres Freund wrote:
>> > Is there any consideration of promoting these or other warnings to
>> > fatal?
>> 
>> You mean the perl warnings?
>
> Yes - it'd be nice if the warnings caused an obvious failure to allow
> addressing the issue.  I noticed the icu warning while looking at a bug
> in 0da243fed, and updating to add ZSTD.  

Perl warnings can be made fatal with `use warnings FATAL =>
;`, but one should be careful about which categories to
fatalise, per .

Some categories are inherently unsafe to fatalise, as documented in
.

- ilmari




Re: SQL/JSON revisited

2023-02-27 Thread Amit Langote
On Mon, Feb 27, 2023 at 4:45 PM Amit Langote  wrote:
> On Tue, Feb 21, 2023 at 2:25 AM Andres Freund  wrote:
> > Evaluating N expressions for a json table isn't a good approach, both memory
> > and CPU efficiency wise.
>
> Are you referring to JsonTableInitOpaque() initializing all these
> sub-expressions of JsonTableParent, especially colvalexprs, using N
> *independent* ExprStates?  That could perhaps be made to work by
> making JsonTableParent be an expression recognized by execExpr.c, so
> that a single ExprState can store the steps for all its
> sub-expressions, much like JsonExpr is.  I'll give that a try, though
> I wonder if the semantics of making this work in a single
> ExecEvalExpr() call will mismatch that of the current way, because
> different sub-expressions are currently evaluated under different APIs
> of TableFuncRoutine.

I was looking at this and realized that using N ExprStates for various
subsidiary expressions is not something specific to JSON_TABLE
implementation.  I mean we already have bunch of ExprStates being
created in ExecInitTableFuncScan():

scanstate->ns_uris =
ExecInitExprList(tf->ns_uris, (PlanState *) scanstate);
scanstate->docexpr =
ExecInitExpr((Expr *) tf->docexpr, (PlanState *) scanstate);
scanstate->rowexpr =
ExecInitExpr((Expr *) tf->rowexpr, (PlanState *) scanstate);
scanstate->colexprs =
ExecInitExprList(tf->colexprs, (PlanState *) scanstate);
scanstate->coldefexprs =
ExecInitExprList(tf->coldefexprs, (PlanState *) scanstate);

Or maybe you're worried about jsonpath_exec.c using so many ExprStates
*privately* to put into TableFuncScanState.opaque?

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: Allow logical replication to copy tables in binary format

2023-02-27 Thread Bharath Rupireddy
On Fri, Feb 24, 2023 at 8:32 AM Peter Smith  wrote:
>
> Here is my summary of this thread so far, plus some other thoughts.

Thanks.

> 1. Initial Goal
> --
>
> Currently, the CREATE SUBSCRIPTION ... WITH(binary=true) mode does
> data replication in binary mode, but tablesync COPY phase is still
> only in text mode. IIUC Melih just wanted to unify those phases so
> binary=true would mean also do the COPY phase in binary [1].
>
> Actually, this was my very first expectation too.
>
> 2. Objections to unification
> ---
>
> Bharath posted suggesting tying the COPY/replication parts is not a
> good idea [2]. But if these are not to be unified then it requires a
> new subscription option to be introduced, and currently, the thread
> refers to this new option as copy_format=binary.

Looking closely at the existing binary=true option and COPY's binary
protocol, essentially they depend on the data type's binary send and
receive functions.

> 3. A problem: binary replication is not always binary!
> --
>
> Shi-san reported [3] that under special circumstances (e.g. if the
> datatype has no binary output function) the current HEAD binary=true
> mode for replication has the ability to fall back to text replication.
> Since the binary COPY doesn't do this, it means binary COPY could fail
> in some cases where the binary=true replication will be OK.

Right. Apply workers can fallback to text mode transparently, whereas
with binary COPY it's a bit difficult to do so.

> AFAIK, this means that even if we *wanted* to unify everything with
> just 'binary=true' it can't be done like that.

Hm, it looks like that.

> 4. New option is needed
> -
>
> OK, so let's assume these options must be separated (because of the
> problem of #3).
>
> ~
>
> 4a. New string option called copy_format?
>
> Currently, the patch/thread describes a new 'copy_format' string
> option with values 'text' (default) and 'binary'.
>
> Why? If there are only 2 values then IMO it would be better to have a
> *boolean* option called something like binary_copy=true/false.
>
> ~
>
> 4b. Or, we could extend copy_data
>
> Jelte suggested [4] we could extend copy_data = 'text' (aka on/true)
> OR 'binary' OR 'none' (aka off/false).

How about copy_binary = {true/false}? So, the options for the user are:

copy_binary - defaults to false and when true, the subscriber requests
publisher to send pre-existing table's data in binary format (as
opposed to text) during data synchronization/table copy phase. It is
recommended to enable this option only when 1) the column data types
have appropriate binary send/receive functions, 2) not replicating
between different major versions or different platforms, 3) both
publisher and subscriber tables have the exact same column types (not
when replicating from smallint to int or numeric to int8 and so on), 4)
both publisher and subscriber supports COPY with binary option,
otherwise the table copy can fail.

binary - defaults to false and when true, the subscriber requests
publisher to send table's data in binary format (as opposed to text)
during normal replication phase. <>

Note that with this we made users responsible to choose copy_binary
rather than we being smart.

> AFAIK currently the patch disallows some combinations:
>
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("%s and %s are mutually exclusive options",
> + "binary = false", "copy_format = binary")));
>
> 6. pub/sub version checking
> 
>
> There were some discussions about doing some server checking to reject
> some PG combinations from being allowed to use the copy_format=binary.

IMHO, these restrictions make the feature more complicated to use and
be removed and the options be made simple to use (usability and
simplicity clearly wins the race). If there's any kind of feedback
from the users/developers, we can always come back and improve.

> But if the publication was defined as FOR ALL TABLES, or as ALL TABLES
> IN SCHEMA, then I think the tablesync can still crash if a user
> creates a new table that suffers the same COPY binary trouble Shi-san
> described earlier [3].
>
> What is the user supposed to do when that tablesync fails?
>
> They had no way to predict it could happen, And it will be too painful
> to have to DROP and re-create the entire SUBSCRIPTION again now that
> it has happened.

Can't ALTER SUBSCRIPTION  SET copy_format = 'text'; and ALTER
SUBSCRIPTION ... REFRESH PUBLICATION WITH (copy_data = true); work
here instead of drop and recreate subscription?

> 7a. A thought bubble
>
> I wondered (perhaps this is naive) would it be it possible to enhance
> the COPY command to enhance the "binary" mode so it can be made to
> fall back to text mode if it needs to in the same way that binary
> replication does.
>
> If such an enhanced COPY format mode worked, th

Re: pg_upgrade and logical replication

2023-02-27 Thread Amit Kapila
On Sun, Feb 26, 2023 at 8:35 AM Julien Rouhaud  wrote:
>
> On Sat, Feb 25, 2023 at 11:24:17AM +0530, Amit Kapila wrote:
> > >
> > > The new command is of the form
> > >
> > > ALTER SUBSCRIPTION name ADD TABLE (relid = X, state = 'Y', lsn = 'Z/Z')
> > >
> > > with the lsn part being optional.
> > >
> >
> > BTW, do we restore the origin and its LSN after the upgrade? Because
> > without that this won't be sufficient as that is required for apply
> > worker to ensure that it is in sync with table sync workers.
>
> We currently don't, which is yet another sign that no one actually tried to
> resume logical replication after a pg_upgrade.  That being said, trying to
> pg_upgrade a node that's currently syncing relations seems like a bad idea
> (I didn't even think to try), but I guess it should also be supported.  I will
> work on that too.  Assuming we add a new option for controlling either plain
> pg_dump and/or pg_upgrade behavior, should this option control both
> pg_subscription_rel and replication origins and their data or do we need more
> granularity?
>

My vote would be to have one option for both. BTW, thinking some more
on this, how will we allow to continue replication after upgrading the
publisher? During upgrade, we don't retain slots, so the replication
won't continue. I think after upgrading subscriber-node, user will
need to upgrade the publisher as well.

-- 
With Regards,
Amit Kapila.




RE: Rework LogicalOutputPluginWriterUpdateProgress

2023-02-27 Thread wangw.f...@fujitsu.com
On Thur, Feb 23, 2023 at 18:41 PM Kuroda, Hayato/�\田 隼人 
 wrote:
> Dear Wang,
> 
> Thank you for making the patch. IIUC your patch basically can achieve that
> output plugin
> does not have to call UpdateProgress.

Thanks for your review and comments.

> I think the basic approach is as follows, is it right?
> 
> 1. In *_cb_wrapper, set ctx->did_write to false
> 2. In OutputPluginWrite() set ctx->did_write to true.
>This means that changes are really written, not skipped.
> 3. At the end of the transaction, call update_progress_and_keepalive().
>Even if we are not at the end, check skipped count and call the function if
> needed.
>The counter will be reset if ctx->did_write is true or we exceed the 
> threshold.

Yes, you are right.
For the reset of the counter, please also refer to the reply to #05.

> Followings are my comments. I apologize if I missed some previous discussions.
> 
> 01. logical.c
> 
> ```
> +static void update_progress_and_keepalive(struct LogicalDecodingContext *ctx,
> + 
> bool finished_xact);
> +
> +static bool is_skip_threshold_change(struct LogicalDecodingContext *ctx);
> ```
> 
> "struct" may be not needed.

Removed.

> 02. UpdateDecodingProgressAndKeepalive
> 
> I think the name should be UpdateDecodingProgressAndSendKeepalive(),
> keepalive is not verb.
> (But it's ok to ignore if you prefer the shorter name)
> Same thing can be said for the name of datatype and callback.

Yes, I prefer the shorter one. Otherwise, I think some names would be longer.

> 03. UpdateDecodingProgressAndKeepalive
> 
> ```
> +   /* set output state */
> +   ctx->accept_writes = false;
> +   ctx->write_xid = xid;
> +   ctx->write_location = lsn;
> +   ctx->did_write = false;
> ```
> 
> Do we have to modify accept_writes, write_xid, and write_location here?
> These value is not used in WalSndUpdateProgressAndKeepalive().

I think it might be better to set these three flags.
Since LogicalOutputPluginWriterUpdateProgressAndKeepalive is an open callback, I
think setting write_xid and write_location is not just for the function
WalSndUpdateProgressAndKeepalive. And I think setting accept_writes could
prevent some wrong usage.

> 04. stream_abort_cb_wrapper
> 
> ```
> +   update_progress_and_keepalive(ctx, true)
> ```
> 
> I'm not sure, but is it correct that call update_progress_and_keepalive() with
> finished_xact = true? Isn't there a possibility that streamed sub-transaciton 
> is
> aborted?

Fixed.

> 05. is_skip_threshold_change
> 
> At the end of the transaction, update_progress_and_keepalive() is called 
> directly.
> Don't we have to reset change_count here?

I think this might complicate the function is_skip_threshold_change, so I didn't
reset the counter in this case.
I think the worst case of not resetting the counter is to delay sending the
keepalive message for the next transaction. But since the threshold we're using
is safe enough, it seems fine to me not to reset the counter in this case.
Added these related comments in the function is_skip_threshold_change.

> 06. ReorderBufferAbort
> 
> Assuming that the top transaction is aborted. At that time
> update_progress_and_keepalive()
> is called in stream_abort_cb_wrapper(), an then
> WalSndUpdateProgressAndKeepalive()
> is called at the end of ReorderBufferAbort(). Do we have to do in both?
> I think stream_abort_cb_wrapper() may be not needed.

Yes, I think we only need one call for this case.
To make the behavior in *_cb_wrapper look consistent, I avoided the second call
for this case in the function ReorderBufferAbort.

> 07. WalSndUpdateProgress
> 
> You renamed ProcessPendingWrites() to WalSndSendPending(), but it may be
> still strange
> because it will be called even if there are no pending writes.
> 
> Isn't it sufficient to call ProcessRepliesIfAny(), WalSndCheckTimeOut() and
> (at least) WalSndKeepaliveIfNecessary()in the case? Or better name may be
> needed.

I think after sending the keepalive message (in WalSndKeepaliveIfNecessary), we
need to make sure the pending data is flushed through the loop.

Attach the new patch.

Regards,
Wang wei


v3-0001-Rework-LogicalOutputPluginWriterUpdateProgress.patch
Description:  v3-0001-Rework-LogicalOutputPluginWriterUpdateProgress.patch


  1   2   >