Re: Compilation issue on Solaris.

2022-07-08 Thread Thomas Munro
On Sat, Jul 9, 2022 at 2:02 PM Ibrar Ahmed  wrote:
> Thanks for looking at that, yes you are right, the attached patch do that now
>
>  if test "$PORTNAME" = "solaris"; then
>
>CPPFLAGS="$CPPFLAGS -D_POSIX_PTHREAD_SEMANTICS"
>
> +  CPPFLAGS="$CPPFLAGS -D__STDC_WANT_LIB_EXT1__"
>
>  fi

Hmm.  K.3.3.1 of [1] says you can show or hide all that _s stuff by
defining that macro to 0 or 1 before you include , but it's
implementation-defined whether they are exposed by default, and the
template file is one way to deal with that
implementation-definedness... it's not quite in the autoconf spirit
though, it's kinda manual.  Another approach would be to define it
unconditionally at the top of explicit_bzero.c before including "c.h",
on all platforms.  The man page on my system tells me I should do that
anyway, even though you don't need to on my system.

Why is your Solaris system trying to compile that file in the first
place?  A quick check of the Solaris and Illumos build farm animals
and some online man pages tells me they have explicit_bzero().
Ahhh... looks like it came a few years ago in some Solaris 11.4
update[2], and Illumos (which forked around 10) probably added it
independently (why do Solaris man pages not have a history section to
tell us these things?!).  I guess you must be running an old version.
OK then.

[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
[2] https://blogs.oracle.com/solaris/post/expanding-the-library




Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT

2022-07-08 Thread Bruce Momjian
On Fri, Jul  1, 2022 at 08:11:36AM -0700, David G. Johnston wrote:
> That said, I still think that the current wording should be tweak with respect
> to row vs. rows (especially if we continue to call it a table):
> 
> Current:
> "The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the
> existing row using the table's name (or an alias), and to [rows] proposed
> for insertion using the special excluded table."
> 
> Change [rows] to:
> 
> "the row"
> 
> 
> I'm undecided whether "FROM excluded" should be something that works - but I
> also don't think it would actually be used in any case.

I found two places where a singular "row" would be better, doc patch
attached.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml
index a9af9959c0..022ac99b75 100644
--- a/doc/src/sgml/ref/insert.sgml
+++ b/doc/src/sgml/ref/insert.sgml
@@ -176,7 +176,7 @@ INSERT INTO table_name [ AS ON CONFLICT DO UPDATE
 targets a table named excluded, since that will otherwise
-be taken as the name of the special table representing rows proposed
+be taken as the name of the special table representing the row proposed
 for insertion.

   
@@ -396,7 +396,7 @@ INSERT INTO table_name [ AS SET and
 WHERE clauses in ON CONFLICT DO
 UPDATE have access to the existing row using the
-table's name (or an alias), and to rows proposed for insertion
+table's name (or an alias), and to the row proposed for insertion
 using the special excluded table.
 SELECT privilege is required on any column in the
 target table where corresponding excluded


Re: doc: Clarify Routines and Extension Membership

2022-07-08 Thread Bruce Momjian
On Tue, Jul  5, 2022 at 08:12:09PM -0400, Tom Lane wrote:
> "David G. Johnston"  writes:
> 
> +  A function that's marked as dependent on an extension is dropped when 
> the
> +  extension is dropped, even if cascade is not specified.
> +  dependency checking in restrict mode  linkend="sql-dropextension"/>.
> +  A function can depend upon multiple extensions, and will be dropped 
> when
> +  any one of those extensions is dropped.
> 
> Third line here seems like a copy/paste mistake?  Also I'd tend
> to mark up the keyword as CASCADE.
> 
> +  This form marks the procedure as dependent on the extension, or no 
> longer
> +  dependent on that extension if NO is specified.
> 
> The/that inconsistency ... choose one.  Or actually, the "an ... the"
> combination you used elsewhere doesn't grate on the ear either.
> 
> +  For each extension, refuse to drop anything if any objects (other than 
> the
> +  extensions listed) depend on it.  However, its own member objects, and 
> routines
> +  that are explicitly dependent on this extension, are skipped.
> +  This is the default.
> 
> "skipped" seems like a horrible choice of word; it could easily be read as
> "they don't get dropped".  I am not convinced that mentioning the member
> objects here is an improvement either.  In the first sentence you are
> treating each extension as a monolithic object; why not in the second?

I created a modified patch based on this feedback;  patch attached.  I
rewrote the last change.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

diff --git a/doc/src/sgml/ref/alter_function.sgml b/doc/src/sgml/ref/alter_function.sgml
index 0ee756a94d..2e8e1162d8 100644
--- a/doc/src/sgml/ref/alter_function.sgml
+++ b/doc/src/sgml/ref/alter_function.sgml
@@ -160,8 +160,10 @@ ALTER FUNCTION name [ ( [ [ extension_name
 
  
-  The name of the extension that the procedure is to depend on.
+  This form marks the procedure as dependent on the extension, or no longer
+  dependent on the extension if NO is specified.
+  A procedure that's marked as dependent on an extension is dropped when the
+  extension is dropped, even if cascade is not specified.
+  A procedure can depend upon multiple extensions, and will be dropped when
+  any one of those extensions is dropped.
  
 

diff --git a/doc/src/sgml/ref/drop_extension.sgml b/doc/src/sgml/ref/drop_extension.sgml
index 5e507dec92..c01ddace84 100644
--- a/doc/src/sgml/ref/drop_extension.sgml
+++ b/doc/src/sgml/ref/drop_extension.sgml
@@ -30,7 +30,9 @@ DROP EXTENSION [ IF EXISTS ] name [
 
   
DROP EXTENSION removes extensions from the database.
-   Dropping an extension causes its component objects to be dropped as well.
+   Dropping an extension causes its component objects, and other explicitly
+   dependent routines (see ,
+   the depends on extension action), to be dropped as well.
   
 
   
@@ -77,9 +79,9 @@ DROP EXTENSION [ IF EXISTS ] name [
 RESTRICT
 
  
-  Refuse to drop the extension if any objects depend on it (other than
-  its own member objects and other extensions listed in the same
-  DROP command).  This is the default.
+  This option prevents the specified extensions from being dropped
+  if there exists non-extension-member objects that depends on any
+  the extensions.  This is the default.
  
 



Re: doc: Fix description of how the default user name is chosen

2022-07-08 Thread Bruce Momjian
On Fri, Jul  8, 2022 at 10:17:11PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Tue, Jul  5, 2022 at 08:20:25PM -0400, Tom Lane wrote:
> >> I agree this phrasing needs some work, but "resolved" doesn't seem
> >> helpful, since it's not defined here or nearby.  Maybe "The default
> >> database name is the specified (or defaulted) user name." ?
> 
> > I am not seeing much improvement in the proposed patch either.  I wonder
> > if we should be calling this the "session" or "connection" user name. 
> > When the docs say "if you do not specify a database name, it defaults to
> > the database user name", there is so much "database in there that the
> > meaing is unclear, and in this context, the user name is a property of
> > the connection or session, not of the database.
> 
> Umm ... you could make the exact same statement with respect to the
> user's operating-system login session, so I doubt that "session" or
> "connection" adds any clarity.

Well, one confusion is that there is a database name and a database user
name.  We don't have different operating system names that users can
connect to, usually.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: Two successive tabs in test case are causing syntax error in psql

2022-07-08 Thread Jingtang Zhang
I see, thank you.

Tom Lane  于2022年7月9日周六 03:35写道:

> Jingtang Zhang  writes:
> > Recently, when I was developing some function about INSERT ... ON
> CONFLICT,
> > I used test cases in `src/test/regress/sql/insert_conflict.sql` to
> evaluate
> > my function. When I copy the CREATE TABLE from this case alone, and paste
> > it to psql, I got a syntax error. As I go through the case carefully, I
> > found the CREATE TABLE uses two tabs to separate column name and column
> > type, and this two tabs are regarded as an auto completion instruction by
> > psql, causing no separation between column name and column type anymore.
>
> > It may not be a problem since this case has passed the regression, but
> > would it be better to use space here to avoid this confusing situation?
>
> There are tabs all through the regression test files, and we're certainly
> not going to remove them all.  (If we did, we'd lose test coverage of
> whether the parser accepts tabs as whitespace.)  So I can't get excited
> about removing one or two.
>
> The usual recommendation for pasting text into psql when it contains
> tabs is to start psql with the -n switch to disable tab completion.
>
> regards, tom lane
>


Re: doc: Fix description of how the default user name is chosen

2022-07-08 Thread Tom Lane
Bruce Momjian  writes:
> On Tue, Jul  5, 2022 at 08:20:25PM -0400, Tom Lane wrote:
>> I agree this phrasing needs some work, but "resolved" doesn't seem
>> helpful, since it's not defined here or nearby.  Maybe "The default
>> database name is the specified (or defaulted) user name." ?

> I am not seeing much improvement in the proposed patch either.  I wonder
> if we should be calling this the "session" or "connection" user name. 
> When the docs say "if you do not specify a database name, it defaults to
> the database user name", there is so much "database in there that the
> meaing is unclear, and in this context, the user name is a property of
> the connection or session, not of the database.

Umm ... you could make the exact same statement with respect to the
user's operating-system login session, so I doubt that "session" or
"connection" adds any clarity.

regards, tom lane




Re: Eliminating SPI from RI triggers - take 2

2022-07-08 Thread Tom Lane
Robert Haas  writes:
> ...  I think there's some
> debate to be had here over what behavior we need to preserve exactly
> vs. what we can and should change.

For sure.  For example, people occasionally complain because
user-defined triggers can defeat RI integrity checks.  Should we
change that?  I dunno, but if we're not using the standard executor
then there's at least some room to consider it.  I think people would
be upset if we stopped firing user triggers at all; but if triggers
couldn't defeat RI actions short of throwing a transaction-aborting
error, I believe a lot of people would consider that an improvement.

> For instance, it seems clear to me
> that leaving out permissions checks altogether would be not OK, but if
> this implementation arranged to cache the results of a permission
> check and the SQL-based implementations don't, is that OK? Maybe Tom
> would argue that it isn't, because he considers that a part of the
> user-visible behavior, but I'm not sure that's the right view of it.

Uh ... if such caching behavior is at all competently implemented,
it will be transparent because the cache will notice and respond to
events that should change its outputs.  So I don't foresee a semantic
problem there.  It may well be that it's practical to cache
permissions-check info for RI checks when it isn't for more general
queries, so looking into ideas like that seems well within scope here.
(Or then again, maybe we should be building a more general permissions
cache?)

I'm too tired to have more than that to say right now, but I agree
that there is room for discussion about exactly what behavior we
want to preserve.

regards, tom lane




Re: Compilation issue on Solaris.

2022-07-08 Thread Ibrar Ahmed
On Sat, Jul 9, 2022 at 6:46 AM Tom Lane  wrote:

> Ibrar Ahmed  writes:
> > While compiling the PostgreSQL I have found that *memset_s function
> > requires a define "*__STDC_WANT_LIB_EXT1__*" *
> > *explicit_bzero.c:* In function ‘*explicit_bzero*’:
> > *explicit_bzero.c:23:9:* *warning: *implicit declaration of function ‘
> > *memset_s*’; did you mean ‘*memset*’? [*-Wimplicit-function-declaration*]
>
> Hmm.
>
> > Attached is the patch to define that in the case of Solaris.
>
> If you don't have any test you want to make before adding the
> #define, I don't think this is idiomatic use of autoconf.
> Personally I'd have just added "-D__STDC_WANT_LIB_EXT1__" into
> the CPPFLAGS for Solaris, perhaps in src/template/solaris,
> or maybe just adjust the stanza immediately above this one:
>
> if test "$PORTNAME" = "solaris"; then
>   CPPFLAGS="$CPPFLAGS -D_POSIX_PTHREAD_SEMANTICS"
> fi
>
> regards, tom lane
>

Thanks for looking at that, yes you are right, the attached patch do that
now

 if test "$PORTNAME" = "solaris"; then

   CPPFLAGS="$CPPFLAGS -D_POSIX_PTHREAD_SEMANTICS"

+  CPPFLAGS="$CPPFLAGS -D__STDC_WANT_LIB_EXT1__"

 fi

-- 
Ibrar Ahmed


solaris_memset_s_v2.patch
Description: Binary data


Re: doc: Fix description of how the default user name is chosen

2022-07-08 Thread Bruce Momjian
On Tue, Jul  5, 2022 at 08:20:25PM -0400, Tom Lane wrote:
> "David G. Johnston"  writes:
> > In passing, the authentication error examples use the phrase
> > "database user name" in a couple of locations.  The word
> > database in both cases is both unusual and unnecessary for
> > understanding.  The reference to user name means the one in/for the
> > database unless otherwise specified.
> 
> I'm not convinced that just saying "user name" is an improvement.
> The thing that we are trying to clarify in much of this section
> is the relationship between your operating-system-assigned user
> name and your database-cluster-assigned user name.  So just saying
> "user name" adds an undesirable element of ambiguity.
> 
> Maybe we could change "database user name" to "Postgres user name"?
> 
> -if you do not specify a database name, it defaults to the database
> -user name, which might or might not be the right thing.
> +if the database name shown matches the user name you are connecting
> +as it is not by accident: the default database name is the
> +user name.
> 
> This does absolutely not seem like an improvement.
> 
>  Since the database server uses the same default, you will not have
>  to specify the port in most cases. The default user name is your
> -operating-system user name, as is the default database name.
> +operating-system user name. The default database name is the resolved 
> user name.
> 
> I agree this phrasing needs some work, but "resolved" doesn't seem
> helpful, since it's not defined here or nearby.  Maybe "The default
> database name is the specified (or defaulted) user name." ?

I am not seeing much improvement in the proposed patch either.  I wonder
if we should be calling this the "session" or "connection" user name. 
When the docs say "if you do not specify a database name, it defaults to
the database user name", there is so much "database in there that the
meaing is unclear, and in this context, the user name is a property of
the connection or session, not of the database.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: Compilation issue on Solaris.

2022-07-08 Thread Tom Lane
Ibrar Ahmed  writes:
> While compiling the PostgreSQL I have found that *memset_s function
> requires a define "*__STDC_WANT_LIB_EXT1__*" *
> *explicit_bzero.c:* In function ‘*explicit_bzero*’:
> *explicit_bzero.c:23:9:* *warning: *implicit declaration of function ‘
> *memset_s*’; did you mean ‘*memset*’? [*-Wimplicit-function-declaration*]

Hmm.

> Attached is the patch to define that in the case of Solaris.

If you don't have any test you want to make before adding the
#define, I don't think this is idiomatic use of autoconf.
Personally I'd have just added "-D__STDC_WANT_LIB_EXT1__" into
the CPPFLAGS for Solaris, perhaps in src/template/solaris,
or maybe just adjust the stanza immediately above this one:

if test "$PORTNAME" = "solaris"; then
  CPPFLAGS="$CPPFLAGS -D_POSIX_PTHREAD_SEMANTICS"
fi

regards, tom lane




Compilation issue on Solaris.

2022-07-08 Thread Ibrar Ahmed
Hi,

While compiling the PostgreSQL I have found that *memset_s function
requires a define "*__STDC_WANT_LIB_EXT1__*" *

*explicit_bzero.c:* In function ‘*explicit_bzero*’:

*explicit_bzero.c:23:9:* *warning: *implicit declaration of function ‘
*memset_s*’; did you mean ‘*memset*’? [*-Wimplicit-function-declaration*]

  (void) *memset_s*(buf, len, 0, len);

 *^~~~*

Attached is the patch to define that in the case of Solaris.


-- 
Ibrar Ahmed


solaris_memset_s_v1.patch
Description: Binary data


Re: doc: Move enum storage commentary to top of section

2022-07-08 Thread Bruce Momjian
On Fri, Jul  8, 2022 at 09:21:31PM -0400, Bruce Momjian wrote:
> On Wed, Jul  6, 2022 at 10:34:58AM -0700, David G. Johnston wrote:
> > Agreed.
> > 
> > Tangentially: It does seem a bit unusual to call the fact that the values 
> > both
> > case-sensitive and limited to the length of a system identifier an
> > implementation detail.  But if anything the length is more of one than the
> > case-sensitivity.  Specifying NAMEDATALEN here seems like it breaks
> > encapsulation, it could refer by comparison to an identifier and only those
> > that care can learn how that length might be changed in a custom build of
> > PostgreSQL.
> 
> I don't think we can do much to improve what we have already in the
> docs.

I have marked the commit-fest entry as returned with feedback.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: doc: Move enum storage commentary to top of section

2022-07-08 Thread Bruce Momjian
On Wed, Jul  6, 2022 at 10:34:58AM -0700, David G. Johnston wrote:
> Agreed.
> 
> Tangentially: It does seem a bit unusual to call the fact that the values both
> case-sensitive and limited to the length of a system identifier an
> implementation detail.  But if anything the length is more of one than the
> case-sensitivity.  Specifying NAMEDATALEN here seems like it breaks
> encapsulation, it could refer by comparison to an identifier and only those
> that care can learn how that length might be changed in a custom build of
> PostgreSQL.

I don't think we can do much to improve what we have already in the
docs.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: doc: Bring mention of unique index forced transaction wait behavior outside of the internal section

2022-07-08 Thread Bruce Momjian
On Tue, Jun 21, 2022 at 09:07:42AM -0700, David G. Johnston wrote:
> On Tue, Jun 21, 2022 at 6:49 AM Aleksander Alekseev 
> wrote:
> 
> Hi David,
> 
> > It's basically a glorified cross-reference.  I didn't dislike directing
> the reader to the internals section enough to try and establish a better
> location for the main content.
> 
> One problem I see is that:
> 
> + [..], but as there is no pre-existing data, visibility checks are
> unnecessary.
> 
> ... allows a wide variety of interpretations, most of which will be
> wrong. And all in all I find an added paragraph somewhat cryptic.
> 
> Yeah, I'd probably have to say "but since no existing record is being 
> modified,
> visibility checks are unnecessary".
> 
> Is there a specific mis-interpretation that first came to mind for you that I
> can consider specifically?
> 
> 
> If the goal is to add a cross-reference I suggest keeping it short,
> something like "For additional details on various corner cases please
> see ...".
> 
> That does work, and I may end up there, but it feels unsatisfying to be so
> vague/general.

I was not happy with putting this in the Transaction Isolation section.
I rewrote it and put it in the INSERT secion, right before ON CONFLICT; 
patch attached.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml
index a9af9959c0..29e92546ef 100644
--- a/doc/src/sgml/ref/insert.sgml
+++ b/doc/src/sgml/ref/insert.sgml
@@ -75,6 +75,11 @@ INSERT INTO table_name [ AS 
 
   
+   INSERT into tables that lack unique indexes will
+   not be blocked by concurrent activity.  Tables with unique indexes
+   might block if concurrent sessions perform actions that lock or modify
+   rows matching the unique index values being inserted;  the details
+   are covered in .
ON CONFLICT can be used to specify an alternative
action to raising a unique constraint or exclusion constraint
violation error. (See  below.)


Re: doc: array_length produces null instead of 0

2022-07-08 Thread Bruce Momjian
On Tue, Jun 21, 2022 at 09:02:41AM -0700, David G. Johnston wrote:
> On Tue, Jun 21, 2022 at 6:33 AM Aleksander Alekseev 
> Maybe it's worth using `array_length(array[] :: int[], 1)` instead.
> 
> I think subconsciously the cast looked ugly to me so I probably skipped adding
> it.  I do agree the example should be executable though, and most of the
> existing examples use integer[] (not the abbreviated form, int) so I'll plan 
> to
> go with that.

Patch applied through PG 13, with adjustments suggested above.  Our doc
formatting for pre-PG 13 was too different for me to risk backpatching
further back.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections

2022-07-08 Thread Yura Sokolov
В Сб, 09/07/2022 в 02:32 +0300, Yura Sokolov пишет:
> В Пт, 08/07/2022 в 11:04 -0400, Robert Haas пишет:
> > On Fri, Jul 8, 2022 at 10:11 AM Yura Sokolov  
> > wrote:
> > > I see analogy with Bus Stop:
> > > - there is bus stop
> > > - there is a schedule of bus arriving this top
> > > - there are passengers, who every day travel with this bus
> > > 
> > > Bus occasionally comes later... Well, it comes later quite often...
> > > 
> > > Which way Major (or other responsible person) should act?
> > 
> > I do not think that is a good analogy, because a bus schedule is an
> > implicit promise - or at least a strong suggestion - that the bus will
> > arrive at the scheduled time.
> 
> There is implicit promise: those data are written in single row.
> If you want to notice they are NOT related to each other, return them
> in different rows or even in different view tables.
> 
> > In this case, who made such a promise? The original post presents it
> > as fact that these systems should give compatible answers at all
> > times, but there's nothing in the code or documentation to suggest
> > that this is true.
> > 
> > IMHO, a better analogy would be if you noticed that the 7:03am bus was
> > normally blue and you took that one because you have a small child who
> > likes the color blue and it makes them happy to take a blue bus. And
> > then one day the bus at that time is a red bus and your child is upset
> > and you call the major (or other responsible person) to complain.
> > They're probably not going to handle that situation by trying to send
> > a blue bus at 7:03am as often as possible. They're going to tell you
> > that they only promised you a bus at 7:03am, not what color it would
> > be.
> > 
> > Perhaps that's not an ideal analogy either, because the reported wait
> > event and the reported activity are more closely related than the time
> > of a bus is to the color of the bus. But I think it's still true that
> > nobody ever promised that those values would be compatible with each
> > other, and that's not really fixable, and that there are lots of other
> > cases just like this one which can't be fixed either.
> > 
> > I think that the more we try to pretend like it is possible to make
> > these values seem like they are synchronized, the more unhappy people
> > will be in the unavoidable cases where they aren't, and the more
> > pressure there will be to try to tighten it up even further. That's
> > likely to result in code that is more complex and slower, which I do
> > not want, and especially not for the sake of avoiding a harmless
> > reporting discrepancy.
> 
> Then just don't return them together, right?

Well, I'm a bit hotter guy than it is needed. I appologize for that.

Lets look on situation from compromise point of view:
- We are telling: we could make this view more synchronous (and faster).
- You are telling: it will never be totally synchronous, and it is
  mistake we didn't mention the issue in documentation.

Why don't do both?
Why can't we do it more synchronous (and faster) AND mention in
documentaion it is not totally synchronous and never will be?



regards

Yura





Re: [Commitfest 2022-07] Begins Now

2022-07-08 Thread Jacob Champion
On 7/1/22 08:08, Jacob Champion wrote:
> It's been July everywhere on Earth for a few hours, so the July
> commitfest is now in progress:
> 
> https://commitfest.postgresql.org/38/
One week down, three to go.

I forgot to put the overall status in the last email. We started the
month with the following stats:

Needs review: 214
Waiting on Author: 36
Ready for Committer:   23
Committed: 21
Moved to next CF:   1
Withdrawn:  5
Rejected:   2
Returned with Feedback: 3
--
Total:305

And as of this email, we're now at

Needs review: 193
Waiting on Author: 38
Ready for Committer:   24
Committed: 37
Moved to next CF:   2
Withdrawn:  6
Rejected:   2
Returned with Feedback: 3
--
Total:305

That's sixteen patchsets committed in the first week.

Have a good weekend,
--Jacob




Re: [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections

2022-07-08 Thread Yura Sokolov
В Пт, 08/07/2022 в 11:04 -0400, Robert Haas пишет:
> On Fri, Jul 8, 2022 at 10:11 AM Yura Sokolov  wrote:
> > I see analogy with Bus Stop:
> > - there is bus stop
> > - there is a schedule of bus arriving this top
> > - there are passengers, who every day travel with this bus
> > 
> > Bus occasionally comes later... Well, it comes later quite often...
> > 
> > Which way Major (or other responsible person) should act?
> 
> I do not think that is a good analogy, because a bus schedule is an
> implicit promise - or at least a strong suggestion - that the bus will
> arrive at the scheduled time.

There is implicit promise: those data are written in single row.
If you want to notice they are NOT related to each other, return them
in different rows or even in different view tables.

> In this case, who made such a promise? The original post presents it
> as fact that these systems should give compatible answers at all
> times, but there's nothing in the code or documentation to suggest
> that this is true.
> 
> IMHO, a better analogy would be if you noticed that the 7:03am bus was
> normally blue and you took that one because you have a small child who
> likes the color blue and it makes them happy to take a blue bus. And
> then one day the bus at that time is a red bus and your child is upset
> and you call the major (or other responsible person) to complain.
> They're probably not going to handle that situation by trying to send
> a blue bus at 7:03am as often as possible. They're going to tell you
> that they only promised you a bus at 7:03am, not what color it would
> be.
> 
> Perhaps that's not an ideal analogy either, because the reported wait
> event and the reported activity are more closely related than the time
> of a bus is to the color of the bus. But I think it's still true that
> nobody ever promised that those values would be compatible with each
> other, and that's not really fixable, and that there are lots of other
> cases just like this one which can't be fixed either.
> 
> I think that the more we try to pretend like it is possible to make
> these values seem like they are synchronized, the more unhappy people
> will be in the unavoidable cases where they aren't, and the more
> pressure there will be to try to tighten it up even further. That's
> likely to result in code that is more complex and slower, which I do
> not want, and especially not for the sake of avoiding a harmless
> reporting discrepancy.

Then just don't return them together, right?





Re: Aggregate leads to superfluous projection from the scan

2022-07-08 Thread Tom Lane
Zhihong Yu  writes:
> I was looking at the following comment in createplan.c :

>  * For table scans, rather than using the relation targetlist (which is
>  * only those Vars actually needed by the query), we prefer to generate
> a
>  * tlist containing all Vars in order.  This will allow the executor to
>  * optimize away projection of the table tuples, if possible.

> Maybe you can give me some background on the above decision.

Look into execScan.c and note that it skips doing ExecProject() if the
scan node's targetlist exactly matches the table's tuple descriptor.
And particularly this comment:

 * We can avoid a projection step if the requested tlist exactly matches
 * the underlying tuple type.  If so, we just set ps_ProjInfo to NULL.
 * Note that this case occurs not only for simple "SELECT * FROM ...", but
 * also in most cases where there are joins or other processing nodes above
 * the scan node, because the planner will preferentially generate a matching
 * tlist.

regards, tom lane




Re: doc: pg_prewarm add configuration example

2022-07-08 Thread Bruce Momjian
On Thu, Jun 30, 2022 at 02:40:50PM +0900, Dong Wook Lee wrote:
> On 22/06/29 03:57오후, Jacob Champion wrote:
> > On 6/18/22 01:55, Dong Wook Lee wrote:
> > >  Hi hackers,
> > > 
> > >  I thought it would be nice to have an configuration example of the 
> > > pg_prewarm extension.
> > >  Therefore, I have written an example of a basic configuration.
> > 
> > [offlist]
> > 
> > Hi Dong Wook, I saw a commitfest entry registered for some of your other
> > patches, but not for this one. Quick reminder to register it in the
> > Documentation section if that was your intent.
> 
> Thank you very much for letting me know.
> The patch is so trivial that I thought about whether to post it on commitfest,
> but I think it would be better to post it.

I have applied this, with adjustments, to all supported versions back to
PG 11.  PG 10 had docs different enough that I skipped it.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: Commitfest Update

2022-07-08 Thread Jacob Champion
On 3/31/22 07:37, Tom Lane wrote:
> Robert Haas  writes:
>> On Thu, Mar 31, 2022 at 10:11 AM Tom Lane  wrote:
>>> ... Would it be feasible or reasonable
>>> to drop reviewers if they've not commented in the thread in X amount
>>> of time?
> 
>> In theory, this might cause someone who made a valuable contribution
>> to the discussion to not get credited in the commit message. But it
>> probably wouldn't in practice, because I at least always construct the
>> list of reviewers from the thread, not the CF app, since that tends to
>> be wildly inaccurate in both directions. So maybe it's fine? Not sure.
> 
> Hmm, I tend to believe what's in the CF app, so maybe I'm dropping the
> ball on review credits :-(.  But there are various ways we could implement
> this.  One way would be a nagbot that sends private email along the lines
> of "you haven't commented on patch X in Y months.  Please remove your name
> from the list of reviewers if you don't intend to review it any more."

It seems there wasn't a definitive decision here. Are there any
objections to more aggressive pruning of the Reviewers entries? So
committers would need to go through the thread for full attribution,
moving forward.

If there are no objections, I'll start doing that during next Friday's
patch sweep.

--Jacob




Re: Aggregate leads to superfluous projection from the scan

2022-07-08 Thread Zhihong Yu
On Fri, Jul 8, 2022 at 12:48 PM Zhihong Yu  wrote:

>
>
> On Fri, Jul 8, 2022 at 12:30 PM Tom Lane  wrote:
>
>> Ibrar Ahmed  writes:
>> > I give a quick look and I think in case whenever data is extracted from
>> the
>> > heap it shows all the columns. Therefore when columns are extracted from
>> > the index only it shows the indexed column only.
>>
>> This is operating as designed, and I don't think that the proposed
>> patch is an improvement.  The point of use_physical_tlist() is that
>> returning all the columns is cheaper because it avoids a projection
>> step.  That's true for any case where we have to fetch the heap
>> tuple, so IndexScan is included though IndexOnlyScan is not.
>>
>> Now, that's something that was true a decade or more ago.
>> There's been considerable discussion recently about cases where
>> it's not true anymore, for example with columnar storage or FDWs,
>> and so we ought to invent a way to prevent createplan.c from
>> doing it when it would be counterproductive.  But just summarily
>> turning it off is not an improvement.
>>
>> regards, tom lane
>>
> Hi,
> In createplan.c, there is `change_plan_targetlist`
>
> Plan *
> change_plan_targetlist(Plan *subplan, List *tlist, bool
> tlist_parallel_safe)
>
> But it doesn't have `Path` as parameter.
> So I am not sure whether the check of non-returnable columns should be
> done in change_plan_targetlist().
>
> bq. for example with columnar storage or FDWs,
>
> Yeah. The above is the case where I want to optimize.
>
> Cheers
>
Hi, Tom:
I was looking at the following comment in createplan.c :

 * For table scans, rather than using the relation targetlist (which is
 * only those Vars actually needed by the query), we prefer to generate
a
 * tlist containing all Vars in order.  This will allow the executor to
 * optimize away projection of the table tuples, if possible.

Maybe you can give me some background on the above decision.

Thanks


Re: replacing role-level NOINHERIT with a grant-level option

2022-07-08 Thread Robert Haas
On Fri, Jul 8, 2022 at 5:02 PM Nathan Bossart  wrote:
> I think this is an interesting approach, as it seems to move things closer
> to the end goal (i.e., removing [NO]INHERIT), but it also introduces a
> pretty significant compatibility break.  With this change, you cannot keep
> using [NO]INHERIT like you do today.  You will also need to individually
> update each GRANT.

True. But it may not be very common for people to ALTER ROLE
[NO]INHERIT after having previously used GRANT, so I'm not sure that
it would be a big problem in practice.

> The advantage of using [NO]INHERIT as the fallback
> value in the absence of a grant-level option is that users don't need to
> adjust behavior right away.  They can essentially ignore the new
> grant-level options for now, although presumably they'd need to adjust at
> some point.

Also true.

> IMO we should either retain compatibility for existing scripts for now, or
> we should remove [NO]INHERIT completely in v16.  I'm worried that keeping
> [NO]INHERIT around while changing its behavior somewhat subtly is only
> going to create confusion.

Could be. It's just a little hard to know what to do here, because
we've gotten opinions from only a few people, and they're all
different. I've now produced patches for what I believe to be all
three of the viable alternatives, and none of them have met with
universal acclaim:

v1: hard compatibility break, NOINHERIT no-op w/WARNING
v2: WITH INHERIT { TRUE | FALSE | DEFAULT }, DEFAULT => use rolinherit
that is current at time of use
v3: WITH INHERIT { TRUE | FALSE }, if unspecified => set grant-level
Boolean to rolinherit that is current at time of GRANT

Nobody seems desperately opposed to the basic idea of adding a
grant-level option, so probably it's OK to proceed with one of these.
Which one, though, is a bit of a puzzler.

Anyone else want to offer an opinion?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: AIX support - alignment issues

2022-07-08 Thread Robert Haas
On Tue, Jul 5, 2022 at 1:32 AM Andres Freund  wrote:
> I just thought an easier way - why don't we introduce a 'catalog_double'
> that's defined to be pg_attribute_aligned(whatever-we-need) on AIX? Then we
> can get rid of the manually enforced alignedness and we don't need to contort
> catalog order.

I investigated this a little bit today. It seems that
att_align_nominal() thinks that typalign=='d' means ALIGNOF_DOUBLE,
which on AIX is 4. So I think what we would need to do first is
redefine typalign=='d' to mean alignment to MAXIMUM_ALIGNOF. If we
don't do that, then there's no automatic way to get uint64 fields to
be placed on 8-byte boundaries, which it requires. Such a change would
have no effect on many systems, but if as on AIX double requires less
alignment than either "long" or "long long int", it will break on-disk
compatibility and in particular pg_upgrade compatibility.

If we did that, then we could pursue your proposal above. Rather than
creating an altogether new typedef, we could just apply
pg_attribute_aligned(MAXIMUM_ALIGNOF) to the existing typedef for
float8, which is documented as being the name that should be used in
the catalogs, and is. Since pg_attribute_aligned() is not supported on
all platforms, we elsewhere apply it conditionally, so we would
presumably do the same thing here. That would mean that it might fail
to apply on some platform somewhere, but we could compensate for that
by adding a static assertion checking that if we do struct
float8_alignmment_test { char pad; float8 x; } then
alignof(float8_alignment_test, x) == MAXIMUM_ALIGNOF. That way, if
pg_attribute_aligned() isn't supported but the platform doesn't have
this issue in the first place, all is well. If pg_attribute_aligned()
isn't supported and the platform does have this issue, compilation
will fail.

In theory, we could have the same issue with int64 on some other
platform. On this hypothetical system, ALIGNOF_LONG_LONG_INT <
ALIGNOF_DOUBLE. The compile would then align int64 catalog columns on,
say, 4-byte boundaries, but our tuple deforming code would think that
they were aligned to 8 byte boundaries. We could fix that by forcing
the int64 type to have maximum alignment as well or introducing a new
typedef that does. However, such a fix could probably be postponed
until such time as a system of this kind turns up. It might never
happen.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

2022-07-08 Thread Andrew Dunstan


On 2022-07-05 Tu 15:04, Andrew Dunstan wrote:
> On 2022-07-05 Tu 14:36, Andres Freund wrote:
>>
 I think Andrew's beta 2 comment was more about my other architectural
 complains around the json expression eval stuff.
>>> Right. That's being worked on but it's not going to be a mechanical fix.
>> Any updates here?
>
> Not yet. A colleague and I are working on it. I'll post a status this
> week if we can't post a fix.


We're still working on it. We've made substantial progress but there are
some tests failing that we need to fix.


>> I'd mentioned the significant space use due to all JsonCoercionsState for all
>> the types. Another related aspect is that this code is just weird - the same
>> struct name (JsonCoercionsState), nested in each other?
>>
>> struct JsonCoercionsState
>> {
>> struct JsonCoercionState
>> {
>> JsonCoercion *coercion; /* coercion expression */
>> ExprState  *estate; /* coercion expression state */
>> }   null,
>> string,
>> numeric,
>> boolean,
>> date,
>> time,
>> timetz,
>> timestamp,
>> timestamptz,
>> composite;
>> }   coercions;  /* states for coercion from SQL/JSON item
>>  * types directly to the output type */
>>
>> Also note the weird numeric indentation that pgindent does...
>
> Yeah, we'll try to fix that.


Actually, it's not the same name: JsonCoercionsState vs
JsonCoercionState. But I agree that it's a subtle enough difference that
we should use something more obvious. Maybe JsonCoercionStates instead
of JsonCoercionsState? The plural at the end would be harder to miss.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: replacing role-level NOINHERIT with a grant-level option

2022-07-08 Thread Nathan Bossart
On Fri, Jul 08, 2022 at 03:56:56PM -0400, Robert Haas wrote:
> For those who may not have read the entire thread, the current patch
> does not actually remove the role-level option as the subject line
> suggests, but rather makes it set the default for future grants as
> suggested by Tom in
> http://postgr.es/m/1066202.1654190...@sss.pgh.pa.us

I think this is an interesting approach, as it seems to move things closer
to the end goal (i.e., removing [NO]INHERIT), but it also introduces a
pretty significant compatibility break.  With this change, you cannot keep
using [NO]INHERIT like you do today.  You will also need to individually
update each GRANT.  The advantage of using [NO]INHERIT as the fallback
value in the absence of a grant-level option is that users don't need to
adjust behavior right away.  They can essentially ignore the new
grant-level options for now, although presumably they'd need to adjust at
some point.

IMO we should either retain compatibility for existing scripts for now, or
we should remove [NO]INHERIT completely in v16.  I'm worried that keeping
[NO]INHERIT around while changing its behavior somewhat subtly is only
going to create confusion.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: SQL/JSON documentation JSON_TABLE

2022-07-08 Thread Andrew Dunstan


On 2022-07-08 Fr 16:03, Erik Rijkers wrote:
> Hi,
>
> Attached are a few small changes to the JSON_TABLE section in func.sgml.
>
> The first two changes are simple typos.
>
> Then there was this line:
>
> 
> context_item, path_expression [ AS json_path_name ] [ PASSING { value
> AS varname } [, ...]]
> 
>
> those are the parameters to JSON_TABLE() so I changed that line to:
>
> 
> JSON_TABLE(context_item, path_expression [ AS json_path_name ] [
> PASSING { value AS varname } [, ...]])
> 
>
> Some parts of the JSON_TABLE text strike me as opaque.  For instance,
> there are paragraphs that more than once use the term:
>    json_api_common_syntax
>
> 'json_api_common_syntax' is not explained.  It turns out it's a relic
> from Nikita's original docs. I dug up a 2018 patch where the term is
> used as:
>
>  2018:
> JSON_TABLE (
>  json_api_common_syntax [ AS path_name ]
>  COLUMNS ( json_table_column [, ...] )
>  (etc...)
> 
>
> with explanation:
>
>  2018:
> json_api_common_syntax:
>    The input data to query, the JSON path expression defining the
> query, and an optional PASSING clause.
> 
>
> So that made sense then (input+jsonpath+params=api), but it doesn't
> now fit as such in the current docs.
>
> I think it would be best to remove all uses of that compound term, and
> rewrite the explanations using only the current parameter names
> (context_item, path_expression, etc).
>
> But I wasn't sure and I haven't done any such changes in the attached.
>
> Perhaps I'll give it a try during the weekend.
>
>
>


Thanks for this. If you want to follow up that last sentence I will try
to commit a single fix early next week.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: automatically generating node support functions

2022-07-08 Thread Tom Lane
I wrote:
> 0003 moves the node-level attributes as discussed.

Meh.  Just realized that I forgot to adjust the commentary in nodes.h
about where to put node attributes.

Maybe like

- * Attributes can be attached to a node as a whole (the attribute
- * specification must be at the end of the struct or typedef, just before the
- * semicolon) or to a specific field (must be at the end of the line).  The
+ * Attributes can be attached to a node as a whole (place the attribute
+ * specification on the first line after the struct's opening brace)
+ * or to a specific field (place it at the end of that field's line).  The
  * argument is a comma-separated list of attributes.  Unrecognized attributes
  * cause an error.

regards, tom lane




SQL/JSON documentation JSON_TABLE

2022-07-08 Thread Erik Rijkers

Hi,

Attached are a few small changes to the JSON_TABLE section in func.sgml.

The first two changes are simple typos.

Then there was this line:


context_item, path_expression [ AS json_path_name ] [ PASSING { value AS 
varname } [, ...]]



those are the parameters to JSON_TABLE() so I changed that line to:


JSON_TABLE(context_item, path_expression [ AS json_path_name ] [ PASSING 
{ value AS varname } [, ...]])



Some parts of the JSON_TABLE text strike me as opaque.  For instance, 
there are paragraphs that more than once use the term:

   json_api_common_syntax

'json_api_common_syntax' is not explained.  It turns out it's a relic 
from Nikita's original docs. I dug up a 2018 patch where the term is 
used as:


 2018:
JSON_TABLE (
 json_api_common_syntax [ AS path_name ]
 COLUMNS ( json_table_column [, ...] )
 (etc...)


with explanation:

 2018:
json_api_common_syntax:
   The input data to query, the JSON path expression defining the 
query, and an optional PASSING clause.



So that made sense then (input+jsonpath+params=api), but it doesn't now 
fit as such in the current docs.


I think it would be best to remove all uses of that compound term, and 
rewrite the explanations using only the current parameter names 
(context_item, path_expression, etc).


But I wasn't sure and I haven't done any such changes in the attached.

Perhaps I'll give it a try during the weekend.


Erik Rijkers


--- ./doc/src/sgml/func.sgml.orig	2022-07-08 19:46:46.018505707 +0200
+++ ./doc/src/sgml/func.sgml	2022-07-08 20:47:35.488303254 +0200
@@ -18026,7 +18026,7 @@
 or array, but if it is CONDITIONAL it will not be
 applied to a single array or object. UNCONDITIONAL
 is the default.
-If the result is a a scalar string, by default the value returned will have
+If the result is a scalar string, by default the value returned will have
 surrounding quotes making it a valid JSON value. However, this behavior
 is reversed if OMIT QUOTES is specified.
 The ON ERROR and ON EMPTY
@@ -18097,7 +18097,7 @@
columns. Columns produced by NESTED PATHs at the
same level are considered to be siblings,
while a column produced by a NESTED PATH is
-   considered to be a child of the column produced by and
+   considered to be a child of the column produced by a
NESTED PATH or row expression at a higher level.
Sibling columns are always joined first. Once they are processed,
the resulting rows are joined to the parent row.
@@ -18106,7 +18106,7 @@
   

 
- context_item, path_expression  AS json_path_name   PASSING { value AS varname } , ...
+ JSON_TABLE(context_item, path_expression  AS json_path_name   PASSING { value AS varname } , ...)
 
 
 


Re: replacing role-level NOINHERIT with a grant-level option

2022-07-08 Thread Robert Haas
On Tue, Jul 5, 2022 at 8:04 AM Robert Haas  wrote:
> On Sun, Jul 3, 2022 at 1:17 PM Nathan Bossart  
> wrote:
> > If by "bolder" you mean "mark [NO]INHERIT as deprecated-and-to-be-removed
> > and begin emitting WARNINGs when it and WITH INHERIT DEFAULT are used," I
> > think it's worth consideration.  I suspect it will be hard to sell removing
> > [NO]INHERIT in v16 because it would introduce a compatibility break without
> > giving users much time to migrate.  I could be wrong, though.
>
> It's a fair point. But, if our goal for v16 is to do something that
> could lead to an eventual deprecation of [NO]INHERIT, I still think
> removing WITH INHERIT DEFAULT from the patch set is probably a good
> idea.

So here is an updated patch with that change.

For those who may not have read the entire thread, the current patch
does not actually remove the role-level option as the subject line
suggests, but rather makes it set the default for future grants as
suggested by Tom in
http://postgr.es/m/1066202.1654190...@sss.pgh.pa.us

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v3-0001-Allow-grant-level-control-of-role-inheritance-beh.patch
Description: Binary data


Re: Aggregate leads to superfluous projection from the scan

2022-07-08 Thread Zhihong Yu
On Fri, Jul 8, 2022 at 12:30 PM Tom Lane  wrote:

> Ibrar Ahmed  writes:
> > I give a quick look and I think in case whenever data is extracted from
> the
> > heap it shows all the columns. Therefore when columns are extracted from
> > the index only it shows the indexed column only.
>
> This is operating as designed, and I don't think that the proposed
> patch is an improvement.  The point of use_physical_tlist() is that
> returning all the columns is cheaper because it avoids a projection
> step.  That's true for any case where we have to fetch the heap
> tuple, so IndexScan is included though IndexOnlyScan is not.
>
> Now, that's something that was true a decade or more ago.
> There's been considerable discussion recently about cases where
> it's not true anymore, for example with columnar storage or FDWs,
> and so we ought to invent a way to prevent createplan.c from
> doing it when it would be counterproductive.  But just summarily
> turning it off is not an improvement.
>
> regards, tom lane
>
Hi,
In createplan.c, there is `change_plan_targetlist`

Plan *
change_plan_targetlist(Plan *subplan, List *tlist, bool tlist_parallel_safe)

But it doesn't have `Path` as parameter.
So I am not sure whether the check of non-returnable columns should be done
in change_plan_targetlist().

bq. for example with columnar storage or FDWs,

Yeah. The above is the case where I want to optimize.

Cheers


Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

2022-07-08 Thread David Zhang

Hi,

I tried to apply this patch v5 to current master branch but it complains,
"git apply --check 
v5-0001-Add-protections-in-xlog-record-APIs-against-large.patch

error: patch failed: src/include/access/xloginsert.h:43
error: src/include/access/xloginsert.h: patch does not apply"

then I checked it out before the commit 
`b0a55e43299c4ea2a9a8c757f9c26352407d0ccc` and applied this v5 patch.


1) both make check and make installcheck passed.

2) and I can also see this patch v5 prevents the error happens previously,

"postgres=# SELECT pg_logical_emit_message(false, long, long) FROM 
repeat(repeat(' ', 1024), 1024*1023) as l(long);

ERROR:  too much WAL data"

3) without this v5 patch, the same test will cause the standby crash 
like below, and the standby not be able to boot up after this crash.


"2022-07-08 12:28:16.425 PDT [2363] FATAL:  invalid memory alloc request 
size 2145388995
2022-07-08 12:28:16.426 PDT [2360] LOG:  startup process (PID 2363) 
exited with exit code 1
2022-07-08 12:28:16.426 PDT [2360] LOG:  terminating any other active 
server processes
2022-07-08 12:28:16.427 PDT [2360] LOG:  shutting down due to startup 
process failure

2022-07-08 12:28:16.428 PDT [2360] LOG:  database system is shut down"


Best regards,

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: Two successive tabs in test case are causing syntax error in psql

2022-07-08 Thread Tom Lane
Jingtang Zhang  writes:
> Recently, when I was developing some function about INSERT ... ON CONFLICT,
> I used test cases in `src/test/regress/sql/insert_conflict.sql` to evaluate
> my function. When I copy the CREATE TABLE from this case alone, and paste
> it to psql, I got a syntax error. As I go through the case carefully, I
> found the CREATE TABLE uses two tabs to separate column name and column
> type, and this two tabs are regarded as an auto completion instruction by
> psql, causing no separation between column name and column type anymore.

> It may not be a problem since this case has passed the regression, but
> would it be better to use space here to avoid this confusing situation?

There are tabs all through the regression test files, and we're certainly
not going to remove them all.  (If we did, we'd lose test coverage of
whether the parser accepts tabs as whitespace.)  So I can't get excited
about removing one or two.

The usual recommendation for pasting text into psql when it contains
tabs is to start psql with the -n switch to disable tab completion.

regards, tom lane




Re: System catalog documentation chapter

2022-07-08 Thread Bruce Momjian
On Fri, Jul  8, 2022 at 12:07:45PM -0400, Bruce Momjian wrote:
> On Fri, Jul  8, 2022 at 11:49:47AM -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > Agreed. I don't want to break links into the documentation in final
> > > released versions, so head and PG15 seem wise.
> > 
> > I would not expect this to change the doc URLs for any individual
> > catalogs or views --- if it does, I won't be happy.
> 
> Good point --- I thought the chapter was in the URL, but I now see it is
> just the section heading:
> 
>   https://www.postgresql.org/docs/devel/view-pg-available-extensions.html
> 
> so I guess we can backpatch this with no issues.

Attached is a patch to accomplish this.  Its output can be seen here:

https://momjian.us/tmp/pgsql/internals.html

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson



views.diff.gz
Description: application/gzip


Re: Aggregate leads to superfluous projection from the scan

2022-07-08 Thread Tom Lane
Ibrar Ahmed  writes:
> I give a quick look and I think in case whenever data is extracted from the
> heap it shows all the columns. Therefore when columns are extracted from
> the index only it shows the indexed column only.

This is operating as designed, and I don't think that the proposed
patch is an improvement.  The point of use_physical_tlist() is that
returning all the columns is cheaper because it avoids a projection
step.  That's true for any case where we have to fetch the heap
tuple, so IndexScan is included though IndexOnlyScan is not.

Now, that's something that was true a decade or more ago.
There's been considerable discussion recently about cases where
it's not true anymore, for example with columnar storage or FDWs,
and so we ought to invent a way to prevent createplan.c from
doing it when it would be counterproductive.  But just summarily
turning it off is not an improvement.

regards, tom lane




Re: Add red-black tree missing comparison searches

2022-07-08 Thread Alexander Korotkov
On Thu, Jul 7, 2022 at 1:43 PM Alexander Korotkov  wrote:
> On Thu, Jul 7, 2022 at 2:16 AM Steve Chavez  wrote:
> > Thanks Alexander!
> >
> > wrt to the new patch. I think the following comment is misleading since 
> > keyDeleted can be true or false:
> >
> > + /* switch equal_match to false so we only find greater matches now */
> > + node = (IntRBTreeNode *) rbt_find_great(tree, (RBTNode *) ,
> > + keyDeleted);
> >
> > Maybe it should be the same used for searching lesser keys:
> >
> > + /*
> > + * Find the next key.  If the current key is deleted, we can pass
> > + * equal_match == true and still find the next one.
> > + */
>
> Thank you for catching this.
> The revised version of patch is attached!

Pushed!

--
Regards,
Alexander Korotkov




Re: Aggregate leads to superfluous projection from the scan

2022-07-08 Thread Ibrar Ahmed
On Fri, Jul 8, 2022 at 10:32 PM Zhihong Yu  wrote:

>
>
> On Fri, Jul 8, 2022 at 9:40 AM Zhihong Yu  wrote:
>
>> Hi,
>> Here is the query which involves aggregate on a single column:
>>
>>
>> https://dbfiddle.uk/?rdbms=postgres_13=44bfd8f6b6b5aad34d00d449c04c5a96
>>
>> As you can see from `Output:`, there are many columns added which are not
>> needed by the query executor.
>>
>> I wonder if someone has noticed this in the past.
>> If so, what was the discussion around this topic ?
>>
>> Thanks
>>
> Hi,
> With the patch, I was able to get the following output:
>
>  explain (analyze, verbose) /*+ IndexScan(t) */select count(fire_year)
> from fires t where objectid <= 200;
>   QUERY PLAN
>
> --
>  Aggregate  (cost=119.00..119.01 rows=1 width=8) (actual time=9.453..9.453
> rows=1 loops=1)
>Output: count(fire_year)
>->  Index Scan using fires_pkey on public.fires t  (cost=0.00..116.50
> rows=1000 width=4) (actual time=9.432..9.432 rows=0 loops=1)
>  Output: fire_year
>  Index Cond: (t.objectid <= 200)
>  Planning Time: 52.598 ms
>  Execution Time: 13.082 ms
>
> Please pay attention to the column list after `Output:`
>
> Tom:
> Can you take a look and let me know what I may have missed ?
>
> Thanks
>
I give a quick look and I think in case whenever data is extracted from the
heap it shows all the columns. Therefore when columns are extracted from
the index only it shows the indexed column only.

postgres=# explain (analyze, verbose) /*+ IndexScan(idx) */select
count(fire_year) from fires t where objectid = 20;


  QUERY PLAN




--



 Aggregate  (cost=8.31..8.32 rows=1 width=8) (actual time=0.029..0.030
rows=1 loops=1)

   Output: count(fire_year)

   ->  Index Scan using fires_pkey on public.fires t  (cost=0.29..8.31
rows=1 width=4) (actual time=0.022..0.023 rows=1 loops=1)

 Output: objectid, fire_name, fire_year, discovery_date,
discovery_time, stat_cause_descr, fire_size, fire_size_class, latitude,
longitude, state, county,

 discovery_date_j, discovery_date_d

 Index Cond: (t.objectid = 20)

 Planning Time: 0.076 ms

 Execution Time: 0.059 ms

(7 rows)



Index-only.


postgres=# explain (analyze, verbose) /*+ IndexScan(idx) */select
count(fire_year) from fires t where fire_year = 20;

  QUERY PLAN


---

 Aggregate  (cost=8.31..8.32 rows=1 width=8) (actual time=0.026..0.027
rows=1 loops=1)

   Output: count(fire_year)

   ->  Index Only Scan using idx on public.fires t  (cost=0.29..8.31 rows=1
width=4) (actual time=0.023..0.024 rows=0 loops=1)

 Output: fire_year

 Index Cond: (t.fire_year = 20)

 Heap Fetches: 0

 Planning Time: 0.140 ms

 Execution Time: 0.052 ms

(8 rows)



Index Scans



postgres=# explain (analyze, verbose) select count(fire_year) from fires t
where objectid = 20;

 Aggregate  (cost=8.31..8.32 rows=1 width=8) (actual time=0.030..0.031
rows=1 loops=1)

   Output: count(fire_year)

   ->  Index Scan using fires_pkey on public.fires t  (cost=0.29..8.31
rows=1 width=4) (actual time=0.021..0.023 rows=1 loops=1)

 Output: objectid, fire_name, fire_year, discovery_date,
discovery_time, stat_cause_descr, fire_size, fire_size_class, latitude,
longitude, state, county,

 discovery_date_j, discovery_date_d

 Index Cond: (t.objectid = 20)

 Planning Time: 0.204 ms

 Execution Time: 0.072 ms

(7 rows)



Seq scans.

--


postgres=# explain (analyze, verbose) select count(fire_year) from fires t;





 Aggregate  (cost=1791.00..1791.01 rows=1 width=8) (actual
time=13.172..13.174 rows=1 loops=1)

   Output: count(fire_year)

   ->  Seq Scan on public.fires t  (cost=0.00..1541.00 rows=10 width=4)
(actual time=0.007..6.500 rows=10 loops=1)

 Output: objectid, fire_name, fire_year, discovery_date,
discovery_time, stat_cause_descr, fire_size, fire_size_class, latitude,
longitude, state, county,

 discovery_date_j, discovery_date_d

 Planning Time: 0.094 ms

 Execution Time: 13.201 ms

(6 rows)


-- 
Ibrar Ahmed


Re: [PATCH] Log details for client certificate failures

2022-07-08 Thread Jacob Champion
On Thu, Jul 7, 2022 at 2:50 AM Peter Eisentraut
 wrote:
> I looked into how you decode the serial number.  I have found some code
> elsewhere that passed the result of X509_get_serialNumber() directly to
> ASN1_INTEGER_set().  But I guess a serial number of maximum length 20
> octets wouldn't fit into a 32-bit long.  (There is
> ASN1_INTEGER_set_int64(), but that requires OpenSSL 1.1.0.)  Does that
> match your understanding?

Yep. And the bit lengths of the serial numbers used in the test suite
are in the low 60s already. Many people will just randomize their
serial numbers, so I think BN_bn2dec() is the way to go.

> For the detail string, I think we could do something like:
>
> DETAIL:  Failed certificate data (unverified): subject '%s', serial
> number %s, issuer '%s'

Done that way in v4.

I also added an optional 0002 that bubbles the error info up to the
final ereport(ERROR), using errdetail() and errhint(). I can squash it
into 0001 if you like it, or drop it if you don't. (This approach
could be adapted to the client, too.)

Thanks!
--Jacob
commit 7489e52168b76df3a84c68d79fb22651a05a0c05
Author: Jacob Champion 
Date:   Fri Jul 8 09:19:32 2022 -0700

squash! Log details for client certificate failures

Change detail message, per review.

diff --git a/src/backend/libpq/be-secure-openssl.c 
b/src/backend/libpq/be-secure-openssl.c
index 80b361b105..a2cbcdad5f 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -1173,9 +1173,10 @@ verify_cb(int ok, X509_STORE_CTX *ctx)
(errmsg("client certificate verification failed at 
depth %d: %s",
depth, errstring),
 /* only print detail if we have a certificate to print 
*/
-subject && errdetail("failed certificate had subject 
'%s', "
+subject && errdetail("Failed certificate data 
(unverified): "
+   
"subject '%s', "
"serial 
number %s, "
-   
"purported issuer '%s'",
+   "issuer 
'%s'",
  sub_truncated 
? sub_truncated : subject,
  serialno ? 
serialno : _("unknown"),
  iss_truncated 
? iss_truncated : issuer)));
diff --git a/src/test/ssl/t/001_ssltests.pl b/src/test/ssl/t/001_ssltests.pl
index a9b737ed09..0f837e1b9f 100644
--- a/src/test/ssl/t/001_ssltests.pl
+++ b/src/test/ssl/t/001_ssltests.pl
@@ -685,7 +685,7 @@ $node->connect_fails(
expected_stderr => qr/SSL error: sslv3 alert certificate revoked/,
log_like => [
qr/client certificate verification failed at depth 0: 
certificate revoked/,
-   qr/failed certificate had subject '\/CN=ssltestuser', serial 
number 2315134995201656577, purported issuer '\/CN=Test CA for PostgreSQL SSL 
regression test client certs'/,
+   qr/Failed certificate data \(unverified\): subject 
'\/CN=ssltestuser', serial number 2315134995201656577, issuer '\/CN=Test CA for 
PostgreSQL SSL regression test client certs'/,
],
# revoked certificates should not authenticate the user
log_unlike => [qr/connection authenticated:/],);
@@ -737,7 +737,7 @@ $node->connect_fails(
expected_stderr => qr/SSL error: tlsv1 alert unknown ca/,
log_like => [
qr/client certificate verification failed at depth 0: unable to 
get local issuer certificate/,
-   qr/failed certificate had subject '\/CN=ssltestuser', serial 
number 2315134995201656576, purported issuer '\/CN=Test CA for PostgreSQL SSL 
regression test client certs'/,
+   qr/Failed certificate data \(unverified\): subject 
'\/CN=ssltestuser', serial number 2315134995201656576, issuer '\/CN=Test CA for 
PostgreSQL SSL regression test client certs'/,
]);
 
 $node->connect_fails(
@@ -746,7 +746,7 @@ $node->connect_fails(
expected_stderr => qr/SSL error: tlsv1 alert unknown ca/,
log_like => [
qr/client certificate verification failed at depth 0: unable to 
get local issuer certificate/,
-   qr/failed certificate had subject 
'\.\.\.\/CN=ssl-123456789012345678901234567890123456789012345678901234567890', 
serial number 2315418733629425152, purported issuer '\/CN=Test CA for 
PostgreSQL SSL regression test client certs'/,
+   qr/Failed certificate data \(unverified\): subject 
'\.\.\.\/CN=ssl-123456789012345678901234567890123456789012345678901234567890', 
serial number 2315418733629425152, issuer '\/CN=Test CA for PostgreSQL SSL 
regression test client 

Re: Add last_vacuum_index_scans in pg_stat_all_tables

2022-07-08 Thread Peter Geoghegan
On Fri, Jul 8, 2022 at 10:47 AM Alvaro Herrera  wrote:
> Saving some sort of history would be much more useful, but of course a
> lot more work.

I think that storing a certain amount of history would be very useful,
for lots of reasons. Not just for instrumentation purposes; I envisage
a design where VACUUM itself makes certain decisions based on the
history of each VACUUM operation against the table. The direction that
things have taken suggests a certain amount about the direction that
things are going in, which we should try to influence.

The simplest and best example of how this could help is probably
freezing, and freeze debt. Currently, the visibility map interacts
with vacuum_freeze_min_age in a way that allows unfrozen all-visible
pages to accumulate. These pages won't be frozen until the next
aggressive VACUUM. But there is no fixed relationship between the
number of XIDs consumed by the system (per unit of wallclock time) and
the number of unfrozen all-visible pages (over the same duration). So
we might end up having to freeze an absolutely enormous number of
pages in the eventual aggressive vacuum. We also might not -- it's
really hard to predict, for reasons that just don't make much sense.

There are a few things we could do here, but having a sense of history
seems like the important part. If (say) the table exceeds a certain
size, and the number of all-visible pages grows and grows (without any
freezing taking place), then we should "proactively" freeze at least
some of the unfrozen all-visible pages in earlier VACUUM operations.
In other words, we should (at the very least) spread out the burden of
freezing those pages over time, while being careful to not pay too
much more than we would with the old approach if and when the workload
characteristics change again.

More generally, I think that we should blur the distinction between
aggressive and non-aggressive autovacuum. Sure, we'd still need VACUUM
to "behave aggressively" in some sense, but that could all happen
dynamically, without committing to a particular course of action until
the last moment -- being able to change our minds at the last minute
can be very valuable, even though we probably won't change our minds
too often.

-- 
Peter Geoghegan




Re: automatically generating node support functions

2022-07-08 Thread Tom Lane
Alvaro Herrera  writes:
> While going over this patch, I noticed that I forgot to add support for
> XidList in copyfuncs.c.  OK if I push this soon quickly?

Yeah, go ahead, that part of copyfuncs is still going to be manually
maintained, so we need the fix.

What about equalfuncs etc?

regards, tom lane




Re: explain analyze rows=%.0f

2022-07-08 Thread Ibrar Ahmed
On Thu, Jul 7, 2022 at 10:53 PM Greg Stark  wrote:

> > -   ->  Parallel Seq Scan on tenk1 (actual rows=1960
> loops=50)
> > +   ->  Parallel Seq Scan on tenk1 (actual rows=1960.00
>
> At the not inconsiderable risk of bike-shedding
>
> I'm wondering if printing something like 0.00 will be somewhat
> deceptive when the real value is non-zero but less than 1 row per 200
> loops. I wonder if the number of decimal places should be calculated
> to produce a minimum of one non-zero digit for non-zero values.
>
> --
> greg
>

+   ->  Parallel Seq Scan on tenk1 (actual rows=1960.00

I have added a new check to remove any ".00" from the output because in
the case of parallel queries we are getting that. Secondly, it is
disturbing many test case outputs.

-- 
Ibrar Ahmed


Re: explain analyze rows=%.0f

2022-07-08 Thread Ibrar Ahmed
On Thu, Jul 7, 2022 at 3:14 PM vignesh C  wrote:

> On Thu, Jun 23, 2022 at 2:25 AM Ibrar Ahmed  wrote:
> >
> >
> >
> > On Thu, Jun 23, 2022 at 1:04 AM David G. Johnston <
> david.g.johns...@gmail.com> wrote:
> >>
> >> On Wed, Jun 22, 2022 at 12:11 PM Ibrar Ahmed 
> wrote:
> >>>
> >>> On Thu, Jun 23, 2022 at 12:01 AM Tom Lane  wrote:
> 
>  Robert Haas  writes:
>  > On Jun 2, 2009, at 9:41 AM, Simon Riggs 
> wrote:
>  >> You're right that the number of significant digits already exceeds
> the
>  >> true accuracy of the computation. I think what Robert wants to see
> is
>  >> the exact value used in the calc, so the estimates can be checked
> more
>  >> thoroughly than is currently possible.
> 
>  > Bingo.
> 
>  Uh, the planner's estimate *is* an integer.  What was under discussion
>  (I thought) was showing some fractional digits in the case where
> EXPLAIN
>  ANALYZE is outputting a measured row count that is an average over
>  multiple loops, and therefore isn't necessarily an integer.  In that
>  case the measured value can be considered arbitrarily precise ---
> though
>  I think in practice one or two fractional digits would be plenty.
> 
>  regards, tom lane
> 
> 
> >>> Hi,
> >>> I was looking at the TODO list and found that the issue requires
> >>> a quick fix. Attached is a patch which shows output like this.
> >>
> >>
> >> Quick code review:
> >>
> >> + "actual rows=%.0f loops=%.0f": " rows=%.2f loops=%.0f",
> >>
> >> The leading space before the else block "rows" does not belong.
> >>
> >> There should be a space after the colon.
> >>
> > Thanks, David for your quick response. I have updated the patch.
> >
> >>
> >> The word "actual" that you are dropping in the else block seems like it
> should belong - it is a header for the entire section not just a modifier
> for the word "rows".  This is evidenced by the timing block verbiage where
> rows is standalone and the word actual comes before time.  In short, only
> the format specifier should change under the current scheme.  Both sections.
> >>
> >> - WRITE_FLOAT_FIELD(rows, "%.0f");
> >> + WRITE_FLOAT_FIELD(rows, "%.2f");
> >>
> >> This one looks suspicious, though I haven't dug into the code to see
> exactly what all is being touched.  That it doesn't have an nloops
> condition like everything else stands out.
> >>
> > I was also thinking about that, but I don't see any harm when we
> ultimately truncating that decimal
> > at a latter stage of code in case of loop = 1.
>
> Thanks for the patch.
>

Thanks for the review.

>
> 1) There are some existing regression tests that are failing, you
> should update the expect files accordingly for the same:
> --- /home/vignesh/postgres/src/test/regress/expected/select_parallel.out
>2022-05-18 20:51:46.874818044 +0530
> +++ /home/vignesh/postgres/src/test/regress/results/select_parallel.out
> 2022-07-07 15:27:34.450440922 +0530
> @@ -545,17 +545,17 @@
>  explain (analyze, timing off, summary off, costs off)
> select count(*) from tenk1, tenk2 where tenk1.hundred > 1
>  and tenk2.thousand=0;
> -QUERY PLAN
> ---
> + QUERY PLAN
>
> +-
>   Aggregate (actual rows=1 loops=1)
> ->  Nested Loop (actual rows=98000 loops=1)
>   ->  Seq Scan on tenk2 (actual rows=10 loops=1)
> Filter: (thousand = 0)
> Rows Removed by Filter: 9990
> - ->  Gather (actual rows=9800 loops=10)
> + ->  Gather (actual rows=9800.00 loops=10)
> Workers Planned: 4
> Workers Launched: 4
> -   ->  Parallel Seq Scan on tenk1 (actual rows=1960 loops=50)
> +   ->  Parallel Seq Scan on tenk1 (actual rows=1960.00
> loops=50)
>   Filter: (hundred > 1)
>
> test select_parallel  ... FAILED  744 ms
>  partition_prune  ... FAILED  861 ms
>  explain  ... FAILED  134 ms
>  memoize  ... FAILED  250 ms
>
> 2) This change is not required as part of this patch:
> --- a/src/backend/access/transam/xact.c
> +++ b/src/backend/access/transam/xact.c
> @@ -122,7 +122,7 @@ boolbsysscan = false;
>   * lookups as fast as possible.
>   */
>  static FullTransactionId XactTopFullTransactionId =
> {InvalidTransactionId};
> -static int nParallelCurrentXids = 0;
> +static int nParallelCurrentXids = 0;
>  static TransactionId *ParallelCurrentXids;
>
>
I have fixed the regression and removed non-related code.

> Regards,
> Vignesh
>


-- 
Ibrar Ahmed


Re: explain analyze rows=%.0f

2022-07-08 Thread Ibrar Ahmed
On Thu, Jul 7, 2022 at 2:41 PM Amit Kapila  wrote:

> On Thu, Jun 23, 2022 at 2:25 AM Ibrar Ahmed  wrote:
> >
> > On Thu, Jun 23, 2022 at 1:04 AM David G. Johnston <
> david.g.johns...@gmail.com> wrote:
> >>
> >> - WRITE_FLOAT_FIELD(rows, "%.0f");
> >> + WRITE_FLOAT_FIELD(rows, "%.2f");
> >>
> >> This one looks suspicious, though I haven't dug into the code to see
> exactly what all is being touched.  That it doesn't have an nloops
> condition like everything else stands out.
> >>
> > I was also thinking about that, but I don't see any harm when we
> ultimately truncating that decimal
> > at a latter stage of code in case of loop = 1.
> >
>
> That change is in the path node which we anyway not going to target as
> part of this change. We only want to change the display for actual
> rows in Explain Analyze. So, I can't see how the quoted change can
> help in any way.
>
> Agreed removed.


> Few miscellaneous comments:
> 
> *
>  static FullTransactionId XactTopFullTransactionId =
> {InvalidTransactionId};
> -static int nParallelCurrentXids = 0;
> +static int nParallelCurrentXids = 0;
>
> Removed.


> I don't see why this change is required.
>
> * Can you please add a comment explaining why we are making this
> change for actual rows?
>

Done

>
> * Can you please write a test case unless there is some existing test
> that covers the change by displaying actual rows values in decimal but
> in that case patch should have that changed output test? If you don't
> think we can reliably write such a test then please let me know the
> reason?
>
> I think there are tests, and I have updated the results accordingly.

> --
> With Regards,
> Amit Kapila.
>


-- 
Ibrar Ahmed


explain_float_row_v3.patch
Description: Binary data


Re: automatically generating node support functions

2022-07-08 Thread Alvaro Herrera
While going over this patch, I noticed that I forgot to add support for
XidList in copyfuncs.c.  OK if I push this soon quickly?

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
>From 24185e0421cc1e22f9a78f56d03e4585a142e78e Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 8 Jul 2022 15:34:47 +0200
Subject: [PATCH] Forgot to add copy support in f10a025cfe97

---
 src/backend/nodes/copyfuncs.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 2c834e4d0d..b8a5715981 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -5978,11 +5978,12 @@ copyObjectImpl(const void *from)
 			break;
 
 			/*
-			 * Lists of integers and OIDs don't need to be deep-copied, so we
-			 * perform a shallow copy via list_copy()
+			 * Lists of integers, OIDs and XIDs don't need to be deep-copied,
+			 * so we perform a shallow copy via list_copy()
 			 */
 		case T_IntList:
 		case T_OidList:
+		case T_XidList:
 			retval = list_copy(from);
 			break;
 
-- 
2.30.2



Re: Add last_vacuum_index_scans in pg_stat_all_tables

2022-07-08 Thread Alvaro Herrera
On 2022-Jul-04, Ken Kato wrote:

> I think having number of index scans of the last vacuum in
> pg_stat_all_tables can be helpful. This value shows how efficiently vacuums
> have performed and can be an indicator to increase maintenance_work_mem.

Yeah, this would be a good metric to expose, since it directly tells how
to set autovacuum_work_mem.  I'm not sure that the shape you propose is
correct, though, because each vacuum run would clobber whatever value
was there before.  No other stats counter works that way; they are all
additive.  But I'm not sure that adding the current number each time is
sensible, either, because then the only thing you know is the average of
the last X runs, which doesn't tell you much.

Saving some sort of history would be much more useful, but of course a
lot more work.

> It was proposed previously[1], but it was not accepted due to the limitation
> of stats collector. Statistics are now stored in shared memory, so we got
> more rooms to store statistics. I think this statistics is still valuable
> for some people, so I am proposing this again.

> [1] 
> https://www.postgresql.org/message-id/20171010.192616.108347483.horiguchi.kyotaro%40lab.ntt.co.jp

I read this thread, but what was proposed there is a bunch of metrics
that are not this one.  The discussions there centered about how it
would be unacceptable to incur in the space cost that would be taken by
adding autovacuum-related metrics completely different from the one you
propose.  That debate is now over, so we're clear to proceed.  But we
need to agree on what to add.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




Re: Aggregate leads to superfluous projection from the scan

2022-07-08 Thread Zhihong Yu
On Fri, Jul 8, 2022 at 9:40 AM Zhihong Yu  wrote:

> Hi,
> Here is the query which involves aggregate on a single column:
>
>
> https://dbfiddle.uk/?rdbms=postgres_13=44bfd8f6b6b5aad34d00d449c04c5a96
>
> As you can see from `Output:`, there are many columns added which are not
> needed by the query executor.
>
> I wonder if someone has noticed this in the past.
> If so, what was the discussion around this topic ?
>
> Thanks
>
Hi,
With the patch, I was able to get the following output:

 explain (analyze, verbose) /*+ IndexScan(t) */select count(fire_year) from
fires t where objectid <= 200;
  QUERY PLAN
--
 Aggregate  (cost=119.00..119.01 rows=1 width=8) (actual time=9.453..9.453
rows=1 loops=1)
   Output: count(fire_year)
   ->  Index Scan using fires_pkey on public.fires t  (cost=0.00..116.50
rows=1000 width=4) (actual time=9.432..9.432 rows=0 loops=1)
 Output: fire_year
 Index Cond: (t.objectid <= 200)
 Planning Time: 52.598 ms
 Execution Time: 13.082 ms

Please pay attention to the column list after `Output:`

Tom:
Can you take a look and let me know what I may have missed ?

Thanks


index-scan-with-non-returnable.patch
Description: Binary data


Re: remove more archiving overhead

2022-07-08 Thread Nathan Bossart
On Fri, Jul 08, 2022 at 01:02:51PM -0400, David Steele wrote:
> I think I wrote this before I'd had enough coffee. "fully persisted to
> storage" can mean many things depending on the storage (Posix, CIFS, S3,
> etc.) so I think this is fine. The basic_archive module is there for people
> who would like implementation details for Posix.

Perhaps this section should warn against reading without sufficient
caffeination.  :)

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-07-08 Thread Nathan Bossart
On Fri, Jul 08, 2022 at 09:39:10PM +0530, Bharath Rupireddy wrote:
> 0001 - there are many places where lstat/stat is being used - don't we
> need to replace all or most of them with get_dirent_type?

It's been a while since I wrote this one, but I believe my intent was to
replace as many [l]stat() calls in ReadDir()-style loops as possible with
get_dirent_type().  Are there any that I've missed?

> 0002 - I'm not quite happy with this patch, with the change,
> checkpoint errors out, if it can't remove just a file - the comments
> there says it all. Is there any strong reason for this change?

Andres noted several concerns upthread.  In short, ignoring unexpected
errors makes them harder to debug and likely masks bugs.

FWIW I agree that it is unfortunate that a relatively non-critical error
here leads to checkpoint failures, which can cause much worse problems down
the road.  I think this is one of the reasons for moving tasks like this
out of the checkpointer, as I'm trying to do with the proposed custodian
process [0].

[0] https://commitfest.postgresql.org/38/3448/

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: pg15b2: large objects lost on upgrade

2022-07-08 Thread Robert Haas
On Fri, Jul 8, 2022 at 11:53 AM Justin Pryzby  wrote:
> pg_upgrade drops template1 and postgres before upgrading:

Hmm, but I bet you could fiddle with template0. Indeed what's the
difference between a user fiddling with template0 and me committing a
patch that bumps catversion? If the latter doesn't prevent pg_upgrade
from working when the release comes out, why should the former?

> I don't have a deep understanding why the DB hasn't imploded at this point,
> maybe related to the filenode map file, but it seems very close to being
> catastrophic.

Yeah, good example.

> It seems like pg_upgrade should at least check that the new cluster has no
> objects with either OID or relfilenodes in the user range..

Well... I think it's not quite that simple. There's an argument for
that rule, to be sure, but in some sense it's far too strict. We only
preserve the OIDs of tablespaces, types, enums, roles, and now
relations. So if you create any other type of object in the new
cluster, like say a function, it's totally fine. You could still fail
if the old cluster happens to contain a function with the same
signature, but that's kind of a different issue. An OID collision for
any of the many object types for which OIDs are not preserved is no
problem at all.

But even if we restrict ourselves to talking about an object type for
which OIDs are preserved, it's still not quite that simple. For
example, if I create a relation in the new cluster, its OID or
relfilenode might be the same as a relation that exists in the old
cluster. In such a case, a failure is inevitable. We're definitely in
big trouble, and the question is only whether pg_upgrade will notice.
But it's also possible that, either by planning or by sure dumb luck,
neither the relation nor the OID that I've created in the new cluster
is in use in the old cluster. In such a case, the upgrade can succeed
without breaking anything, or at least nothing other than our sense of
order in the universe.

Without a doubt, there are holes in pg_upgrade's error checking that
need to be plugged, but I think there is room to debate exactly what
size plug we want to use. I can't really say that it's definitely
stupid to use a plug that's definitely big enough to catch all the
scenarios that might break stuff, but I think my own preference would
be to try to craft it so that it isn't too much larger than necessary.
That's partly because I do think there are some scenarios in which
modifying the new cluster might be the easiest way of working around
some problem, but also because, as a matter of principle, I like the
idea of making rules that correspond to the real dangers. If we write
a rule that says essentially "it's no good if there are two relations
sharing a relfilenode," nobody with any understanding of how the
system works can argue that bypassing it is a sensible thing to do,
and probably nobody will even try, because it's so obviously bonkers
to do so. It's a lot less obviously bonkers to try to bypass the
broader prohibition which you suggest should never be bypassed, so
someone may do it, and get themselves in trouble.

Now I'm not saying such a person will get any sympathy from this list.
If for example you #if out the sanity check and hose yourself, people
here are, including me, are going to suggest that you've hosed
yourself and it's not our problem. But ... the world is full of
warnings about problems that aren't really that serious, and sometimes
those have the effect of discouraging people from taking warnings
about very serious problems as seriously as they should. I know that I
no longer panic when the national weather service texts me to say that
there's a tornado or a flash flood in my area. They've just done that
too many times when there was no real issue with which I needed to be
concerned. If I get caught out by a tornado at some point, they're
probably going to say "well that's why you should always take our
warnings seriously," but I'm going to say "well that's why you
shouldn't send spurious warnings."

> > Perhaps a better solution to this particular problem is to remove the
> > backing files for the large object table and index *before* restoring
> > the dump, deciding what files to remove by asking the running server
> > for the file path. It might seem funny to allow for dangling pg_class
> > entries, but we're going to create that situation for all other user
> > rels anyway, and pg_upgrade treats pg_largeobject as a user rel.
>
> I'll think about it more later.

Sounds good.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Weird behaviour with binary copy, arrays and column count

2022-07-08 Thread James Vanns
Hi all, apologies if this is the wrong list to use, but I figured this is a
low-level enough problem that it might be the best to gain some
understanding.

In PGDB 13.4 I have a simple (obscured) table;

CREATE SEQUENCE tbl_id_seq START 1;
CREATE TABLE tbl (
   a BIGINT UNIQUE NOT NULL DEFAULT nextval('tbl_id_seq'),
   b BIGINT NOT NULL,
   c TIMESTAMP WITHOUT TIME ZONE NOT NULL,
   d INT NOT NULL,
   e INT NULL DEFAULT 0,
   f INT NULL DEFAULT 1,
   g INT NULL DEFAULT 0,
   h BIGINT ARRAY,
   PRIMARY KEY (a)
);

Prior to introducing the ARRAY as field h, everything was working fine
using a binary mode COPY via libpq;
COPY tbl (b,c,d,e,f,g,h) FROM STDIN WITH (FORMAT binary, FREEZE ON)
PQputCopyData()
PQputCopyEnd()
etc.

Now this is where the problem becomes peculiar. I read all the Interwebs
has to offer on the efforts required to encode an array
in binary mode and I've achieved that just fine... but it only works *if* I
remove column g from the COPY statement and data (it can remain in table
definition and be filled in with a default). It's most odd. I've
selectively gone through the table adding/removing fields until I get to
this. It doesn't appear to be the array copy itself - it succeeds with 6
columns (b .. f plus h) but fails with the full complement of 7 (noting
that 'a' is a generative sequence). The error in the PG logs is this;

ERROR:  syntax error at end of input at character 255

It does seem to smell of an alignment, padding, buffer overrun, parsing
kind of error. I tried reintroducing column g as a larger integer or
smaller field and the problem persists (and curiously the input character
error remains at 255).

Also, if I remove the array from the COPY or replace it with a simple
(series of) int, then the problem also goes away. The size of the array
appears to have no relevance - whether its just a single item or 10, for
example, the same problem remains and the same parse error at character
255. Finally, the definition order of the columns/fields also makes no
difference - I can sandwich the array in the middle of the table and the
COPY listing and the upload still succeeds so long as I keep the column
count down at 6, essentially omitting 'g' again in this case.

I've read the array_send/recv functions in arrayfuncs.c and pretty sure I
got that right (otherwise the array copy wouldn't work at all, right!?) ...
its this odd combination of array+field lengths I can't figure!? I couldn't
find the protocol receive code where array_recv is called - that might
provide a clue.

Anyway, I appreciate I've sent this off without code or an MRE - I'll work
on getting something isolated. Until then I wanted to get the ball rolling,
in case anyone has any clues or can suggest what I'm either doing wrong or
where the problem might be in PG!? In the meantime, to confirm the PG array
format in binary its (inc overall field size for wire transfer);

htobe32(total_array_bytes_inc_header);
/* begin header */
htobe32(1); // single dimension
htobe32(0); // flags
htobe32(20); // array of bigint (it's OID)
htobe32(2); // 2 items, as an example
htobe32(1); // offset to first dimension
/* end header */
for (int i = 0 ; i < 2 ; ++i) {
htobe32(sizeof(int8));
htobe64(some_int8_val + i);
}

Cheers,

Jim

-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London


Re: remove more archiving overhead

2022-07-08 Thread David Steele

On 7/8/22 12:54, Nathan Bossart wrote:

On Fri, Jul 08, 2022 at 08:20:09AM -0400, David Steele wrote:


Nathan, I don't see the language about being sure to persist to storage
here?


It's here:
When an archive library encounters a pre-existing file, it may return
true if the WAL file has identical contents to the pre-existing archive
and the pre-existing archive is fully persisted to storage.

Since you didn't catch it, I wonder if it needs improvement.  At the very
least, perhaps we should note that one way to do the latter is to persist
it yourself before returning true, and we could point to basic_archive.c as
an example.  However, I'm hesitant to make these docs too much more
complicated than they already are.  WDYT?


I think I wrote this before I'd had enough coffee. "fully persisted to 
storage" can mean many things depending on the storage (Posix, CIFS, S3, 
etc.) so I think this is fine. The basic_archive module is there for 
people who would like implementation details for Posix.


Regards,
-David




Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2022-07-08 Thread Melih Mutlu
>
> It seems from your results that performance degrades for large
> relations. Did you try to investigate the reasons for the same?
>

I have not tried to investigate the performance degradation for large
relations yet.
Once I'm done with changes for the slot usage, I'll look into this and come
with more findings.

Thanks,
Melih


Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2022-07-08 Thread Melih Mutlu
Hi Amit and Dilip,

Thanks for the replies.


> > I had a quick look into the patch and it seems it is using the worker
> > array index instead of relid while forming the slot name
>

Yes, I changed the slot names so they include slot index instead of
relation id.
This was needed because I aimed to separate replication slots from
relations.

I think that won't work because each time on restart the slot won't be
> fixed. Now, it is possible that we may drop the wrong slot if that
> state of copying rel is SUBREL_STATE_DATASYNC. Also, it is possible
> that while creating a slot, we fail because the same name slot already
> exists due to some other worker which has created that slot has been
> restarted. Also, what about origin_name, won't that have similar
> problems? Also, if the state is already SUBREL_STATE_FINISHEDCOPY, if
> the slot is not the same as we have used in the previous run of a
> particular worker, it may start WAL streaming from a different point
> based on the slot's confirmed_flush_location.
>

You're right Amit. In case of a failure, tablesync phase of a relation may
continue with different worker and replication slot due to this change in
naming.
Seems like the same replication slot should be used from start to end for a
relation during tablesync. However, creating/dropping replication slots can
be a major overhead in some cases.
It would be nice if these slots are somehow reused.

To overcome this issue, I've been thinking about making some changes in my
patch.
So far, my proposal would be as follows:

Slot naming can be like pg__ instead of
pg__. This way each worker can use the same replication
slot during their lifetime.
But if a worker is restarted, then it will switch to a new replication slot
since its pid has changed.

pg_subscription_rel catalog can store replication slot name for each
non-ready relation. Then we can find the slot needed for that particular
relation to complete tablesync.
If a worker syncs a relation without any error, everything works well and
this new replication slot column from the catalog will not be needed.
However if a worker is restarted due to a failure, the previous run of that
worker left its slot behind since it did not exit properly.
And the restarted worker (with a different pid) will see that the relation
is actually in  SUBREL_STATE_FINISHEDCOPY and want to proceed for the
catchup step.
Then the worker can look for that particular relation's replication slot
from pg_subscription_rel catalog (slot name should be there since relation
state is not ready). And tablesync can proceed with that slot.

There might be some cases where some replication slots are left behind. An
example to such cases would be when the slot is removed from
pg_subscription_rel catalog and detached from any relation, but tha slot
actually couldn't be dropped for some reason. For such cases, a slot
cleanup logic is needed. This cleanup can also be done by tablesync workers.
Whenever a tablesync worker is created, it can look for existing
replication slots that do not belong to any relation and any worker (slot
name has pid for that), and drop those slots if it finds any.

What do you think about this new way of handling slots? Do you see any
points of concern?

I'm currently working on adding this change into the patch. And would
appreciate any comment.

Thanks,
Melih


Re: remove more archiving overhead

2022-07-08 Thread Nathan Bossart
On Fri, Jul 08, 2022 at 08:20:09AM -0400, David Steele wrote:
> On 7/7/22 21:56, Kyotaro Horiguchi wrote:
>> Thinking RFC'ish, the meaning of "may" and "must" is significant in
>> this description.  On the other hand it uses both "may" and "can" but
>> I thinkthat their difference is not significant or "can" there is
>> somewhat confusing.  I think the "can" should be "may" here.
> 
> +1.

Done.

>> And since "must" is emphasized, doesn't "may" also needs emphasis?
> 
> I think emphasis only on must is fine.

Yeah, I wanted to emphasize the importance of returning false in this case.
Since it's okay to return true or false in the identical/persisted file
case, I didn't think it deserved emphasis.

> Nathan, I don't see the language about being sure to persist to storage
> here?

It's here:
When an archive library encounters a pre-existing file, it may return
true if the WAL file has identical contents to the pre-existing archive
and the pre-existing archive is fully persisted to storage.

Since you didn't catch it, I wonder if it needs improvement.  At the very
least, perhaps we should note that one way to do the latter is to persist
it yourself before returning true, and we could point to basic_archive.c as
an example.  However, I'm hesitant to make these docs too much more
complicated than they already are.  WDYT?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From a32f95d3b129d386ce03c194bd6cddad2f92b76b Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 7 Apr 2022 14:11:59 -0700
Subject: [PATCH v4 1/2] Reduce overhead of renaming archive status files.

Presently, archive status files are durably renamed from .ready to
.done to indicate that a file has been archived.  Persisting this
rename to disk accounts for a significant amount of the overhead
associated with archiving.  While durably renaming the file
prevents re-archiving in most cases, archive commands and libraries
must already gracefully handle attempts to re-archive the last
archived file after a crash (e.g., a crash immediately after
archive_command exits but before the server renames the status
file).

This change reduces the amount of overhead associated with
archiving by using rename() instead of durable_rename() to rename
the archive status files.  As a consequence, the server is more
likely to attempt to re-archive files after a crash, but as noted
above, archive commands and modules are already expected to handle
this.  It is also possible that the server will attempt to re-
archive files that have been removed or recycled, but the archiver
already handles this, too.

Author: Nathan Bossart
---
 src/backend/postmaster/pgarch.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 25e31c42e1..6ce361707d 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -745,7 +745,19 @@ pgarch_archiveDone(char *xlog)
 
 	StatusFilePath(rlogready, xlog, ".ready");
 	StatusFilePath(rlogdone, xlog, ".done");
-	(void) durable_rename(rlogready, rlogdone, WARNING);
+
+	/*
+	 * To avoid extra overhead, we don't durably rename the .ready file to
+	 * .done.  Archive commands and libraries must gracefully handle attempts
+	 * to re-archive files (e.g., if the server crashes just before this
+	 * function is called), so it should be okay if the .ready file reappears
+	 * after a crash.
+	 */
+	if (rename(rlogready, rlogdone) < 0)
+		ereport(WARNING,
+(errcode_for_file_access(),
+ errmsg("could not rename file \"%s\" to \"%s\": %m",
+		rlogready, rlogdone)));
 }
 
 
-- 
2.25.1

>From 54e1225173211062940ed4c8130ba97c192a99ec Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 7 Jul 2022 10:43:30 -0700
Subject: [PATCH v4 2/2] add note about re-archiving in docs

---
 doc/src/sgml/backup.sgml | 24 +++-
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 73a774d3d7..04a1f94ad0 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -681,14 +681,28 @@ test ! -f /mnt/server/archivedir/000100A90065  cp pg_wal/0
 any pre-existing archive file.  This is an important safety feature to
 preserve the integrity of your archive in case of administrator error
 (such as sending the output of two different servers to the same archive
-directory).
+directory).  It is advisable to test your proposed archive library to ensure
+that it does not overwrite an existing file.

 

-It is advisable to test your proposed archive library to ensure that it
-indeed does not overwrite an existing file, and that it returns
-false in this case.
-The example command above for Unix ensures this by including a separate
+In rare cases, PostgreSQL may attempt to
+re-archive a WAL file that was previously archived.  For example, if 

Aggregate leads to superfluous projection from the scan

2022-07-08 Thread Zhihong Yu
Hi,
Here is the query which involves aggregate on a single column:

https://dbfiddle.uk/?rdbms=postgres_13=44bfd8f6b6b5aad34d00d449c04c5a96

As you can see from `Output:`, there are many columns added which are not
needed by the query executor.

I wonder if someone has noticed this in the past.
If so, what was the discussion around this topic ?

Thanks


Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)

2022-07-08 Thread Bharath Rupireddy
On Thu, Jun 30, 2022 at 5:33 PM Peter Eisentraut
 wrote:
>
> On 25.04.22 20:39, Stephen Frost wrote:
> > All of which isn't an issue if we don't have an external tool trying to
> > do this and instead have the server doing it as the server knows its
> > internal status, that the archive command has been failing long enough
> > to pass the configuration threshold, and that the WAL isn't needed for
> > crash recovery.  The ability to avoid having to crash and go through
> > that process is pretty valuable.  Still, a crash may still happen and
> > it'd be useful to have a clean way to deal with it.  I'm not really a
> > fan of having to essentially configure this external command as well as
> > have the server configured.  Have we settled that there's no way to make
> > the server archive while there's no space available and before trying to
> > write out more data?
>
> I would also be in favor of not having an external command and instead
> pursue a solution built into the server along the ways you have
> outlined.  Besides the better integration and less potential for misuse
> that can be achieved that way, maintaining a separate tool has some
> constant overhead and if users only use it every ten years on average,
> it seems not worth it.

Thanks for the feedback. My understanding is this: introduce a GUC
(similar to max_slot_wal_keep_size), when set, beyond which postgres
will not keep the WAL files even if archiving is failing, am I right?

If my understanding is correct, are we going to say that postgres may
not archive "all" the WAL files that may be needed for PITR if the
archiving is failing for long enough? Will this be okay in production
environments?

Regards,
Bharath Rupireddy.




Re: Eliminating SPI from RI triggers - take 2

2022-07-08 Thread Robert Haas
On Fri, Jul 1, 2022 at 2:23 AM Amit Langote  wrote:
> So, I hacked together a patch (attached 0001) that invents an "RI
> plan" construct (struct RIPlan) to replace the use of an "SPI plan"
> (struct _SPI_plan).
>
> With that in place, I decided to rebase my previous patch [1] to use
> this new interface and the result is attached 0002.

I think inventing something like RIPlan is probably reasonable, but
I'm not sure how much it really does to address the objections that
were raised previously. How do we know that ri_LookupKeyInPkRel does
all the same things that executing a plan would have done? I see that
function contains permission-checking logic, for example, as well as
snapshot-related logic, and maybe there are other subsystems to worry
about, like rules or triggers or row-level security. Maybe there's no
answer to that problem other than careful manual verification, because
after all the only way to be 100% certain we're doing all the things
that would happen if you executed a plan is to execute a plan, which
kind of defeats the point of the whole thing. All I'm saying is that
I'm not sure that this refactoring in and of itself addresses that
concern.

As far as 0002 goes, the part I'm most skeptical about is this:

+static bool
+ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan)
+{
+ /* Never store anything that can be invalidated. */
+ return true;
+}

Isn't that leaving rather a lot on the table? ri_LookupKeyInPkRel is
going to be called a lot of times and do a lot of things over and over
again that maybe only need to be done once, like checking permissions
and looking up the operators to use and reopening the index. And all
the stuff ExecGetLeafPartitionForKey does too, yikes that's a lot of
stuff. Now maybe that's what Tom wants, I don't know. Certainly, the
existing SQL-based implementation is going to do that stuff on every
call, too; I'm just not sure that's a good thing. I think there's some
debate to be had here over what behavior we need to preserve exactly
vs. what we can and should change. For instance, it seems clear to me
that leaving out permissions checks altogether would be not OK, but if
this implementation arranged to cache the results of a permission
check and the SQL-based implementations don't, is that OK? Maybe Tom
would argue that it isn't, because he considers that a part of the
user-visible behavior, but I'm not sure that's the right view of it. I
think what we're promising the user is that we will check permissions,
not that we're going to do it separately for every trigger firing, or
even that every kind of trigger is going to do it exactly the same
number of times as every other trigger. I think we need some input
from Tom (and perhaps others) on how rigidly we need to maintain the
high-level behavior here before we can really say much about whether
the implementation is as good as it can be.

I suspect, though, that there's more that can be done here in terms of
sharing code. For instance, picking on the permissions checking logic,
presumably that's something that every non-SQL implementation would
need to do. But the rest of what's in ri_LookupKeyInPkRel() is
specific to one particular kind of trigger. If we had multiple non-SQL
trigger types, we'd want to somehow have common logic for permissions
checking for all of them.

I also suspect that we ought to have a separation between planning and
execution even for non-SQL based things. You don't really have that
here. What that ought to look like, though, depends on the answers to
the questions above, about how exactly we think we need to reproduce
the existing behavior.

I find my ego slightly wounded by the comment that "the partition
descriptor machinery has a hack that assumes that the queries
originating in this module push the latest snapshot in the
transaction-snapshot mode." It's true that the partition descriptor
machinery gives different answers depending on the active snapshot,
but, err, is that a hack, or just a perfectly reasonable design
decision? An alternative might be for PartitionDirectoryLookup to take
a snapshot as an explicit argument rather than relying on the global
variable to get that information from context. I generally feel that
we rely too much on global variables where we should be passing around
explicit parameters, so if you're just arguing that explicit
parameters would be better here, then I agree and just didn't think of
it. If you're arguing that making the answer depend on the snapshot is
itself a bad idea, I don't agree with that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-07-08 Thread Bharath Rupireddy
On Sat, Apr 9, 2022 at 1:49 AM Nathan Bossart  wrote:
>
> On Wed, Mar 30, 2022 at 09:21:30AM -0700, Nathan Bossart wrote:
> > Here is an updated patch set.
>
> rebased

Thanks.

0001 - there are many places where lstat/stat is being used - don't we
need to replace all or most of them with get_dirent_type?

0002 - I'm not quite happy with this patch, with the change,
checkpoint errors out, if it can't remove just a file - the comments
there says it all. Is there any strong reason for this change?

- /*
- * It's not particularly harmful, though strange, if we can't
- * remove the file here. Don't prevent the checkpoint from
- * completing, that'd be a cure worse than the disease.
- */
  if (unlink(path) < 0)
- {
- ereport(LOG,
+ ereport(ERROR,
  (errcode_for_file_access(),
  errmsg("could not remove file \"%s\": %m",
  path)));
- continue;
- }

Regards,
Bharath Rupireddy.




Re: System catalog documentation chapter

2022-07-08 Thread Bruce Momjian
On Fri, Jul  8, 2022 at 11:49:47AM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > Agreed. I don't want to break links into the documentation in final
> > released versions, so head and PG15 seem wise.
> 
> I would not expect this to change the doc URLs for any individual
> catalogs or views --- if it does, I won't be happy.

Good point --- I thought the chapter was in the URL, but I now see it is
just the section heading:

https://www.postgresql.org/docs/devel/view-pg-available-extensions.html

so I guess we can backpatch this with no issues.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: pg15b2: large objects lost on upgrade

2022-07-08 Thread Justin Pryzby
On Thu, Jul 07, 2022 at 03:11:38PM -0400, Robert Haas wrote:
> point: we assume that nothing significant has happened between when
> the cluster was created and when pg_upgrade is run, but we don't check
> it. Either we shouldn't assume it, or we should check it.
> 
> So, is such activity ever legitimate? I think there are people doing
> it. The motivation is that maybe you have a dump from the old database
> that doesn't quite restore on the new version, but by doing something
> to the new cluster, you can make it restore. For instance, maybe there
> are some functions that used to be part of core and are now only
> available in an extension. That's going to make pg_upgrade's
> dump-and-restore workflow fail, but if you install that extension onto
> the new cluster, perhaps you can work around the problem. It doesn't
> have to be an extension, even. Maybe some function in core just got an
> extra argument, and you're using it, so the calls to that function
> cause dump-and-restore to fail. You might try overloading it in the
> new database with the old calling sequence to get things working.

I don't think that's even possible.

pg_upgrade drops template1 and postgres before upgrading:

 * template1 database will already exist in the target 
installation,
 * so tell pg_restore to drop and recreate it; otherwise we 
would fail
 * to propagate its database-level properties.

 * postgres database will already exist in the target 
installation, so
 * tell pg_restore to drop and recreate it; otherwise we would 
fail to
 * propagate its database-level properties.

For any other DBs, you'd hit an error if, after initdb'ing, you started the new
cluster, connected to it, created a DB (?!) and then tried to upgrade:

pg_restore: error: could not execute query: ERROR:  database "pryzbyj" 
already exists

So if people start, connect, and then futz with a cluster before upgrading it,
it must be for global stuff (roles, tablespaces), and not per-DB stuff.
Also, pg_upgrade refuses to run if additional roles are defined...
So I'm not seeing what someone could do on the new cluster.

That supports the idea that it'd be okay to refuse to upgrade anything other
than a pristine cluster.

> Now, are these kinds of things considered to be supported? Well, I
> don't know that we've made any statement about that one way or the
> other. Perhaps they are not. But I can see why people want to use
> workarounds like this. The alternative is having to dump-and-restore
> instead of an in-place upgrade, and that's painfully slow by
> comparison.

The alternative in cases that I know about is to fix the old DB to allow it to
be upgraded.  check.c has a list of the things that aren't upgradable, and The
fixes are some things like ALTER TABLE DROP OIDs.  We just added another one to
handle v14 aggregates (09878cdd4).

> My view on this is that, while we probably don't want to make such
> things officially supported, I don't think we should ban it outright,
> either. We probably can't support an upgrade after the next cluster
> has been subjected to arbitrary amounts of tinkering, but we're making
> a mistake if we write code that has fragile assumptions for no really
> good reason. I think we can do better than this excerpt from your
> patch, for example:
> 
> +   /* Keep track of whether a filenode matches the OID */
> +   if (maps[mapnum].relfilenumber == LargeObjectRelationId)
> +   *has_lotable = true;
> +   if (maps[mapnum].relfilenumber == LargeObjectLOidPNIndexId)
> +   *has_loindex = true;
> 
> I spent a while struggling to understand this because it seems to me
> that every database has an LO table and an LO index, so what's up with
> these names?  I think what these names are really tracking is whether
> the relfilenumber of pg_largeobject and its index in the old database
> had their default values. 

Yes, has_lotable means "has a LO table whose filenode matches the OID".
I will solicit suggestions for a better name.

> But this makes the assumption that the LO
> table and LO index in the new database have never been subjected to
> VACUUM FULL or CLUSTER and, while there's no real reason to do that, I
> can't quite see what the point of such an unnecessary and fragile
> assumption might be.

The idea of running initdb, starting the cluster, and connecting to it to run
VACUUM FULL scares me.  Now that I think about it, it might be almost
inconsequential, since the initial DBs are dropped, and the upgrade will fail
if any non-template DB exists.  But .. maybe something exciting happens if you
vacuum full a shared catalog...  Yup.

With my patch:

./tmp_install/usr/local/pgsql/bin/initdb --no-sync -D pg15b1.dat -k
./tmp_install/usr/local/pgsql/bin/postgres -D pg15b1.dat -c 
logging_collector=no -p 5678 -k /tmp&
postgres=# \lo_import /etc/shells 

Re: System catalog documentation chapter

2022-07-08 Thread Tom Lane
Bruce Momjian  writes:
> Agreed. I don't want to break links into the documentation in final
> released versions, so head and PG15 seem wise.

I would not expect this to change the doc URLs for any individual
catalogs or views --- if it does, I won't be happy.

regards, tom lane




Re: Switching XLog source from archive to streaming when primary available

2022-07-08 Thread Bharath Rupireddy
On Sat, Jun 25, 2022 at 1:31 AM Cary Huang  wrote:
>
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   not tested
> Documentation:not tested
>
> Hello
>
> I tested this patch in a setup where the standby is in the middle of 
> replicating and REDOing primary's WAL files during a very large data 
> insertion. During this time, I keep killing the walreceiver process to cause 
> a stream failure and force standby to read from archive. The system will 
> restore from archive for "wal_retrieve_retry_interval" seconds before it 
> attempts to steam again. Without this patch, once the streaming is 
> interrupted, it keeps reading from archive until standby reaches the same 
> consistent state of primary and then it will switch back to streaming again. 
> So it seems that the patch does the job as described and does bring some 
> benefit during a very large REDO job where it will try to re-stream after 
> restoring some WALs from archive to speed up this "catch up" process. But if 
> the recovery job is not a large one, PG is already switching back to 
> streaming once it hits consistent state.

Thanks a lot Cary for testing the patch.

> Here's a v1 patch that I've come up with. I'm right now using the
> existing GUC wal_retrieve_retry_interval to switch to stream mode from
> archive mode as opposed to switching only after the failure to get WAL
> from archive mode. If okay with the approach, I can add tests, change
> the docs and add a new GUC to control this behaviour. I'm open to
> thoughts and ideas here.

It will be great if I can hear some thoughts on the above points (as
posted upthread).

Regards,
Bharath Rupireddy.




Re: automatically generating node support functions

2022-07-08 Thread Peter Eisentraut

On 08.07.22 15:52, Tom Lane wrote:

I'll re-read the patch today, but how open are you to putting the
struct attributes at the top?  I'm willing to do the legwork.


I agree near the top would be preferable.  I think it would even be 
feasible to parse the whole thing if pgindent split it across lines.  I 
sort of tried to maintain the consistency with C/C++ attributes like 
__attribute__ and [[attribute]], hoping that that would confuse other 
tooling the least.  Feel free to experiment further.





Re: System catalog documentation chapter

2022-07-08 Thread Bruce Momjian
On Fri, Jul  8, 2022 at 09:21:13AM +1000, Peter Smith wrote:
> > My only question is whether we apply this to head, head & PG 15, or all
> > branches?  I think the URLs will change with this adjustment so we might
> > want to do only head & PG 15.
> 
> AFAIK the chapter has been structured like this for many years and
> nobody patched it sooner, so perhaps that is an indication the older
> branches don't really need changing?

Agreed. I don't want to break links into the documentation in final
released versions, so head and PG15 seem wise.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Two successive tabs in test case are causing syntax error in psql

2022-07-08 Thread Jingtang Zhang
Hi, community.

Recently, when I was developing some function about INSERT ... ON CONFLICT,
I used test cases in `src/test/regress/sql/insert_conflict.sql` to evaluate
my function. When I copy the CREATE TABLE from this case alone, and paste
it to psql, I got a syntax error. As I go through the case carefully, I
found the CREATE TABLE uses two tabs to separate column name and column
type, and this two tabs are regarded as an auto completion instruction by
psql, causing no separation between column name and column type anymore.

It may not be a problem since this case has passed the regression, but
would it be better to use space here to avoid this confusing situation?
Since other part of this case are all using spaces.

Never mind my opinion as a beginner, thanks.


Jingtang Zhang


0001-Fix-two-tabs-are-regarded-as-auto-completion-in-test.patch
Description: Binary data


Re: Fast COPY FROM based on batch insert

2022-07-08 Thread Andrey Lepikhov

On 8/7/2022 05:12, Ian Barwick wrote:
     ERROR:  bind message supplies 0 parameters, but prepared statement 
"pgsql_fdw_prep_178" requires 6
     CONTEXT:  remote SQL command: INSERT INTO public.foo_part_1(t, v1, 
v2, v3, v4, v5) VALUES ($1, $2, $3, $4, $5, $6)

     COPY foo, line 88160
Thanks, I got it. MultiInsertBuffer are created on the first non-zero 
flush of tuples into the partition and isn't deleted from the buffers 
list until the end of COPY. And on a subsequent flush in the case of 
empty buffer we catch the error.
Your fix is correct, but I want to propose slightly different change 
(see in attachment).


--
regards,
Andrey Lepikhov
Postgres Professionaldiff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 245a260982..203289f7f2 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -329,7 +329,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
   
resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL);
 
/* Flush into foreign table or partition */
-   do {
+   while(sent < nused)
+   {
int size = (resultRelInfo->ri_BatchSize < nused - sent) 
?
resultRelInfo->ri_BatchSize : 
(nused - sent);
int inserted = size;
@@ -340,7 +341,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,

 NULL,

 );
sent += size;
-   } while (sent < nused);
+   }
}
else
{

Re: [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections

2022-07-08 Thread Robert Haas
On Fri, Jul 8, 2022 at 10:11 AM Yura Sokolov  wrote:
> I see analogy with Bus Stop:
> - there is bus stop
> - there is a schedule of bus arriving this top
> - there are passengers, who every day travel with this bus
>
> Bus occasionally comes later... Well, it comes later quite often...
>
> Which way Major (or other responsible person) should act?

I do not think that is a good analogy, because a bus schedule is an
implicit promise - or at least a strong suggestion - that the bus will
arrive at the scheduled time.

In this case, who made such a promise? The original post presents it
as fact that these systems should give compatible answers at all
times, but there's nothing in the code or documentation to suggest
that this is true.

IMHO, a better analogy would be if you noticed that the 7:03am bus was
normally blue and you took that one because you have a small child who
likes the color blue and it makes them happy to take a blue bus. And
then one day the bus at that time is a red bus and your child is upset
and you call the major (or other responsible person) to complain.
They're probably not going to handle that situation by trying to send
a blue bus at 7:03am as often as possible. They're going to tell you
that they only promised you a bus at 7:03am, not what color it would
be.

Perhaps that's not an ideal analogy either, because the reported wait
event and the reported activity are more closely related than the time
of a bus is to the color of the bus. But I think it's still true that
nobody ever promised that those values would be compatible with each
other, and that's not really fixable, and that there are lots of other
cases just like this one which can't be fixed either.

I think that the more we try to pretend like it is possible to make
these values seem like they are synchronized, the more unhappy people
will be in the unavoidable cases where they aren't, and the more
pressure there will be to try to tighten it up even further. That's
likely to result in code that is more complex and slower, which I do
not want, and especially not for the sake of avoiding a harmless
reporting discrepancy.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: ERROR: operator does not exist: json = json

2022-07-08 Thread Andrew Dunstan


On 2022-07-08 Fr 07:57, Dagfinn Ilmari Mannsåker wrote:
> Erik Rijkers  writes:
>
>> Hi,
>>
>> Comparison of 2 values of type jsonb is allowed.
>>
>> Comparison of 2 values of type json gives an error.
>>
>> That seems like an oversight -- or is it deliberate?
> This is because json is just a textual representation, and different
> JSON strings can be semantically equal because e.g. whitespace and
> object key order is not significant.
>
>> Example:
>>
>> select '42'::json = '{}'::json;
>> --> ERROR:  operator does not exist: json = json
>>
>> (of course, easily 'solved' by casting but that's not really the
>> point)
> To do a proper comparison you have to parse it into a semantic form,
> which is what casting to jsonb does.


Alternatively, if you really need something like this, try



(I should probably update it to mark the functions as parallel safe)


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg15b2: large objects lost on upgrade

2022-07-08 Thread Robert Haas
On Thu, Jul 7, 2022 at 4:16 PM Bruce Momjian  wrote:
> You are right to be concerned since you are spanning number spaces, but
> I think you are fine because the relfilenode in the user-space cannot
> have been used since it already was being used in each database.  It is
> true we never had a per-database rename like this before.

Thanks for checking over the reasoning, and the kind words in general.
I just committed Justin's fix for the bug, without fixing the fact
that the new cluster's original pg_largeobject files will be left
orphaned afterward. That's a relatively minor problem by comparison,
and it seemed best to me not to wait too long to get the main issue
addressed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections

2022-07-08 Thread Yura Sokolov
В Пт, 08/07/2022 в 09:44 -0400, Robert Haas пишет:
> On Thu, Jul 7, 2022 at 10:39 PM Kyotaro Horiguchi
>  wrote:
> > At Thu, 7 Jul 2022 13:58:06 -0500, Justin Pryzby  
> > wrote in
> > > I agree that this is a bug, since it can (and did) cause false positives 
> > > in a
> > > monitoring system.
> > 
> > I'm not this is undoubtfully a bug but agree about the rest.
> 
> I don't agree that this is a bug, and even if it were, I don't think
> this patch can fix it.
> 
> Let's start with the second point first: pgstat_report_wait_start()
> and pgstat_report_wait_end() change the advertised wait event for a
> process, while the backend state is changed by
> pgstat_report_activity(). Since those function calls are in different
> places, those changes are bound to happen at different times, and
> therefore you can observe drift between the two values. Now perhaps
> there are some one-directional guarantees: I think we probably always
> set the state to idle before we start reading from the client, and
> always finish reading from the client before the state ceases to be
> idle. But I don't really see how that helps anything, because when you
> read those values, you must read one and then the other. If you read
> the activity before the wait event, you might see the state before it
> goes idle and then the wait event after it's reached ClientRead. If
> you read the wait event before the activity, you might see the wait
> event as ClientRead, and then by the time you check the activity the
> backend might have gotten some data from the client and no longer be
> idle. The very best a patch like this can hope to do is narrow the
> race condition enough that the discrepancies are observed less
> frequently in practice.
> 
> And that's why I think this is not a bug fix, or even a good idea.
> It's just encouraging people to rely on something which can never be
> fully reliable in the way that the original poster is hoping. There
> was never any intention of having wait events synchronized with the
> pgstat_report_activity() stuff, and I think that's perfectly fine.
> Both systems are trying to provide visibility into states that can
> change very quickly, and therefore they need to be low-overhead, and
> therefore they use very lightweight synchronization, which means that
> ephemeral discrepancies are possible by nature. There are plenty of
> other examples of that as well. You can't for example query pg_locks
> and pg_stat_activity in the same query and expect that all and only
> those backends that are apparently waiting for a lock in
> pg_stat_activity will have an ungranted lock in pg_locks. It just
> doesn't work like that, and there's a very good reason for that:
> trying to make all of these introspection facilities behave in
> MVCC-like ways would be painful to code and probably end up slowing
> the system down substantially.
> 
> I think the right fix here is to change nothing in the code, and stop
> expecting these things to be perfectly consistent with each other.

I see analogy with Bus Stop:
- there is bus stop
- there is a schedule of bus arriving this top
- there are passengers, who every day travel with this bus

Bus occasionally comes later... Well, it comes later quite often...

Which way Major (or other responsible person) should act?

First possibility: do all the best Bus comes at the schedule. And
although there will no be 100% guarantee, it will raise from 90%
to 99%.

Second possibility: tell the passengers "you should not rely on bus
schedule, and we will not do anything to make it more reliable".

If I were passenger, I'd prefer first choice.


regards

Yura





Re: Add function to return backup_label and tablespace_map

2022-07-08 Thread David Steele

On 7/8/22 09:09, David Steele wrote:

On 7/8/22 08:22, Julien Rouhaud wrote:

On Fri, Jul 8, 2022 at 7:42 PM David Steele  wrote:


On 7/7/22 12:43, Fujii Masao wrote:

Since an exclusive backup method was dropped in v15, in v15 or 
later, we

need to create backup_label and tablespace_map files from the result of
pg_backup_stop() when taking a base backup using low level backup API.
One issue when doing this is that; there is no simple way to create
those files from two columns "labelfile" and "spcmapfile" that
pg_backup_stop() returns if we execute it via psql. Probaby we need to
store those columns in a temporary file and run some OS commands or
script to separate that file into backup_label and tablespace_map.


Why not just select these columns into a temp table:

create temp table backup_result as select * from pg_backup_stop(...);

Then they can be easily dumped with \o by selecting from the temp table.


That wouldn't help people making backups from standby servers.


Ah, yes, good point. This should work on a standby, though:

select quote_literal(labelfile) as backup_label from pg_backup_stop(...) 
\gset

\pset tuples_only on
\pset format unaligned
\o /backup_path/backup_label
select :backup_label;


Looks like I made that more complicated than it needed to be:

select * from pg_backup_stop(...) \gset
\pset tuples_only on
\pset format unaligned
\o /backup_path/backup_label
select :'labelfile';

Regards,
-David




Re: Add function to return backup_label and tablespace_map

2022-07-08 Thread David Steele




On 7/8/22 09:10, Christoph Berg wrote:

Re: David Steele

To enable us to do that more easily, how about adding the
pg_backup_label() function that returns backup_label and tablespace_map?
I'm thinking to make this function available just after
pg_backup_start() finishes


I was just wondering: Why is "labelfile" only returned by
pg_backup_stop()? All the info in there is already available at
pg_backup_start() time. 


Not sure exactly why this decision was made in 9.6 (might be because 
tablespace_map does need to be generated at stop time), but I'm planning 
to add data to this file in PG16 that is only available at stop time. In 
particular, backup software would like to know the earliest possible 
time that can be used for PITR and right now this needs to be 
approximated. Would be good to have that in backup_label along with 
start time. Min recovery xid would also be very useful.



Having the output available earlier would
allow writing the backup_label into the backup directory, or store it
along some filesystem snapshot that is already immutable by the time
pg_backup_stop is called.


What is precluded by getting the backup label after pg_backup_stop()? 
Perhaps a more detailed example here would be helpful.



If we rename all functions anyway for PG15, we could move the info
from stop to start.


This makes me nervous as I'm sure users will immediately start writing
backup_label into PGDATA to make their lives easier. Having backup_label in
PGDATA for a running cluster causes problems and is the major reason we
deprecated and then removed the exclusive method. In addition, what little
protection we had from this condition has been removed.


Is that really an argument for making the life of everyone else
harder?


I don't see how anyone's life is made harder unless the plan is to write 
backup_label into PGDATA, which should not be done.


As we've noted before, there's no point in pretending that doing backup 
correctly is easy because it is definitely not.


Regards,
-David




Re: automatically generating node support functions

2022-07-08 Thread Tom Lane
Peter Eisentraut  writes:
> On 06.07.22 22:46, Tom Lane wrote:
>> ...  There is one nasty problem
>> we need a solution to, which is that pgindent is not at all on board
>> with this idea of attaching node attrs to typedefs.

> I have found that putting the attributes at the end of the struct 
> definition, right before the semicolon, works, so I have changed it that 
> way.  (This is also where a gcc __attribute__() would go, so it seems 
> reasonable.)

That was the first solution I thought of as well, but I do not like
it from a cosmetic standpoint.  The node attributes are a pretty
critical part of the node definition (especially "abstract"),
so shoving them to the very end is not helpful for readability.
IMO anyway.

> I think for this present patch, I would do a catversion bump, just to be 
> sure, in case some of the printed node fields are different now.

I know from comparing the code that some printed node tags have
changed, and so has the print order of some fields.  It might be
that none of those changes are in node types that can appear in
stored rules --- but I'm not sure, so I concur that doing a
catversion bump for this commit is advisable.

> It was also my plan to remove the #ifdef OBSOLETE sections in a separate 
> commit right after, just to be clear.

Yup, my thought as well.  There are a few other mop-up things
I want to do shortly after (e.g. add copyright-notice headers
to the emitted files), but let's wait for the buildfarm's
opinion of the main commit first.

> Final thoughts?

I'll re-read the patch today, but how open are you to putting the
struct attributes at the top?  I'm willing to do the legwork.

regards, tom lane




Re: [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections

2022-07-08 Thread Robert Haas
On Thu, Jul 7, 2022 at 10:39 PM Kyotaro Horiguchi
 wrote:
> At Thu, 7 Jul 2022 13:58:06 -0500, Justin Pryzby  wrote 
> in
> > I agree that this is a bug, since it can (and did) cause false positives in 
> > a
> > monitoring system.
>
> I'm not this is undoubtfully a bug but agree about the rest.

I don't agree that this is a bug, and even if it were, I don't think
this patch can fix it.

Let's start with the second point first: pgstat_report_wait_start()
and pgstat_report_wait_end() change the advertised wait event for a
process, while the backend state is changed by
pgstat_report_activity(). Since those function calls are in different
places, those changes are bound to happen at different times, and
therefore you can observe drift between the two values. Now perhaps
there are some one-directional guarantees: I think we probably always
set the state to idle before we start reading from the client, and
always finish reading from the client before the state ceases to be
idle. But I don't really see how that helps anything, because when you
read those values, you must read one and then the other. If you read
the activity before the wait event, you might see the state before it
goes idle and then the wait event after it's reached ClientRead. If
you read the wait event before the activity, you might see the wait
event as ClientRead, and then by the time you check the activity the
backend might have gotten some data from the client and no longer be
idle. The very best a patch like this can hope to do is narrow the
race condition enough that the discrepancies are observed less
frequently in practice.

And that's why I think this is not a bug fix, or even a good idea.
It's just encouraging people to rely on something which can never be
fully reliable in the way that the original poster is hoping. There
was never any intention of having wait events synchronized with the
pgstat_report_activity() stuff, and I think that's perfectly fine.
Both systems are trying to provide visibility into states that can
change very quickly, and therefore they need to be low-overhead, and
therefore they use very lightweight synchronization, which means that
ephemeral discrepancies are possible by nature. There are plenty of
other examples of that as well. You can't for example query pg_locks
and pg_stat_activity in the same query and expect that all and only
those backends that are apparently waiting for a lock in
pg_stat_activity will have an ungranted lock in pg_locks. It just
doesn't work like that, and there's a very good reason for that:
trying to make all of these introspection facilities behave in
MVCC-like ways would be painful to code and probably end up slowing
the system down substantially.

I think the right fix here is to change nothing in the code, and stop
expecting these things to be perfectly consistent with each other.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Patch proposal: New hooks in the connection path

2022-07-08 Thread Tom Lane
Bharath Rupireddy  writes:
> On Fri, Jul 8, 2022 at 1:40 AM Tom Lane  wrote:
>> It doesn't seem like a great place for a hook, because the list of stuff
>> you could safely do there would be mighty short, possibly the empty set.

> I agree with this. But, all of the areas that v2-0003 touched for
> connectivity failures, they typically are emitting
> ereport(FATAL,/ereport(COMMERROR, (in ProcessStartupPacket) and we
> have emit_log_hook already being exposed and the implementers can,
> literally, do anything the hook.

This is utterly off-point, because those calls are not inside
signal handlers.

regards, tom lane




Re: Add function to return backup_label and tablespace_map

2022-07-08 Thread Christoph Berg
Re: David Steele
> > To enable us to do that more easily, how about adding the
> > pg_backup_label() function that returns backup_label and tablespace_map?
> > I'm thinking to make this function available just after
> > pg_backup_start() finishes

I was just wondering: Why is "labelfile" only returned by
pg_backup_stop()? All the info in there is already available at
pg_backup_start() time. Having the output available earlier would
allow writing the backup_label into the backup directory, or store it
along some filesystem snapshot that is already immutable by the time
pg_backup_stop is called.

If we rename all functions anyway for PG15, we could move the info
from stop to start.

> This makes me nervous as I'm sure users will immediately start writing
> backup_label into PGDATA to make their lives easier. Having backup_label in
> PGDATA for a running cluster causes problems and is the major reason we
> deprecated and then removed the exclusive method. In addition, what little
> protection we had from this condition has been removed.

Is that really an argument for making the life of everyone else
harder?

Christoph




Re: Add function to return backup_label and tablespace_map

2022-07-08 Thread David Steele

On 7/8/22 08:22, Julien Rouhaud wrote:

On Fri, Jul 8, 2022 at 7:42 PM David Steele  wrote:


On 7/7/22 12:43, Fujii Masao wrote:


Since an exclusive backup method was dropped in v15, in v15 or later, we
need to create backup_label and tablespace_map files from the result of
pg_backup_stop() when taking a base backup using low level backup API.
One issue when doing this is that; there is no simple way to create
those files from two columns "labelfile" and "spcmapfile" that
pg_backup_stop() returns if we execute it via psql. Probaby we need to
store those columns in a temporary file and run some OS commands or
script to separate that file into backup_label and tablespace_map.


Why not just select these columns into a temp table:

create temp table backup_result as select * from pg_backup_stop(...);

Then they can be easily dumped with \o by selecting from the temp table.


That wouldn't help people making backups from standby servers.


Ah, yes, good point. This should work on a standby, though:

select quote_literal(labelfile) as backup_label from pg_backup_stop(...) 
\gset

\pset tuples_only on
\pset format unaligned
\o /backup_path/backup_label
select :backup_label;

Regards,
-David




Re: Backup command and functions can cause assertion failure and segmentation fault

2022-07-08 Thread Robert Haas
On Thu, Jul 7, 2022 at 10:58 AM Fujii Masao  wrote:
> But if many think that it's worth adding the test, I will give a try. But 
> even in that case, I think it's better to commit the proposed patch at first 
> to fix the bug, and then to write the patch adding the test.

I don't think that we necessarily need to have a test for this patch.
It's true that we don't really have good test coverage of write-ahead
logging and recovery, but this doesn't seem like the most important
thing to be testing in that area, either, and developing stable tests
for stuff like this can be a lot of work.

I do kind of feel like the patch is fixing two separate bugs. The
change to SendBaseBackup() is fixing the problem that, because there's
SQL access on replication connections, we could try to start a backup
in the middle of another backup by mixing and matching the two
different methods of doing backups. The change to do_pg_abort_backup()
is fixing the fact that, after aborting a base backup, we don't reset
the session state properly so that another backup can be tried
afterwards.

I don't know if it's worth committing them separately - they are very
small fixes. But it would probably at least be good to highlight in
the commit message that there are two different issues.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Patch proposal: New hooks in the connection path

2022-07-08 Thread Bharath Rupireddy
On Fri, Jul 8, 2022 at 1:40 AM Tom Lane  wrote:
>
> Joe Conway  writes:
> > It isn't clear to me if having a hook in the timeout handler is a
> > nonstarter -- perhaps a comment with suitable warning for prospective
> > extension authors is enough? Anyone else want to weigh in on this issue
> > specifically?
>
> It doesn't seem like a great place for a hook, because the list of stuff
> you could safely do there would be mighty short, possibly the empty set.
> Write to shared memory?  Not too safe.  Write to a file?  Even less.
> Write to local memory?  Pointless, because we're about to _exit(1).
> Pretty much anything I can think of that you'd want to do is something
> we've already decided the core code can't safely do, and putting it
> in a hook won't make it safer.

I agree with this. But, all of the areas that v2-0003 touched for
connectivity failures, they typically are emitting
ereport(FATAL,/ereport(COMMERROR, (in ProcessStartupPacket) and we
have emit_log_hook already being exposed and the implementers can,
literally, do anything the hook.

Looking at v2-0003 patch and emit_log_hook, how about we filter out
for those connectivity errors either based on error codes and if they
aren't unique, perhaps passing special flags to ereport API indicating
that it's a connectivity error and in the emit_log_hook we can look
for those connectivity error codes or flags to collect the stats about
the failure connections (with MyProcPort being present in
emit_log_hook)? This way, we don't need a new hook. Thoughts?

Regards,
Bharath Rupireddy.




Re: Add function to return backup_label and tablespace_map

2022-07-08 Thread Julien Rouhaud
On Fri, Jul 8, 2022 at 7:42 PM David Steele  wrote:
>
> On 7/7/22 12:43, Fujii Masao wrote:
>
> > Since an exclusive backup method was dropped in v15, in v15 or later, we
> > need to create backup_label and tablespace_map files from the result of
> > pg_backup_stop() when taking a base backup using low level backup API.
> > One issue when doing this is that; there is no simple way to create
> > those files from two columns "labelfile" and "spcmapfile" that
> > pg_backup_stop() returns if we execute it via psql. Probaby we need to
> > store those columns in a temporary file and run some OS commands or
> > script to separate that file into backup_label and tablespace_map.
>
> Why not just select these columns into a temp table:
>
> create temp table backup_result as select * from pg_backup_stop(...);
>
> Then they can be easily dumped with \o by selecting from the temp table.

That wouldn't help people making backups from standby servers.




Re: remove more archiving overhead

2022-07-08 Thread David Steele

On 7/7/22 21:56, Kyotaro Horiguchi wrote:

At Thu, 7 Jul 2022 15:07:16 -0700, Nathan Bossart  
wrote in

Here's an updated patch.


Thinking RFC'ish, the meaning of "may" and "must" is significant in
this description.  On the other hand it uses both "may" and "can" but
I thinkthat their difference is not significant or "can" there is
somewhat confusing.  I think the "can" should be "may" here.


+1.


And since "must" is emphasized, doesn't "may" also needs emphasis?


I think emphasis only on must is fine.

Nathan, I don't see the language about being sure to persist to storage 
here?


Regards,
-David




Re: Add function to return backup_label and tablespace_map

2022-07-08 Thread David Steele

On 7/8/22 07:53, Bharath Rupireddy wrote:

On Fri, Jul 8, 2022 at 5:12 PM David Steele  wrote:



To enable us to do that more easily, how about adding the
pg_backup_label() function that returns backup_label and tablespace_map?
I'm thinking to make this function available just after
pg_backup_start() finishes


This makes me nervous as I'm sure users will immediately start writing
backup_label into PGDATA to make their lives easier. Having backup_label
in PGDATA for a running cluster causes problems and is the major reason
we deprecated and then removed the exclusive method. In addition, what
little protection we had from this condition has been removed.


IIUC, with the new mechanism, we don't need a backup_label file to be
present in the data directory after pg_backup_stop? If yes, where will
the postgres recover from if it crashes after pg_backup_stop before
the next checkpoint? I'm trying to understand the significance of the
backup_label and tablespace_map contents after the removal of
exclusive backup.


backup_label should be written directly into the backup and should be 
present when the backup is restored and before recovery begins. It 
should not be present in a normally operating cluster or it will cause 
problems after crashes and restarts.



Also, do we need the read_backup_label part of the code [1]?


Yes, since the backup_label is required for recovery.

Regards,
-David




Re: ERROR: operator does not exist: json = json

2022-07-08 Thread Dagfinn Ilmari Mannsåker
Erik Rijkers  writes:

> Hi,
>
> Comparison of 2 values of type jsonb is allowed.
>
> Comparison of 2 values of type json gives an error.
>
> That seems like an oversight -- or is it deliberate?

This is because json is just a textual representation, and different
JSON strings can be semantically equal because e.g. whitespace and
object key order is not significant.

> Example:
>
> select '42'::json = '{}'::json;
> --> ERROR:  operator does not exist: json = json
>
> (of course, easily 'solved' by casting but that's not really the
> point)

To do a proper comparison you have to parse it into a semantic form,
which is what casting to jsonb does.

> Thanks,
>
> Erik Rijkers

- ilmari




Re: Add function to return backup_label and tablespace_map

2022-07-08 Thread Bharath Rupireddy
On Fri, Jul 8, 2022 at 5:12 PM David Steele  wrote:
>
> > To enable us to do that more easily, how about adding the
> > pg_backup_label() function that returns backup_label and tablespace_map?
> > I'm thinking to make this function available just after
> > pg_backup_start() finishes
>
> This makes me nervous as I'm sure users will immediately start writing
> backup_label into PGDATA to make their lives easier. Having backup_label
> in PGDATA for a running cluster causes problems and is the major reason
> we deprecated and then removed the exclusive method. In addition, what
> little protection we had from this condition has been removed.

IIUC, with the new mechanism, we don't need a backup_label file to be
present in the data directory after pg_backup_stop? If yes, where will
the postgres recover from if it crashes after pg_backup_stop before
the next checkpoint? I'm trying to understand the significance of the
backup_label and tablespace_map contents after the removal of
exclusive backup.

Also, do we need the read_backup_label part of the code [1]?


[1]
if (read_backup_label(, , ,
  ))
{
List   *tablespaces = NIL;

/*
 * Archive recovery was requested, and thanks to the backup label
 * file, we know how far we need to replay to reach consistency. Enter
 * archive recovery directly.
 */
InArchiveRecovery = true;
if (StandbyModeRequested)
StandbyMode = true;

/*
 * When a backup_label file is present, we want to roll forward from
 * the checkpoint it identifies, rather than using pg_control.
 */

Regards,
Bharath Rupireddy.




Re: Add function to return backup_label and tablespace_map

2022-07-08 Thread David Steele

On 7/7/22 12:43, Fujii Masao wrote:

Since an exclusive backup method was dropped in v15, in v15 or later, we 
need to create backup_label and tablespace_map files from the result of 
pg_backup_stop() when taking a base backup using low level backup API. 
One issue when doing this is that; there is no simple way to create 
those files from two columns "labelfile" and "spcmapfile" that 
pg_backup_stop() returns if we execute it via psql. Probaby we need to 
store those columns in a temporary file and run some OS commands or 
script to separate that file into backup_label and tablespace_map. 


Why not just select these columns into a temp table:

create temp table backup_result as select * from pg_backup_stop(...);

Then they can be easily dumped with \o by selecting from the temp table.

To enable us to do that more easily, how about adding the 
pg_backup_label() function that returns backup_label and tablespace_map? 
I'm thinking to make this function available just after 
pg_backup_start() finishes


This makes me nervous as I'm sure users will immediately start writing 
backup_label into PGDATA to make their lives easier. Having backup_label 
in PGDATA for a running cluster causes problems and is the major reason 
we deprecated and then removed the exclusive method. In addition, what 
little protection we had from this condition has been removed.


Regards,
-David




Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-08 Thread Masahiko Sawada
On Fri, Jul 8, 2022 at 5:59 PM Amit Kapila  wrote:
>
> On Fri, Jul 8, 2022 at 12:46 PM Masahiko Sawada  wrote:
> >
> > On Fri, Jul 8, 2022 at 3:27 PM Amit Kapila  wrote:
> > >
> >
> > > 1.
> > > In ReorderBufferGetCatalogChangesXacts(), isn't it better to use the
> > > list length of 'catchange_txns' to allocate xids array? If we can do
> > > so, then we will save the need to repalloc as well.
> >
> > Since ReorderBufferGetcatalogChangesXacts() collects all ongoing
> > catalog modifying transactions, the length of the array could be
> > bigger than the one taken last time. We can start with the previous
> > length but I think we cannot remove the need for repalloc.
> >
>
> It is using the list "catchange_txns" to form xid array which
> shouldn't change for the duration of
> ReorderBufferGetCatalogChangesXacts(). Then the caller frees the xid
> array after its use. Next time in
> ReorderBufferGetCatalogChangesXacts(), the fresh allocation for xid
> array happens, so not sure why repalloc would be required?

Oops, I mistook catchange_txns for catchange->xcnt. You're right.
Starting with the length of catchange_txns should be sufficient.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Handle infinite recursion in logical replication setup

2022-07-08 Thread Amit Kapila
On Tue, Jul 5, 2022 at 9:33 PM vignesh C  wrote:
>
> Since the existing test is already handling the verification of this
> scenario, I felt no need to add the test. Updated v29 patch removes
> the 0001 patch which had the test case.
>

I am not able to apply 0001.
patching file src/bin/psql/tab-complete.c
Hunk #1 FAILED at 1873.
Hunk #2 FAILED at 3152.

Few comments on 0002
=
1.
+   for interaction

Is there a need for space before / in above? If not, please remove it
with similar extra space from other similar usages.

2.
+ 
+  There is some interaction between the origin
+  parameter and the copy_data parameter. Refer to
+  the CREATE SUBSCRIPTION
+   for interaction
+  details and usage of force for
+  copy_data parameter.
  

I think this is bit long. We can try to reduce this by something like:
Refer  for the usage of
force option and its interaction with the
origin parameter.

Also, adopt the same other places if you agree with the above change.

4.
@@ -601,16 +549,28 @@ GetSubscriptionNotReadyRelations(Oid subid)
  SysScanDesc scan;

  rel = table_open(SubscriptionRelRelationId, AccessShareLock);
-
  ScanKeyInit([nkeys++],
  Anum_pg_subscription_rel_srsubid,

Spurious line removal.

5.
+static void
+check_pub_table_subscribed(WalReceiverConn *wrconn, List *publications,
+CopyData copydata, char *origin, Oid subid)
{
...
...
+ /*
+ * Get the ready relations for the subscription. The subid will be valid
+ * only for ALTER SUBSCRIPTION ... REFRESH because there will be no
+ * relations in ready state while the subscription is created.
+ */
+ if (OidIsValid(subid))
+ subreadyrels = GetSubscriptionRelations(subid, READY_STATE);

Why do we want to consider only READY_STATE relations here? If you see
its caller AlterSubscription_refresh(), we don't consider copying the
relation if it exists on subscribers in any state. If my observation
is correct then you probably don't need to introduce SubRelStateType.

-- 
With Regards,
Amit Kapila.




Re: Support for grabbing multiple consecutive values with nextval()

2022-07-08 Thread Michael Paquier
On Thu, Mar 03, 2022 at 10:21:05AM +0100, Jille Timmermans wrote:
> I'm using https://pkg.go.dev/github.com/jackc/pgx/v4#Conn.CopyFrom, which
> uses the COPY FROM protocol but doesn't actually have to originate from a
> file.

It is Friday here, so I would easily miss something..  It is possible
to use COPY FROM with a list of columns, so assuming that you could
use a default expression with nextval() or just a SERIAL column not
listed in the COPY FROM query to do the job, what do we gain with this
feature?  In which aspect does the preallocation of a range handled
on the client side after being allocated in the backend make things
better?
--
Michael


signature.asc
Description: PGP signature


Re: Implementing Incremental View Maintenance

2022-07-08 Thread Yugo NAGATA
Hi huyajun, 

Thank you for your comments!

On Wed, 29 Jun 2022 17:56:39 +0800
huyajun  wrote:


> Hi, Nagata-san
> I read your patch with v27 version and has some new comments,I want to 
> discuss with you.
> 
> 1. How about use DEPENDENCY_INTERNAL instead of DEPENDENCY_AUTO
>   when record dependence on trigger created by IMV.( related code is in the 
> end of CreateIvmTrigger)
> Otherwise, User can use sql to drop trigger and corrupt IVM, 
> DEPENDENCY_INTERNAL is also semantically more correct
> Crash case like:
>create table t( a int);
>create incremental  materialized view s as select * from t;
>drop trigger "IVM_trigger_”;
>Insert into t values(1);

We use DEPENDENCY_AUTO because we want to delete the triggers when
REFRESH ... WITH NO DATA is performed on the materialized view in order
to disable IVM. Triggers created with DEPENDENCY_INTERNAL cannot be dropped.
Such triggers are re-created when REFRESH ... [WITH DATA] is performed.

We can use DEPENDENCY_INTERNAL if we disable/enable such triggers instead of
dropping/re-creating them, although users also can disable triggers using
ALTER TRIGGER.

> 2. In get_matching_condition_string, Considering NULL values, we can not use 
> simple = operator.
> But how about 'record = record', record_eq treat NULL = NULL
> it should fast than current implementation for only one comparation
> Below is my simple implementation with this, Variables are named arbitrarily..
> I test some cases it’s ok
> 
> static char *
> get_matching_condition_string(List *keys)
> {
> StringInfoData match_cond;
> ListCell*lc;
> 
> /* If there is no key columns, the condition is always true. */
> if (keys == NIL)
> return "true";
> else
> {
> StringInfoData s1;
> StringInfoData s2;
> initStringInfo(_cond);
> initStringInfo();
> initStringInfo();
> /* Considering NULL values, we can not use simple = operator. */
> appendStringInfo(, "ROW(");
> appendStringInfo(, "ROW(");
> foreach (lc, keys)
> {
> Form_pg_attribute attr = (Form_pg_attribute) lfirst(lc);
> char   *resname = NameStr(attr->attname);
> char   *mv_resname = quote_qualified_identifier("mv", resname);
> char   *diff_resname = quote_qualified_identifier("diff", 
> resname);
> 
> appendStringInfo(, "%s", mv_resname);
> appendStringInfo(, "%s", diff_resname);
> 
> if (lnext(lc))
> {
> appendStringInfo(, ", ");
> appendStringInfo(, ", ");
> }
> }
> appendStringInfo(, ")::record");
> appendStringInfo(, ")::record");
> appendStringInfo(_cond, "%s operator(pg_catalog.=) %s", 
> s1.data, s2.data);
> return match_cond.data;
> }
> }

As you say, we don't have to use IS NULL if we use ROW(...)::record, but we
cannot use an index in this case and it makes IVM ineffecient. As showed
bellow (#5), an index works even when we use simple = operations together
with together "IS NULL" operations.
 
> 3. Consider truncate base tables, IVM will not refresh, maybe raise an error 
> will be better

I fixed to support TRUNCATE on base tables in our repository.
https://github.com/sraoss/pgsql-ivm/commit/a1365ed69f34e1adbd160f2ce8fd1e80e032392f

When a base table is truncated, the view content will be empty if the
view definition query does not contain an aggregate without a GROUP clause.
Therefore, such views can be truncated. 

Aggregate views without a GROUP clause always have one row. Therefore,
if a base table is truncated, the view will not be empty and will contain
a row with NULL value (or 0 for count()). So, in this case, we refresh the
view instead of truncating it.

The next version of the patch-set will include this change. 
 
> 4. In IVM_immediate_before,I know Lock base table with ExclusiveLock is 
>   for concurrent updates to the IVM correctly, But how about to Lock it when 
> actually 
>   need  to maintain MV which in IVM_immediate_maintenance
> In this way you don't have to lock multiple times.

Yes, as you say, we don't have to lock the view multiple times.
I'll investigate better locking ways including the way that you suggest.
 
> 5. Why we need CreateIndexOnIMMV, is it a optimize? 
>It seems like when maintenance MV,
>the index may not be used because of our match conditions  can’t use 
> simple = operator

No, the index works even when we use simple = operator together with "IS NULL".
For example:

postgres=# \d mv
   Materialized view "public.mv"
 Column |  Type   | Collation | Nullable | Default 
+-+---+--+-
 id | integer |   |  | 
 v1 | integer |   |  | 
 v2 | integer |   |  | 
Indexes:
"mv_index" UNIQUE, btree (id) NULLS NOT DISTINCT

postgres=# EXPLAIN ANALYZE
 WITH 

Re: Add function to return backup_label and tablespace_map

2022-07-08 Thread Bharath Rupireddy
On Fri, Jul 8, 2022 at 3:31 PM Bharath Rupireddy
 wrote:
>
> On Thu, Jul 7, 2022 at 10:14 PM Fujii Masao  
> wrote:
> >
> > Hi,
> >
> > Since an exclusive backup method was dropped in v15, in v15 or later, we 
> > need to create backup_label and tablespace_map files from the result of 
> > pg_backup_stop() when taking a base backup using low level backup API. One 
> > issue when doing this is that; there is no simple way to create those files 
> > from two columns "labelfile" and "spcmapfile" that pg_backup_stop() returns 
> > if we execute it via psql. Probaby we need to store those columns in a 
> > temporary file and run some OS commands or script to separate that file 
> > into backup_label and tablespace_map. This is not simple way, and which 
> > would prevent users from migrating their backup scripts using psql from an 
> > exclusive backup method to non-exclusive one, I'm afraid.
> >
> > To enable us to do that more easily, how about adding the pg_backup_label() 
> > function that returns backup_label and tablespace_map? I'm thinking to make 
> > this function available just after pg_backup_start() finishes, also even 
> > after pg_backup_stop() finishes. For example, this function allows us to 
> > take a backup using the following psql script file.
> >
> > --
> > SELECT * FROM pg_backup_start('test');
> > \! cp -a $PGDATA /backup
> > SELECT * FROM pg_backup_stop();
> >
> > \pset tuples_only on
> > \pset format unaligned
> >
> > \o /backup/data/backup_label
> > SELECT labelfile FROM pg_backup_label();
> >
> > \o /backup/data/tablespace_map
> > SELECT spcmapfile FROM pg_backup_label();
> > --
> >
> > Attached is the WIP patch to add pg_backup_label function. No tests nor 
> > docs have been added yet, but if we can successfully reach the consensus 
> > for adding the function, I will update the patch.
> >
> > Thought?
>
> +1 for making it easy for the user to create backup_label and
> tablespace_map files. With the patch, the label_file and
> tblspc_map_file contents are preserved until the lifecycle of the
> session or the next run of pg_backup_start, I'm not sure if we need to
> worry more about it.
>
> Why can't we have functions like pg_create_backup_label() and
> pg_create_tablespace_map() which create the 'backup_label' and
> 'tablespace_map' files respectively in the data directory and also
> return the contents as output columns?
>
> Also, we can let users run these create functions only once (perhaps
> after the pg_backup_stop is called which is when the contents will be
> consistent). If we allow these functions to read the label_file or
> tblspc_map_file contents during the backup before stop backup, they
> may not be consistent. We can have a new sessionBackupState something
> like SESSION_BACKUP_READY_TO_COLLECT_INFO or SESSION_BACKUP_DONE and
> after the new function calls sessionBackupState  goes to
> SESSION_BACKUP_NONE) and the contents of label_file and
> tblspc_map_file are freed up.
>
> In the docs, it's good if we can clearly specify the steps to use all
> of these functions.

Forgot to mention a comment on the v1 patch: we'll need to revoke
permissions from the public for pg_backup_label (or whatever the new
function(s) that'll be introduced) as well similar to pg_backup_start
and pg_backup_stop.

Regards,
Bharath Rupireddy.




Re: Add function to return backup_label and tablespace_map

2022-07-08 Thread Bharath Rupireddy
On Thu, Jul 7, 2022 at 10:14 PM Fujii Masao  wrote:
>
> Hi,
>
> Since an exclusive backup method was dropped in v15, in v15 or later, we need 
> to create backup_label and tablespace_map files from the result of 
> pg_backup_stop() when taking a base backup using low level backup API. One 
> issue when doing this is that; there is no simple way to create those files 
> from two columns "labelfile" and "spcmapfile" that pg_backup_stop() returns 
> if we execute it via psql. Probaby we need to store those columns in a 
> temporary file and run some OS commands or script to separate that file into 
> backup_label and tablespace_map. This is not simple way, and which would 
> prevent users from migrating their backup scripts using psql from an 
> exclusive backup method to non-exclusive one, I'm afraid.
>
> To enable us to do that more easily, how about adding the pg_backup_label() 
> function that returns backup_label and tablespace_map? I'm thinking to make 
> this function available just after pg_backup_start() finishes, also even 
> after pg_backup_stop() finishes. For example, this function allows us to take 
> a backup using the following psql script file.
>
> --
> SELECT * FROM pg_backup_start('test');
> \! cp -a $PGDATA /backup
> SELECT * FROM pg_backup_stop();
>
> \pset tuples_only on
> \pset format unaligned
>
> \o /backup/data/backup_label
> SELECT labelfile FROM pg_backup_label();
>
> \o /backup/data/tablespace_map
> SELECT spcmapfile FROM pg_backup_label();
> --
>
> Attached is the WIP patch to add pg_backup_label function. No tests nor docs 
> have been added yet, but if we can successfully reach the consensus for 
> adding the function, I will update the patch.
>
> Thought?

+1 for making it easy for the user to create backup_label and
tablespace_map files. With the patch, the label_file and
tblspc_map_file contents are preserved until the lifecycle of the
session or the next run of pg_backup_start, I'm not sure if we need to
worry more about it.

Why can't we have functions like pg_create_backup_label() and
pg_create_tablespace_map() which create the 'backup_label' and
'tablespace_map' files respectively in the data directory and also
return the contents as output columns?

Also, we can let users run these create functions only once (perhaps
after the pg_backup_stop is called which is when the contents will be
consistent). If we allow these functions to read the label_file or
tblspc_map_file contents during the backup before stop backup, they
may not be consistent. We can have a new sessionBackupState something
like SESSION_BACKUP_READY_TO_COLLECT_INFO or SESSION_BACKUP_DONE and
after the new function calls sessionBackupState  goes to
SESSION_BACKUP_NONE) and the contents of label_file and
tblspc_map_file are freed up.

In the docs, it's good if we can clearly specify the steps to use all
of these functions.

Regards,
Bharath Rupireddy.




Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-08 Thread Amit Kapila
On Fri, Jul 8, 2022 at 12:46 PM Masahiko Sawada  wrote:
>
> On Fri, Jul 8, 2022 at 3:27 PM Amit Kapila  wrote:
> >
>
> > 1.
> > In ReorderBufferGetCatalogChangesXacts(), isn't it better to use the
> > list length of 'catchange_txns' to allocate xids array? If we can do
> > so, then we will save the need to repalloc as well.
>
> Since ReorderBufferGetcatalogChangesXacts() collects all ongoing
> catalog modifying transactions, the length of the array could be
> bigger than the one taken last time. We can start with the previous
> length but I think we cannot remove the need for repalloc.
>

It is using the list "catchange_txns" to form xid array which
shouldn't change for the duration of
ReorderBufferGetCatalogChangesXacts(). Then the caller frees the xid
array after its use. Next time in
ReorderBufferGetCatalogChangesXacts(), the fresh allocation for xid
array happens, so not sure why repalloc would be required?

-- 
With Regards,
Amit Kapila.




Re: Support for grabbing multiple consecutive values with nextval()

2022-07-08 Thread Ronan Dunklau
Hello,

Reading the thread, I think the feature has value: it would basically transfer 
control of the sequence cache to the client application.

However, I don't think that returning only the last value is a sensible thing 
to do. The client will need to know the details of the sequence to do anything 
useful about this, especially it's increment, minvalue, maxvalue and cycle 
options. 

As suggested upthread, returning a resultset would probably be better. If the 
client application is concerned about the volume of data exchanged with the 
server, and is willing to deal with handling the knowledge of the sequence 
details themselves, they can always wrap it in an aggregate:

SELECT min(newvals), max(newvals)   FROM nextvals() as newvals

Regards,

--
Ronan Dunklau






Re: Support TRUNCATE triggers on foreign tables

2022-07-08 Thread Ian Lawrence Barwick
2022年7月8日(金) 17:10 Yugo NAGATA :
>
> On Fri, 8 Jul 2022 16:50:10 +0900
> Ian Lawrence Barwick  wrote:
>
> > 2022年7月8日(金) 14:06 Fujii Masao :
> > > On 2022/07/08 11:19, Yugo NAGATA wrote:
> > > >> You added "foreign tables" for BEFORE statement-level trigger as the 
> > > >> above, but ISTM that you also needs to do that for AFTER 
> > > >> statement-level trigger. No?
> > > >
> > > > Oops, I forgot it. I attached the updated patch.
> > >
> > > Thanks for updating the patch! LGTM.
> > > Barring any objection, I will commit the patch.
> >
> > An observation: as-is the patch would make it possible to create a truncate
> > trigger for a foreign table whose FDW doesn't support truncation, which 
> > seems
> > somewhat pointless, possible source of confusion etc.:
> >
> > postgres=# CREATE TRIGGER ft_trigger
> >   AFTER TRUNCATE ON fb_foo
> >   EXECUTE FUNCTION fb_foo_trg();
> > CREATE TRIGGER
> >
> > postgres=# TRUNCATE fb_foo;
> > ERROR:  cannot truncate foreign table "fb_foo"
> >
> > It would be easy enough to check for this, e.g.:
> >
> > else if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
> > {
> > FdwRoutine *fdwroutine = GetFdwRoutineForRelation(rel, false);
> >
> > if (!fdwroutine->ExecForeignTruncate)
> > ereport(ERROR,
> > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> >  errmsg("foreign data wrapper does not support
> > table truncation")));
> > ...
> >
> > which results in:
> >
> > postgres=# CREATE TRIGGER ft_trigger
> >   AFTER TRUNCATE ON fb_foo
> >   EXECUTE FUNCTION fb_foo_trg();
> > ERROR:  foreign data wrapper does not support table truncation
> >
> > which IMO is preferable to silently accepting DDL which will never
> > actually do anything.
>
> At beginning, I also thought such check would be necessary, but I noticed that
> it is already possible to create insert/delete/update triggers for a foreign
> table whose FDW doesn't support such operations. So, I discarded this idea 
> from
> the proposed patch for consistency.
>
> If we want to add such prevention, we will need similar checks for
> INSERT/DELETE/UPDATE not only TRUNCATE. However, I think such fix is 
> independent
> from this and it can be proposed as another patch.

Ah OK, makes sense from that point of view. Thanks for the clarification!

Regards

Ian Barwick




Re: Support TRUNCATE triggers on foreign tables

2022-07-08 Thread Yugo NAGATA
On Fri, 8 Jul 2022 16:50:10 +0900
Ian Lawrence Barwick  wrote:

> 2022年7月8日(金) 14:06 Fujii Masao :
> > On 2022/07/08 11:19, Yugo NAGATA wrote:
> > >> You added "foreign tables" for BEFORE statement-level trigger as the 
> > >> above, but ISTM that you also needs to do that for AFTER statement-level 
> > >> trigger. No?
> > >
> > > Oops, I forgot it. I attached the updated patch.
> >
> > Thanks for updating the patch! LGTM.
> > Barring any objection, I will commit the patch.
> 
> An observation: as-is the patch would make it possible to create a truncate
> trigger for a foreign table whose FDW doesn't support truncation, which seems
> somewhat pointless, possible source of confusion etc.:
> 
> postgres=# CREATE TRIGGER ft_trigger
>   AFTER TRUNCATE ON fb_foo
>   EXECUTE FUNCTION fb_foo_trg();
> CREATE TRIGGER
> 
> postgres=# TRUNCATE fb_foo;
> ERROR:  cannot truncate foreign table "fb_foo"
> 
> It would be easy enough to check for this, e.g.:
> 
> else if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
> {
> FdwRoutine *fdwroutine = GetFdwRoutineForRelation(rel, false);
> 
> if (!fdwroutine->ExecForeignTruncate)
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>  errmsg("foreign data wrapper does not support
> table truncation")));
> ...
> 
> which results in:
> 
> postgres=# CREATE TRIGGER ft_trigger
>   AFTER TRUNCATE ON fb_foo
>   EXECUTE FUNCTION fb_foo_trg();
> ERROR:  foreign data wrapper does not support table truncation
> 
> which IMO is preferable to silently accepting DDL which will never
> actually do anything.

At beginning, I also thought such check would be necessary, but I noticed that
it is already possible to create insert/delete/update triggers for a foreign
table whose FDW doesn't support such operations. So, I discarded this idea from
the proposed patch for consistency. 

If we want to add such prevention, we will need similar checks for
INSERT/DELETE/UPDATE not only TRUNCATE. However, I think such fix is independent
from this and it can be proposed as another patch.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: [PATCH] Optimize json_lex_string by batching character copying

2022-07-08 Thread John Naylor
I've pushed 0001 (although the email seems to have been swallowed
again), and pending additional comments on 0002 and 0003 I'll squash
and push those next week. 0004 needs some thought on integrating with
symbols we discover during configure.

--
John Naylor
EDB: http://www.enterprisedb.com




Re: Add a test for "cannot truncate foreign table"

2022-07-08 Thread Etsuro Fujita
On Fri, Jul 8, 2022 at 11:07 AM Yugo NAGATA  wrote:
> On Fri, 08 Jul 2022 09:44:10 +0900 (JST)
> Kyotaro Horiguchi  wrote:
> > At Fri, 8 Jul 2022 01:06:18 +0900, Fujii Masao 
> >  wrote in
> > > On 2022/07/08 0:33, Tom Lane wrote:

> > > >> On 2022/06/30 10:48, Yugo NAGATA wrote:
> > > >>> When a foreign table has handler but doesn't support TRUNCATE,
> > > >>> an error "cannot truncate foreign table xxx" occurs. So, what
> > > >>> about adding a test this message output? We can add this test
> > > >>> for file_fdw because it is one of the such foreign data wrappers.

> > > > This seems like a fairly pointless expenditure of test cycles
> > > > to me.  Perhaps more importantly, what will you do when
> > > > somebody adds truncate support to that FDW?

> > As Fujii-san mentioned below, file_fdw has tests for INSERT/UPDATE and
> > DELETE.  If somebody added DELETE to file_fdw, the test for DELETE
> > rejection would be turned into a normal test of the DELETE function.
> > I don't see a difference between TRUNCATE and other updating commands
> > from this point of view.

I agree on this point.

> > > One idea is to create dummy FDW (like foreign_data.sql regression test
> > > does) not supporting TRUNCATE and use it for the test.

> If we want to test foreign table modifications for the FDW framework,
> we will have to add such tests in foreign_data.sql, because foreign
> table modifications are tested only for postgres_fdw and file_fdw.

Rather than doing so, I'd vote for adding a test case to file_fdw, as
proposed in the patch, because that would be much simpler and much
less expensive.

Best regards,
Etsuro Fujita




Re: Support TRUNCATE triggers on foreign tables

2022-07-08 Thread Ian Lawrence Barwick
2022年7月8日(金) 14:06 Fujii Masao :
> On 2022/07/08 11:19, Yugo NAGATA wrote:
> >> You added "foreign tables" for BEFORE statement-level trigger as the 
> >> above, but ISTM that you also needs to do that for AFTER statement-level 
> >> trigger. No?
> >
> > Oops, I forgot it. I attached the updated patch.
>
> Thanks for updating the patch! LGTM.
> Barring any objection, I will commit the patch.

An observation: as-is the patch would make it possible to create a truncate
trigger for a foreign table whose FDW doesn't support truncation, which seems
somewhat pointless, possible source of confusion etc.:

postgres=# CREATE TRIGGER ft_trigger
  AFTER TRUNCATE ON fb_foo
  EXECUTE FUNCTION fb_foo_trg();
CREATE TRIGGER

postgres=# TRUNCATE fb_foo;
ERROR:  cannot truncate foreign table "fb_foo"

It would be easy enough to check for this, e.g.:

else if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
{
FdwRoutine *fdwroutine = GetFdwRoutineForRelation(rel, false);

if (!fdwroutine->ExecForeignTruncate)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("foreign data wrapper does not support
table truncation")));
...

which results in:

postgres=# CREATE TRIGGER ft_trigger
  AFTER TRUNCATE ON fb_foo
  EXECUTE FUNCTION fb_foo_trg();
ERROR:  foreign data wrapper does not support table truncation

which IMO is preferable to silently accepting DDL which will never
actually do anything.


Regards

Ian Barwick




Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-08 Thread Masahiko Sawada
On Fri, Jul 8, 2022 at 3:27 PM Amit Kapila  wrote:
>
> On Fri, Jul 8, 2022 at 6:45 AM Masahiko Sawada  wrote:
> >
> > On Thu, Jul 7, 2022 at 3:40 PM Amit Kapila  wrote:
> > >
> > > On Thu, Jul 7, 2022 at 8:21 AM Masahiko Sawada  
> > > wrote:
> >
> > I've attached the new version patch that incorporates the comments and
> > the optimizations discussed above.
> >
>
> Thanks, few minor comments:

Thank you for the comments.

> 1.
> In ReorderBufferGetCatalogChangesXacts(), isn't it better to use the
> list length of 'catchange_txns' to allocate xids array? If we can do
> so, then we will save the need to repalloc as well.

Since ReorderBufferGetcatalogChangesXacts() collects all ongoing
catalog modifying transactions, the length of the array could be
bigger than the one taken last time. We can start with the previous
length but I think we cannot remove the need for repalloc.

> 2.
> /* ->committed manipulation */
> static void SnapBuildPurgeCommittedTxn(SnapBuild *builder);
>
> The above comment also needs to be changed.
>
> 3. As SnapBuildPurgeCommittedTxn() removes xacts both from committed
> and catchange arrays, the function name no more remains appropriate.
> We can either rename to something like SnapBuildPurgeOlderTxn() or
> move the catchange logic to a different function and call it from
> SnapBuildProcessRunningXacts.
>
> 4.
> + if (TransactionIdEquals(builder->catchange.xip[off],
> + builder->xmin) ||
> + NormalTransactionIdFollows(builder->catchange.xip[off],
> +builder->xmin))
>
> Can we use TransactionIdFollowsOrEquals() instead of above?
>
> 5. Comment change suggestion:
> /*
>   * Remove knowledge about transactions we treat as committed or
> containing catalog
>   * changes that are smaller than ->xmin. Those won't ever get checked via
> - * the ->committed array and ->catchange, respectively. The committed xids 
> will
> - * get checked via the clog machinery.
> + * the ->committed or ->catchange array, respectively. The committed xids 
> will
> + * get checked via the clog machinery. We can ideally remove the transaction
> + * from catchange array once it is finished (committed/aborted) but that 
> could
> + * be costly as we need to maintain the xids order in the array.
>   */
>

Agreed with the above comments.

> Apart from the above, I think there are pending comments for the
> back-branch patch and some performance testing of this work.

Right. I'll incorporate all comments I got so far into these patches
and submit them. Also, will do some benchmark tests.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Allow file inclusion in pg_hba and pg_ident files

2022-07-08 Thread Julien Rouhaud
Hi,

On Thu, Jun 02, 2022 at 10:08:15AM +0900, Michael Paquier wrote:
> On Thu, May 26, 2022 at 03:26:57PM +0800, Julien Rouhaud wrote:
>
> > After a bit more digging, I think that this comes from the fact that 
> > there's no
> > "official" name for this file.  Even the documentation just says "the
> > pg_hba.conf file" [1].  So using pg_hba.conf can either means explicitly
> > $PGDATA/pg_hba.conf or the instance's HBA file in general, whatever its
> > location.
> >
> > I think it would be good to improve this, including in the doc, but I'm
> > assuming it's entirely for HEAD only, including the error messages?
>
> Yes, that would be a set of changes only for HEAD, once 16~ opens for
> business.  FWIW, the acronym "HBA" is defined as "Host-Based
> Authentication", so we could use that as a base for the description of
> the file, using simply HBA in the follow-up paragraphs for simplicity,
> telling that pg_hba.conf is the default.

Ok.

> > If so, should I also change the doc to replace "pg_hba.conf" with something
> > else when it's not referring to the file default name?
> >
> > I'm thinking of using "HBA file" to replace pg_hba.conf, and using
> > "authentication file" when it can be either the "HBA file" and the "User 
> > Name
> > Maps file", would that be ok?
>
> Using "HBA file" in the docs is fine by me, knowing that the acronym
> is already defined.  The modified parts of the docs should perhaps
> mention once something like "Host-Based Authentication file (or HBA
> file)" for clarity.  For the error message, I think that we tend to
> avoid those acronyms, don't we?

I don't have an extensive knowledge of all the user facing error messages, but
after a quick grep I see multiple usage of OID, PID, GIN and other defined
acronyms.  I also see multiple occurrences of "only heap AM is supported",
while AM isn't even a defined acronym.

It doesn't seem that my proposal would be inconsistent with existing messages
and will help to reduce the message length, but if you prefer to keep the full
name I'm fine with it.  Those should be very rare and specialized errors
anyway.

While on the bikeshedding part, are you ok with the proposed keywords (include
and include_dir), behaving exactly like for postgresql.conf, and to also add
include_if_exists, so that we have the exact same possibilities with
postgresql.conf, pg_hba.conf and pg_ident.conf?




  1   2   >