Re: Time to drop plpython2?

2022-02-18 Thread Michael Paquier
On Thu, Feb 17, 2022 at 10:08:55AM -0800, Andres Freund wrote:
> On 2022-02-16 23:14:46 -0800, Andres Freund wrote:
>> I've pinged the owners of the animals failing so far:
>> - myna, butterflyfish
> 
> Fixed, as noted by Micheal on this thread.

Fixed is an incorrect word here, "temporarily bypassed" fits better :)

Unfortunately, I had to remove --with-python on both animals for the
time being, as I was not able to figure out why Python.h could not be
found in those installations, and it was Friday afternoon.  I'll try
to investigate and re-enable that some time next week.
--
Michael


signature.asc
Description: PGP signature


Re: improve --with-lz4 install documentation

2022-02-18 Thread Michael Paquier
On Fri, Feb 18, 2022 at 04:49:48PM +0900, Michael Paquier wrote:
> With everything that has been merged recently based on LZ4, it makes
> sense to me to simplify things as you are suggesting.
> 
> install-windows.sgml includes the same description, so we'd better do
> the same thing and simplify the paragraph of LZ4 there as well, no?

I have checked the docs for any other areas that could be changed, and
did not notice anything else, so applied.  Thanks, Jeevan.
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Add support for building with ZSTD.

2022-02-18 Thread Michael Paquier
Hi,

On Fri, Feb 18, 2022 at 06:53:10PM +, Robert Haas wrote:
> Add support for building with ZSTD.
> 
> This commit doesn't actually add anything that uses ZSTD; that will be
> done separately. It just puts the basic infrastructure into place.
> 
> Jeevan Ladhe, Robert Haas, and Michael Paquier. Reviewed by Justin
> Pryzby and Andres Freund.

I completely forgot that the section of the docs dedicated to the TAP
tests with MSVC also needs a refresh to mention the new variable ZSTD,
as of the attached.  Do you mind if I apply that?

Thanks,
--
Michael
diff --git a/doc/src/sgml/install-windows.sgml b/doc/src/sgml/install-windows.sgml
index 98fa6962f6..91b1410e4a 100644
--- a/doc/src/sgml/install-windows.sgml
+++ b/doc/src/sgml/install-windows.sgml
@@ -557,6 +557,15 @@ $ENV{PROVE_TESTS}='t/020*.pl t/010*.pl'
   PATH.
  
 
+
+
+ ZSTD
+ 
+  Path to a zstd command. The default is
+  zstd, that would be the command found in
+  PATH.
+ 
+

   
  


signature.asc
Description: PGP signature


Re: MultiXact\SLRU buffers configuration

2022-02-18 Thread Andrey Borodin



> 8 апр. 2021 г., в 17:22, Thomas Munro  написал(а):
> 
> Thanks!  I chickened out of committing a buffer replacement algorithm
> patch written 11 hours before the feature freeze, but I also didn't
> really want to commit the GUC patch without that.  Ahh, if only we'd
> latched onto the real problems here just a little sooner, but there is
> always PostgreSQL 15, I heard it's going to be amazing.  Moved to next
> CF.

Hi Thomas!

There's feature freeze approaching again. I see that you are working on moving 
SLRUs to buffer pools, but it is not clear to which PG version it will land. 
And there is 100% consensus that first patch is useful and helps to prevent big 
issues. Maybe let's commit 1'st step without lifting default xact_buffers 
limit? Or 1st patch as-is with any simple technique that prevents linear search 
in SLRU buffers.

Best regards, Andrey Borodin.



Re: logical decoding and replication of sequences

2022-02-18 Thread Amit Kapila
On Sat, Feb 19, 2022 at 7:48 AM Tomas Vondra
 wrote:
>
> Do we need to handle both FOR ALL TABLES and FOR ALL SEQUENCES at the
> same time? That seems like a reasonable thing people might want.
>

+1. That seems reasonable to me as well. I think the same will apply
to  FOR ALL TABLES IN SCHEMA and FOR ALL SEQUENCES IN SCHEMA.

-- 
With Regards,
Amit Kapila.




Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-02-18 Thread Julien Rouhaud
Hi,

On Fri, Feb 18, 2022 at 08:07:05PM +0530, Nitin Jadhav wrote:
> 
> The backend_pid contains a valid value only during
> the CHECKPOINT command issued by the backend explicitly, otherwise the
> value will be 0.  We may have to add an additional field to
> 'CheckpointerShmemStruct' to hold the backend pid. The backend
> requesting the checkpoint will update its pid to this structure.
> Kindly let me know if you still feel the backend_pid field is not
> necessary.

There are more scenarios where you can have a baackend requesting a checkpoint
and waiting for its completion, and there may be more than one backend
concerned, so I don't think that storing only one / the first backend pid is
ok.

> > And also while looking at the patch I see there's the same problem that I
> > mentioned in the previous thread, which is that the effective flags can be
> > updated once the checkpoint started, and as-is the view won't reflect that. 
> >  It
> > also means that you can't simply display one of wal, time or force but a
> > possible combination of the flags (including the one not handled in v1).
> 
> If I understand the above comment properly, it has 2 points. First is
> to display the combination of flags rather than just displaying wal,
> time or force - The idea behind this is to just let the user know the
> reason for checkpointing. That is, the checkpoint is started because
> max_wal_size is reached or checkpoint_timeout expired or explicitly
> issued CHECKPOINT command. The other flags like CHECKPOINT_IMMEDIATE,
> CHECKPOINT_WAIT or CHECKPOINT_FLUSH_ALL indicate how the checkpoint
> has to be performed. Hence I have not included those in the view.  If
> it is really required, I would like to modify the code to include
> other flags and display the combination.

I think all the information should be exposed.  Only knowing why the current
checkpoint has been triggered without any further information seems a bit
useless.  Think for instance for cases like [1].

> Second point is to reflect
> the updated flags in the view. AFAIK, there is a possibility that the
> flags get updated during the on-going checkpoint but the reason for
> checkpoint (wal, time or force) will remain same for the current
> checkpoint. There might be a change in how checkpoint has to be
> performed if CHECKPOINT_IMMEDIATE flag is set. If we go with
> displaying the combination of flags in the view, then probably we may
> have to reflect this in the view.

You can only "upgrade" a checkpoint, but not "downgrade" it.  So if for
instance you find both CHECKPOINT_CAUSE_TIME and CHECKPOINT_FORCE (which is
possible) you can easily know which one was the one that triggered the
checkpoint and which one was added later.

> > > Probably a new field named 'processes_wiating' or 'events_waiting' can be
> > > added for this purpose.
> >
> > Maybe num_process_waiting?
> 
> I feel 'processes_wiating' aligns more with the naming conventions of
> the fields of the existing progres views.

There's at least pg_stat_progress_vacuum.num_dead_tuples.  Anyway I don't have
a strong opinion on it, just make sure to correct the typo.

> > > Probably writing of buffers or syncing files may complete before
> > > pg_is_in_recovery() returns false. But there are some cleanup
> > > operations happen as part of the checkpoint. During this scenario, we
> > > may get false value for pg_is_in_recovery(). Please refer following
> > > piece of code which is present in CreateRestartpoint().
> > >
> > > if (!RecoveryInProgress())
> > > replayTLI = XLogCtl->InsertTimeLineID;
> >
> > Then maybe we could store the timeline rather then then kind of checkpoint?
> > You should still be able to compute the information while giving a bit more
> > information for the same memory usage.
> 
> Can you please describe more about how checkpoint/restartpoint can be
> confirmed using the timeline id.

If pg_is_in_recovery() is true, then it's a restartpoint, otherwise it's a
restartpoint if the checkpoint's timeline is different from the current
timeline?

[1] 
https://www.postgresql.org/message-id/1486805889.24568.96.camel%40credativ.de




Re: Fix formatting of Interval output

2022-02-18 Thread Tom Lane
Joseph Koshakow  writes:
> When formatting the output of an Interval, we call abs() on the hours
> field of the Interval. Calling abs(INT_MIN) returns back INT_MIN
> causing the output to contain two '-' characters. The attached patch
> fixes that issue by special casing INT_MIN hours.

Good catch, but it looks to me like three out of the four formats in
EncodeInterval have variants of this problem --- there are assumptions
throughout that code that we can compute "-x" or "abs(x)" without
fear.  Not much point in fixing only one symptom.

Also, I notice that there's an overflow hazard upstream of here,
in interval2tm:

regression=# select interval '214748364 hours' * 11;
ERROR:  interval out of range
regression=# \errverbose 
ERROR:  22008: interval out of range
LOCATION:  interval2tm, timestamp.c:1982

There's no good excuse for not being able to print a value that
we computed successfully.

I wonder if the most reasonable fix would be to start using int64
instead of int arithmetic for the values that are potentially large.
I doubt that we'd be taking much of a performance hit on modern
hardware.

regards, tom lane




Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-02-18 Thread Ashutosh Sharma
On Sat, Feb 19, 2022 at 2:24 AM Nathan Bossart  wrote:
>
> On Fri, Feb 18, 2022 at 10:48:10PM +0530, Ashutosh Sharma wrote:
> > Some review comments on the latest version:
>
> Thanks for the feedback!  Before I start spending more time on this one, I
> should probably ask if this has any chance of making it into v15.

I don't see any reason why it can't make it to v15. However, it is not
super urgent as the users facing this problem have a choice. They can
use non-exclusive mode.

--
With Regards,
Ashutosh Sharma.




Re: Make mesage at end-of-recovery less scary.

2022-02-18 Thread Ashutosh Sharma
On Thu, Feb 17, 2022 at 1:20 PM Kyotaro Horiguchi
 wrote:
>
> At Tue, 15 Feb 2022 20:17:20 +0530, Ashutosh Sharma  
> wrote in
> > OK. The v13 patch looks good. I have marked it as ready to commit.
> > Thank you for working on all my review comments.
>
> Thaks! But the recent xlog.c refactoring crashes into this patch.
> And I found a silly bug while rebasing.
>
> xlog.c:12463 / xlogrecovery.c:3168
> if (!WaitForWALToBecomeAvailable(targetPagePtr + reqLen,
> ..
> {
> +   Assert(!StandbyMode);
> ...
> +   xlogreader->EndOfWAL = true;
>
> Yeah, I forgot about promotion there..

Yes, we exit WaitForWALToBecomeAvailable() even in standby mode
provided the user has requested for the promotion. So checking for the
!StandbyMode condition alone was not enough.

So what I should have done is
> setting EndOfWAL according to StandbyMode.
>
> +   Assert(!StandbyMode || CheckForStandbyTrigger());
> ...
> +   /* promotion exit is not end-of-WAL */
> +   xlogreader->EndOfWAL = !StandbyMode;
>

The changes looks good. thanks.!

--
With Regards,
Ashutosh Sharma.




Re: Timeout control within tests

2022-02-18 Thread Noah Misch
On Fri, Feb 18, 2022 at 10:26:52AM -0500, Tom Lane wrote:
> Noah Misch  writes:
> > On Thu, Feb 17, 2022 at 09:48:25PM -0800, Andres Freund wrote:
> >> Meson's test runner has the concept of a "timeout multiplier" for ways of
> >> running tests. Meson's stuff is about entire tests (i.e. one tap test), so
> >> doesn't apply here, but I wonder if we shouldn't do something similar?
> 
> > Hmmm.  It is good if the user can express an intent that continues to make
> > sense if we change the default timeout.  For the buildfarm use case, a
> > multiplier is moderately better on that axis (PG_TEST_TIMEOUT_MULTIPLIER=100
> > beats PG_TEST_TIMEOUT_DEFAULT=18000).  For the hacker use case, an absolute
> > value is substantially better on that axis (PG_TEST_TIMEOUT_DEFAULT=3 beats
> > PG_TEST_TIMEOUT_MULTIPLIER=.01).
> 
> FWIW, I'm fairly sure that PGISOLATIONTIMEOUT=300 was selected after
> finding that smaller values didn't work reliably in the buildfarm.
> Now maybe 741d7f1 fixed that, but I wouldn't count on it.  So while I
> approve of the idea to remove PGISOLATIONTIMEOUT in favor of using this
> centralized setting, I think that we might need to have a multiplier
> there, or else we'll end up with PG_TEST_TIMEOUT_DEFAULT set to 300
> across the board.  Perhaps the latter is fine, but a multiplier seems a
> bit more flexible.

The PGISOLATIONTIMEOUT replacement was 2*timeout_default, so isolation suites
would get 2*180s=360s.  (I don't want to lower any default timeouts, but I
don't mind raising them.)  In a sense, PG_TEST_TIMEOUT_DEFAULT is a multiplier
with as many sites as possible multiplying it by 1.  The patch has multiples
at two code sites.

> On the other hand, I also support your point that an absolute setting
> is easier to think about / adjust for special uses.  So maybe we should
> just KISS and use a single absolute setting until we find a hard reason
> why that doesn't work well.




Re: logical decoding and replication of sequences

2022-02-18 Thread Tomas Vondra
On 2/15/22 10:00, Peter Eisentraut wrote:
> On 13.02.22 14:10, Tomas Vondra wrote:
>> Here's the remaining part, rebased, with a small tweak in the TAP test
>> to eliminate the issue with not waiting for sequence increments. I've
>> kept the tweak in a separate patch, so that we can throw it away easily
>> if we happen to resolve the issue.
> 
> This basically looks fine to me.  You have identified a few XXX and
> FIXME spots that should be addressed.
> 
> Here are a few more comments:
> 
> * general
> 
> Handling of owned sequences, as previously discussed.  This would
> probably be a localized change somewhere in get_rel_sync_entry(), so it
> doesn't affect the overall structure of the patch.
> 

So you're suggesting not to track owned sequences in pg_publication_rel
explicitly, and handle them dynamically in output plugin? So when
calling get_rel_sync_entry on the sequence, we'd check if it's owned by
a table that is replicated.

We'd want a way to enable/disable this for each publication, but that
makes the lookups more complicated. Also, we'd probably need the same
logic elsewhere (e.g. in psql, when listing sequences in a publication).

I'm not sure we want this complexity, maybe we should simply deal with
this in the ALTER PUBLICATION and track all sequences (owned or not) in
pg_publication_rel.

> pg_dump support is missing.
> 

Will fix.

> Some psql \dxxx support should probably be there.  Check where existing
> publication-table relationships are displayed.
> 

Yeah, I noticed this while adding regression tests. Currently, \dRp+
shows something like this:

 Publication testpub_fortbl
Owner   | All tables | Inserts | Updates  ...
  --++-+- ...
   regress_publication_user | f  | t   | t...
  Tables:
  "pub_test.testpub_nopk"
  "public.testpub_tbl1"

or

 Publication testpub5_forschema
Owner   | All tables | Inserts | Updates |  ...
  --++-+-+- ...
   regress_publication_user | f  | t   | t   |  ...
  Tables from schemas:
  "CURRENT_SCHEMA"
  "public"

I wonder if we should copy the same block for sequences, so

 Publication testpub_fortbl
Owner   | All tables | Inserts | Updates  ...
  --++-+- ...
   regress_publication_user | f  | t   | t...
  Tables:
  "pub_test.testpub_nopk"
  "public.testpub_tbl1"
  Sequences:
  "pub_test.sequence_s1"
  "public.sequence_s2"

And same for "tables from schemas" etc.


> * src/backend/catalog/system_views.sql
> 
> + LATERAL pg_get_publication_sequences(P.pubname) GPT,
> 
> The GPT presumably stood for "get publication tables", so should be
> changed.
> 
> (There might be a few more copy-and-paste occasions like this around.  I
> have not written down all of them.)
> 

Will fix.

> * src/backend/commands/publicationcmds.c
> 
> This adds a new publication option called "sequence".  I would have
> expected it to be named "sequences", but that's just my opinion.
> 
> But in any case, the option is not documented at all.
> 
> Some of the new functions added in this file are almost exact duplicates
> of the analogous functions for tables.  For example,
> PublicationAddSequences()/PublicationDropSequences() are almost
> exactly the same as PublicationAddTables()/PublicationDropTables(). This
> could be handled with less duplication by just adding an ObjectType
> argument to the existing functions.
> 

Yes, I noticed that too, and I'll review this later, after making sure
the behavior is correct.

> * src/backend/commands/sequence.c
> 
> Could use some refactoring of ResetSequence()/ResetSequence2().  Maybe
> call the latter ResetSequenceData() and have the former call it internally.
> 

Will check.

> * src/backend/commands/subscriptioncmds.c
> 
> Also lots of duplication between tables and sequences in this file.
> 

Same as the case above.

> * src/backend/replication/logical/tablesync.c
> 
> The comment says it needs sequence support, but there appears to be
> support for initial sequence syncing.  What exactly is missing here?
> 

I think that's just obsolete comment.

> * src/test/subscription/t/028_sequences.pl
> 
> Change to use done_testing() (see
> 549ec201d6132b7c7ee11ee90a4e02119259ba5b).

Will fix.


Do we need to handle both FOR ALL TABLES and FOR ALL SEQUENCES at the
same time? That seems like a reasonable thing people might want.

The patch probably also needs to modify pg_publication_namespace to
track whether the schema is FOR TABLES IN SCHEMA or FOR SEQUENCES.



regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Time to drop plpython2?

2022-02-18 Thread Mark Wong
Hi everyone,

On Fri, Feb 18, 2022 at 02:41:04PM -0800, Andres Freund wrote:
> There's snapper ("pgbf [ a t ] twiska.com"), and there's Mark Wong's large
> menagerie. Mark said yesterday that he's working on updating.

I've made one pass.  Hopefully I didn't make any mistakes. :)

Regards,
Mark




Re: pg_upgrade verbosity when redirecting output to log file

2022-02-18 Thread Justin Pryzby
+* If outputting to a tty / or , append newline. pg_log_v() will put 
the 
 
+* individual progress items onto the next line.

  
+*/ 

  
+   if (log_opts.isatty || log_opts.verbose)

  

I guess the comment should say "or in verbose mode".

-- 
Justin




Re: pg_upgrade verbosity when redirecting output to log file

2022-02-18 Thread Andres Freund
Hi,

On 2022-02-16 17:09:34 +1300, Thomas Munro wrote:
> On Tue, Jan 11, 2022 at 4:42 AM Bruce Momjian  wrote:
> > On Sun, Jan  9, 2022 at 10:39:58PM -0800, Andres Freund wrote:
> > > On 2022-01-10 01:14:32 -0500, Tom Lane wrote:
> > > > I think I'd vote for just nuking that output altogether.
> > > > It seems of very dubious value.
> > >
> > > It seems worthwhile in some form - on large cluster in copy mode, the 
> > > "Copying
> > > user relation files" step can take *quite* a while, and even link/clone 
> > > mode
> > > aren't fast. But perhaps what'd be really needed is something counting up
> > > actual progress in percentage of files and/or space...
> > >
> > > I think just coupling it to verbose mode makes the most sense, for now?
> >
> > All of this logging is from the stage where I was excited pg_upgrade
> > worked, and I wanted to give clear output if it failed in some way ---
> > printing the file names seems like an easy solution.  I agree at this
> > point that logging should be reduced, and if they want more logging, the
> > verbose option is the right way to get it.
> 
> +1

I got a bit stuck on how to best resolve this. I felt bad about removing all
interactive progress, because a pg_upgrade can take a while after all. But
it's also not easy to come up with some good, without a substantially bigger
effort than I want to invest.

After all, I just want to be able to read check-world output. Nearly half of
which is pg_upgrade test output right now.

The attached is my attempt at coming up with something halfway sane without
rewriting pg_upgrade logging entirely. I think it mostly ends up with at least
as sane output as the current code. I needed to add a separate
prep_status_progress() function to make that work.

Greetings,

Andres Freund
>From a04d989a64e6c6bc273b0c8c0d778d5f66f00158 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 18 Feb 2022 17:16:37 -0800
Subject: [PATCH v2] pg_upgrade: Don't print progress status when output is not
 a tty.

Until this change pg_upgrade with output redirected to a file / pipe would end
up printing all files in the cluster. This has made check-world output
exceedingly verbose.

Author: Andres Freund
Discussion: https://postgr.es/m/CA+hUKGKjrV61ZVJ8OSag+3rKRmCZXPc03bDyWMqhXg3rdZ=f...@mail.gmail.com
---
 src/bin/pg_upgrade/dump.c|  2 +-
 src/bin/pg_upgrade/option.c  |  2 +
 src/bin/pg_upgrade/pg_upgrade.c  |  2 +-
 src/bin/pg_upgrade/pg_upgrade.h  |  2 +
 src/bin/pg_upgrade/relfilenode.c |  6 +--
 src/bin/pg_upgrade/util.c| 63 ++--
 6 files changed, 60 insertions(+), 17 deletions(-)

diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index b69b4f95695..29b9e44f782 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -29,7 +29,7 @@ generate_old_dump(void)
 			  GLOBALS_DUMP_FILE);
 	check_ok();
 
-	prep_status("Creating dump of database schemas\n");
+	prep_status_progress("Creating dump of database schemas");
 
 	/* create per-db dump files */
 	for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index d2c82cc2bbb..e75be2c423e 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -207,6 +207,8 @@ parseCommandLine(int argc, char *argv[])
 	if (log_opts.verbose)
 		pg_log(PG_REPORT, "Running in verbose mode\n");
 
+	log_opts.isatty = isatty(fileno(stdout));
+
 	/* Turn off read-only mode;  add prefix to PGOPTIONS? */
 	if (getenv("PGOPTIONS"))
 	{
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index f66bbd53079..ecb3e1f6474 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -381,7 +381,7 @@ create_new_objects(void)
 {
 	int			dbnum;
 
-	prep_status("Restoring database schemas in the new cluster\n");
+	prep_status_progress("Restoring database schemas in the new cluster");
 
 	/*
 	 * We cannot process the template1 database concurrently with others,
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 0aca0a77aae..ca86c112924 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -274,6 +274,7 @@ typedef struct
 	char	   *basedir;		/* Base output directory */
 	char	   *dumpdir;		/* Dumps */
 	char	   *logdir;			/* Log files */
+	bool		isatty;			/* is stdout a tty */
 } LogOpts;
 
 
@@ -427,6 +428,7 @@ void		pg_log(eLogType type, const char *fmt,...) pg_attribute_printf(2, 3);
 void		pg_fatal(const char *fmt,...) pg_attribute_printf(1, 2) pg_attribute_noreturn();
 void		end_progress_output(void);
 void		prep_status(const char *fmt,...) pg_attribute_printf(1, 2);
+void		prep_status_progress(const char *fmt,...) pg_attribute_printf(1, 2);
 void		check_ok(void);
 unsigned int str2uint(const char *str);
 
diff --git a/src/bin/pg_upgrade/relfilenode.c b/src/bin/pg_upgrade/relfilenode.c
index 2f4deb34163..d23ac884bd1 100644
--- a/src/bin/pg_upgrade/r

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-18 Thread Peter Geoghegan
On Fri, Feb 18, 2022 at 2:11 PM Andres Freund  wrote:
> I think it'd be good to add a few isolationtest cases for the
> can't-get-cleanup-lock paths. I think it shouldn't be hard using cursors. The
> slightly harder part is verifying that VACUUM did something reasonable, but
> that still should be doable?

We could even just extend existing, related tests, from vacuum-reltuples.spec.

Another testing strategy occurs to me: we could stress-test the
implementation by simulating an environment where the no-cleanup-lock
path is hit an unusually large number of times, possibly a fixed
percentage of the time (like 1%, 5%), say by making vacuumlazy.c's
ConditionalLockBufferForCleanup() call return false randomly. Now that
we have lazy_scan_noprune for the no-cleanup-lock path (which is as
similar to the regular lazy_scan_prune path as possible), I wouldn't
expect this ConditionalLockBufferForCleanup() testing gizmo to be too
disruptive.

-- 
Peter Geoghegan




Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

2022-02-18 Thread Tomas Vondra



On 2/17/22 23:16, Robert Haas wrote:
> On Thu, Feb 17, 2022 at 4:17 PM Tomas Vondra
>  wrote:
>> IMHO the whole problem is we're unable to estimate the join clause as a
>> conditional probability, i.e.
>>
>>P(A.x = B.x | (A.x < 42) & (B.x < 42))
>>
>> so maybe instead of trying to generate additional RelOptInfo items we
>> should think about improving that. The extra RelOptInfos don't really
>> solve this, because even if you decide to join A|x<42 to B|x<42 it does
>> nothing to improve the join clause estimate.
> 
> I guess I hadn't considered that angle. I think the extra RelOptInfos
> (or whatever) actually do solve a problem, because enforcing a
> high-selectivity join qual against both sides is potentially quite
> wasteful, and you need some way to decide whether to do it on one
> side, the other, or both. But it's also true that I was wrong to
> assume independence ... and if we could avoid assuming that, then the
> join selectivity would work itself out without any of the machinery
> that I just proposed.
> 

True. We kinda already have this issue for the equality clauses, and
having paths with the condition pushed down (or not) seems like a
natural approach.

>> It actually deals with a more general form of this case, because the
>> clauses don't need to reference the same attribute - so for example this
>> would work too, assuming there is extended stats object on the columns
>> on each side:
>>
>>   P(A.c = B.d | (A.e < 42) & (B.f < 42))
> 
> That'd be cool.
> 

Yeah, but the patch implementing this still needs more work.

>> Not sure. In my experience queries with both a join clause and other
>> clauses referencing the same attribute are pretty rare. But I agree if
>> we can do the expensive stuff only when actually needed, with no cost in
>> the 99.999% other cases, I don't see why not. Of course, code complexity
>> is a cost too.
> 
> Right. I mean, we could have a planner GUC to control whether the
> optimization is used even in cases where we see that it's possible.
> But Tom keeps arguing that it is possible in many queries and would
> benefit few queries, and I'm not seeing why that should be so. I think
> it's likely to benefit many of the queries to which it applies.
> 

Maybe. Although the example I linked some time ago shows a pretty
dramatic improvement, due to picking merge join + index scan, and not
realizing we'll have to skip a lot of data. But that's just one
anecdotal example.

Anyway, I think the best way to deal with these (perfectly legitimate)
concerns is to show how expensive it is for queries not not having such
join/restriction clauses, with the cost being close to 0. And then for
queries with such clauses but not benefiting from the change (a bit like
a worst case).


regards


[1]
https://www.postgresql.org/message-id/CA%2B1Wm9U_sP9237f7OH7O%3D-UTab71DWOO4Qc-vnC78DfsJQBCwQ%40mail.gmail.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-18 Thread Peter Geoghegan
On Fri, Feb 18, 2022 at 1:56 PM Robert Haas  wrote:
> + * Also maintains *NewRelfrozenxid and *NewRelminmxid, which are the current
> + * target relfrozenxid and relminmxid for the relation.  Assumption is that
>
> "maintains" is fuzzy. I think you should be saying something much more
> explicit, and the thing you are saying should make it clear that these
> arguments are input-output arguments: i.e. the caller must set them
> correctly before calling this function, and they will be updated by
> the function.

Makes sense.

> I don't think you have to spell all of that out in every
> place where this comes up in the patch, but it needs to be clear from
> what you do say. For example, I would be happier with a comment that
> said something like "Every call to this function will either set
> HEAP_XMIN_FROZEN in the xl_heap_freeze_tuple struct passed as an
> argument, or else reduce *NewRelfrozenxid to the xmin of the tuple if
> it is currently newer than that. Thus, after a series of calls to this
> function, *NewRelfrozenxid represents a lower bound on unfrozen xmin
> values in the tuples examined. Before calling this function, caller
> should initialize *NewRelfrozenxid to ."

We have to worry about XIDs from MultiXacts (and xmax values more
generally). And we have to worry about the case where we start out
with only xmin frozen (by an earlier VACUUM), and then have to freeze
xmax too. I believe that we have to generally consider xmin and xmax
independently. For example, we cannot ignore xmax, just because we
looked at xmin, since in general xmin alone might have already been
frozen.

> + * Also maintains *NewRelfrozenxid and *NewRelminmxid, which are the current
> + * target relfrozenxid and relminmxid for the relation.  Assumption is that
> + * caller will never freeze any of the XIDs from the tuple, even when we say
> + * that they should.  If caller opts to go with our recommendation to freeze,
> + * then it must account for the fact that it shouldn't trust how we've set
> + * NewRelfrozenxid/NewRelminmxid.  (In practice aggressive VACUUMs always 
> take
> + * our recommendation because they must, and non-aggressive VACUUMs always 
> opt
> + * to not freeze, preferring to ratchet back NewRelfrozenxid instead).
>
> I don't understand this one.
>
> +* (Actually, we maintain NewRelminmxid differently here, because we
> +* assume that XIDs that should be frozen according to cutoff_xid 
> won't
> +* be, whereas heap_prepare_freeze_tuple makes the opposite 
> assumption.)
>
> This one either.

The difference between the cleanup lock path (in
lazy_scan_prune/heap_prepare_freeze_tuple) and the share lock path (in
lazy_scan_noprune/heap_tuple_needs_freeze) is what is at issue in both
of these confusing comment blocks, really. Note that cutoff_xid is the
name that both heap_prepare_freeze_tuple and heap_tuple_needs_freeze
have for FreezeLimit (maybe we should rename every occurence of
cutoff_xid in heapam.c to FreezeLimit).

At a high level, we aren't changing the fundamental definition of an
aggressive VACUUM in any of the patches -- we still need to advance
relfrozenxid up to FreezeLimit in an aggressive VACUUM, just like on
HEAD, today (we may be able to advance it *past* FreezeLimit, but
that's just a bonus). But in a non-aggressive VACUUM, where there is
still no strict requirement to advance relfrozenxid (by any amount),
the code added by 0001 can set relfrozenxid to any known safe value,
which could either be from before FreezeLimit, or after FreezeLimit --
almost anything is possible (provided we respect the relfrozenxid
invariant, and provided we see that we didn't skip any
all-visible-not-all-frozen pages).

Since we still need to "respect FreezeLimit" in an aggressive VACUUM,
the aggressive case might need to wait for a full cleanup lock the
hard way, having tried and failed to do it the easy way within
lazy_scan_noprune (lazy_scan_noprune will still return false when any
call to heap_tuple_needs_freeze for any tuple returns false) -- same
as on HEAD, today.

And so the difference at issue here is: FreezeLimit/cutoff_xid only
needs to affect the new NewRelfrozenxid value we use for relfrozenxid in
heap_prepare_freeze_tuple, which is involved in real freezing -- not
in heap_tuple_needs_freeze, whose main purpose is still to help us
avoid freezing where a cleanup lock isn't immediately available. While
the purpose of FreezeLimit/cutoff_xid within heap_tuple_needs_freeze
is to determine its bool return value, which will only be of interest
to the aggressive case (which might have to get a cleanup lock and do
it the hard way), not the non-aggressive case (where ratcheting back
NewRelfrozenxid is generally possible, and generally leaves us with
almost as good of a value).

In other words: the calls to heap_tuple_needs_freeze made from
lazy_scan_noprune are simply concerned with the page as it actually
is, whereas the similar/corresponding calls to
heap_prepare_freeze_tuple from lazy

Re: Race conditions in 019_replslot_limit.pl

2022-02-18 Thread Andres Freund
Hi,

On 2022-02-18 18:49:14 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > We'd also need to add pq_endmessage_noblock(), because the pq_endmessage()
> > obviously tries to send (as in the backtrace upthread) if the output buffer 
> > is
> > large enough, which it often will be in walsender.
> 
> I don't see that as "obvious".  If we're there, we do not have an
> error situation.

The problem is that due walsender using pq_putmessage_noblock(), the output
buffer is often going to be too full for a plain pq_endmessage() to not send
out data.  Because walsender will have an output buffer > PQ_SEND_BUFFER_SIZE
a lot of the time, errors will commonly be thrown with the output buffer
full.

That leads the pq_endmessage() in send_message_to_frontend() to block sending
the error message and the preceding WAL data. Leading to backtraces like:

> #0  0x7faf4a570ec6 in epoll_wait (epfd=4, events=0x7faf4c201458, 
> maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
> #1  0x7faf4b8ced5c in WaitEventSetWaitBlock (set=0x7faf4c2013e0, 
> cur_timeout=-1, occurred_events=0x7ffe47df2320, nevents=1)
> at /home/andres/src/postgresql/src/backend/storage/ipc/latch.c:1465
> #2  0x7faf4b8cebe3 in WaitEventSetWait (set=0x7faf4c2013e0, timeout=-1, 
> occurred_events=0x7ffe47df2320, nevents=1, wait_event_info=100663297)
> at /home/andres/src/postgresql/src/backend/storage/ipc/latch.c:1411
> #3  0x7faf4b70f48b in secure_write (port=0x7faf4c22da50, 
> ptr=0x7faf4c2f1210, len=21470) at 
> /home/andres/src/postgresql/src/backend/libpq/be-secure.c:298
> #4  0x7faf4b71aecb in internal_flush () at 
> /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1380
> #5  0x7faf4b71ada1 in internal_putbytes (s=0x7ffe47df23dc "E\177", len=1) 
> at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1326
> #6  0x7faf4b71b0cf in socket_putmessage (msgtype=69 'E', s=0x7faf4c201700 
> "SFATAL", len=112)
> at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1507
> #7  0x7faf4b71c151 in pq_endmessage (buf=0x7ffe47df2460) at 
> /home/andres/src/postgresql/src/backend/libpq/pqformat.c:301
> #8  0x7faf4babbb5e in send_message_to_frontend (edata=0x7faf4be2f1c0 
> ) at 
> /home/andres/src/postgresql/src/backend/utils/error/elog.c:3253
> #9  0x7faf4bab8aa0 in EmitErrorReport () at 
> /home/andres/src/postgresql/src/backend/utils/error/elog.c:1541
> #10 0x7faf4bab5ec0 in errfinish (filename=0x7faf4bc9770d "postgres.c", 
> lineno=3192, funcname=0x7faf4bc99170 <__func__.8> "ProcessInterrupts")
> at /home/andres/src/postgresql/src/backend/utils/error/elog.c:592
> #11 0x7faf4b907e73 in ProcessInterrupts () at 
> /home/andres/src/postgresql/src/backend/tcop/postgres.c:3192
> #12 0x7faf4b8920af in WalSndLoop (send_data=0x7faf4b8927f2 
> ) at 
> /home/andres/src/postgresql/src/backend/replication/walsender.c:2404





> > I guess we could try to flush in a blocking manner sometime later in the
> > shutdown sequence, after we've released resources? But I'm doubtful it's a
> > good idea, we don't really want to block waiting to exit when e.g. the 
> > network
> > connection is dead without the TCP stack knowing.
> 
> I think you are trying to move in the direction of possibly exiting
> without ever sending at all, which does NOT seem like an improvement.

I was just talking about doing another blocking pq_flush(), in addition to the
pq_flush_if_writable() earlier. That'd be trying harder to send out data, not
less hard...

Greetings,

Andres Freund




Re: Mark all GUC variable as PGDLLIMPORT

2022-02-18 Thread Andres Freund
Hi,

On 2022-02-15 08:06:58 -0800, Andres Freund wrote:
> The more I think about it the more I'm convinced that if we want to do this,
> we should do it for variables and functions.

Btw, if we were to do this, we should just use -fvisibility=hidden everywhere
and would see the same set of failures on unixoid systems as on windows. Of
course only in in-core extensions, but it'd still be better than nothing.

I proposed using -fvisibility=hidden in extensions:
https://postgr.es/m/20211101020311.av6hphdl6xbjb...@alap3.anarazel.de

Mostly because using explicit symbol exports makes it a lot easier to build
extensions libraries on windows (with meson, but also everything built outside
of postgres with msvc). But also because it makes function calls *inside* an
extension have noticeably lower overhead. And it makes the set of symbols
exported smaller.


One problem I encountered was that it's actually kind of hard to set build
flags only for extension libraries:
https://postgr.es/m/20220111025328.iq5g6uck53j5qtin%40alap3.anarazel.de

But if we added explicit exports to everything we export, we'd not need to
restrict the use of -fvisibility=hidden to extension libraries anymore. Would
get decent error messages on all platforms.


Subsequently we could yield a bit of performance from critical paths by
marking selected symbols as *not* exported. That'd make them a tad cheaper to
call and allow the optimizer more leeway.

Greetings,

Andres Freund




Re: Race conditions in 019_replslot_limit.pl

2022-02-18 Thread Tom Lane
Andres Freund  writes:
> On 2022-02-18 18:15:21 -0500, Tom Lane wrote:
>> Perhaps it'd be sensible to do this only in debugging (ie Assert)
>> builds?

> That seems not great, because it pretty clearly can lead to hangs, which is
> problematic in tests too. What about using pq_flush_if_writable()? In nearly
> all situations that'd still push the failure to the client.

That'd be okay by me.

> We'd also need to add pq_endmessage_noblock(), because the pq_endmessage()
> obviously tries to send (as in the backtrace upthread) if the output buffer is
> large enough, which it often will be in walsender.

I don't see that as "obvious".  If we're there, we do not have an
error situation.

> I guess we could try to flush in a blocking manner sometime later in the
> shutdown sequence, after we've released resources? But I'm doubtful it's a
> good idea, we don't really want to block waiting to exit when e.g. the network
> connection is dead without the TCP stack knowing.

I think you are trying to move in the direction of possibly exiting
without ever sending at all, which does NOT seem like an improvement.

regards, tom lane




Re: Race conditions in 019_replslot_limit.pl

2022-02-18 Thread Andres Freund
Hi,

On 2022-02-18 18:15:21 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2022-02-17 21:55:21 -0800, Andres Freund wrote:
> >> Isn't it pretty bonkers that we allow error processing to get stuck behind
> >> network traffic, *before* we have have released resources (locks etc)?
>
> It's more or less intentional, per elog.c:
>
> /*
>  * This flush is normally not necessary, since postgres.c will flush out
>  * waiting data when control returns to the main loop. But it seems best
>  * to leave it here, so that the client has some clue what happened if the
>  * backend dies before getting back to the main loop ... error/notice
>  * messages should not be a performance-critical path anyway, so an extra
>  * flush won't hurt much ...
>  */
> pq_flush();
>
> Perhaps it'd be sensible to do this only in debugging (ie Assert)
> builds?

That seems not great, because it pretty clearly can lead to hangs, which is
problematic in tests too. What about using pq_flush_if_writable()? In nearly
all situations that'd still push the failure to the client.

We'd also need to add pq_endmessage_noblock(), because the pq_endmessage()
obviously tries to send (as in the backtrace upthread) if the output buffer is
large enough, which it often will be in walsender.


I guess we could try to flush in a blocking manner sometime later in the
shutdown sequence, after we've released resources? But I'm doubtful it's a
good idea, we don't really want to block waiting to exit when e.g. the network
connection is dead without the TCP stack knowing.


Hm. There already is code trying to short-circuit sending errors to the client
if a backend gets terminated. Introduced in

commit 2ddb9149d14de9a2e7ac9ec6accf3ad442702b24
Author: Tom Lane 
Date:   2018-10-19 21:39:21 -0400

Server-side fix for delayed NOTIFY and SIGTERM processing.

and predecessors.

If ProcessClientWriteInterrupt() sees ProcDiePending, we'll stop trying to
send stuff to the client if writes block. However, this doesn't work here,
because we've already unset ProcDiePending:

#7  0x7faf4b71c151 in pq_endmessage (buf=0x7ffe47df2460) at 
/home/andres/src/postgresql/src/backend/libpq/pqformat.c:301
#8  0x7faf4babbb5e in send_message_to_frontend (edata=0x7faf4be2f1c0 
) at /home/andres/src/postgresql/src/backend/utils/error/elog.c:3253
#9  0x7faf4bab8aa0 in EmitErrorReport () at 
/home/andres/src/postgresql/src/backend/utils/error/elog.c:1541
#10 0x7faf4bab5ec0 in errfinish (filename=0x7faf4bc9770d "postgres.c", 
lineno=3192, funcname=0x7faf4bc99170 <__func__.8> "ProcessInterrupts")
at /home/andres/src/postgresql/src/backend/utils/error/elog.c:592
#11 0x7faf4b907e73 in ProcessInterrupts () at 
/home/andres/src/postgresql/src/backend/tcop/postgres.c:3192
#12 0x7faf4b8920af in WalSndLoop (send_data=0x7faf4b8927f2 
) at 
/home/andres/src/postgresql/src/backend/replication/walsender.c:2404
#13 0x7faf4b88f82e in StartReplication (cmd=0x7faf4c293fc0) at 
/home/andres/src/postgresql/src/backend/replication/walsender.c:834

Before ProcessInterrupts() FATALs due to a SIGTERM, it resets ProcDiePending.

This seems not optimal.

We can't just leave ProcDiePending set, otherwise we'll probably end up
throwing more errors during the shutdown sequence. But it seems we need
something similar to proc_exit_inprogress, except set earlier? And then take
that into account in ProcessClientWriteInterrupt()?

Greetings,

Andres Freund




Re: Time to drop plpython2?

2022-02-18 Thread Tom Lane
Andres Freund  writes:
> On 2022-02-18 18:09:19 -0500, Tom Lane wrote:
>> This one was discussed on the buildfarm-owners list last month.
>> There's no 32-bit python3 on that box, and apparently no plans
>> to install one --- it sounded like the box is due for retirement
>> anyway.

> Any chance that was a private response? I just looked in the buildfarm-members
> list (I assume you meant that?), and didn't see anything:
> https://www.postgresql.org/message-id/flat/3162195.1642093011%40sss.pgh.pa.us

Oh ... hm, yeah, looking at my mail logs, that came in as private mail and
there's no corresponding buildfarm-members traffic.  I did not keep the
message unfortunately, but the owner indicated that he wasn't planning to
bother updating python.  Which is fine, but maybe we should press him to
remove --with-python instead of disabling the box altogether --- we don't
have a lot of Solaris clones in the farm.

regards, tom lane




Re: shared_preload_libraries = 'pg_stat_statements' failing with installcheck (compute_query_id = auto)

2022-02-18 Thread Michael Paquier
On Fri, Feb 18, 2022 at 05:38:56PM +0800, Julien Rouhaud wrote:
> On Fri, Feb 18, 2022 at 05:22:36PM +0900, Michael Paquier wrote:
>> So, I have been looking at this problem, and I don't see a problem in
>> doing something like the attached, where we add a "regress" mode to
>> compute_query_id that is a synonym of "auto". Or, in short, we have
>> the default of letting a module decide if a query ID can be computed
>> or not, at the exception that we hide its result in EXPLAIN outputs.
>>
>> Julien, what do you think?
> 
> I don't see any problem with that patch.

Thanks for looking at it.  An advantage of this version is that it is
backpatchable as the option enum won't break any ABI compatibility, so
that's a win-win.

>> FWIW, about your question of upthread, I have noticed the behavior
>> while testing, but I know of some internal customers that enable
>> pg_stat_statements and like doing tests on the PostgreSQL instance
>> deployed this way, so that would break.  They are not on 14 yet as far
>> as I know.
> 
> Are they really testing EXPLAIN output, or just doing application-level tests?

With the various internal customers, both.  And installcheck implies
EXPLAIN outputs.

First, let's wait a couple of days and see if anyone has an extra
opinion to offer on this topic.
--
Michael


signature.asc
Description: PGP signature


Re: Time to drop plpython2?

2022-02-18 Thread Andres Freund
Hi,

On 2022-02-18 18:09:19 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > There's one further failure, but the symptoms are quite different. I've also
> > pinged its owner. I think it's a problem on the system, rather than our 
> > side,
> > but less certain than with the other cases:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=haddock&dt=2022-02-17%2023%3A36%3A09
> 
> This one was discussed on the buildfarm-owners list last month.
> There's no 32-bit python3 on that box, and apparently no plans
> to install one --- it sounded like the box is due for retirement
> anyway.

Any chance that was a private response? I just looked in the buildfarm-members
list (I assume you meant that?), and didn't see anything:

https://www.postgresql.org/message-id/flat/3162195.1642093011%40sss.pgh.pa.us

Greetings,

Andres Freund




Re: Race conditions in 019_replslot_limit.pl

2022-02-18 Thread Tom Lane
Andres Freund  writes:
> On 2022-02-17 21:55:21 -0800, Andres Freund wrote:
>> Isn't it pretty bonkers that we allow error processing to get stuck behind
>> network traffic, *before* we have have released resources (locks etc)?

It's more or less intentional, per elog.c:

/*
 * This flush is normally not necessary, since postgres.c will flush out
 * waiting data when control returns to the main loop. But it seems best
 * to leave it here, so that the client has some clue what happened if the
 * backend dies before getting back to the main loop ... error/notice
 * messages should not be a performance-critical path anyway, so an extra
 * flush won't hurt much ...
 */
pq_flush();

Perhaps it'd be sensible to do this only in debugging (ie Assert)
builds?

regards, tom lane




Re: Race conditions in 019_replslot_limit.pl

2022-02-18 Thread Andres Freund
Hi,

On 2022-02-18 14:42:48 -0800, Andres Freund wrote:
> On 2022-02-17 21:55:21 -0800, Andres Freund wrote:
> > Isn't it pretty bonkers that we allow error processing to get stuck behind
> > network traffic, *before* we have have released resources (locks etc)?
> 
> This is particularly likely to be a problem for walsenders, because they often
> have a large output buffer filled, because walsender uses
> pq_putmessage_noblock() to send WAL data. Which obviously can be large.
> 
> In the stacktrace upthread you can see:
> #3  0x7faf4b70f48b in secure_write (port=0x7faf4c22da50, 
> ptr=0x7faf4c2f1210, len=21470) at 
> /home/andres/src/postgresql/src/backend/libpq/be-secure.c:29
> 
> which certainly is more than in most other cases of error messages being
> sent. And it obviously might not be the first to have gone out.
> 
> 
> > I wonder if we should try to send, but do it in a nonblocking way.
> 
> I think we should probably do so at least during FATAL error processing. But
> also consider doing so for ERROR, because not releasing resources after
> getting cancelled / terminated is pretty nasty imo.

Is it possible that what we're seeing is a deadlock, with both walsender and
the pg_basebackup child trying to send data, but neither receiving?

But that'd require that somehow the basebackup child process didn't exit with
its parent. And I don't really see how that'd happen.


I'm running out of ideas for how to try to reproduce this. I think we might
need some additional debugging information to get more information from the
buildfarm.

I'm thinking of adding log_min_messages=DEBUG2 to primary3, passing --verbose
to pg_basebackup in $node_primary3->backup(...).

It might also be worth adding DEBUG2 messages to ReplicationSlotShmemExit(),
ReplicationSlotCleanup(), InvalidateObsoleteReplicationSlots().

Greetings,

Andres Freund




Re: Time to drop plpython2?

2022-02-18 Thread Tom Lane
Andres Freund  writes:
> There's one further failure, but the symptoms are quite different. I've also
> pinged its owner. I think it's a problem on the system, rather than our side,
> but less certain than with the other cases:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=haddock&dt=2022-02-17%2023%3A36%3A09

This one was discussed on the buildfarm-owners list last month.
There's no 32-bit python3 on that box, and apparently no plans
to install one --- it sounded like the box is due for retirement
anyway.

regards, tom lane




Re: Race conditions in 019_replslot_limit.pl

2022-02-18 Thread Andres Freund
Hi,

On 2022-02-17 21:55:21 -0800, Andres Freund wrote:
> Isn't it pretty bonkers that we allow error processing to get stuck behind
> network traffic, *before* we have have released resources (locks etc)?

This is particularly likely to be a problem for walsenders, because they often
have a large output buffer filled, because walsender uses
pq_putmessage_noblock() to send WAL data. Which obviously can be large.

In the stacktrace upthread you can see:
#3  0x7faf4b70f48b in secure_write (port=0x7faf4c22da50, 
ptr=0x7faf4c2f1210, len=21470) at 
/home/andres/src/postgresql/src/backend/libpq/be-secure.c:29

which certainly is more than in most other cases of error messages being
sent. And it obviously might not be the first to have gone out.


> I wonder if we should try to send, but do it in a nonblocking way.

I think we should probably do so at least during FATAL error processing. But
also consider doing so for ERROR, because not releasing resources after
getting cancelled / terminated is pretty nasty imo.

Greetings,

Andres Freund




Re: Time to drop plpython2?

2022-02-18 Thread Andres Freund
Hi,

Thanks to some more buildfarm animal updates things are looking better. I
think there's now only three owners that haven't updated their animals
successfully. One of which I hadn't yet pinged (chipmunk / Heikki), done now.

There's snapper ("pgbf [ a t ] twiska.com"), and there's Mark Wong's large
menagerie. Mark said yesterday that he's working on updating.


There's one further failure, but the symptoms are quite different. I've also
pinged its owner. I think it's a problem on the system, rather than our side,
but less certain than with the other cases:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=haddock&dt=2022-02-17%2023%3A36%3A09

checking Python.h usability... no
checking Python.h presence... yes
configure: WARNING: Python.h: present but cannot be compiled
configure: WARNING: Python.h: check for missing prerequisite headers?
configure: WARNING: Python.h: see the Autoconf documentation
configure: WARNING: Python.h: section "Present But Cannot Be Compiled"
configure: WARNING: Python.h: proceeding with the compiler's result
configure: WARNING: ## -- ##
configure: WARNING: ## Report this to pgsql-b...@lists.postgresql.org ##
configure: WARNING: ## -- ##
checking for Python.h... no
configure: error: header file  is required for Python

configure:19158: ccache gcc -c -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Werror=vla -Wendif-labels 
-Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type 
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard 
-Wno-format-truncation -Wno-stringop-truncation -g -m32 
-I/usr/include/python3.9 -DWAIT_USE_POLL -D_POSIX_PTHREAD_SEMANTICS 
-I/usr/include/libxml2  conftest.c >&5
In file included from /usr/include/python3.9/Python.h:8,
 from conftest.c:235:
/usr/include/python3.9/pyconfig.h:1443: warning: "SIZEOF_LONG" redefined
 1443 | #define SIZEOF_LONG 8
  |
conftest.c:183: note: this is the location of the previous definition
  183 | #define SIZEOF_LONG 4
  |
In file included from /usr/include/python3.9/Python.h:8,
 from conftest.c:235:
/usr/include/python3.9/pyconfig.h:1467: warning: "SIZEOF_SIZE_T" redefined
 1467 | #define SIZEOF_SIZE_T 8
  |
conftest.c:182: note: this is the location of the previous definition
  182 | #define SIZEOF_SIZE_T 4
  |
In file included from /usr/include/python3.9/Python.h:8,
 from conftest.c:235:
/usr/include/python3.9/pyconfig.h:1476: warning: "SIZEOF_VOID_P" redefined
 1476 | #define SIZEOF_VOID_P 8
  |
conftest.c:181: note: this is the location of the previous definition
  181 | #define SIZEOF_VOID_P 4
  |
In file included from /usr/include/python3.9/Python.h:63,
 from conftest.c:235:
/usr/include/python3.9/pyport.h:736:2: error: #error "LONG_BIT definition 
appears wrong for platform (bad gcc/glibc config?)."
  736 | #error "LONG_BIT definition appears wrong for platform (bad gcc/glibc 
config?)."
  |  ^


This is a 64bit host, targetting 32bit "CFLAGS' => '-m32'. However it linked
successfully against python 2.

Greetings,

Andres Freund




Re: killing perl2host

2022-02-18 Thread Andrew Dunstan


On 2/17/22 12:12, Andres Freund wrote:
> Hi,
>
> On 2022-02-17 09:20:56 -0500, Andrew Dunstan wrote:
>> I don't think we have or have ever had a buildfarm animal targeting
>> msys. In general I think of msys as a build environment to create native
>> binaries. But if we want to support targeting msys we should have an
>> animal doing that.
> It's pretty much cygwin. Wouldn't hurt to have a dedicated animal though, I
> agree. We do have a dedicated path for it in configure.ac:
>
> case $host_os in
> ...
>   cygwin*|msys*) template=cygwin ;;



FYI I tested it while in wait mode for something else, and it fell over
at the first hurdle:


running bootstrap script ... 2022-02-18 22:25:45.119 UTC [34860] FATAL: 
could not create shared memory segment: Function not implemented
2022-02-18 22:25:45.119 UTC [34860] DETAIL:  Failed system call was
shmget(key=1407374884304065, size=56, 03600).
child process exited with exit code 1


I'm not intending to put any more effort into supporting it.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Synchronizing slots from primary to standby

2022-02-18 Thread Andres Freund
Hi,

On 2022-02-11 15:28:19 +0100, Peter Eisentraut wrote:
> On 05.02.22 20:59, Andres Freund wrote:
> > On 2022-01-03 14:46:52 +0100, Peter Eisentraut wrote:
> > >  From ec00dc6ab8bafefc00e9b1c78ac9348b643b8a87 Mon Sep 17 00:00:00 2001
> > > From: Peter Eisentraut
> > > Date: Mon, 3 Jan 2022 14:43:36 +0100
> > > Subject: [PATCH v3] Synchronize logical replication slots from primary to
> > >   standby
> > I've just skimmed the patch and the related threads. As far as I can tell 
> > this
> > cannot be safely used without the conflict handling in [1], is that correct?
> 
> This or similar questions have been asked a few times about this or similar
> patches, but they always come with some doubt.

I'm certain it's a problem - the only reason I couched it was that there could
have been something clever in the patch preventing problems that I missed
because I just skimmed it.


> If we think so, it would be
> useful perhaps if we could come up with test cases that would demonstrate
> why that other patch/feature is necessary.  (I'm not questioning it
> personally, I'm just throwing out ideas here.)

The patch as-is just breaks one of the fundamental guarantees necessary for
logical decoding, that no rows versions can be removed that are still required
for logical decoding (signalled via catalog_xmin). So there needs to be an
explicit mechanism upholding that guarantee, but there is not right now from
what I can see.

One piece of the referenced patchset is that it adds information about removed
catalog rows to a few WAL records, and then verifies during replay that no
record can be replayed that removes resources that are still needed. If such a
conflict exists it's dealt with as a recovery conflict.

That itself doesn't provide prevention against removal of required, but it
provides detection. The prevention against removal can then be done using a
physical replication slot with hot standby feedback or some other mechanism
(e.g. slot syncing mechanism could maintain a "placeholder" slot on the
primary for all sync targets or something like that).

Even if that infrastructure existed / was merged, the slot sync stuff would
still need some very careful logic to protect against problems due to
concurrent WAL replay and "synchronized slot" creation. But that's doable.

Greetings,

Andres Freund




Re: Emit a warning if the extension's GUC is set incorrectly

2022-02-18 Thread Tom Lane
Florin Irion  writes:
> Il giorno ven 18 feb 2022 alle ore 10:58 Tom Lane  ha
> scritto:
>> Here's a delta patch (meant to be applied after reverting cab5b9ab2)
>> that does things like that.  It fixes the parallel-mode problem ...
>> so do we want to tighten things up this much?

> Thank you for taking care of the bug I introduced with 75d22069e,
> I didn't notice this thread until now.

Yeah, this thread kinda disappeared under the rug during the Christmas
holidays, but we need to decide whether we want to push forward or
leave things at the status quo.

> For what it's worth, I agree with throwing an ERROR if the placeholder is
> unrecognized. Initially, I didn't want to change too much the liberty of
> setting any placeholder, but mainly to not go unnoticed.

I also think that this is probably a good change to make.

One situation where somebody would be unhappy is if they're using GUC
variables as plain custom variables despite them being within the
namespace of an extension they're using.  But that seems to me to be
(a) far-fetched and (b) mighty dangerous, since some new version of the
extension might suddenly start reacting to those variables in ways you
presumably didn't expect.

Perhaps a more plausible use-case is "I need to work with both versions
X and Y of extension E, and Y has setting e.foo while X doesn't --- but
I can just set e.foo unconditionally since X will ignore it".  With this
patch, that could still work, but you'd have to be certain to apply the
setting before loading E.

I don't really see any other use-cases favoring the current behavior.
On balance, eliminating possible mistakes seems like enough of a
benefit to justify disallowing these cases.

regards, tom lane




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-18 Thread Andres Freund
Hi,

On 2022-02-18 15:54:19 -0500, Robert Haas wrote:
> > Are there any objections to this plan?
> 
> I really like the idea of reducing the scope of what is being changed
> here, and I agree that eagerly advancing relfrozenxid carries much
> less risk than the other changes.

Sounds good to me too!

Greetings,

Andres Freund




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-18 Thread Andres Freund
Hi,

On 2022-02-18 13:09:45 -0800, Peter Geoghegan wrote:
> 0001 is tricky in the sense that there are a lot of fine details, and
> if you get any one of them wrong the result might be a subtle bug. For
> example, the heap_tuple_needs_freeze() code path is only used when we
> cannot get a cleanup lock, which is rare -- and some of the branches
> within the function are relatively rare themselves. The obvious
> concern is: What if some detail of how we track the new relfrozenxid
> value (and new relminmxid value) in this seldom-hit codepath is just
> wrong, in whatever way we didn't think of?

I think it'd be good to add a few isolationtest cases for the
can't-get-cleanup-lock paths. I think it shouldn't be hard using cursors. The
slightly harder part is verifying that VACUUM did something reasonable, but
that still should be doable?

Greetings,

Andres Freund




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-18 Thread Robert Haas
On Fri, Feb 18, 2022 at 4:10 PM Peter Geoghegan  wrote:
> It does not change any other behavior. It's totally mechanical.
>
> 0001 is tricky in the sense that there are a lot of fine details, and
> if you get any one of them wrong the result might be a subtle bug. For
> example, the heap_tuple_needs_freeze() code path is only used when we
> cannot get a cleanup lock, which is rare -- and some of the branches
> within the function are relatively rare themselves. The obvious
> concern is: What if some detail of how we track the new relfrozenxid
> value (and new relminmxid value) in this seldom-hit codepath is just
> wrong, in whatever way we didn't think of?

Right. I think we have no choice but to accept such risks if we want
to make any progress here, and every patch carries them to some
degree. I hope that someone else will review this patch in more depth
than I have just now, but what I notice reading through it is that
some of the comments seem pretty opaque. For instance:

+ * Also maintains *NewRelfrozenxid and *NewRelminmxid, which are the current
+ * target relfrozenxid and relminmxid for the relation.  Assumption is that

"maintains" is fuzzy. I think you should be saying something much more
explicit, and the thing you are saying should make it clear that these
arguments are input-output arguments: i.e. the caller must set them
correctly before calling this function, and they will be updated by
the function. I don't think you have to spell all of that out in every
place where this comes up in the patch, but it needs to be clear from
what you do say. For example, I would be happier with a comment that
said something like "Every call to this function will either set
HEAP_XMIN_FROZEN in the xl_heap_freeze_tuple struct passed as an
argument, or else reduce *NewRelfrozenxid to the xmin of the tuple if
it is currently newer than that. Thus, after a series of calls to this
function, *NewRelfrozenxid represents a lower bound on unfrozen xmin
values in the tuples examined. Before calling this function, caller
should initialize *NewRelfrozenxid to ."

+* Changing nothing, so might have to ratchet
back NewRelminmxid,
+* NewRelfrozenxid, or both together

This comment I like.

+* New multixact might have remaining XID older than
+* NewRelfrozenxid

This one's good, too.

+ * Also maintains *NewRelfrozenxid and *NewRelminmxid, which are the current
+ * target relfrozenxid and relminmxid for the relation.  Assumption is that
+ * caller will never freeze any of the XIDs from the tuple, even when we say
+ * that they should.  If caller opts to go with our recommendation to freeze,
+ * then it must account for the fact that it shouldn't trust how we've set
+ * NewRelfrozenxid/NewRelminmxid.  (In practice aggressive VACUUMs always take
+ * our recommendation because they must, and non-aggressive VACUUMs always opt
+ * to not freeze, preferring to ratchet back NewRelfrozenxid instead).

I don't understand this one.

+* (Actually, we maintain NewRelminmxid differently here, because we
+* assume that XIDs that should be frozen according to cutoff_xid won't
+* be, whereas heap_prepare_freeze_tuple makes the opposite assumption.)

This one either.

I haven't really grokked exactly what is happening in
heap_tuple_needs_freeze yet, and may not have time to study it further
in the near future. Not saying it's wrong, although improving the
comments above would likely help me out.

> > If there's a way you can make the precise contents of 0002 and 0003
> > more clear, I would like that, too.
>
> The really big one is 0002 -- even 0003 (the FSM PageIsAllVisible()
> thing) wasn't on the table before now. 0002 is the patch that changes
> the basic criteria for freezing, making it block-based rather than
> based on the FreezeLimit cutoff (barring edge cases that are important
> for correctness, but shouldn't noticeably affect freezing overhead).
>
> The single biggest practical improvement from 0002 is that it
> eliminates what I've called the freeze cliff, which is where many old
> tuples (much older than FreezeLimit/vacuum_freeze_min_age) must be
> frozen all at once, in a balloon payment during an eventual aggressive
> VACUUM. Although it's easy to see that that could be useful, it is
> harder to justify (much harder) than anything else. Because we're
> freezing more eagerly overall, we're also bound to do more freezing
> without benefit in certain cases. Although I think that this can be
> justified as the cost of doing business, that's a hard argument to
> make.

You've used the term "freezing cliff" repeatedly in earlier emails,
and this is the first time I've been able to understand what you
meant. I'm glad I do, now.

But can you describe the algorithm that 0002 uses to accomplish this
improvement? Like "if it sees that the page meets criteria X, then it
freezes all tuples on the page, else if it sees th

Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit

2022-02-18 Thread David Zhang

Thanks a lot for updating the patch.

Tried to apply the patches to master branch, no warning found and 
regression test passed.


Now, we have many places (5) calling the same function with a constant 
number 3. Is this a good time to consider redefine this number a 
macro somewhere?


Thank you,

On 2022-02-17 8:46 a.m., Fujii Masao wrote:



On 2022/02/11 21:59, Etsuro Fujita wrote:

I tweaked comments/docs a little bit as well.


Thanks for updating the patches!

I reviewed 0001 patch. It looks good to me except the following minor 
things. If these are addressed, I think that the 001 patch can be 
marked as ready for committer.


+ * Also determine to commit (sub)transactions opened on the 
remote server

+ * in parallel at (sub)transaction end.

Like the comment "Determine whether to keep the connection ...", 
"determine to commit" should be "determine whether to commit"?


"remote server" should be "remote servers"?


+    curlevel = GetCurrentTransactionNestLevel();
+    snprintf(sql, sizeof(sql), "RELEASE SAVEPOINT s%d", curlevel);

Why does pgfdw_finish_pre_subcommit_cleanup() need to call 
GetCurrentTransactionNestLevel() and construct the "RELEASE SAVEPOINT" 
query string again? pgfdw_subxact_callback() already does them and 
probably we can make it pass either of them to 
pgfdw_finish_pre_subcommit_cleanup() as its argument.



+   This option controls whether postgres_fdw 
commits

+   remote (sub)transactions opened on a foreign server in a local
+   (sub)transaction in parallel when the local (sub)transaction 
commits.


"a foreign server" should be "foreign servers"?

"in a local (sub)transaction" part seems not to be necessary.

Regards,


--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: Synchronizing slots from primary to standby

2022-02-18 Thread James Coleman
On Fri, Feb 11, 2022 at 9:26 AM Peter Eisentraut
 wrote:
>
> The way I understand it:
>
> 1. This feature (probably) depends on the "Minimal logical decoding on
> standbys" patch.  The details there aren't totally clear (to me).  That
> patch had some activity lately but I don't see it in a state that it's
> nearing readiness.
>
> 2. I think the way this (my) patch is currently written needs some
> refactoring about how we launch and manage workers.  Right now, it's all
> mangled together with logical replication, since that is a convenient
> way to launch and manage workers, but it really doesn't need to be tied
> to logical replication, since it can also be used for other logical slots.
>
> 3. It's an open question how to configure this.  My patch show a very
> minimal configuration that allows you to keep all logical slots always
> behind one physical slot, which addresses one particular use case.  In
> general, you might have things like, one set of logical slots should
> stay behind one physical slot, another set behind another physical slot,
> another set should not care, etc.  This could turn into something like
> the synchronous replication feature, where it ends up with its own
> configuration language.
>
> Each of these are clearly significant jobs on their own.
>

Thanks for bringing this topic back up again.

I haven't gotten a chance to do any testing on the patch yet, but here
are my initial notes from reviewing it:

First, reusing the logical replication launcher seems a bit gross.
It's obviously a pragmatic choice, but I find it confusing and likely
to become only moreso given the fact there's nothing about slot
syncing that's inherently limited to logical slots. Plus the feature
currently is about syncing slots on a physical replica. So I think
that probably should change.

Second, it seems to me that the worker-per-DB architecture means that
this is unworkable on a cluster with a large number of DBs. The
original thread said that was because "logical slots are per database,
walrcv_exec needs db connection, etc". As to the walrcv_exec, we're
(re)connecting to the primary for each synchronization anyway, so that
doesn't seem like a significant reason. I don't understand why logical
slots being per-database means we have to do it this way. Is there
something about the background worker architecture (I'm revealing my
own ignorance here I suppose) that requires this?

Also it seems that we reconnect to the primary every time we want to
synchronize slots. Maybe that's OK, but it struck me as a bit odd, so
I wanted to ask about it.

Third, wait_for_standby_confirmation() needs a function comment.
Andres noted this earlier, but it seems like we're doing quite a bit
of work in this function for each commit. Some of it is obviously
duplicative like the parsing of standby_slot_names. The waiting
introduced also doesn't seem like a good idea. Andres also commented
on that earlier; I'd echo his comments here absent an explanation of
why it's preferable/necessary to do it this way.

> + if (flush_pos >= commit_lsn && wait_slots_remaining > 0)
> + wait_slots_remaining --;

I might be missing something re: project style, but the space before
the "--" looks odd to my eyes.

>* Call either PREPARE (for two-phase transactions) or COMMIT (for
>* regular ones).
>*/
> +
> + wait_for_standby_confirmation(commit_lsn);
> +
>   if (rbtxn_prepared(txn))
>   rb->prepare(rb, txn, commit_lsn);

>  else

It appears the addition of this call splits the comment from the code
it goes with.

> + * Wait for remote slot to pass localy reserved position.

Typo ("localy" -> "locally").

This patch would be a significant improvement for us; I'm hoping we
can see some activity on it. I'm also hoping to try to do some testing
next week and see if I can poke any holes in the functionality (with
the goal of verifying Andres's concerns about the safety without the
minimal logical decoding on a replica patch).

Thanks,
James Coleman




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-18 Thread Peter Geoghegan
On Fri, Feb 18, 2022 at 12:54 PM Robert Haas  wrote:
> I'd like to have a clearer idea of exactly what is in each of the
> remaining patches before forming a final opinion.

Great.

> What's tricky about 0001? Does it change any other behavior, either as
> a necessary component of advancing relfrozenxid more eagerly, or
> otherwise?

It does not change any other behavior. It's totally mechanical.

0001 is tricky in the sense that there are a lot of fine details, and
if you get any one of them wrong the result might be a subtle bug. For
example, the heap_tuple_needs_freeze() code path is only used when we
cannot get a cleanup lock, which is rare -- and some of the branches
within the function are relatively rare themselves. The obvious
concern is: What if some detail of how we track the new relfrozenxid
value (and new relminmxid value) in this seldom-hit codepath is just
wrong, in whatever way we didn't think of?

On the other hand, we must already be precise in almost the same way
within heap_tuple_needs_freeze() today -- it's not all that different
(we currently need to avoid leaving any XIDs < FreezeLimit behind,
which isn't made that less complicated by the fact that it's a static
XID cutoff). Plus, we have experience with bugs like this. There was
hardening added to catch stuff like this back in 2017, following the
"freeze the dead" bug.

> If there's a way you can make the precise contents of 0002 and 0003
> more clear, I would like that, too.

The really big one is 0002 -- even 0003 (the FSM PageIsAllVisible()
thing) wasn't on the table before now. 0002 is the patch that changes
the basic criteria for freezing, making it block-based rather than
based on the FreezeLimit cutoff (barring edge cases that are important
for correctness, but shouldn't noticeably affect freezing overhead).

The single biggest practical improvement from 0002 is that it
eliminates what I've called the freeze cliff, which is where many old
tuples (much older than FreezeLimit/vacuum_freeze_min_age) must be
frozen all at once, in a balloon payment during an eventual aggressive
VACUUM. Although it's easy to see that that could be useful, it is
harder to justify (much harder) than anything else. Because we're
freezing more eagerly overall, we're also bound to do more freezing
without benefit in certain cases. Although I think that this can be
justified as the cost of doing business, that's a hard argument to
make.

In short, 0001 is mechanically tricky, but easy to understand at a
high level. Whereas 0002 is mechanically simple, but tricky to
understand at a high level (and therefore far trickier than 0001
overall).

-- 
Peter Geoghegan




Re: Time to drop plpython2?

2022-02-18 Thread Joe Conway

On 2/18/22 15:53, Andres Freund wrote:

the next run succeeded, with 'PYTHON' => 'python3' in build env. But
presumably this just was because you installed the python3-devel package?



Ok, I guess I got confused when it failed due to the missing devel 
package, because I removed the PYTHON => 'python3' from the build env 
and it is still getting successfully past the configure stage. Sorry for 
the noise.


Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: Time to drop plpython2?

2022-02-18 Thread Alvaro Herrera
On 2022-Feb-17, Andres Freund wrote:

> Now also pinged:
> - guaibasaurus

Fixed now (apt install python3-dev), but I had initially added
PYTHON=>python3 to the .conf, unsuccessfully because I failed to install
the dev pkg.  After the first success I removed that line.  It should
still work if we do test python3 first, but if it does fail, then I'll
put that line back.

Thanks

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"The problem with the future is that it keeps turning into the present"
(Hobbes)




Re: Small TAP tests cleanup for Windows and unused modules

2022-02-18 Thread Daniel Gustafsson
> On 16 Feb 2022, at 23:36, Andrew Dunstan  wrote:
> On 2/16/22 17:04, Andrew Dunstan wrote:
>> On 2/16/22 16:36, Daniel Gustafsson wrote:
>>> Seeing msys in TAP tests mentioned in a thread [1] tonight reminded me about
>>> two related (well, one of them) patches I had sitting around, so rather than
>>> forgetting again here are some small cleanups.
>>> 
>>> 0001 attempts to streamline how we detect Windows in the TAP tests (after 
>>> that
>>> there is a single msys check left that I'm not sure about, but [1] seems to
>>> imply it could go); 0002 removes some unused module includes which either 
>>> were
>>> used at some point in the past or likely came from copy/paste.
>>> 
>> 0002 looks OK at first glance.
>> 
>> 0001 is something we should investigate. It's really going in the wrong
>> direction I suspect. We should be looking to narrow the scope of these
>> platform-specific bits of processing, not expand them.
> 
> More specifically, all the tests in question are now passing on jacana
> and fairywren, and their $Config{osname} is NOT 'msys' (it's 'MSWin32').

+1

> So we should simply remove any line that ends "if $Config{osname} eq
> 'msys';" since we don't have any such animals any more.

Since msys is discussed in the other thread let's drop the 0001 from this
thread and just go ahead with 0002.

--
Daniel Gustafsson   https://vmware.com/





Re: Trap errors from streaming child in pg_basebackup to exit early

2022-02-18 Thread Daniel Gustafsson
> On 16 Feb 2022, at 08:27, Michael Paquier  wrote:
> 
> On Wed, Sep 29, 2021 at 01:18:40PM +0200, Daniel Gustafsson wrote:
>> So there is one mention of a background WAL receiver already in there, but 
>> it's
>> pretty inconsistent as to what we call it.  For now I've changed the 
>> messaging
>> in this patch to say "background process", leaving making this all consistent
>> for a follow-up patch.
>> 
>> The attached fixes the above, as well as the typo mentioned off-list and is
>> rebased on top of todays HEAD.
> 
> I have been looking a bit at this patch, and did some tests on Windows
> to find out that this is able to catch the failure of the thread
> streaming the WAL segments in pg_basebackup, avoiding a completion of
> the base backup, while HEAD waits until the backup finishes.  Testing
> this scenario is actually simple by issuing pg_terminate_backend() on
> the WAL sender that streams the WAL with START_REPLICATION, while
> throttling the base backup.

Great, thanks!

> Could you add a test to automate this scenario?  As far as I can see,
> something like the following should be stable even for Windows:
> 1) Run a pg_basebackup in the background with IPC::Run, using
> --max-rate with a minimal value to slow down the base backup, for slow
> machines.  013_crash_restart.pl does that as one example with $killme.
> 2) Find out the WAL sender doing START_REPLICATION in the backend, and
> issue pg_terminate_backend() on it.
> 3) Use a variant of pump_until() on the pg_basebackup process and
> check after one or more failure patterns.  We should refactor this
> part, actually.  If this new test uses the same logic, that would make
> three tests doing that with 022_crash_temp_files.pl and
> 013_crash_restart.pl.  The CI should be fine to provide any feedback
> with the test in place, though I am fine to test things also in my
> box.

This is good idea, I was going in a different direction earlier with a test but
this is cleaner.  The attached 0001 refactors pump_until; 0002 fixes a trivial
spelling error found while hacking; and 0003 is the previous patch complete
with a test that passes on Cirrus CI.

--
Daniel Gustafsson   https://vmware.com/



v5-0001-Add-function-to-pump-IPC-process-until-string-mat.patch
Description: Binary data


v5-0002-Remove-duplicated-word-in-comment.patch
Description: Binary data


v5-0003-Quick-exit-on-log-stream-child-exit-in-pg_basebac.patch
Description: Binary data


Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-02-18 Thread Nathan Bossart
On Fri, Feb 18, 2022 at 10:48:10PM +0530, Ashutosh Sharma wrote:
> Some review comments on the latest version:

Thanks for the feedback!  Before I start spending more time on this one, I
should probably ask if this has any chance of making it into v15.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-18 Thread Robert Haas
On Fri, Feb 18, 2022 at 3:41 PM Peter Geoghegan  wrote:
> Concerns about my general approach to this project (and even the
> Postgres 14 VACUUM work) were expressed by Robert and Andres over on
> the "Nonrandom scanned_pages distorts pg_class.reltuples set by
> VACUUM" thread. Some of what was said honestly shocked me. It now
> seems unwise to pursue this project on my original timeline. I even
> thought about shelving it indefinitely (which is still on the table).
>
> I propose the following compromise: the least contentious patch alone
> will be in scope for Postgres 15, while the other patches will not be.
> I'm referring to the first patch from v8, which adds dynamic tracking
> of the oldest extant XID in each heap table, in order to be able to
> use it as our new relfrozenxid. I can't imagine that I'll have
> difficulty convincing Andres of the merits of this idea, for one,
> since it was his idea in the first place. It makes a lot of sense,
> independent of any change to how and when we freeze.
>
> The first patch is tricky, but at least it won't require elaborate
> performance validation. It doesn't change any of the basic performance
> characteristics of VACUUM. It sometimes allows us to advance
> relfrozenxid to a value beyond FreezeLimit (typically only possible in
> an aggressive VACUUM), which is an intrinsic good. If it isn't
> effective then the overhead seems very unlikely to be noticeable. It's
> pretty much a strictly additive improvement.
>
> Are there any objections to this plan?

I really like the idea of reducing the scope of what is being changed
here, and I agree that eagerly advancing relfrozenxid carries much
less risk than the other changes.

I'd like to have a clearer idea of exactly what is in each of the
remaining patches before forming a final opinion.

What's tricky about 0001? Does it change any other behavior, either as
a necessary component of advancing relfrozenxid more eagerly, or
otherwise?

If there's a way you can make the precise contents of 0002 and 0003
more clear, I would like that, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Time to drop plpython2?

2022-02-18 Thread Andres Freund
Hi,

On 2022-02-18 15:35:37 -0500, Joe Conway wrote:
> Initially I just installed the python3 RPMs and when I tried running
> manually it was still error'ing on configure due to finding python2.

> Even after adding EXPORT PYTHON=python3 to my ~/.bash_profile I was seeing
> the same.
>
> By adding PYTHON => 'python3' to build-farm.conf I saw that the error
> changed to indicate missing python3-devel package. Once I installed that,
> everything went green.

Hm. It definitely did test python3, earlier today:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2022-02-18%2016%3A52%3A13

checking for python3... no
checking for python... /usr/bin/python


the next run then saw:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2022-02-18%2017%3A50%3A09
checking for PYTHON... python3
configure: using python 3.6.8 (default, Nov 16 2020, 16:55:22)
checking for Python sysconfig module... yes
checking Python configuration directory... 
/usr/lib64/python3.6/config-3.6m-x86_64-linux-gnu
checking Python include directory... -I/usr/include/python3.6m

but then failed because the python headers weren't available:
checking for Python.h... no
configure: error: header file  is required for Python


Note that this did *not* yet use PYTHON => 'python3' in build_env, but has it
in the environment starting the buildfarm client.


the next run succeeded, with 'PYTHON' => 'python3' in build env. But
presumably this just was because you installed the python3-devel package?

Greetings,

Andres Freund




Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-18 Thread Nathan Bossart
> On Thu, Feb 17, 2022 at 03:12:47PM -0800, Andres Freund wrote:
>>> > The improvements around deleting temporary files and serialized snapshots
>>> > afaict don't require a dedicated process - they're only relevant during
>>> > startup. We could use the approach of renaming the directory out of the 
>>> > way as
>>> > done in this patchset but perform the cleanup in the startup process after
>>> > we're up.

BTW I know you don't like the dedicated process approach, but one
improvement to that approach could be to shut down the custodian process
when it has nothing to do.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-18 Thread Peter Geoghegan
On Fri, Feb 11, 2022 at 8:30 PM Peter Geoghegan  wrote:
> Attached is v8. No real changes -- just a rebased version.

Concerns about my general approach to this project (and even the
Postgres 14 VACUUM work) were expressed by Robert and Andres over on
the "Nonrandom scanned_pages distorts pg_class.reltuples set by
VACUUM" thread. Some of what was said honestly shocked me. It now
seems unwise to pursue this project on my original timeline. I even
thought about shelving it indefinitely (which is still on the table).

I propose the following compromise: the least contentious patch alone
will be in scope for Postgres 15, while the other patches will not be.
I'm referring to the first patch from v8, which adds dynamic tracking
of the oldest extant XID in each heap table, in order to be able to
use it as our new relfrozenxid. I can't imagine that I'll have
difficulty convincing Andres of the merits of this idea, for one,
since it was his idea in the first place. It makes a lot of sense,
independent of any change to how and when we freeze.

The first patch is tricky, but at least it won't require elaborate
performance validation. It doesn't change any of the basic performance
characteristics of VACUUM. It sometimes allows us to advance
relfrozenxid to a value beyond FreezeLimit (typically only possible in
an aggressive VACUUM), which is an intrinsic good. If it isn't
effective then the overhead seems very unlikely to be noticeable. It's
pretty much a strictly additive improvement.

Are there any objections to this plan?

--
Peter Geoghegan




Re: Time to drop plpython2?

2022-02-18 Thread Joe Conway

On 2/18/22 15:25, Andres Freund wrote:

On 2022-02-18 14:46:39 -0500, Joe Conway wrote:

$ ll /usr/bin/python
lrwxrwxrwx. 1 root root 7 Mar 13  2021 /usr/bin/python -> python2
8<---


Yea, that all looks fine. What's the problem if you don't specify the
PYTHON=python3? We try python3, python in that order by default, so it should
pick up the same python3 you specify explicitly?


Initially I just installed the python3 RPMs and when I tried running 
manually it was still error'ing on configure due to finding python2.


Even after adding EXPORT PYTHON=python3 to my ~/.bash_profile I was 
seeing the same.


By adding PYTHON => 'python3' to build-farm.conf I saw that the error 
changed to indicate missing python3-devel package. Once I installed 
that, everything went green.


Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-18 Thread Andres Freund
Hi,

On 2022-02-18 17:26:04 +0900, Masahiko Sawada wrote:
> With this change, pg_stat_subscription_workers will be like:
> 
> * subid
> * subname
> * subrelid
> * error_count
> * last_error_timestamp

> This view will be extended by adding transaction statistics proposed
> on another thread[1].

I do not agree with these bits. What's the point of these per-relation stats
at this poitns.  You're just duplicating the normal relation pg_stats here.

I really think we just should drop pg_stat_subscription_workers. Even if we
don't, we definitely should rename it, because it still isn't meaningfully
about workers.


This stuff is getting painful for me. I'm trying to clean up some stuff for
shared memory stats, and this stuff doesn't fit in with the rest. I'll have to
rework some core stuff in the shared memory stats patch to make it work with
this. Just to then quite possibly deal with reverting that part.


Given the degree we're still designing stuff at this point, I think the
appropriate thing is to revert the patch, and then try from there.

Greetings,

Andres Freund




Re: Time to drop plpython2?

2022-02-18 Thread Andres Freund
Hi,

On 2022-02-18 14:46:39 -0500, Joe Conway wrote:
> On 2/18/22 14:37, Andres Freund wrote:
> > > That seems to have worked.
> > > 
> > > But the question is, is that the correct/recommended method?
> > 
> > If python3 is in PATH, then you shouldn't need that part.
> 
> Not quite -- python3 is definitely in the PATH:
> 
> 8<---
> $ which python3
> /usr/bin/python3
> 8<---
> 
> And I gather that merely installing python3 RPMs on a RHEL-esque 7.X system
> does not replace the python symlink:
> 
> 8<---
> $ yum whatprovides /usr/bin/python
> 
> python-2.7.5-89.el7.x86_64 : An interpreted, interactive, object-oriented
> programming language
> Repo: base
> Matched from:
> Filename: /usr/bin/python
> 
> $ ll /usr/bin/python
> lrwxrwxrwx. 1 root root 7 Mar 13  2021 /usr/bin/python -> python2
> 8<---

Yea, that all looks fine. What's the problem if you don't specify the
PYTHON=python3? We try python3, python in that order by default, so it should
pick up the same python3 you specify explicitly?

Or maybe I'm just misunderstanding what you're asking...

Greetings,

Andres Freund




Re: buildfarm warnings

2022-02-18 Thread Robert Haas
On Thu, Feb 17, 2022 at 3:57 PM Robert Haas  wrote:
> True. That would be easy enough.

I played around with this a bit, and of course it is easy enough to
add --progress with or without --verbose to a few tests, but when I
reverted 62cb7427d1e491faf8612a82c2e3711a8cd65422 it didn't make any
tests fail. So then I tried sticking --progress --verbose into
@pg_basebackup_defs so all the tests would run with it, and that did
make some tests fail, but the same ones fail with and without the
patch. So it doesn't seem like we would have caught this particular
problem via this testing method no matter what we did in detail.

If we just want to splatter a few --progress switches around for the
heck of it, we could do something like the attached. But I don't know
if this is best: it seems highly arguable what to do in detail (and
also not worth arguing about, so if someone else feels motivated to do
something different than this, or the same as this, fine by me).

-- 
Robert Haas
EDB: http://www.enterprisedb.com


splatter-some-progress.patch
Description: Binary data


Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-18 Thread David G. Johnston
On Fri, Feb 18, 2022 at 1:26 AM Masahiko Sawada 
wrote:

>
> Here is the summary of the discussion, changes, and plan.
>
> 1. Move some error information such as the error message to a new
> system catalog, pg_subscription_error. The pg_subscription_error table
> would have the following columns:
>
> * sesubid : subscription Oid.
> * serelid : relation Oid (NULL for apply worker).
> * seerrlsn : commit-LSN or the error transaction.
> * seerrcmd : command (INSERT, UPDATE, etc.) of the error transaction.
> * seerrmsg : error message
>

Not a fan of the "se" prefix but overall yes. We should include a timestamp.


> The tuple is inserted or updated when an apply worker or a tablesync
> worker raises an error. If the same error occurs in a row, the update
> is skipped.


I disagree with this - I would treat every new instance of an error to be
important and insert on conflict (sesubid, serelid) the new entry, updating
lsn/cmd/msg with the new values.

The tuple is removed in the following cases:
>
> * the subscription is dropped.
> * the table is dropped.

* the table is removed from the subscription.
> * the worker successfully committed a non-empty transaction.
>

Correct.  This handles the "end of sync worker" just fine since its final
action should be a successful commit of a non-empty transaction.

>
> With this change, pg_stat_subscription_workers will be like:
>
> * subid
> * subname
> * subrelid
> * error_count
> * last_error_timestamp
>
> This view will be extended by adding transaction statistics proposed
> on another thread[1].
>

I haven't reviewed that thread but in-so-far as this one goes I would just
drop this altogether.  The error count, if desired, can be added to
pg_subscription_error, and the timestamp should be added there as noted
above.

If this is useful for the feature on the other thread it can be
reconstituted from there.


> 2. Fix a bug in pg_stat_subscription_workers. As pointed out by
> Andres, there is a bug in pg_stat_subscription_workers; it doesn't
> drop entries for already-dropped tables. We need to fix it.
>

Given the above, this becomes an N/A.


> 3. Show commit-LSN of the error transaction in errcontext. Currently,
> we show XID and commit timestamp in the errcontext. But given that we
> use LSN in pg_subscriptoin_error, it's better to show commit-LSN as
> well (or instead of error-XID).
>

Agreed, I think: what "errcontext" is this referring to?

>
> 5. Record skipped data to the system catalog, say
> pg_subscription_conflict_history so that there is a chance to analyze
> and recover.


We can discuss the
> details in a new thread.
>
Agreed - the "before skipping" consideration seems considerably more
helpful; but wouldn't need to be persistent, it could just be a view.  A
permanent record probably would best be recorded in the logs - though if we
get the pre-skip functionality the user can view directly and save the
results wherever they wish or we can provide a function to spool the
information to the log.  I don't see persistent in-database storage being
that desirable here.

If we only do something after the transaction has been skipped it may be
useful to add an option to the skipping command to auto-disable the
subscription after the transaction has been skipped and before any
subsequent transactions are applied.  This will give the user a chance to
process the post-skipped information before restoring sync and having the
system begin changing underneath them again.


>
> 4 and 5 might be better introduced together but I think since the user
> is able to check what changes will be skipped on the publisher side we
> can do 5 for v16.


How would one go about doing that (checking what changes will be skipped on
the publisher side)?

David J.


Re: Time to drop plpython2?

2022-02-18 Thread Joe Conway

On 2/18/22 14:37, Andres Freund wrote:

Hi,

On 2022-02-18 14:19:49 -0500, Joe Conway wrote:

On 2/17/22 13:08, Andres Freund wrote:
> On 2022-02-16 23:14:46 -0800, Andres Freund wrote:
> > > Done. Curious how red the BF will turn out to be. Let's hope it's not
> > > too bad.

> > - rhinoceros
> 
> Joe replied that he is afk, looking into it tomorrow.


I installed python3 packages (initially forgetting the devel package --
d'oh!) and changed build-farm.conf thusly:

8<---
***
*** 185,190 
--- 185,193 

build_env => {

+   # specify python 3
+   PYTHON => 'python3',
+
# use a dedicated cache for the build farm. this should give
us
# very high hit rates and slightly faster cache searching.
#
8<---

That seems to have worked.

But the question is, is that the correct/recommended method?


If python3 is in PATH, then you shouldn't need that part.


Not quite -- python3 is definitely in the PATH:

8<---
$ which python3
/usr/bin/python3
8<---

And I gather that merely installing python3 RPMs on a RHEL-esque 7.X 
system does not replace the python symlink:


8<---
$ yum whatprovides /usr/bin/python

python-2.7.5-89.el7.x86_64 : An interpreted, interactive, 
object-oriented programming language

Repo: base
Matched from:
Filename: /usr/bin/python

$ ll /usr/bin/python
lrwxrwxrwx. 1 root root 7 Mar 13  2021 /usr/bin/python -> python2
8<---


Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: Time to drop plpython2?

2022-02-18 Thread Andres Freund
Hi,

On 2022-02-18 14:19:49 -0500, Joe Conway wrote:
> On 2/17/22 13:08, Andres Freund wrote:
> > On 2022-02-16 23:14:46 -0800, Andres Freund wrote:
> > > > Done. Curious how red the BF will turn out to be. Let's hope it's not
> > > > too bad.
> 
> > > - rhinoceros
> > 
> > Joe replied that he is afk, looking into it tomorrow.
> 
> I installed python3 packages (initially forgetting the devel package --
> d'oh!) and changed build-farm.conf thusly:
> 
> 8<---
> ***
> *** 185,190 
> --- 185,193 
> 
> build_env => {
> 
> +   # specify python 3
> +   PYTHON => 'python3',
> +
> # use a dedicated cache for the build farm. this should give
> us
> # very high hit rates and slightly faster cache searching.
> #
> 8<---
> 
> That seems to have worked.
> 
> But the question is, is that the correct/recommended method?

If python3 is in PATH, then you shouldn't need that part.

Greetings,

Andres Freund




Re: Time to drop plpython2?

2022-02-18 Thread Joe Conway

On 2/17/22 13:08, Andres Freund wrote:

On 2022-02-16 23:14:46 -0800, Andres Freund wrote:

> Done. Curious how red the BF will turn out to be. Let's hope it's not
> too bad.



- rhinoceros


Joe replied that he is afk, looking into it tomorrow.


I installed python3 packages (initially forgetting the devel package -- 
d'oh!) and changed build-farm.conf thusly:


8<---
***
*** 185,190 
--- 185,193 

build_env => {

+   # specify python 3
+   PYTHON => 'python3',
+
# use a dedicated cache for the build farm. this should 
give us

# very high hit rates and slightly faster cache searching.
#
8<---

That seems to have worked.

But the question is, is that the correct/recommended method?

Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: [PATCH] Add support to table_to_xmlschema regex when timestamp has time zone

2022-02-18 Thread Euler Taveira
On Fri, Feb 18, 2022, at 2:47 PM, Renan Soares Lopes wrote:
> Hello,
> 
> I added a patch to fix table_to_xmlschema, could you point me how to add a 
> unit test to that?
You should edit src/test/regress/expected/xmlmap.out. In this case, you should
also modify src/test/regress/expected/xmlmap_1.out that the output from this
test when you build without libxml support. Run 'make check' to test your fix
after building with/without libxml support.

Regarding this fix, it looks good to me. FWIW, character class escape is
defined here [1].

[1] https://www.w3.org/TR/xmlschema11-2/#cces


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: adding 'zstd' as a compression algorithm

2022-02-18 Thread Robert Haas
On Fri, Feb 18, 2022 at 9:52 AM Robert Haas  wrote:
> So here's a revised patch for zstd support that uses
> AC_CHECK_HEADER(), plus a second patch to change the occurrences above
> to use AC_CHECK_HEADER() and remove all traces of the corresponding
> preprocessor symbol.

And I've now committed the first of these.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

2022-02-18 Thread Robert Haas
On Fri, Feb 18, 2022 at 12:56 AM Andy Fan  wrote:
> What do you think about moving on this feature?  The items known by me
> are: 1).  Make sure the estimation error can be fixed or discuss if my current
> solution is workable.  b).  Just distribute some selectivity restrictinfo to
> RelOptInfo.  c).  See how hard it is to treat the original / derived qual 
> equally.
> d).  Reduce the extra planner cost at much as possible.  Any other important
> items I missed?

I think it's not realistic to do anything here for PostgreSQL 15.
Considering that it's almost the end of February and feature freeze
will probably be in perhaps 5-6 weeks, in order to get something
committed at this point, you would need to have (1) sufficient
consensus on the design, (2) a set of reasonably complete patches
implementing that design at an acceptable level of quality, and (3) a
committer interested in putting in the necessary time to help you get
this over the finish line. As far as I can see, you have none of those
things.  Tom doesn't think we need this at all, and you and I and
Tomas all have somewhat different ideas on what approach we ought to
be taking, and the patches appear to be at a POC level at this point
rather than something that's close to being ready to ship, and no
committer has expressed interest in trying to get them into this
release.

It seems to me that the thing to do here is see if you can build
consensus on an approach. Just saying that we ought to think the
patches you've already got are good enough is not going to get you
anywhere. I do understand that the political element of this problem
is frustrating to you, as it is to many people. But consider the
alternative: suppose the way things worked around here is that any
committer could commit anything they liked without needing the
approval of any other committer, or even over their objections. Well,
it would be chaos. People would be constantly reverting or rewriting
things that other people had done, and everybody would probably be
pissed off at each other all the time, and the quality would go down
the tubes and nobody would use PostgreSQL any more. I'm not saying the
current system is perfect, not at all. It's frustrating as all get out
at times. But the reason it's frustrating is because the PostgreSQL
community is a community of human beings, and there's nothing more
frustrating in the world than the stuff other human beings do.

However, it's equally true that we get further working together than
we would individually. I think Tom is wrong about the merits of doing
something in this area, but I also think he's incredibly smart and
thoughtful and one of the best technologists I've ever met, and
probably just one of the absolute best technologists on Planet Earth.
And I also have to consider, and this is really important, the
possibility that Tom is right about this issue and I am wrong. So far
Tom hasn't replied to what I wrote, but I hope he does. Maybe he'll
admit that I have some valid points. Maybe he'll tell me why he thinks
I'm wrong. Maybe I'll learn about some problem that I haven't
considered from his response, and maybe that will lead to a refinement
of the idea that will make it better. I don't know, but it's certainly
happened in plenty of other cases. And that's how PostgreSQL gets to
be this pretty amazing database that it is. So, yeah, building
consensus is frustrating and it takes a long time and sometimes it
feels like other people are obstructing you needlessly and sometimes
that's probably true. But there's not a realistic alternative. Nobody
here is smart enough to create software that is as good as what all of
us create together.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: USE_BARRIER_SMGRRELEASE on Linux?

2022-02-18 Thread Nathan Bossart
On Wed, Feb 16, 2022 at 01:00:53PM -0800, Nathan Bossart wrote:
> On Wed, Feb 16, 2022 at 11:27:31AM -0800, Andres Freund wrote:
>> That doesn't strike me as great architecturally. E.g. in theory the same
>> problem could exist in single user mode. I think it doesn't today, because
>> RegisterSyncRequest() will effectively "absorb" it immediately, but that kind
>> of feels like an implementation detail?
> 
> Yeah, maybe that is a reason to add an absorb somewhere within
> CreateCheckPoint() instead, like v1 [0] does.  Then the extra absorb would
> be sufficient for single-user mode if the requests were not absorbed
> immediately.

Here is a new patch.  This is essentially the same as v1, but I've spruced
up the comments and the commit message.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From adf7f7c9717897cc72a3c76d90ad0db082a7a432 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 15 Feb 2022 22:53:04 -0800
Subject: [PATCH v3 1/1] Fix race condition between DROP TABLESPACE and
 checkpointing.

Commands like ALTER TABLE SET TABLESPACE may leave files for the next
checkpoint to clean up.  If such files are not removed by the time DROP
TABLESPACE is called, we request a checkpoint so that they are deleted.
However, there is presently a window before checkpoint start where new unlink
requests won't be scheduled until the following checkpoint.  This means that
the checkpoint forced by DROP TABLESPACE might not remove the files we expect
it to remove, and the following ERROR will be emitted:

	ERROR:  tablespace "mytblspc" is not empty

To fix, add a call to AbsorbSyncRequests() just before advancing the unlink
cycle counter.  This ensures that any unlink requests forwarded prior to
checkpoint start (i.e., when ckpt_started is incremented) will be processed by
the current checkpoint.  Since AbsorbSyncRequests() performs memory
allocations, it cannot be called within a critical section, so we also need to
move SyncPreCheckpoint() to before CreateCheckPoint()'s critical section.

This is an old bug, so back-patch to all supported versions.

Author: Nathan Bossart
Reported-by: Nathan Bossart
Diagnosed-by: Thomas Munro, Nathan Bossart
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/20220215235845.GA2665318%40nathanxps13
---
 src/backend/access/transam/xlog.c | 15 ---
 src/backend/storage/sync/sync.c   | 14 +-
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ce78ac413e..174aab5ae4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6294,6 +6294,14 @@ CreateCheckPoint(int flags)
 	MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
 	CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
 
+	/*
+	 * Let smgr prepare for checkpoint; this has to happen outside the critical
+	 * section and before we determine the REDO pointer.  Note that smgr must
+	 * not do anything that'd have to be undone if we decide no checkpoint is
+	 * needed.
+	 */
+	SyncPreCheckpoint();
+
 	/*
 	 * Use a critical section to force system panic if we have trouble.
 	 */
@@ -6307,13 +6315,6 @@ CreateCheckPoint(int flags)
 		LWLockRelease(ControlFileLock);
 	}
 
-	/*
-	 * Let smgr prepare for checkpoint; this has to happen before we determine
-	 * the REDO pointer.  Note that smgr must not do anything that'd have to
-	 * be undone if we decide no checkpoint is needed.
-	 */
-	SyncPreCheckpoint();
-
 	/* Begin filling in the checkpoint WAL record */
 	MemSet(&checkPoint, 0, sizeof(checkPoint));
 	checkPoint.time = (pg_time_t) time(NULL);
diff --git a/src/backend/storage/sync/sync.c b/src/backend/storage/sync/sync.c
index e161d57761..f16b25faa1 100644
--- a/src/backend/storage/sync/sync.c
+++ b/src/backend/storage/sync/sync.c
@@ -173,7 +173,9 @@ InitSync(void)
  * counter is incremented here.
  *
  * This must be called *before* the checkpoint REDO point is determined.
- * That ensures that we won't delete files too soon.
+ * That ensures that we won't delete files too soon.  Since this calls
+ * AbsorbSyncRequests(), which performs memory allocations, it cannot be
+ * called within a critical section.
  *
  * Note that we can't do anything here that depends on the assumption
  * that the checkpoint will be completed.
@@ -181,6 +183,16 @@ InitSync(void)
 void
 SyncPreCheckpoint(void)
 {
+	/*
+	 * Operations such as DROP TABLESPACE assume that the next checkpoint will
+	 * process all recently forwarded unlink requests, but if they aren't
+	 * absorbed prior to advancing the cycle counter, they won't be processed
+	 * until a future checkpoint.  The following absorb ensures that any unlink
+	 * requests forwarded before the checkpoint began will be processed in the
+	 * current checkpoint.
+	 */
+	AbsorbSyncRequests();
+
 	/*
 	 * Any unlink requests arriving after this point will be assigned the next
 	 * cycle counter, and won't be unlinked until next chec

[PATCH] Add support to table_to_xmlschema regex when timestamp has time zone

2022-02-18 Thread Renan Soares Lopes
Hello,I added a patch to fix table_to_xmlschema, could you point me how to add a unit test to that?From 24768b689638fc35edcdb378735ae2505315a151 Mon Sep 17 00:00:00 2001
From: Renan Lopes 
Date: Fri, 18 Feb 2022 14:36:30 -0300
Subject: [PATCH] fix: table_to_xmlschema regex for timestamp with time zone

---
 src/backend/utils/adt/xml.c | 4 ++--
 src/test/regress/sql/xmlmap.sql | 7 +++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 51773db..801ad8f 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3659,7 +3659,7 @@ map_sql_type_to_xmlschema_type(Oid typeoid, int typmod)
 			case TIMEOID:
 			case TIMETZOID:
 {
-	const char *tz = (typeoid == TIMETZOID ? "(+|-)\\p{Nd}{2}:\\p{Nd}{2}" : "");
+	const char *tz = (typeoid == TIMETZOID ? "(\\+|-)\\p{Nd}{2}:\\p{Nd}{2}" : "");
 
 	if (typmod == -1)
 		appendStringInfo(&result,
@@ -3682,7 +3682,7 @@ map_sql_type_to_xmlschema_type(Oid typeoid, int typmod)
 			case TIMESTAMPOID:
 			case TIMESTAMPTZOID:
 {
-	const char *tz = (typeoid == TIMESTAMPTZOID ? "(+|-)\\p{Nd}{2}:\\p{Nd}{2}" : "");
+	const char *tz = (typeoid == TIMESTAMPTZOID ? "(\\+|-)\\p{Nd}{2}:\\p{Nd}{2}" : "");
 
 	if (typmod == -1)
 		appendStringInfo(&result,
diff --git a/src/test/regress/sql/xmlmap.sql b/src/test/regress/sql/xmlmap.sql
index fde1b9e..13407ee 100644
--- a/src/test/regress/sql/xmlmap.sql
+++ b/src/test/regress/sql/xmlmap.sql
@@ -55,3 +55,10 @@ CREATE TABLE testxmlschema.test3
 
 SELECT xmlforest(c1, c2, c3, c4) FROM testxmlschema.test3;
 SELECT table_to_xml('testxmlschema.test3', true, true, '');
+
+
+-- test datetime with time zone
+
+CREATE TABLE testxmlschema.test4 (a timestamp with time zone default current_timestamp);
+SELECT table_to_xmlschema('testxmlschema.test4', true, true, '');
+
-- 
2.35.1



Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-02-18 Thread Ashutosh Sharma
Some review comments on the latest version:

+* runningBackups is a counter indicating the number of backups currently in
+* progress. forcePageWrites is set to true when either of these is
+* non-zero. lastBackupStart is the latest checkpoint redo location used as
+* a starting point for an online backup.
 */
-   ExclusiveBackupState exclusiveBackupState;
-   int nonExclusiveBackups;

What do you mean by "either of these is non-zero ''. Earlier we used
to set forcePageWrites in case of both exclusive and non-exclusive
backups, but we have just one type of backup now.

==

-* OK to update backup counters, forcePageWrites and session-level lock.
+* OK to update backup counters and forcePageWrites.
 *

We still update the status of session-level lock so I don't think we
should update the above comment. See below code:

if (XLogCtl->Insert.runningBackups == 0)
{
XLogCtl->Insert.forcePageWrites = false;
}

/*
 * Clean up session-level lock.
 *
 * You might think that WALInsertLockRelease() can be called before
 * cleaning up session-level lock because session-level lock doesn't need
 * to be protected with WAL insertion lock. But since
 * CHECK_FOR_INTERRUPTS() can occur in it, session-level lock must be
 * cleaned up before it.
 */
sessionBackupState = SESSION_BACKUP_NONE;

WALInsertLockRelease();

==

@@ -8993,18 +8686,16 @@ do_pg_abort_backup(int code, Datum arg)
boolemit_warning = DatumGetBool(arg);

/*
-* Quick exit if session is not keeping around a non-exclusive backup
-* already started.
+* Quick exit if session does not have a running backup.
 */
-   if (sessionBackupState != SESSION_BACKUP_NON_EXCLUSIVE)
+   if (sessionBackupState != SESSION_BACKUP_RUNNING)
return;

WALInsertLockAcquireExclusive();
-   Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
-   XLogCtl->Insert.nonExclusiveBackups--;
+   Assert(XLogCtl->Insert.runningBackups > 0);
+   XLogCtl->Insert.runningBackups--;

-   if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&
-   XLogCtl->Insert.nonExclusiveBackups == 0)
+   if (XLogCtl->Insert.runningBackups == 0)
{
XLogCtl->Insert.forcePageWrites = false;
}

I think we have a lot of common code in do_pg_abort_backup() and
pg_do_stop_backup(). So why not have a common function that can be
called from both these functions.

==

+# Now delete the bogus backup_label file since it will interfere with startup
+unlink("$pgdata/backup_label")
+  or BAIL_OUT("unable to unlink $pgdata/backup_label");
+

Why do we need this additional change? Earlier this was not required.

--
With Regards,
Ashutosh Sharma.

On Thu, Feb 17, 2022 at 6:41 AM Nathan Bossart  wrote:
>
> Here is a rebased patch.
>
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com




Re: Per-table storage parameters for TableAM/IndexAM extensions

2022-02-18 Thread Simon Riggs
On Thu, 17 Feb 2022 at 17:55, Sadhuprasad Patro  wrote:
>
> > On Sat, Feb 12, 2022 at 2:35 AM Robert Haas  wrote:
> >>
> >>
> >> Imagine that I am using the "foo" tableam with "compression=lots" and
> >> I want to switch to the "bar" AM which does not support that option.
> >> If I remove the "compression=lots" option using a separate command,
> >> the "foo" table AM may rewrite my whole table and decompress
> >> everything. Then when I convert to the "bar" AM it's going to have to
> >> be rewritten again. That's painful. I clearly need some way to switch
> >> AMs without having to rewrite the table twice.
> >
> Agreed. Better to avoid multiple rewrites here. Thank you for figuring out 
> this.
>
>
> > You'd need to be able to do multiple things with one command e.g.
>
> > ALTER TABLE mytab SET ACCESS METHOD baz RESET compression, SET
> > preferred_fruit = 'banana';
>
> +1
> Silently dropping some options is not right and it may confuse users
> too. So I would like to go
> for the command you have suggested, where the user should be able to
> SET & RESET multiple
> options in a single command for an object.

I prefer ADD/DROP to SET/RESET. The latter doesn't convey the meaning
accurately to me.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-18 Thread Nathan Bossart
On Thu, Feb 17, 2022 at 03:12:47PM -0800, Andres Freund wrote:
>> > The improvements around deleting temporary files and serialized snapshots
>> > afaict don't require a dedicated process - they're only relevant during
>> > startup. We could use the approach of renaming the directory out of the 
>> > way as
>> > done in this patchset but perform the cleanup in the startup process after
>> > we're up.
>> 
>> Perhaps this is a good place to start.  As I mentioned above, IME the
>> temporary file cleanup is the most common problem, so I think even getting
>> that one fixed would be a huge improvement.
> 
> Cool.

Hm.  How should this work for standbys?  I can think of the following
options:
1. Do temporary file cleanup in the postmaster (as it does today).
2. Pause after allowing connections to clean up temporary files.
3. Do small amounts of temporary file cleanup whenever there is an
   opportunity during recovery.
4. Wait until recovery completes before cleaning up temporary files.

I'm not too thrilled about any of these options.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-02-18 Thread Robert Haas
On Thu, Feb 17, 2022 at 6:39 PM Andres Freund  wrote:
> > The only other kind of hazard I can think of is: could it be unsafe to
> > try to interpret the contents of a database to which no one else is
> > connected at present due to any of the issues you mention? But what's
> > the hazard exactly?
>
> I don't quite know. But I don't think it's impossible to run into trouble in
> this area. E.g. xid horizons are computed in a database specific way. If the
> routine reading pg_class did hot pruning, you could end up removing data
> that's actually needed for a logical slot in the other database because the
> backend local horizon state was computed for the "local" database?

Yeah, but it doesn't -- and shouldn't. There's no HeapScanDesc here,
so we can't accidentally wander into heap_page_prune_opt(). We should
document the things we're thinking about here in the comments to
prevent future mistakes, but I think for the moment we are OK.

> Could there be problems because other backends wouldn't see the backend
> accessing the other database as being connected to that database
> (PGPROC->databaseId)?

I think that if there's any hazard here, it must be related to
snapshots, which brings us to your next point:

> Or what if somebody optimized snapshots to disregard readonly transactions in
> other databases?

So there are two related questions here. One is whether the snapshot
that we're using to access the template database can be unsafe, and
the other is whether the read-only access that we're performing inside
the template database could mess up somebody else's snapshot. Let's
deal with the second point first: nobody else knows that we're reading
from the template database, and nobody else is reading from the
template database except possibly for someone who is doing exactly
what we're doing. Therefore, I think this hazard can be ruled out.

On the first point, a key point in my opinion is that there can be no
in-flight transactions in the template database, because nobody is
connected to it, and prepared transactions in a template database are
verboten. It therefore can't matter if we include too few XIDs in our
snapshot, or if our xmin is too new. The reverse case can matter,
though: if the xmin of our snapshot were too old, or if we had extra
XIDs in our snapshot, then we might think that a transaction is still
in progress when it isn't. Therefore, I think the patch is wrong to
use GetActiveSnapshot() and must use GetLatestSnapshot() *after* it's
finished making sure that nobody is using the template database. I
don't think there's a hazard beyond that, though. Let's consider the
two ways in which things could go wrong:

1. Extra XIDs in the snapshot. Any current or future optimization of
snapshots would presumably be trying to make them smaller by removing
XIDs from the snapshot, not making them bigger by adding XIDs to the
snapshot. I guess in theory you can imagine an optimization that tests
for the presence of XIDs by some method other than scanning through an
array, and which feels free to add XIDs to the snapshot if they "can't
matter," but I think it's up to the author of that hypothetical future
patch to make sure they don't break anything in so doing -- especially
because it's entirely possible for our session to see XIDs used by a
backend in some other database, because they could show up in shared
catalogs. I think that's why, as far as I can tell, we only use the
database ID when setting pruning thresholds, and not for snapshots.

2. xmin of snapshot too new. There are no in-progress transactions in
the template database, so how can this even happen? If we set the xmin
"in the future," then we could get confused about what's visible due
to wraparound, but that seems crazy. I don't see how there can be a
problem here.

> > It can't be a problem if we've failed to process sinval messages for the
> > target database, because we're not using any caches anyway.
>
> Could you end up with an outdated relmap entry? Probably not.

Again, we're not relying on caching -- we read the file.

> > We can't. It can't be unsafe to test visibility of XIDs for that database,
> > because in an alternate universe some backend could have connected to that
> > database and seen the same XIDs.
>
> That's a weak argument, because in that alternative universe a PGPROC entry
> with the PGPROC->databaseId = template_databases_oid would exist.

So what? As I also argue above, I don't think that affects snapshot
generation, and if it did it wouldn't matter anyway, because it could
only remove in-progress transactions from the snapshot, and there
aren't any in the template database anyhow.

Another way of looking at this is: we could just as well use
SnapshotSelf or (if it still existed) SnapshotNow to test visibility.
In a world where there are no transactions in flight, it's the same
thing. In fact, maybe we should do it that way, just to make it
clearer what's happening.

I think these are really good questions you are 

Re: Timeout control within tests

2022-02-18 Thread Tom Lane
Noah Misch  writes:
> On Thu, Feb 17, 2022 at 09:48:25PM -0800, Andres Freund wrote:
>> Meson's test runner has the concept of a "timeout multiplier" for ways of
>> running tests. Meson's stuff is about entire tests (i.e. one tap test), so
>> doesn't apply here, but I wonder if we shouldn't do something similar?

> Hmmm.  It is good if the user can express an intent that continues to make
> sense if we change the default timeout.  For the buildfarm use case, a
> multiplier is moderately better on that axis (PG_TEST_TIMEOUT_MULTIPLIER=100
> beats PG_TEST_TIMEOUT_DEFAULT=18000).  For the hacker use case, an absolute
> value is substantially better on that axis (PG_TEST_TIMEOUT_DEFAULT=3 beats
> PG_TEST_TIMEOUT_MULTIPLIER=.01).

FWIW, I'm fairly sure that PGISOLATIONTIMEOUT=300 was selected after
finding that smaller values didn't work reliably in the buildfarm.
Now maybe 741d7f1 fixed that, but I wouldn't count on it.  So while I
approve of the idea to remove PGISOLATIONTIMEOUT in favor of using this
centralized setting, I think that we might need to have a multiplier
there, or else we'll end up with PG_TEST_TIMEOUT_DEFAULT set to 300
across the board.  Perhaps the latter is fine, but a multiplier seems a
bit more flexible.

On the other hand, I also support your point that an absolute setting
is easier to think about / adjust for special uses.  So maybe we should
just KISS and use a single absolute setting until we find a hard reason
why that doesn't work well.

regards, tom lane




Re: Synchronizing slots from primary to standby

2022-02-18 Thread James Coleman
On Fri, Feb 11, 2022 at 9:26 AM Peter Eisentraut
 wrote:
>
> On 10.02.22 22:47, Bruce Momjian wrote:
> > I would love to see this feature in PG 15.  Can someone explain its
> > current status?  Thanks.
>
> The way I understand it:
> ...

Hi Peter,

I'm starting to review this patch, and last time I checked I noticed
it didn't seem to apply cleanly to master anymore. Would you be able
to send a rebased version?

Thanks,
James Coleman




RE: Failed transaction statistics to measure the logical replication progress

2022-02-18 Thread osumi.takami...@fujitsu.com
On Friday, February 18, 2022 3:34 PM Tang, Haiying/唐 海英 
 wrote:
> On Wed, Jan 12, 2022 8:35 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > The attached v21 has a couple of other minor updates like a
> > modification of error message text.
> >
> >
> 
> Thanks for updating the patch. Here are some comments.
Thank you for your reivew !



> 1) I saw the following description about pg_stat_subscription_workers view in
> existing doc:
> 
>The pg_stat_subscription_workers view will
> contain
>one row per subscription worker on which errors have occurred, ...
> 
> It only says  "which errors have occurred", maybe we should also mention
> transactions here, right?
I wrote about this statistics in the next line but as you pointed out,
separating the description into two sentences wasn't good idea.
Fixed.



> 2)
> /* --
> + * pgstat_send_subworker_xact_stats() -
> + *
> + *   Send a subworker's transaction stats to the collector.
> + *   The statistics are cleared upon return.
> 
> Should "The statistics are cleared upon return" changed to "The statistics are
> cleared upon sending"? Because if it doesn't reach PGSTAT_STAT_INTERVAL
> and the transaction stats are not sent, the function will return without 
> clearing
> out statistics.
Now, the purpose of this function has become purely
to send a message and whenever it's called, the function
clears the saved stats. So, I skipped this comments now.


> 3)
> + Assert(command == LOGICAL_REP_MSG_COMMIT ||
> +command == LOGICAL_REP_MSG_STREAM_COMMIT ||
> +command == LOGICAL_REP_MSG_COMMIT_PREPARED
> ||
> +command ==
> LOGICAL_REP_MSG_ROLLBACK_PREPARED);
> +
> + switch (command)
> + {
> + case LOGICAL_REP_MSG_COMMIT:
> + case LOGICAL_REP_MSG_STREAM_COMMIT:
> + case LOGICAL_REP_MSG_COMMIT_PREPARED:
> + MyLogicalRepWorker->commit_count++;
> + break;
> + case LOGICAL_REP_MSG_ROLLBACK_PREPARED:
> + MyLogicalRepWorker->abort_count++;
> + break;
> + default:
> + ereport(ERROR,
> + errmsg("invalid logical message type
> for transaction statistics of subscription"));
> + break;
> + }
> 
> I'm not sure that do we need the assert, because it will report an error 
> later if
> command is an invalid value.
The Assert has been removed, along with the switch branches now.
Since there was an adivce that we should call this from 
apply_handle_commit_internal
and from that function, if we don't want to change this function's argument,
all we need to do is to pass a boolean value that indicates the stats is
commit_count or abort_count. Kindly have a look at the updated version.


> 4) I noticed that the abort_count doesn't include aborted streaming
> transactions.
> Should we take this case into consideration?
Hmm, we can add this into this column, when there's no objection.
I'm not sure but someone might say those should be separate columns.

The new patch v22 is shared in [2].

[2] - 
https://www.postgresql.org/message-id/TYCPR01MB83737C689F8F310C87C19C1EED379%40TYCPR01MB8373.jpnprd01.prod.outlook.com


Best Regards,
Takamichi Osumi



Re: [PATCH] Add reloption for views to enable RLS

2022-02-18 Thread Laurenz Albe
On Tue, 2022-02-15 at 16:32 +0100, walt...@technowledgy.de wrote:
> > "check_permissions_as_owner" is ok with me, but a bit long.
> 
> check_permissions_as_owner is exactly what happens. The additional "as" 
> shouldn't be a problem in length - but is much better to read. I 
> wouldn't associate that with CHECK OPTION either. +1

Here is a new version, with improved documentation and the option renamed
to "check_permissions_owner".  I just prefer the shorter form.

Yours,
Laurenz Albe
From e31ea3de2838dcfdc8c364fc08e54e5d37f00882 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Fri, 18 Feb 2022 15:53:06 +0100
Subject: [PATCH] Add new boolean reloption "check_permissions_owner" to views

When this reloption is set to "false", all permissions on the underlying
relations will be checked against the invoking user rather than the view
owner.  The latter remains the default behavior.

Author: Christoph Heiss 
Reviewed-By: Laurenz Albe, Wolfgang Walther
Discussion: https://postgr.es/m/b66dd6d6-ad3e-c6f2-8b90-47be773da240%40cybertec.at
---
 doc/src/sgml/ref/alter_view.sgml  | 13 -
 doc/src/sgml/ref/create_view.sgml | 68 ++-
 src/backend/access/common/reloptions.c| 11 
 src/backend/nodes/copyfuncs.c |  1 +
 src/backend/nodes/equalfuncs.c|  1 +
 src/backend/nodes/outfuncs.c  |  1 +
 src/backend/nodes/readfuncs.c |  1 +
 src/backend/optimizer/plan/subselect.c|  1 +
 src/backend/optimizer/prep/prepjointree.c |  1 +
 src/backend/rewrite/rewriteHandler.c  | 19 +--
 src/backend/utils/cache/relcache.c| 63 +++--
 src/include/nodes/parsenodes.h|  1 +
 src/include/utils/rel.h   | 11 
 src/test/regress/expected/create_view.out | 46 ---
 src/test/regress/expected/rowsecurity.out | 65 +-
 src/test/regress/sql/create_view.sql  | 22 +++-
 src/test/regress/sql/rowsecurity.sql  | 44 +++
 17 files changed, 309 insertions(+), 60 deletions(-)

diff --git a/doc/src/sgml/ref/alter_view.sgml b/doc/src/sgml/ref/alter_view.sgml
index 98c312c5bf..0ea764738a 100644
--- a/doc/src/sgml/ref/alter_view.sgml
+++ b/doc/src/sgml/ref/alter_view.sgml
@@ -156,11 +156,22 @@ ALTER VIEW [ IF EXISTS ] name RESET
 
  
   Changes the security-barrier property of the view.  The value must
-  be Boolean value, such as true
+  be a Boolean value, such as true
   or false.
  
 

+   
+check_permissions_owner (boolean)
+
+ 
+  Changes whether permission checks on the underlying relations are
+  performed as the view owner or as the calling user.  Default is
+  true.  The value must be a Boolean value, such as
+  true or false.
+ 
+
+   
   
 

diff --git a/doc/src/sgml/ref/create_view.sgml b/doc/src/sgml/ref/create_view.sgml
index bf03287592..cb253388e7 100644
--- a/doc/src/sgml/ref/create_view.sgml
+++ b/doc/src/sgml/ref/create_view.sgml
@@ -137,8 +137,6 @@ CREATE VIEW [ schema . ] view_namelocal or
   cascaded, and is equivalent to specifying
   WITH [ CASCADED | LOCAL ] CHECK OPTION (see below).
-  This option can be changed on existing views using ALTER VIEW.
  
 

@@ -152,7 +150,22 @@ CREATE VIEW [ schema . ] view_name
 

-  
+
+   
+check_permissions_owner (boolean)
+
+ 
+  Set by default.  If this option is set to true,
+  it will cause all access to underlying tables to be checked as
+  referenced by the view owner, otherwise as the invoking user.
+ 
+
+   
+  
+
+  All of the above options can be changed on existing views using ALTER VIEW.
+ 
 

 
@@ -265,13 +278,35 @@ CREATE VIEW vista AS SELECT text 'Hello World' AS hello;

 

-Access to tables referenced in the view is determined by permissions of
-the view owner.  In some cases, this can be used to provide secure but
-restricted access to the underlying tables.  However, not all views are
-secure against tampering; see  for
-details.  Functions called in the view are treated the same as if they had
-been called directly from the query using the view.  Therefore the user of
+By default, access to relations referenced in the view is determined
+by permissions of the view owner.  This can be used to provide secure
+but restricted access to the underlying tables.  However, not all views
+are secure against tampering; see 
+for details.
+   
+
+   
+Functions called in the view are treated the same as if they had been
+called directly from the query using the view.  Therefore, the user of
 a view must have permissions to call all functions used by the view.
+This also means that functions are executed 

Re: adding 'zstd' as a compression algorithm

2022-02-18 Thread Robert Haas
On Fri, Feb 18, 2022 at 9:24 AM Robert Haas  wrote:
> On Fri, Feb 18, 2022 at 9:02 AM Robert Haas  wrote:
> > Oh wait ... you want it the other way. Yeah, that seems harmless to
> > change. I wonder how many others there are that could be changed
> > similarly...
>
> I went through configure.ac looking for instances of
> AC_CHECK_HEADERS() where the corresponding symbol was not used. I
> found four:
>
> AC_CHECK_HEADERS(lz4.h, [], [AC_MSG_ERROR([lz4.h header file is
> required for LZ4])])
> AC_CHECK_HEADERS(gssapi/gssapi.h, [], [AC_MSG_ERROR([gssapi.h header
> file is required for GSSAPI])])])
> AC_CHECK_HEADERS(ldap.h, [], [AC_MSG_ERROR([header file  is
> required for LDAP])]
> AC_CHECK_HEADER(winldap.h, [], ...stuff...)
>
> I guess we could clean all of those up similarly.

So here's a revised patch for zstd support that uses
AC_CHECK_HEADER(), plus a second patch to change the occurrences above
to use AC_CHECK_HEADER() and remove all traces of the corresponding
preprocessor symbol.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v4-0001-Add-support-for-building-with-ZSTD.patch
Description: Binary data


v4-0002-Demote-AC_CHECK_HEADERS-calls-to-AC_CHECK_HEADER-.patch
Description: Binary data


RE: Failed transaction statistics to measure the logical replication progress

2022-02-18 Thread osumi.takami...@fujitsu.com
On Friday, February 18, 2022 8:11 PM Amit Kapila  
wrote:
> On Fri, Feb 18, 2022 at 2:04 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Thursday, February 17, 2022 6:45 PM Amit Kapila
>  wrote:
> > > On Thu, Feb 17, 2022 at 3:13 PM Amit Kapila
> > > 
> > > wrote:
> > > > Can't we use pgstat_report_stat() here? Basically, you can update
> > > > xact completetion counters during apply, and then from
> > > > pgstat_report_stat(), you can invoke a logical replication worker
> > > > stats-related function.
> > > >
> > >
> > > If we can do this then we can save the logic this patch is trying to
> > > introduce for PGSTAT_STAT_INTERVAL.
> > Hi, I've encounter a couple of questions during my modification, following
> your advice.
> >
> > In the pgstat_report_stat, we refer to the return value of
> > GetCurrentTransactionStopTimestamp to compare the time different from
> the last time.
> > (In my previous patch, I used GetCurrentTimestamp)
> >
> > This time is updated in apply_handle_commit_internal's
> CommitTransactionCommand for the apply worker.
> > Then, if I update the subscription worker
> > stats(commit_count/abort_count) immediately after this
> > CommitTransactionCommand and immediately call pgstat_report_stat in the
> apply_handle_commit_internal, the time difference becomes too small (falls
> short of PGSTAT_STAT_INTERVAL).
> > Also, the time of GetCurrentTransactionStopTimestamp is not updated
> > even when I keep calling pgstat_report_stat repeatedly.
> > Then, IIUC the next possible timing that message of commit_count or
> > abort_count is sent to the stats collector would become the time when
> > we execute another transaction by the apply worker and update the time
> > for GetCurrentTransactionStopTimestamp
> > and rerun pgstat_report_stat again.
> >
> 
> I think but same is true in the case of the transaction in the backend where 
> we
> increment commit counter via AtEOXact_PgStat after updating the transaction
> time. After that, we call pgstat_report_stat() at later point. How is this 
> case
> different?
> 
> > So, if we keep GetCurrentTransactionStopTimestamp without change, an
> > update of stats depends on another new subsequent transaction if we
> simply merge those.
> > (this leads to users cannot see the latest stats information ?)
> >
> 
> I think this should be okay as these don't need to be accurate.
> 
> > At least, I got a test failure because of this function for streaming
> > commit case because it uses poll_query_until to wait for stats update.
> >
> 
> I feel it is not a good idea to wait for the accurate update of these 
> counters.
Ah, then I had wrote tests based on totally wrong direction and made noises for 
it.
Sorry for that. I don't see tests for existing xact_commit/rollback count,
so I'll follow the same way.

Attached a new patch that addresses three major improvements I've got so far as 
comments.
1. skip increment of stats counter by empty transaction, on the subscriber side
   (except for commit prepared)
2. utilize the existing pgstat_report_stat, instead of having a similar logic 
newly.
3. remove the wrong tests.


Best Regards,
Takamichi Osumi



v22-0001-Extend-pg_stat_subscription_workers-to-include-g.patch
Description:  v22-0001-Extend-pg_stat_subscription_workers-to-include-g.patch


Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-02-18 Thread Yugo NAGATA
Hello Fabien,

Thank you so much for your review. 

Sorry for the late reply. I've stopped working on it due to other
jobs but I came back again. I attached the updated patch. I would
appreciate it if you could review this again.

On Mon, 19 Jul 2021 20:04:23 +0200 (CEST)
Fabien COELHO  wrote:

> # About pgbench error handling v15
> 
> Patches apply cleanly. Compilation, global and local tests ok.
> 
>   - v15.1: refactoring is a definite improvement.
> Good, even if it is not very useful (see below).

Ok, we don't need to save variables in order to implement the
retry feature on pbench as you suggested. Well, should we completely
separate these two patches and should I fix v15.2 to not rely v15.1?

> While restructuring, maybe predefined variables could be make readonly
> so that a script which would update them would fail, which would be a
> good thing. Maybe this is probably material for an independent patch.

Yes, It shoule be for an independent patch.

>   - v15.2: see detailed comments below
> 
> # Doc
> 
> Doc build is ok.
> 
> ISTM that "number of tries" line would be better placed between the 
> #threads and #transactions lines. What do you think?

Agreed. Fixed.

> Aggregate logging description: "{ failures | ... }" seems misleading 
> because it suggests we have one or the other, whereas it can also be 
> empty. I suggest: "{ | failures | ... }" to show the empty case.

The description is correct because either "failures" or "both of
serialization_failures and deadlock_failures" should appear in aggregate
logging. If "failures" was printed only when any transaction failed,
each line in aggregate logging could have different numbers of columns
and which would make it difficult to parse the results.

> I'd wonder whether the number of tries is set too high, 
> though, ISTM that an application should give up before 100? 

Indeed, max-tries=100 seems too high for practical system. 

Also, I noticed that sum of latencies of each command (= 15.839 ms)
is significantly larger than the latency average (= 10.870 ms) 
because "per commands" results in the documentation were fixed.

So, I retook a measurement on my machine for more accurate documentation. I
used max-tries=10.

> Minor editing:
> 
> "there're" -> "there are".
> 
> "the --time" -> "the --time option".

Fixed.

> "The latency for failed transactions and commands is not computed 
> separately." is unclear,
> please use a positive sentence to tell what is true instead of what is not 
> and the reader
> has to guess. Maybe: "The latency figures include failed transactions which 
> have reached
> the maximum number of tries or the transaction latency limit.".

I'm not the original author of this description, but I guess this means "The 
latency is
measured only for successful transactions and commands but not for failed 
transactions
or commands.".

> "The main report contains the number of failed transactions if it is 
> non-zero." ISTM that
> this is a pain for scripts which would like to process these reports data, 
> because the data
> may or may not be there. I'm sure to write such scripts, which explains my 
> concern:-)

I agree with you. I fixed the behavior to report the the number of failed 
transactions
always regardless with if it is non-zero or not.

> "If the total number of retried transactions is non-zero…" should it rather 
> be "not one",
> because zero means unlimited retries?

I guess that this means the actual number of retried transaction not the 
max-tries, so
"non-zero" was correct. However, for the same reason above, I fixed the 
behavior to
report the the retry statistics always regardeless with the actual retry 
numbers.

> 
> # FEATURES
 
> Copying variables: ISTM that we should not need to save the variables 
> states… no clearing, no copying should be needed. The restarted 
> transaction simply overrides the existing variables which is what the 
> previous version was doing anyway. The scripts should write their own 
> variables before using them, and if they don't then it is the user 
> problem. This is important for performance, because it means that after a 
> client has executed all scripts once the variable array is stable and does 
> not incur significant maintenance costs. The only thing that needs saving 
> for retry is the speudo-random generator state. This suggest simplifying 
> or removing "RetryState".

Yes. The variables states is not necessary because we retry the
whole script. It was necessary in the initial patch because it
planned to retry one transaction included in the script. I removed
RetryState and copyVariables.
 
> # CODE
 
> In readCommandResponse: ISTM that PGRES_NONFATAL_ERROR is not needed and 
> could be dealt with the default case. We are only interested in 
> serialization/deadlocks which are fatal errors?

We need PGRES_NONFATAL_ERROR to save st->estatus. It is used outside
readCommandResponse to determine whether we should abort or not.

> doRetry: for consistency, gi

Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-02-18 Thread Nitin Jadhav
> > Okay. I feel this can be added as additional field but it will not
> > replace backend_pid field as this represents the pid of the backend
> > which triggered the current checkpoint.
>
> I don't think that's true.  Requesting a checkpoint means telling the
> checkpointer that it should wake up and start a checkpoint (or restore point)
> if it's not already doing so, so the pid will always be the checkpointer pid.
> The only exception is a standalone backend, but in that case you won't be able
> to query that view anyway.

Yes. I agree that the checkpoint will always be performed by the
checkpointer process. So the pid in the pg_stat_progress_checkpoint
view will always correspond to the checkpointer pid only. Checkpoints
get triggered in many scenarios. One of the cases is the CHECKPOINT
command issued explicitly by the backend. In this scenario I would
like to know the backend pid which triggered the checkpoint. Hence I
would like to add a backend_pid field. So the
pg_stat_progress_checkpoint view contains pid fields as well as
backend_pid fields. The backend_pid contains a valid value only during
the CHECKPOINT command issued by the backend explicitly, otherwise the
value will be 0. We may have to add an additional field to
'CheckpointerShmemStruct' to hold the backend pid. The backend
requesting the checkpoint will update its pid to this structure.
Kindly let me know if you still feel the backend_pid field is not
necessary.


> And also while looking at the patch I see there's the same problem that I
> mentioned in the previous thread, which is that the effective flags can be
> updated once the checkpoint started, and as-is the view won't reflect that.  
> It
> also means that you can't simply display one of wal, time or force but a
> possible combination of the flags (including the one not handled in v1).

If I understand the above comment properly, it has 2 points. First is
to display the combination of flags rather than just displaying wal,
time or force - The idea behind this is to just let the user know the
reason for checkpointing. That is, the checkpoint is started because
max_wal_size is reached or checkpoint_timeout expired or explicitly
issued CHECKPOINT command. The other flags like CHECKPOINT_IMMEDIATE,
CHECKPOINT_WAIT or CHECKPOINT_FLUSH_ALL indicate how the checkpoint
has to be performed. Hence I have not included those in the view.  If
it is really required, I would like to modify the code to include
other flags and display the combination. Second point is to reflect
the updated flags in the view. AFAIK, there is a possibility that the
flags get updated during the on-going checkpoint but the reason for
checkpoint (wal, time or force) will remain same for the current
checkpoint. There might be a change in how checkpoint has to be
performed if CHECKPOINT_IMMEDIATE flag is set. If we go with
displaying the combination of flags in the view, then probably we may
have to reflect this in the view.

> > Probably a new field named 'processes_wiating' or 'events_waiting' can be
> > added for this purpose.
>
> Maybe num_process_waiting?

I feel 'processes_wiating' aligns more with the naming conventions of
the fields of the existing progres views.

> > Probably writing of buffers or syncing files may complete before
> > pg_is_in_recovery() returns false. But there are some cleanup
> > operations happen as part of the checkpoint. During this scenario, we
> > may get false value for pg_is_in_recovery(). Please refer following
> > piece of code which is present in CreateRestartpoint().
> >
> > if (!RecoveryInProgress())
> > replayTLI = XLogCtl->InsertTimeLineID;
>
> Then maybe we could store the timeline rather then then kind of checkpoint?
> You should still be able to compute the information while giving a bit more
> information for the same memory usage.

Can you please describe more about how checkpoint/restartpoint can be
confirmed using the timeline id.

Thanks & Regards,
Nitin Jadhav

On Fri, Feb 18, 2022 at 1:13 PM Julien Rouhaud  wrote:
>
> Hi,
>
> On Fri, Feb 18, 2022 at 12:20:26PM +0530, Nitin Jadhav wrote:
> > >
> > > If there's a checkpoint timed triggered and then someone calls
> > > pg_start_backup() which then wait for the end of the current checkpoint
> > > (possibly after changing the flags), I think the view should reflect that 
> > > in
> > > some way.  Maybe storing an array of (pid, flags) is too much, but at 
> > > least a
> > > counter with the number of processes actively waiting for the end of the
> > > checkpoint.
> >
> > Okay. I feel this can be added as additional field but it will not
> > replace backend_pid field as this represents the pid of the backend
> > which triggered the current checkpoint.
>
> I don't think that's true.  Requesting a checkpoint means telling the
> checkpointer that it should wake up and start a checkpoint (or restore point)
> if it's not already doing so, so the pid will always be the checkpointer pid.
> The only exception is a standal

Re: adding 'zstd' as a compression algorithm

2022-02-18 Thread Robert Haas
On Fri, Feb 18, 2022 at 9:02 AM Robert Haas  wrote:
> Oh wait ... you want it the other way. Yeah, that seems harmless to
> change. I wonder how many others there are that could be changed
> similarly...

I went through configure.ac looking for instances of
AC_CHECK_HEADERS() where the corresponding symbol was not used. I
found four:

AC_CHECK_HEADERS(lz4.h, [], [AC_MSG_ERROR([lz4.h header file is
required for LZ4])])
AC_CHECK_HEADERS(gssapi/gssapi.h, [], [AC_MSG_ERROR([gssapi.h header
file is required for GSSAPI])])])
AC_CHECK_HEADERS(ldap.h, [], [AC_MSG_ERROR([header file  is
required for LDAP])]
AC_CHECK_HEADER(winldap.h, [], ...stuff...)

I guess we could clean all of those up similarly.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: killing perl2host

2022-02-18 Thread Andrew Dunstan

On 2/17/22 15:46, Andres Freund wrote:
> On 2022-02-17 15:23:36 -0500, Andrew Dunstan wrote:
>> Very well. I think the easiest way will be to stash $host_os in the
>> environment and let the script pick it up similarly to what I suggested
>> with MSYSTEM.
> WFM.


OK, here's a patch.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/config/check_modules.pl b/config/check_modules.pl
index cc0a7ab0e7..470c3e9c14 100644
--- a/config/check_modules.pl
+++ b/config/check_modules.pl
@@ -6,6 +6,7 @@
 #
 use strict;
 use warnings;
+use Config;
 
 use IPC::Run 0.79;
 
@@ -19,5 +20,9 @@ diag("IPC::Run::VERSION: $IPC::Run::VERSION");
 diag("Test::More::VERSION: $Test::More::VERSION");
 diag("Time::HiRes::VERSION: $Time::HiRes::VERSION");
 
+# Check that if prove is using msys perl it is for an msys target
+ok(($ENV{__CONFIG_HOST_OS__} || "") eq 'msys',
+   "Msys perl used for correct target")
+  if $Config{osname} eq 'msys';
 ok(1);
 done_testing();
diff --git a/configure b/configure
index ba635a0062..5e3af5c35b 100755
--- a/configure
+++ b/configure
@@ -19453,6 +19453,7 @@ fi
   # installation than perl, eg on MSys, so we have to check using prove.
   { $as_echo "$as_me:${as_lineno-$LINENO}: checking for Perl modules required for TAP tests" >&5
 $as_echo_n "checking for Perl modules required for TAP tests... " >&6; }
+  __CONFIG_HOST_OS__=$host_os; export __CONFIG_HOST_OS__
   modulestderr=`"$PROVE" "$srcdir/config/check_modules.pl" 2>&1 >/dev/null`
   if test $? -eq 0; then
 # log the module version details, but don't show them interactively
diff --git a/configure.ac b/configure.ac
index 16167329fc..d00e92357e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2396,6 +2396,7 @@ if test "$enable_tap_tests" = yes; then
   # AX_PROG_PERL_MODULES here, but prove might be part of a different Perl
   # installation than perl, eg on MSys, so we have to check using prove.
   AC_MSG_CHECKING(for Perl modules required for TAP tests)
+  __CONFIG_HOST_OS__=$host_os; export __CONFIG_HOST_OS__
   [modulestderr=`"$PROVE" "$srcdir/config/check_modules.pl" 2>&1 >/dev/null`]
   if test $? -eq 0; then
 # log the module version details, but don't show them interactively


Re: adding 'zstd' as a compression algorithm

2022-02-18 Thread Robert Haas
On Fri, Feb 18, 2022 at 8:48 AM Robert Haas  wrote:
> On Thu, Feb 17, 2022 at 8:36 PM Andres Freund  wrote:
> > On 2022-02-17 13:34:08 +0900, Michael Paquier wrote:
> > > %define needs to include HAVE_LIBZSTD, HAVE_ZSTD_H and USE_ZSTD, so
> > > this version fails the sanity check between pg_config.h.in and the
> > > MSVC scripts checking that all flags exist.
> >
> > Do we really need all three defines? How about using AC_CHECK_HEADER() 
> > instead
> > of AC_CHECK_HEADERS()? That wouldn't define HAVE_ZSTD_H. Cases where we 
> > error
> > out if a header isn't found make it a bit pointless to then still define
> > HAVE_*_H.  Plenty other cases in configure.ac just use AC_CHECK_HEADER.
>
> I have to admit to being somewhat confused by the apparent lack of
> consistency in the way we do configure checks. The ZSTD check we added
> here is just based on the LZ4 check just above it, which was the
> result of my commit of Dilip's patch to add LZ4 TOAST compression. So
> if we want to do something different we should change them both. But
> that just begs the question of why the LZ4 support looks the way it
> does, and to be honest I don't recall. The zlib and XSLT formulas just
> above are much simpler, but for some reason what we're doing here
> seems to be based on the more-complex formula we use for XML support
> instead of either of those.
>
> But having said that, the proposed patch adds no new call to
> AC_CHECK_HEADER(), and does add a new call to AC_CHECK_HEADERS(), so I
> don't understand the specifics of your complaint here.

Oh wait ... you want it the other way. Yeah, that seems harmless to
change. I wonder how many others there are that could be changed
similarly...

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: adding 'zstd' as a compression algorithm

2022-02-18 Thread Robert Haas
On Thu, Feb 17, 2022 at 8:36 PM Andres Freund  wrote:
> On 2022-02-17 13:34:08 +0900, Michael Paquier wrote:
> > %define needs to include HAVE_LIBZSTD, HAVE_ZSTD_H and USE_ZSTD, so
> > this version fails the sanity check between pg_config.h.in and the
> > MSVC scripts checking that all flags exist.
>
> Do we really need all three defines? How about using AC_CHECK_HEADER() instead
> of AC_CHECK_HEADERS()? That wouldn't define HAVE_ZSTD_H. Cases where we error
> out if a header isn't found make it a bit pointless to then still define
> HAVE_*_H.  Plenty other cases in configure.ac just use AC_CHECK_HEADER.

I have to admit to being somewhat confused by the apparent lack of
consistency in the way we do configure checks. The ZSTD check we added
here is just based on the LZ4 check just above it, which was the
result of my commit of Dilip's patch to add LZ4 TOAST compression. So
if we want to do something different we should change them both. But
that just begs the question of why the LZ4 support looks the way it
does, and to be honest I don't recall. The zlib and XSLT formulas just
above are much simpler, but for some reason what we're doing here
seems to be based on the more-complex formula we use for XML support
instead of either of those.

But having said that, the proposed patch adds no new call to
AC_CHECK_HEADER(), and does add a new call to AC_CHECK_HEADERS(), so I
don't understand the specifics of your complaint here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Patch a potential memory leak in describeOneTableDetails()

2022-02-18 Thread wliang
Hi all,


I find a potential memory leak in PostgresSQL 14.1, which is in the function 
describeOneTableDetails (./src/bin/psql/describe.c). The bug has been confirmed 
by an auditor of .


Specifically, at line 1603, a memory chunk is allocated with pg_strdup and 
assigned to tableinfo.reloptions. If the branch takes true at line 1621, the 
execution path will eventually jump to error_return at line 3394. 
Unfortunately, the memory is not freed when the function 
describeOneTableDetail() returns. As a result, the memory is leaked.



Similarly, the two memory chunks (tableinfo.reloftype and tableinfo.relam) 
allocated at line 1605, 1606 and 1612 may also be leaked for the same reason.







1355 bool
1356 describeTableDetails(const char *pattern, bool verbose, bool showSystem)
1357 {
...
1603 tableinfo.reloptions = pg_strdup(PQgetvalue(res, 0, 9));
1604 tableinfo.tablespace = atooid(PQgetvalue(res, 0, 10));

1605 tableinfo.reloftype = (strcmp(PQgetvalue(res, 0, 11), "") != 0) ?
1606 pg_strdup(PQgetvalue(res, 0, 11)) : NULL;
...
1610 if (pset.sversion >= 12)
1611 tableinfo.relam = PQgetisnull(res, 0, 14) ?
1612 (char *) NULL : pg_strdup(PQgetvalue(res, 0, 14));
1613 else
1614 tableinfo.relam = NULL;
...

1621 if (tableinfo.relkind == RELKIND_SEQUENCE)
1622 {

...

1737 goto error_return;  /* not an error, just return early */
1738 }

...

3394 error_return:
3395 
3396 /* clean up */
3397 if (printTableInitialized)
3398 printTableCleanup(&cont);
3399 termPQExpBuffer(&buf);
3400 termPQExpBuffer(&title);
3401 termPQExpBuffer(&tmpbuf);
3402 
3403 if (view_def)
3404 free(view_def);
3405 
3406 if (res)
3407 PQclear(res);
3408 
3409 return retval;
3410 }


We believe we can fix the problems by adding corresponding release function 
before the function returns, such as.




if (tableinfo.reloptions)
pg_free (tableinfo.reloptions);
if (tableinfo.reloftype)
pg_free (tableinfo.reloftype);
if (tableinfo.relam)
pg_free (tableinfo. relam);




I'm looking forward to your reply.


Best,


Wentao--- describe.c	2022-01-26 16:17:51.353135285 +0800
+++ describe_Patch.c	2022-02-18 15:57:51.099581978 +0800
@@ -3406,6 +3406,13 @@
 	if (res)
 		PQclear(res);
 
+if (tableinfo.reloptions)
+pg_free (tableinfo.reloptions);
+if (tableinfo.reloftype)
+pg_free (tableinfo.reloftype);
+if (tableinfo.relam)
+pg_free (tableinfo. relam);
+
 	return retval;
 }
 


Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-02-18 Thread Tomas Vondra
Hi,

here's a slightly updated version of the patch series. The 0001 part
adds tracking of server_version_num, so that it's possible to enable
other features depending on it. In this case it's used to decide whether
TABLESAMPLE is supported.

The 0002 part modifies the sampling. I realized we can do something
similar even on pre-9.5 releases, by running "WHERE random() < $1". Not
perfect, because it still has to read the whole table, but still better
than also sending it over the network.

There's a "sample" option for foreign server/table, which can be used to
disable the sampling if needed.

A simple measurement on a table with 10M rows, on localhost.

  old:6600ms
  random:  450ms
  tablesample:  40ms (system)
  tablesample: 200ms (bernoulli)

Local analyze takes ~190ms, so that's quite close.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom 6524f8c6f0db13c5dca0438bdda194ee5bedebbb Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Fri, 18 Feb 2022 01:32:25 +0100
Subject: [PATCH 1/2] postgres_fdw: track server version for connections

To allow using features that only exist on new Postgres versions, we
need some information about version of the remote node. We simply
request server_version_num from the remote node. We only fetch it if
version number is actually needed, and we cache it.
---
 contrib/postgres_fdw/connection.c   | 44 +
 contrib/postgres_fdw/postgres_fdw.h |  2 ++
 2 files changed, 46 insertions(+)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index f753c6e2324..3ea2948d3ec 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -279,6 +279,50 @@ GetConnection(UserMapping *user, bool will_prep_stmt, PgFdwConnState **state)
 	return entry->conn;
 }
 
+/*
+ * Determine remote server version (as an int value).
+ *
+ * The value is determined only once and then cached in PgFdwConnState.
+ */
+int
+GetServerVersion(PGconn *conn, PgFdwConnState *state)
+{
+	PGresult   *volatile res = NULL;
+
+	/*
+	 * If we already know server version for this connection, we're done.
+	 */
+	if (state->server_version_num != 0)
+		return state->server_version_num;
+
+	/* PGresult must be released before leaving this function. */
+	PG_TRY();
+	{
+		char	   *line;
+
+		/*
+		 * Execute EXPLAIN remotely.
+		 */
+		res = pgfdw_exec_query(conn, "SHOW server_version_num", NULL);
+		if (PQresultStatus(res) != PGRES_TUPLES_OK)
+			pgfdw_report_error(ERROR, res, conn, false, "SHOW server_version_num");
+
+		/*
+		 * Extract the server version number.
+		 */
+		line = PQgetvalue(res, 0, 0);
+		state->server_version_num = strtol(line, NULL, 10);
+	}
+	PG_FINALLY();
+	{
+		if (res)
+			PQclear(res);
+	}
+	PG_END_TRY();
+
+	return state->server_version_num;
+}
+
 /*
  * Reset all transient state fields in the cached connection entry and
  * establish new connection to the remote server.
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 8ae79e97e44..1687c62df2d 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -132,6 +132,7 @@ typedef struct PgFdwRelationInfo
 typedef struct PgFdwConnState
 {
 	AsyncRequest *pendingAreq;	/* pending async request */
+	int			server_version_num;	/* remote server version */
 } PgFdwConnState;
 
 /* in postgres_fdw.c */
@@ -143,6 +144,7 @@ extern void process_pending_request(AsyncRequest *areq);
 extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt,
 			 PgFdwConnState **state);
 extern void ReleaseConnection(PGconn *conn);
+extern int GetServerVersion(PGconn *conn, PgFdwConnState *state);
 extern unsigned int GetCursorNumber(PGconn *conn);
 extern unsigned int GetPrepStmtNumber(PGconn *conn);
 extern void do_sql_command(PGconn *conn, const char *sql);
-- 
2.34.1

From 546657dd0d3ab3bebdb1145785c62809b78e7644 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Fri, 18 Feb 2022 00:33:19 +0100
Subject: [PATCH 2/2] postgres_fdw: sample data on remote node for ANALYZE

When performing ANALYZE on a foreign tables, we need to collect sample
of rows. Until now, we simply read all data from the remote node and
built the sample locally. That is very expensive, especially in terms of
network traffic etc. But it's possible to move the sampling to the
remote node, and use either TABLESAMPLE or simply random() to transfer
just much smaller amount of data.

So we run either

   SELECT * FROM table TABLESAMPLE SYSTEM(fraction)

or

  SELECT * FROM table WHERE random() < fraction

depending on the server version (TABLESAMPLE is supported since 9.5).

To do that, we need to determine what fraction of the table to sample.
We rely on reltuples (fetched from the remote node) to be sufficiently
accurate and up to date, and calculate the fraction based on that. We
increase the sample size a bit (in case the table shrunk), and we still
do the reservoi

Re: adding 'zstd' as a compression algorithm

2022-02-18 Thread Robert Haas
On Thu, Feb 17, 2022 at 8:18 PM Michael Paquier  wrote:
> On Thu, Feb 17, 2022 at 09:40:13AM -0500, Robert Haas wrote:
> > Ah, OK, cool. It seems like we have consensus on this being a good
> > direction, so I plan to commit this later today unless objections or
> > additional review comments turn up.
>
> So, will there be a part of the system where we'll make use of that in
> 15?  pg_basebackup for server-side and client-side compression, at
> least, as far as I can guess?

That's my plan, yes. I think the patches only need a small amount of
work at this point, but I'd like to get this part committed first.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Merging statistics from children instead of re-sampling everything

2022-02-18 Thread Andrey V. Lepikhov

On 2/14/22 20:16, Tomas Vondra wrote:



On 2/14/22 11:22, Andrey V. Lepikhov wrote:

On 2/11/22 20:12, Tomas Vondra wrote:



On 2/11/22 05:29, Andrey V. Lepikhov wrote:

On 2/11/22 03:37, Tomas Vondra wrote:

That being said, this thread was not really about foreign partitions,
but about re-analyzing inheritance trees in general. And sampling
foreign partitions doesn't really solve that - we'll still do the
sampling over and over.

IMO, to solve the problem we should do two things:
1. Avoid repeatable partition scans in the case inheritance tree.
2. Avoid to re-analyze everything in the case of active changes in 
small subset of partitions.


For (1) i can imagine a solution like multiplexing: on the stage of 
defining which relations to scan, group them and prepare parameters 
of scanning to make multiple samples in one shot.
I'm not sure I understand what you mean by multiplexing. The term 
usually means "sending multiple signals at once" but I'm not sure how 
that applies to this issue. Can you elaborate?


I suppose to make a set of samples in one scan: one sample for plane 
table, another - for a parent and so on, according to the inheritance 
tree. And cache these samples in memory. We can calculate all 
parameters of reservoir method to do it.




I doubt keeping the samples just in memory is a good solution. Firstly, 
there's the question of memory consumption. Imagine a large partitioned 
table with 1-10k partitions. If we keep a "regular" sample (30k rows) 
per partition, that's 30M-300M rows. If each row needs 100B, that's 
3-30GB of data.
I tell about caching a sample only for a time that it needed in this 
ANALYZE operation. Imagine 3 levels of partitioned table. On each 
partition you should create and keep three different samples (we can do 
it in one scan). Sample for a plane table we can use immediately and 
destroy it.
Sample for the partition on second level of hierarchy: we can save a 
copy of sample for future usage (maybe, repeated analyze) to a disk. 
In-memory data used to form a reservoir, that has a limited size and can 
be destroyed immediately. At the third level we can use the same logic.
So, at one moment we only use as many samples as many levels of 
hierarchy we have. IMO, it isn't large number.


> the trouble is partitions may be detached, data may be deleted from
> some partitions, etc.
Because statistics hasn't strong relation with data, we can use two 
strategies: In the case of explicit 'ANALYZE ' we can recalculate 
all samples for all partitions, but in autovacuum case or implicit 
analysis we can use not-so-old versions of samples and samples of 
detached (but not destroyed) partitions in optimistic assumption that it 
doesn't change statistic drastically.



So IMHO the samples need to be serialized, in some way.

Agreed

Well, a separate catalog is one of the options. But I don't see how that 
deals with large samples, etc.
I think, we can design fall back to previous approach in the case of 
very large tuples, like a switch from HashJoin to NestedLoop if we 
estimate, that we haven't enough memory.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Emit a warning if the extension's GUC is set incorrectly

2022-02-18 Thread Florin Irion
Il giorno ven 18 feb 2022 alle ore 10:58 Tom Lane  ha
scritto:

> I wrote:
> > As a stopgap to turn the farm green again, I am going to revert
> > 75d22069e as well as my followup patches.  If we don't want to
> > give up on that idea altogether, we have to find some way to
> > suppress the chatter from parallel workers.  I wonder whether
> > it would be appropriate to go further than we have, and actively
> > delete placeholders that turn out to be within an extension's
> > reserved namespace.  The core issue here is that workers don't
> > necessarily set GUCs and load extensions in the same order that
> > their parent did, so if we leave any invalid placeholders behind
> > after reserving an extension's prefix, we're risking issues
> > during worker start.
>
> Here's a delta patch (meant to be applied after reverting cab5b9ab2)
> that does things like that.  It fixes the parallel-mode problem ...
> so do we want to tighten things up this much?
>
> regards, tom lan
>

Hello,

Thank you for taking care of the bug I introduced with 75d22069e,
I didn't notice this thread until now.

For what it's worth, I agree with throwing an ERROR if the placeholder is
unrecognized. Initially, I didn't want to change too much the liberty of
setting any
placeholder, but mainly to not go unnoticed.

In my humble opinion, I still think that this should go on and disallow
bogus placeholders as we do for postgres unrecognized settings.

I tested your delta patch with and without parallel mode, and, naturally,
it behaves correctly.

My 2 cents.

+1

Cheers,
Florin Irion


Re: Failed transaction statistics to measure the logical replication progress

2022-02-18 Thread Amit Kapila
On Fri, Feb 18, 2022 at 2:04 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Thursday, February 17, 2022 6:45 PM Amit Kapila  
> wrote:
> > On Thu, Feb 17, 2022 at 3:13 PM Amit Kapila 
> > wrote:
> > > Can't we use pgstat_report_stat() here? Basically, you can update xact
> > > completetion counters during apply, and then from
> > > pgstat_report_stat(), you can invoke a logical replication worker
> > > stats-related function.
> > >
> >
> > If we can do this then we can save the logic this patch is trying to 
> > introduce for
> > PGSTAT_STAT_INTERVAL.
> Hi, I've encounter a couple of questions during my modification, following 
> your advice.
>
> In the pgstat_report_stat, we refer to the return value of
> GetCurrentTransactionStopTimestamp to compare the time different from the 
> last time.
> (In my previous patch, I used GetCurrentTimestamp)
>
> This time is updated in apply_handle_commit_internal's 
> CommitTransactionCommand for the apply worker.
> Then, if I update the subscription worker stats(commit_count/abort_count) 
> immediately after
> this CommitTransactionCommand and immediately call pgstat_report_stat in the 
> apply_handle_commit_internal,
> the time difference becomes too small (falls short of PGSTAT_STAT_INTERVAL).
> Also, the time of GetCurrentTransactionStopTimestamp is not updated
> even when I keep calling pgstat_report_stat repeatedly.
> Then, IIUC the next possible timing that message of commit_count or 
> abort_count
> is sent to the stats collector would become the time when we execute another 
> transaction
> by the apply worker and update the time for GetCurrentTransactionStopTimestamp
> and rerun pgstat_report_stat again.
>

I think but same is true in the case of the transaction in the backend
where we increment commit counter via AtEOXact_PgStat after updating
the transaction time. After that, we call pgstat_report_stat() at
later point. How is this case different?

> So, if we keep GetCurrentTransactionStopTimestamp without change,
> an update of stats depends on another new subsequent transaction if we simply 
> merge those.
> (this leads to users cannot see the latest stats information ?)
>

I think this should be okay as these don't need to be accurate.

> At least, I got a test failure because of this function for streaming commit 
> case
> because it uses poll_query_until to wait for stats update.
>

I feel it is not a good idea to wait for the accurate update of these counters.

-- 
With Regards,
Amit Kapila.




Re: logical replication empty transactions

2022-02-18 Thread Amit Kapila
On Fri, Feb 18, 2022 at 3:06 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Friday, February 18, 2022 6:18 PM Amit Kapila  
> wrote:
> > On Tue, Feb 8, 2022 at 5:27 AM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > On Friday, August 13, 2021 8:01 PM Ajin Cherian  wrote:
> > > > On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila 
> > > > wrote:
> > > Changing the timing to send the keepalive to the decoding commit
> > > timing didn't look impossible to me, although my suggestion can be
> > > ad-hoc.
> > >
> > > After the initialization of sentPtr(by confirmed_flush lsn), sentPtr
> > > is updated from logical_decoding_ctx->reader->EndRecPtr in
> > XLogSendLogical.
> > > In the XLogSendLogical, we update it after we execute
> > LogicalDecodingProcessRecord.
> > > This order leads to the current implementation to wait the next
> > > iteration to send a keepalive in WalSndWaitForWal.
> > >
> > > But, I felt we can utilize end_lsn passed to ReorderBufferCommit for
> > > updating sentPtr. The end_lsn is the lsn same as the
> > > ctx->reader->EndRecPtr, which means advancing the timing to update the
> > sentPtr for the commit case.
> > > Then if the transaction is empty in synchronous mode, send the
> > > keepalive in WalSndUpdateProgress directly, instead of having the
> > > force_keepalive_syncrep flag and having it true.
> > >
> >
> > You have a point in that we don't need to delay sending this message till 
> > next
> > WalSndWaitForWal() but I don't see why we need to change anything about
> > update of sentPtr.
> Yeah, you're right.
> Now I think we don't need the update of sentPtr to send a keepalive.
>
> I thought we can send a keepalive message
> after its update in XLogSendLogical or any appropriate place for it after the 
> existing update.
>

Yeah, I think there could be multiple ways (a) We can send such a keep
alive in WalSndUpdateProgress() itself by using ctx->write_location.
For this, we need to modify WalSndKeepalive() to take sentPtr as
input. (b) set some flag in WalSndUpdateProgress() and then do it
somewhere in WalSndLoop probably in WalSndKeepaliveIfNecessary, or
maybe there is another better way.

-- 
With Regards,
Amit Kapila.




JSON path decimal literal syntax

2022-02-18 Thread Peter Eisentraut


I noticed that the JSON path lexer does not support the decimal literal 
syntax forms


.1
1.

(that is, there are no digits before or after the decimal point).  This 
is allowed by the relevant ECMAScript standard 
(https://262.ecma-international.org/5.1/#sec-7.8.3) and of course SQL 
allows it as well.


Is there a reason for this?  I didn't find any code comments or 
documentation about this.


Attached are patches that would enable this.  As you can see, a bunch of 
test cases are affected.From 9fc73f1fa4d83da85dc1626cf8b218ec8a11104c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 18 Feb 2022 10:52:56 +0100
Subject: [PATCH v1 1/2] Test cases for JSON path decimal literals

---
 src/test/regress/expected/jsonpath.out | 18 ++
 src/test/regress/sql/jsonpath.sql  |  4 
 2 files changed, 22 insertions(+)

diff --git a/src/test/regress/expected/jsonpath.out 
b/src/test/regress/expected/jsonpath.out
index e399fa9631..54eb548ad1 100644
--- a/src/test/regress/expected/jsonpath.out
+++ b/src/test/regress/expected/jsonpath.out
@@ -878,6 +878,24 @@ select '0.0010e+2'::jsonpath;
  0.10
 (1 row)
 
+select '.001'::jsonpath;
+ERROR:  syntax error, unexpected '.' at or near "." of jsonpath input
+LINE 1: select '.001'::jsonpath;
+   ^
+select '.001e1'::jsonpath;
+ERROR:  syntax error, unexpected '.' at or near "." of jsonpath input
+LINE 1: select '.001e1'::jsonpath;
+   ^
+select '1.'::jsonpath;
+ERROR:  syntax error, unexpected end of file at end of jsonpath input
+LINE 1: select '1.'::jsonpath;
+   ^
+select '1.e1'::jsonpath;
+ jsonpath 
+--
+ 1."e1"
+(1 row)
+
 select '1e'::jsonpath;
 ERROR:  invalid floating point number at or near "1e" of jsonpath input
 LINE 1: select '1e'::jsonpath;
diff --git a/src/test/regress/sql/jsonpath.sql 
b/src/test/regress/sql/jsonpath.sql
index 17ab775783..bf71b99fc5 100644
--- a/src/test/regress/sql/jsonpath.sql
+++ b/src/test/regress/sql/jsonpath.sql
@@ -163,6 +163,10 @@
 select '0.0010e-1'::jsonpath;
 select '0.0010e+1'::jsonpath;
 select '0.0010e+2'::jsonpath;
+select '.001'::jsonpath;
+select '.001e1'::jsonpath;
+select '1.'::jsonpath;
+select '1.e1'::jsonpath;
 select '1e'::jsonpath;
 select '1.e'::jsonpath;
 select '1.2e'::jsonpath;
-- 
2.35.1

From 1881760218f15749122460eb3a71a0b648b5c86b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 18 Feb 2022 11:11:18 +0100
Subject: [PATCH v1 2/2] Fix JSON path decimal literal syntax

Per ECMAScript standard (referenced by SQL standard), the syntax forms

.1
1.

should be allowed for decimal numeric literals, but the existing
implementation rejected them.
---
 src/backend/utils/adt/jsonpath_scan.l  |  12 +-
 src/test/regress/expected/jsonpath.out | 184 +++--
 2 files changed, 110 insertions(+), 86 deletions(-)

diff --git a/src/backend/utils/adt/jsonpath_scan.l 
b/src/backend/utils/adt/jsonpath_scan.l
index 827a9e44cb..bf1cea8c03 100644
--- a/src/backend/utils/adt/jsonpath_scan.l
+++ b/src/backend/utils/adt/jsonpath_scan.l
@@ -82,8 +82,7 @@ other 
[^\?\%\$\.\[\]\{\}\(\)\|\&\!\=\<\>\@\#\,\*:\-\+\/\\\" \t\n\r\f]
 
 digit  [0-9]
 integer(0|[1-9]{digit}*)
-decimal{integer}\.{digit}+
-decimalfail{integer}\.
+decimal({integer}\.{digit}*|\.{digit}+)
 real   ({integer}|{decimal})[Ee][-+]?{digit}+
 realfail1  ({integer}|{decimal})[Ee]
 realfail2  ({integer}|{decimal})[Ee][-+]
@@ -242,15 +241,6 @@ hex_fail   \\x{hex_dig}{0,1}
return 
INT_P;
}
 
-{decimalfail}  {
-   /* 
throw back the ., and treat as integer */
-   
yyless(yyleng - 1);
-   
addstring(true, yytext, yyleng);
-   
addchar(false, '\0');
-   
yylval->str = scanstring;
-   return 
INT_P;
-   }
-
 ({realfail1}|{realfail2})  { yyerror(NULL, "invalid floating point 
number"); }
 
 \" {
diff --git a/src/test/regress/expected/jsonpath.out 
b/src/test/regress/expected/jsonpath.out
index 54eb548ad1..058010172f 100644
--- a/src/test/regress/expected/jsonpath.out
+++ b/src/test/regress/expected/jsonpath.out
@@ -354,11 +354,9 @@ select 'null.type()'::jsonpath;
 (1 row)
 
 select '1.type()'::jsonpath;
- jsonpath 
---
- 1.type()
-(1 row)
-
+ERROR:  syntax error, unexpected TYPE_P, expecting end of file

Re: pgsql: pg_upgrade: Preserve relfilenodes and tablespace OIDs.

2022-02-18 Thread Christoph Berg
Re: Robert Haas
> > This needs to be guarded with "if (binary_upgrade)".
> 
> Right. Sorry about that, and sorry for not responding sooner also. Fix
> pushed now.

Thanks, the 14-15 upgrade test is passing again here.

Christoph




Re: shared_preload_libraries = 'pg_stat_statements' failing with installcheck (compute_query_id = auto)

2022-02-18 Thread Julien Rouhaud
Hi,

On Fri, Feb 18, 2022 at 05:22:36PM +0900, Michael Paquier wrote:
>
> So, I have been looking at this problem, and I don't see a problem in
> doing something like the attached, where we add a "regress" mode to
> compute_query_id that is a synonym of "auto". Or, in short, we have
> the default of letting a module decide if a query ID can be computed
> or not, at the exception that we hide its result in EXPLAIN outputs.
>
> Julien, what do you think?

I don't see any problem with that patch.
>
> FWIW, about your question of upthread, I have noticed the behavior
> while testing, but I know of some internal customers that enable
> pg_stat_statements and like doing tests on the PostgreSQL instance
> deployed this way, so that would break.  They are not on 14 yet as far
> as I know.

Are they really testing EXPLAIN output, or just doing application-level tests?




RE: logical replication empty transactions

2022-02-18 Thread osumi.takami...@fujitsu.com
On Friday, February 18, 2022 6:18 PM Amit Kapila  
wrote:
> On Tue, Feb 8, 2022 at 5:27 AM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Friday, August 13, 2021 8:01 PM Ajin Cherian  wrote:
> > > On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila 
> > > wrote:
> > Changing the timing to send the keepalive to the decoding commit
> > timing didn't look impossible to me, although my suggestion can be
> > ad-hoc.
> >
> > After the initialization of sentPtr(by confirmed_flush lsn), sentPtr
> > is updated from logical_decoding_ctx->reader->EndRecPtr in
> XLogSendLogical.
> > In the XLogSendLogical, we update it after we execute
> LogicalDecodingProcessRecord.
> > This order leads to the current implementation to wait the next
> > iteration to send a keepalive in WalSndWaitForWal.
> >
> > But, I felt we can utilize end_lsn passed to ReorderBufferCommit for
> > updating sentPtr. The end_lsn is the lsn same as the
> > ctx->reader->EndRecPtr, which means advancing the timing to update the
> sentPtr for the commit case.
> > Then if the transaction is empty in synchronous mode, send the
> > keepalive in WalSndUpdateProgress directly, instead of having the
> > force_keepalive_syncrep flag and having it true.
> >
> 
> You have a point in that we don't need to delay sending this message till next
> WalSndWaitForWal() but I don't see why we need to change anything about
> update of sentPtr.
Yeah, you're right.
Now I think we don't need the update of sentPtr to send a keepalive.

I thought we can send a keepalive message
after its update in XLogSendLogical or any appropriate place for it after the 
existing update.


Best Regards,
Takamichi Osumi



Re: logical replication empty transactions

2022-02-18 Thread Amit Kapila
On Tue, Feb 8, 2022 at 5:27 AM osumi.takami...@fujitsu.com
 wrote:
>
> On Friday, August 13, 2021 8:01 PM Ajin Cherian  wrote:
> > On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila 
> > wrote:
> Changing the timing to send the keepalive to the decoding commit
> timing didn't look impossible to me, although my suggestion
> can be ad-hoc.
>
> After the initialization of sentPtr(by confirmed_flush lsn),
> sentPtr is updated from logical_decoding_ctx->reader->EndRecPtr in 
> XLogSendLogical.
> In the XLogSendLogical, we update it after we execute 
> LogicalDecodingProcessRecord.
> This order leads to the current implementation to wait the next iteration
> to send a keepalive in WalSndWaitForWal.
>
> But, I felt we can utilize end_lsn passed to ReorderBufferCommit for updating
> sentPtr. The end_lsn is the lsn same as the ctx->reader->EndRecPtr,
> which means advancing the timing to update the sentPtr for the commit case.
> Then if the transaction is empty in synchronous mode,
> send the keepalive in WalSndUpdateProgress directly,
> instead of having the force_keepalive_syncrep flag and having it true.
>

You have a point in that we don't need to delay sending this message
till next WalSndWaitForWal() but I don't see why we need to change
anything about update of sentPtr.


-- 
With Regards,
Amit Kapila.




Re: logical replication empty transactions

2022-02-18 Thread Amit Kapila
On Thu, Feb 17, 2022 at 4:12 PM Amit Kapila  wrote:
>
> On Mon, Jan 31, 2022 at 6:18 PM Ajin Cherian  wrote:
> >
>
> Few comments:
> =
>

One more comment:
@@ -1546,10 +1557,11 @@ WalSndWaitForWal(XLogRecPtr loc)
  * otherwise idle, this keepalive will trigger a reply. Processing the
  * reply will update these MyWalSnd locations.
  */
- if (MyWalSnd->flush < sentPtr &&
+ if (force_keepalive_syncrep ||
+ (MyWalSnd->flush < sentPtr &&
  MyWalSnd->write < sentPtr &&
- !waiting_for_ping_response)
- WalSndKeepalive(false);
+ !waiting_for_ping_response))
+ WalSndKeepalive(false);

Will this allow syncrep to proceed in case we are skipping the
transaction? Won't we need to send a feedback message with
'requestReply' true in this case as we release syncrep waiters while
processing standby message, see
ProcessStandbyReplyMessage->SyncRepReleaseWaiters. Without
'requestReply', the subscriber might not send any message and the
syncrep won't proceed. Why do you decide to delay sending this message
till WalSndWaitForWal()? It may not be called for each transaction.

I feel we should try to device a test case to test this sync
replication mechanism such that without this particular change the
sync rep transaction waits momentarily but with this change it doesn't
wait. I am not entirely sure whether we can devise an automated test
as this is timing related issue but I guess we can at least manually
try to produce a case.

-- 
With Regards,
Amit Kapila.




Re: [Proposal] Add foreign-server health checks infrastructure

2022-02-18 Thread Fujii Masao




On 2022/02/17 19:35, kuroda.hay...@fujitsu.com wrote:

Dear Horiguchi-san,


I think we just don't need to add the special timeout kind to the
core.  postgres_fdw can use USER_TIMEOUT and it would be suffiction to
keep running health checking regardless of transaction state then fire
query cancel if disconnection happens. As I said in the previous main,
possible extra query cancel woud be safe.


Sounds reasonable to me.



I finally figured out that you mentioned about user-defined timeout system.
Firstly - before posting to hackers - I designed like that,
but I was afraid of an overhead that many FDW registers timeout
and call setitimer() many times. Is it too overcautious?


Isn't it a very special case where many FDWs use their own user timeouts? Could 
you tell me the assumption that you're thinking, especially how many FDWs are 
working?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: Failed transaction statistics to measure the logical replication progress

2022-02-18 Thread osumi.takami...@fujitsu.com
On Thursday, February 17, 2022 6:45 PM Amit Kapila  
wrote:
> On Thu, Feb 17, 2022 at 3:13 PM Amit Kapila 
> wrote:
> >
> > On Tue, Jan 4, 2022 at 5:22 PM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > On Friday, December 31, 2021 10:12 AM Tang, Haiying/唐 海英
>  wrote:
> > > > 4)
> > > > +void
> > > > +pgstat_send_subworker_xact_stats(LogicalRepWorker *repWorker,
> > > > +bool
> > > > +force) {
> > > > + static TimestampTz last_report = 0;
> > > > + PgStat_MsgSubWorkerXactEnd msg;
> > > > +
> > > > + if (!force)
> > > > + {
> > > > ...
> > > > + if (!TimestampDifferenceExceeds(last_report, now,
> > > > PGSTAT_STAT_INTERVAL))
> > > > + return;
> > > > + last_report = now;
> > > > + }
> > > > +
> > > > ...
> > > > + if (repWorker->commit_count == 0 && repWorker->abort_count
> > > > + ==
> > > > 0)
> > > > + return;
> > > > ...
> > > >
> > > > I think it's better to check commit_count and abort_count first,
> > > > then check if reach PGSTAT_STAT_INTERVAL.
> > > > Otherwise if commit_count and abort_count are 0, it is possible
> > > > that the value of last_report has been updated but it didn't send
> > > > stats in fact. In this case, last_report is not the real time that send 
> > > > last
> message.
> > > Yeah, agreed. This fix is right in terms of the variable name aspect.
> > >
> >
> > Can't we use pgstat_report_stat() here? Basically, you can update xact
> > completetion counters during apply, and then from
> > pgstat_report_stat(), you can invoke a logical replication worker
> > stats-related function.
> >
> 
> If we can do this then we can save the logic this patch is trying to 
> introduce for
> PGSTAT_STAT_INTERVAL.
Hi, I've encounter a couple of questions during my modification, following your 
advice.

In the pgstat_report_stat, we refer to the return value of
GetCurrentTransactionStopTimestamp to compare the time different from the last 
time.
(In my previous patch, I used GetCurrentTimestamp)

This time is updated in apply_handle_commit_internal's CommitTransactionCommand 
for the apply worker.
Then, if I update the subscription worker stats(commit_count/abort_count) 
immediately after
this CommitTransactionCommand and immediately call pgstat_report_stat in the 
apply_handle_commit_internal,
the time difference becomes too small (falls short of PGSTAT_STAT_INTERVAL).
Also, the time of GetCurrentTransactionStopTimestamp is not updated
even when I keep calling pgstat_report_stat repeatedly.
Then, IIUC the next possible timing that message of commit_count or abort_count
is sent to the stats collector would become the time when we execute another 
transaction
by the apply worker and update the time for GetCurrentTransactionStopTimestamp
and rerun pgstat_report_stat again.

So, if we keep GetCurrentTransactionStopTimestamp without change,
an update of stats depends on another new subsequent transaction if we simply 
merge those.
(this leads to users cannot see the latest stats information ?)
At least, I got a test failure because of this function for streaming commit 
case
because it uses poll_query_until to wait for stats update.

On the other hand, replacing GetCurrentTransactionStopTimestamp with
GetCurrentTimestamp in case of apply worker looks have another negative impact.
If we do so, it becomes possible that we go into the code to scan 
TabStatusArray with
PgStat_TableStatus's trans with non-null values, because of the timing change.

I might be able to avoid this kind of assert failure if I write the message send
function of this patch before other existing functions to send various type of 
messages
and return if the process is apply worker in pgstat_report_stat.
But, I can't be convinced that this way of modification is OK.

What did you think about those issues ?

Best Regards,
Takamichi Osumi



Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-18 Thread Masahiko Sawada
On Wed, Feb 16, 2022 at 8:36 PM Masahiko Sawada  wrote:
>
> On Wed, Feb 16, 2022 at 5:49 PM Amit Kapila  wrote:
> >
> > On Tue, Feb 15, 2022 at 11:56 PM Andres Freund  wrote:
> > >
> > > On 2022-02-03 13:33:08 +0900, Masahiko Sawada wrote:
> > > > I see that important information such as error-XID that can be used
> > > > for ALTER SUBSCRIPTION SKIP needs to be stored in a reliable way, and
> > > > using system catalogs is a reasonable way for this purpose. But it's
> > > > still unclear to me why all error information that is currently shown
> > > > in pg_stat_subscription_workers view, including error-XID and the
> > > > error message, relation OID, action, etc., need to be stored in the
> > > > catalog. The information other than error-XID doesn't necessarily need
> > > > to be reliable compared to error-XID. I think we can have
> > > > error-XID/LSN in the pg_subscription catalog and have other error
> > > > information in pg_stat_subscription_workers view. After the user
> > > > checks the current status of logical replication by checking
> > > > error-XID/LSN, they can check pg_stat_subscription_workers for
> > > > details.
> > >
> > > The stats system isn't geared towards storing error messages and
> > > such. Generally it is about storing counts of events etc, not about
> > > information about a single event. Obviously there are a few cases where 
> > > that
> > > boundary isn't that clear...
> > >
> >
> > True, in the beginning, we discussed this information to be stored in
> > system catalog [1] (See  Also, I am thinking that instead of a
> > stat view, do we need to consider having a system table .. ) but later
> > discussion led us to store this as stats.
> >
> > > IOW, storing information like:
> > > - subscription oid
> > > - retryable error count
> > > - hard error count
> > > - #replicated inserts
> > > - #replicated updates
> > > - #replicated deletes
> > >
> > > is what pgstats is for.
> > >
> >
> > Some of this and similar ((like error count, last_error_time)) is
> > present in stats and something on the lines of the other information
> > is proposed in [2] (xacts successfully replicated (committed),
> > aborted, etc) to be stored along with it.
> >
> > > But not
> > > - subscription oid
> > > - error message
> > > - xid of error
> > > - ...
> > >
> >
> > I think from the current set of things we are capturing, the other
> > thing in this list will be error_command (insert/update/delete..) and
> > or probably error_code. So, we can keep this information in a system
> > table.
>
> Agreed. Also, we can have also commit-LSN or replace error-XID with error-LSN?
>
> >
> > Based on this discussion, it appears to me what we want here is to
> > store the error info in some persistent way (system table) and the
> > counters (both error and success) of logical replication in stats. If
> > we can't achieve this work (separation) in the next few weeks (before
> > the feature freeze) then I'll revert the work related to stats.
>
> There was an idea to use shmem to store error info but it seems to be
> better to use a system catalog to persist them.
>
> I'll summarize changes we discussed and make a plan of changes and
> feature designs toward the feature freeze (and v16). I think that once
> we get a consensus on them we can start implementation and move it
> forward.
>

Here is the summary of the discussion, changes, and plan.

1. Move some error information such as the error message to a new
system catalog, pg_subscription_error. The pg_subscription_error table
would have the following columns:

* sesubid : subscription Oid.
* serelid : relation Oid (NULL for apply worker).
* seerrlsn : commit-LSN or the error transaction.
* seerrcmd : command (INSERT, UPDATE, etc.) of the error transaction.
* seerrmsg : error message

The tuple is inserted or updated when an apply worker or a tablesync
worker raises an error. If the same error occurs in a row, the update
is skipped. The tuple is removed in the following cases:

* the subscription is dropped.
* the table is dropped.
* the table is removed from the subscription.
* the worker successfully committed a non-empty transaction.

With this change, pg_stat_subscription_workers will be like:

* subid
* subname
* subrelid
* error_count
* last_error_timestamp

This view will be extended by adding transaction statistics proposed
on another thread[1].

2. Fix a bug in pg_stat_subscription_workers. As pointed out by
Andres, there is a bug in pg_stat_subscription_workers; it doesn't
drop entries for already-dropped tables. We need to fix it.

3. Show commit-LSN of the error transaction in errcontext. Currently,
we show XID and commit timestamp in the errcontext. But given that we
use LSN in pg_subscriptoin_error, it's better to show commit-LSN as
well (or instead of error-XID).

4. Skipping the particular conflicted transaction. There are two proposals:

4-1. Use the existing replication_origin_advance() SQL function. We
don't need to add any additional syntax

Re: shared_preload_libraries = 'pg_stat_statements' failing with installcheck (compute_query_id = auto)

2022-02-18 Thread Michael Paquier
On Tue, Feb 08, 2022 at 12:56:59PM +0900, Michael Paquier wrote:
> Well, I can see that this is a second independent complain after a few
> months.  If you wish to keep this capability, wouldn't it be better to
> add a "regress" mode to compute_query_id, where we would mask
> automatically this information in the output of EXPLAIN but still run
> the computation?

So, I have been looking at this problem, and I don't see a problem in
doing something like the attached, where we add a "regress" mode to
compute_query_id that is a synonym of "auto". Or, in short, we have
the default of letting a module decide if a query ID can be computed
or not, at the exception that we hide its result in EXPLAIN outputs.

Julien, what do you think?

FWIW, about your question of upthread, I have noticed the behavior
while testing, but I know of some internal customers that enable
pg_stat_statements and like doing tests on the PostgreSQL instance
deployed this way, so that would break.  They are not on 14 yet as far
as I know.
--
Michael
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index a4c277269e..c670662db2 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -57,7 +57,8 @@ enum ComputeQueryIdType
 {
 	COMPUTE_QUERY_ID_OFF,
 	COMPUTE_QUERY_ID_ON,
-	COMPUTE_QUERY_ID_AUTO
+	COMPUTE_QUERY_ID_AUTO,
+	COMPUTE_QUERY_ID_REGRESS
 };
 
 /* GUC parameters */
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index b970997c34..fae3926b42 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -604,7 +604,13 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
-	if (es->verbose && plannedstmt->queryId != UINT64CONST(0))
+	/*
+	 * COMPUTE_QUERY_ID_REGRESS means COMPUTE_QUERY_ID_AUTO, but we don't
+	 * show it up in any of the EXPLAIN plans to keep stable the results
+	 * generated by any regression test suite.
+	 */
+	if (es->verbose && plannedstmt->queryId != UINT64CONST(0) &&
+		compute_query_id != COMPUTE_QUERY_ID_REGRESS)
 	{
 		/*
 		 * Output the queryid as an int64 rather than a uint64 so we match
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 01f373815e..7438df9109 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -412,6 +412,7 @@ static const struct config_enum_entry backslash_quote_options[] = {
  */
 static const struct config_enum_entry compute_query_id_options[] = {
 	{"auto", COMPUTE_QUERY_ID_AUTO, false},
+	{"regress", COMPUTE_QUERY_ID_REGRESS, false},
 	{"on", COMPUTE_QUERY_ID_ON, false},
 	{"off", COMPUTE_QUERY_ID_OFF, false},
 	{"true", COMPUTE_QUERY_ID_ON, true},
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index e6f71c7582..a4612d9a8a 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -1925,8 +1925,9 @@ create_database(const char *dbname)
 	 "ALTER DATABASE \"%s\" SET lc_numeric TO 'C';"
 	 "ALTER DATABASE \"%s\" SET lc_time TO 'C';"
 	 "ALTER DATABASE \"%s\" SET bytea_output TO 'hex';"
+	 "ALTER DATABASE \"%s\" SET compute_query_id TO 'regress';"
 	 "ALTER DATABASE \"%s\" SET timezone_abbreviations TO 'Default';",
-	 dbname, dbname, dbname, dbname, dbname, dbname);
+	 dbname, dbname, dbname, dbname, dbname, dbname, dbname);
 	psql_end_command(buf, "postgres");
 
 	/*
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d99bf38e67..036e6da680 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7934,9 +7934,11 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
 method is not acceptable.  In this case, in-core computation
 must be always disabled.
 Valid values are off (always disabled),
-on (always enabled) and auto,
+on (always enabled), auto,
 which lets modules such as 
-automatically enable it.
+automatically enable it, and regress which
+has the same effect as auto, except that the
+query identifier is hidden in the EXPLAIN output.
 The default is auto.




signature.asc
Description: PGP signature


Re: logical replication empty transactions

2022-02-18 Thread Ajin Cherian
On Wed, Feb 16, 2022 at 2:15 PM osumi.takami...@fujitsu.com
 wrote:
> Another idea would be, to create an empty file under the the 
> pg_replslot/slotname
> with a prefix different from "xid"  in the DecodePrepare before the shutdown
> if the prepare was empty, and bypass the cleanup of the serialized txns
> and check the existence after the restart. But, this is pretty ad-hoc and I 
> wasn't sure
> if to address the corner case of the restart has the strong enough 
> justification
> to create this new file format.
>

Yes, this doesn't look very efficient.

> Therefore, in my humble opinion, the idea of protocol change slightly wins,
> since the impact of the protocol change would not be big. We introduced
> the protocol version 3 in the devel version and the number of users should be 
> little.

Yes, but we don't want to break backward compatibility for this small
added optimization.

Amit,

I will work on your comments.

regards,
Ajin Cherian
Fujitsu Australia