date:20220911

Re: Expand palloc/pg_malloc API

2022-09-11 Thread Peter Eisentraut


On 09.09.22 22:13, Tom Lane wrote:

Peter Eisentraut  writes:

I have updated this patch set to rename the _obj() functions to
_object(), and I have dropped the _ptrtype() variants.



I have also split the patch to put the new API and the example uses into
separate patches.


This patch set seems fine to me, so I've marked it Ready for Committer.


committed


I think serious consideration should be given to back-patching the
0001 part (that is, addition of the macros).  Otherwise we'll have
to remember not to use these macros in code intended for back-patch,
and that'll be mighty annoying once we are used to them.


Yes, the 0001 patch is kept separate so that we can do that when we feel 
the time is right.

Re: [PATCH]Feature improvement for MERGE tab completion

2022-09-11 Thread bt22kawamotok


else if (TailMatches("USING", MatchAny, "ON", MatchAny, "WHEN"))
COMPLETE_WITH("MATCHED", "NOT MATCHED");
else if (TailMatches("USING", MatchAny, "AS", MatchAny, "ON",
MatchAny, "WHEN"))
COMPLETE_WITH("MATCHED", "NOT MATCHED");
	else if (TailMatches("USING", MatchAny, MatchAny, "ON", MatchAny, 
"WHEN"))

COMPLETE_WITH("MATCHED", "NOT MATCHED");

I thought it would be better to describe this section as follows,
summarizing the conditions

else if (TailMatches("USING", MatchAny, "ON", MatchAny, "WHEN") ||
			 TailMatches("USING", MatchAny, "AS", MatchAny, "ON", MatchAny, 
"WHEN") ||

 TailMatches("USING", MatchAny, MatchAny, "ON", MatchAny, 
"WHEN"))
COMPLETE_WITH("MATCHED", "NOT MATCHED");

There are similar redundancies in the tab completion of MERGE
statement, so why not fix that as well?


Thanks for your review.

A new patch has been created to reflect the changes you indicated.
I also found a problem that "DO NOTHING" is not completed when type 
"WHEN MATCHED ", so I fixed it as well.


Regards,

Kotaro Kawamotodiff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 62a39779b9..c05b52e49a 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -4068,7 +4068,7 @@ psql_completion(const char *text, int start, int end)
 	/* with [AS] alias */
 	else if (TailMatches("MERGE", "INTO", MatchAny, "AS", MatchAny))
 		COMPLETE_WITH("USING");
-	else if (TailMatches("MERGE", "INTO", MatchAny, MatchAny))
+	else if (TailMatches("MERGE", "INTO", MatchAny, MatchAny) && !TailMatches("AS"))
 		COMPLETE_WITH("USING");
 	else if (TailMatches("MERGE", "INTO", MatchAny, "AS", MatchAny, "USING"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables);
@@ -4095,12 +4095,16 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH("WHEN MATCHED", "WHEN NOT MATCHED");
 	else if (TailMatches("USING", MatchAny, MatchAny, "ON", MatchAny))
 		COMPLETE_WITH("WHEN MATCHED", "WHEN NOT MATCHED");
+	else if (TailMatches("USING", MatchAny, "ON", MatchAny, "WHEN")) ||
+			(TailMatches("USING", MatchAny, "AS", MatchAny, "ON", MatchAny, "WHEN")) ||
+			(TailMatches("USING", MatchAny, MatchAny, "ON", MatchAny, "WHEN"))
+		COMPLETE_WITH("MATCHED", "NOT MATCHED");
 	else if (TailMatches("WHEN", "MATCHED"))
 		COMPLETE_WITH("THEN", "AND");
 	else if (TailMatches("WHEN", "NOT", "MATCHED"))
 		COMPLETE_WITH("THEN", "AND");
 	else if (TailMatches("WHEN", "MATCHED", "THEN"))
-		COMPLETE_WITH("UPDATE", "DELETE");
+		COMPLETE_WITH("UPDATE", "DELETE", "DO NOTHING");
 	else if (TailMatches("WHEN", "NOT", "MATCHED", "THEN"))
 		COMPLETE_WITH("INSERT", "DO NOTHING");

pg_stat_statements locking

2022-09-11 Thread Andrey Borodin

Hi hackers!

Recently I observed very peculiar incident.

=== Incident description ===

ETL database was operating fine for many months, regularly updated etc. 
Workload was not changing much, but as far as it was ETL database - most of 
queries were different all the time.
On the night of September 7th database was stuck, no monitoring query could be 
executed. DBAs started to deal with the incident, but there is not much to do 
with database service, when you cannot execute a single query. According to VM 
metrics, VM was writing a lot on disk.
On-call engineer was summoned. He observed a lot of backends stuck with similar 
backtrace
#5  LWLockAcquire()
#6  pgss_store()
#7  pgss_post_parse_analyze()
#8  parse_analyze()
#9  pg_analyze_and_rewrite()

After restart, problem reproduced within 50 minutes. But monitoring queries 
were operating, what showed that all backends were stuck in LWLock 
pg_stat_statements. It was impossible to disable pgss with SQL, so engineer 
altered auto.conf and restarted database. This resolved the incident.

Later I was working on analyzing the incident. Enabling pgss back showed traces 
of the problem:
Fri 09 Sep 2022 08:52:31 AM MSK (every 2s)

 usename |state| wait_event | cnt 
-+-++-
 content | active  | DataFileRead   |   1
 content | active  | pg_stat_statements |  42
 content | idle in transaction | ClientRead |   2
 pgwatch_monitor | active  | BufFileRead|   1
 pgwatch_monitor | active  | [null] |   5
 pgwatch_monitor | active  | pg_stat_statements |  85
 postgres| active  | [null] |   1
 repl| active  | WalSenderMain  |   2
 [null]  | active  | [null] |   2
 [null]  | active  | VacuumDelay|   7
(10 rows)

pgwatch was quering 60 databases, every minute and each call to 
pg_stat_statements() took approximately 3-4 seconds.
Backend that was in charge of grand lock was looking like this in 
pg_stat_statements:

datid| 127782
pid  | 4077900
usename  | pgwatch_monitor
application_name | pgwatch2 - 10.96.17.68
wait_event_type  | IO
wait_event   | BufFileWrite
state| active
backend_xid  | [null]
backend_xmin | 67048029

The contents of pg_stat_statements view overrun work_mem and were materialized 
in tuplestore on disk. This is what cause a lot of disk write on database that 
was not accepting any user query.



TLDR: LWLock "pg_stat_statements" disabled all SQL queries.


=== How the problem develops ===
Prerequisite 1. pgwatch is quering pgss often.
Prerequisite 2. pgss becomes big so that tuplestore is written on disk, while 
holding shared lock.
Prerequisite 3. Someone is calling reset() or pgss_store() needing exclusive 
lock.

Consequence. Exclusive lock queues after long held shared lock and prevents all 
shared locks to be taken.
Result. Any query calling pgss hooks hangs.


=== Reproduction for development purposes ===
0. Setup a database with pg_stat_statements.track = all.
1. Modify pg_stat_statements_internal() to wait for a long time under 
LWLockAcquire(pgss->lock, LW_SHARED).
2. select * from pg_stat_statements()
3. select pg_stat_statements_reset()

Now the database is bonked. Any query will hang until pg_stat_statements() 
finishes.


=== How to fix ===
pgss uses LWLock to protect hashtable.
When the hashtable is reset or new user query is inserted in pgss_store() - 
exclusive lock is used.
When stats are updated for known query or pg_stat_statements are read - shared 
lock is used.

I propose to swap shared lock for stats update to conditional shared lock.
It cannot be taken during reset() - and that's totally fine. It cannot be taken 
during entering new query - and I think it's OK to spill some stats in this 
case. PFA patch attached.

This patch prevents from a disaster incurred by described here locking.


=== Other possible mitigation ===
The incident would not occur if tuplestore did not convert into on-disk 
representation. But I don't see how to reliably protect from this.

What do you think?


Thank!


Best regards, Andrey Borodin.


v1-0001-Demonstrate-and-fix-lock-of-all-SQL-queries-by-pg.patch
Description: Binary data

Re: Switching XLog source from archive to streaming when primary available

2022-09-11 Thread Bharath Rupireddy

On Mon, Sep 12, 2022 at 9:03 AM Bharath Rupireddy
 wrote:
>
> Please review the attached v4 patch addressing above review comments.

Oops, there's a compiler warning [1] with the v4 patch, fixed it.
Please review the attached v5 patch.

[1] https://cirrus-ci.com/task/5730076611313664?logs=gcc_warning#L450

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 857415757ec21bd8b0195c17694618a1cbf22a57 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Mon, 12 Sep 2022 04:44:35 +
Subject: [PATCH v5] Allow standby to switch WAL source from archive to
 streaming replication

A standby typically switches to streaming replication (get WAL
from primary), only when receive from WAL archive finishes (no
more WAL left there) or fails for any reason. Reading WAL from
archive may not always be efficient and cheaper because network
latencies, disk IO cost might differ on the archive as compared
to primary and often the archive may sit far from standby
impacting recovery performance on the standby. Hence reading WAL
from the primary, by setting this parameter, as opposed to the
archive enables the standby to catch up with the primary sooner
thus reducing replication lag and avoiding WAL files accumulation
on the primary.

This feature adds a new GUC that specifies amount of time after
which standby attempts to switch WAL source from WAL archive to
streaming replication. If the standby fails to switch to stream
mode, it falls back to archive mode.

Reported-by: SATYANARAYANA NARLAPURAM
Author: Bharath Rupireddy
Reviewed-by: Cary Huang, Nathan Bossart
Reviewed-by: Kyotaro Horiguchi
Discussion: https://www.postgresql.org/message-id/CAHg+QDdLmfpS0n0U3U+e+dw7X7jjEOsJJ0aLEsrtxs-tUyf5Ag@mail.gmail.com
---
 doc/src/sgml/config.sgml  |  45 +++
 src/backend/access/transam/xlogrecovery.c | 124 +++--
 src/backend/utils/misc/guc.c  |  12 ++
 src/backend/utils/misc/postgresql.conf.sample |   4 +
 src/include/access/xlogrecovery.h |   1 +
 src/test/recovery/t/034_wal_source_switch.pl  | 126 ++
 6 files changed, 301 insertions(+), 11 deletions(-)
 create mode 100644 src/test/recovery/t/034_wal_source_switch.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a5cd4e44c7..694da93b8c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4840,6 +4840,51 @@ ANY num_sync ( 
+  streaming_replication_retry_interval (integer)
+  
+   streaming_replication_retry_interval configuration parameter
+  
+  
+  
+   
+Specifies amount of time after which standby attempts to switch WAL
+source from WAL archive to streaming replication (get WAL from
+primary). If the standby fails to switch to stream mode, it falls back
+to archive mode.
+If this value is specified without units, it is taken as milliseconds.
+The default is five minutes (5min).
+With a lower setting of this parameter, the standby makes frequent
+WAL source switch attempts when the primary is lost for quite longer.
+To avoid this, set a reasonable value.
+A setting of 0 disables the feature. When disabled,
+the standby typically switches to stream mode, only when receive from
+WAL archive finishes (no more WAL left there) or fails for any reason.
+This parameter can only be set in
+the postgresql.conf file or on the server
+command line.
+   
+   
+Reading WAL from archive may not always be efficient and cheaper
+because network latencies, disk IO cost might differ on the archive as
+compared to primary and often the archive may sit far from standby
+impacting recovery performance on the standby. Hence reading WAL
+from the primary, by setting this parameter, as opposed to the archive
+enables the standby to catch up with the primary sooner thus reducing
+replication lag and avoiding WAL files accumulation on the primary.
+  
+   
+Note that the standby may not always attempt to switch source from
+WAL archive to streaming replication at exact
+streaming_replication_retry_interval intervals.
+For example, if the parameter is set to 1min and
+fetching from WAL archive takes 5min, then the
+source switch attempt happens for the next WAL after current WAL is
+fetched from WAL archive and applied.
+  
+  
+ 
+
  
   recovery_min_apply_delay (integer)
   
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 9a80084a68..35f7985e65 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -65,6 +65,12 @@
 #define RECOVERY_COMMAND_FILE	"recovery.conf"
 #define RECOVERY_COMMAND_DONE	"recovery.done"
 
+

Re: [PATCH] initdb: do not exit after warn_on_mount_point

2022-09-11 Thread Tom Lane

Julien Rouhaud  writes:
> On Sun, Sep 11, 2022 at 06:22:21PM +, andrey.ara...@nixaid.com wrote:
>> Everyone using containerized postgresql image cannot use /var/lib/postgresql
>> as the mountpoint but has to use /var/lib/postgresql/data instead due to this
>> issue [4] due to [5].

> initdb had this behavior for almost 10 years, and for good reasons: it 
> prevents
> actual problems that were seen in the field.

The long and the short of this is that one person losing their data
outweighs thousands of people being very mildly inconvenienced by
having to create an extra level of directory.  I understand that you
don't think so, but you'd change your mind PDQ if it was *your* data
that got irretrievably corrupted.

We are not going to remove this check.

If anything, the fault I'd find with the existing code is that it's not
sufficiently thorough about detecting what is a mount point.  AFAICS,
neither the lost+found check nor the extra-dot-files check will trigger
on modern filesystems such as XFS.  I wonder if we could do something
like comparing the stat(2) st_dev numbers for the proposed data directory
vs. its parent directory.

regards, tom lane

Re: broken table formatting in psql

2022-09-11 Thread Pavel Stehule

po 12. 9. 2022 v 7:37 odesílatel John Naylor 
napsal:

> On Thu, Sep 8, 2022 at 12:51 PM Pavel Stehule 
> wrote:
> >
> >
> >> Does anyone want to advocate for backpatching these missing ranges to
> >> v12 and up? v12 still has a table in-line so trivial to remedy, but
> >> v13 and up use a script, so these exceptions would likely have to use
> >> hard-coded branches to keep from bringing in new changes.
> >>
> >> If so, does anyone want to advocate for including this patch in v15?
> >> It claims Unicode 14.0.0, and this would make that claim more
> >> technically correct as well as avoiding additional branches.
> >
> >
> > I think it can be fixed just in v15 and master.  This issue has no
> impact on SQL.
>
> Well, if the regressions from v11 are not important enough to
> backpatch, there is not as much of a case to backpatch the full fix to
> v15 either.
>

This is not a critical issue, really.  On second thought, I don't see the
point in releasing fresh Postgres with this bug, where there is know bugfix
- and this bugfix should be compatible (at this moment) with 16.

PostgreSQL 15 was not released yet.

Regards

Pavel

-- 
> John Naylor
> EDB: http://www.enterprisedb.com
>

Re: broken table formatting in psql

2022-09-11 Thread John Naylor

On Thu, Sep 8, 2022 at 12:51 PM Pavel Stehule  wrote:
>
>
>> Does anyone want to advocate for backpatching these missing ranges to
>> v12 and up? v12 still has a table in-line so trivial to remedy, but
>> v13 and up use a script, so these exceptions would likely have to use
>> hard-coded branches to keep from bringing in new changes.
>>
>> If so, does anyone want to advocate for including this patch in v15?
>> It claims Unicode 14.0.0, and this would make that claim more
>> technically correct as well as avoiding additional branches.
>
>
> I think it can be fixed just in v15 and master.  This issue has no impact on 
> SQL.

Well, if the regressions from v11 are not important enough to
backpatch, there is not as much of a case to backpatch the full fix to
v15 either.
-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: Handle infinite recursion in logical replication setup

2022-09-11 Thread vignesh C

On Fri, 9 Sept 2022 at 11:12, Amit Kapila  wrote:
>
> On Thu, Sep 8, 2022 at 9:32 AM vignesh C  wrote:
> >
> >
> > The attached patch has the changes to handle the same.
> >
>
> Pushed. I am not completely sure whether we want the remaining
> documentation patch in this thread in its current form or by modifying
> it. Johnathan has shown some interest in it. I feel you can start a
> separate thread for it to see if there is any interest in the same and
> close the CF entry for this work.

Thanks for pushing the patch. I have closed this entry in commitfest.
I will wait for some more time and see the response regarding the
documentation patch and then start a new thread if required.

Regards,
Vignesh

Re: [PATCH] initdb: do not exit after warn_on_mount_point

2022-09-11 Thread Julien Rouhaud

On Sun, Sep 11, 2022 at 06:22:21PM +, andrey.ara...@nixaid.com wrote:
> September 11, 2022 12:18 PM, "Julien Rouhaud"  wrote:
>
> > Hi,
> >
> > On Sat, Sep 10, 2022 at 07:14:59PM +, andrey.ara...@nixaid.com wrote:
> >
> >> Have slightly improved the logic so that it does not report an error
> >> "directory \"%s\" exists but is not empty"
> >> when it is only supposed to warn the user about the mountpoint, without
> >> exiting.
> >>
> >> To me, my patch looks like a typo fix where exit(1) should not be called on
> >> the warn_on_mount_point(), but only warn and continue as more people are
> >> mounting the device at `/var/lib/postgresql/data` (PGDATA) in the
> >> containerized world (K8s deployments, especially now in the Akash Network I
> >> am working for) for making sure their data persist.
> >
> > This definitely isn't a typo, as it's really strongly discouraged to use a
> > mount point for the data directory. You can refer to this thread [1] for 
> > more
> > explanations.
> >
> > [1]
> > https://www.postgresql.org/message-id/flat/CAKoxK+6H40imynM5P31bf0DnpN-5f5zeROjcaj6BKVAjxdEA9w@mail.
> > mail.com
>
>
> I've read the "why not using a mountpoint as PGDATA?" thread [1] as well as
> Bugzilla "postgresql-setup fails when /var/lib/pgsql/data is mount point"
> thead [2] but could not find any good reason why not to mount the PGDATA
> directly, except probably for the NFS mount point, but who does that anyway?

What about this part in Tom's original answer:

3. If, some day, the filesystem is accidentally unmounted while the database is
up, it will continue to write into files that are now getting placed in the
mount-point directory on the parent volume.  This usually results in an
unrecoverably messed-up database by the time you realize what's going wrong.
(There are horror stories about such cases in the PG community mailing list
archives, dating from before we installed the don't-use-a-mount-point defenses
in initdb.)

> Everyone using containerized postgresql image cannot use /var/lib/postgresql
> as the mountpoint but has to use /var/lib/postgresql/data instead due to this
> issue [4] due to [5].  Hence, everyone using containerized version of
> postgresql with the device (say Ceph's RBD) mounted over
> /var/lib/postgresql/data directory has to either specify:
>
> - PGDATA=/var/lib/postgresql/data/
>
> OR
>
> make sure to remove $PGDATA/lost+found directory.

initdb had this behavior for almost 10 years, and for good reasons: it prevents
actual problems that were seen in the field.

It's unfortunate that the docker postgres images didn't take that behavior into
account, which was introduced more than a year before any work started on those
images, but if you're not happy with those workarounds it's something that
should be changed in the docker images.

> Both of these hacks are only for the initdb to fail detect the mountpoint
> which, on its own, is supposed to be only a warning which is seen from the
> function name warn_on_mount_point() and its messages [6] look to be of only
> advisory nature to me.

As Tom confirmed at [1], you shouldn't assume anything about the criticality
based on the function name.  If anything, the "warn" part is only saying that
the function itself won't exit(1).  And yes this function is only adding
information on the reason why the given directory can't be used, but it doesn't
change the fact that the given directory can't be used.

[1] https://www.postgresql.org/message-id/813162.1662908...@sss.pgh.pa.us

Re: [PATCH] Clarify the comments about varlena header encoding

2022-09-11 Thread John Naylor

On Sun, Sep 11, 2022 at 5:06 PM Aleksander Alekseev
 wrote:
>
> Hi John,
>
> Many thanks for the feedback!
>
> > Or put the now-standard 0b in front.
>
> Good idea.

Now that I look at the results, though, it's distracting and not good
for readability. I'm not actually sure we need to do anything here,
but I am somewhat in favor of putting [un]aligned in parentheses, as
already discussed. Even there, in the first email you said:

> Also one can read this as "aligned, uncompressed
> data" (which again is wrong).

I'm not sure it rises to the level of "wrong", because a blob of bytes
immediately after an aligned uint32 is in fact aligned. The important
thing is: a zero byte is always either a padding byte or part of a
4-byte header, so it's the alignment of the header we really care
about.

> > I think the problem is ambiguity about what a "toast pointer" is. This 
> > comment:
> >
> > * struct varatt_external is a traditional "TOAST pointer", that is, the
>
> Right. The comment for varatt_external says that it IS a TOAST
> pointer.

Well, the word "traditional" is not very informative, but it is there.
And afterwards there is also varatt_indirect, varatt_expanded, and
varattrib_1b_e, which all mention "TOAST pointer".

> However the comments for varlena headers bit layout
> implicitly include it into a TOAST pointer, which contradicts the
> previous comments. I suggest we fix this ambiguity by explicitly
> enumerating the type tag in the comments for varlena headers.

- * 1000 1-byte length word, unaligned, TOAST pointer
+ * 0b1000 1-byte length word (unaligned), type tag, TOAST pointer

This is distracting from the point of this whole comment, which, I
will say again is: How to look at the first byte to determine what
kind of varlena we're looking at. There is no reason to mention the
type tag here, at all.

- * In TOAST pointers the va_tag field (see varattrib_1b_e) is used to discern
- * the specific type and length of the pointer datum.
+ * For the TOAST pointers the type tag (see varattrib_1b_e.va_tag field) is
+ * used to discern the specific type and length of the pointer datum.

I don't think this clarifies anything, it's just a rephrasing.

More broadly, I think the description of varlenas in this header is at
a kind of "local maximum" -- minor adjustments are more likely to make
it worse. To significantly improve clarity might require a larger
rewriting, but I'm not personally interested in taking part in that.

-- 
John Naylor
EDB: http://www.enterprisedb.com

RE: why can't a table be part of the same publication as its schema

2022-09-11 Thread houzj.f...@fujitsu.com

On Monday, September 12, 2022 1:08 AM Mark Dilger 
 wrote:
> > > On Sep 10, 2022, at 4:17 PM, Robert Haas  wrote:
> >
> >>> I don't understand why we
> >>> used this ALL TABLES IN SCHEMA language.
> >>
> >> The conversation, as I recall, was that "ADD SCHEMA foo" would only mean
> all tables in foo, until publication of other object types became supported, 
> at
> which point "ADD SCHEMA foo" would suddenly mean more than it did before.
> People might find that surprising, so the "ALL TABLES IN" was intended to
> future-proof against surprising behavioral changes.
> >
> > If I encountered this syntax in a vacuum, that's not what I would
> > think. I would think that ADD ALL TABLES IN SCHEMA meant add all the
> > tables in the schema to the publication one by one as individual
> > objects
> 
> Yes, it appears the syntax was chosen to avoid one kind of confusion, but 
> created
> another kind.  Per the docs on this feature:
> 
>   FOR ALL TABLES IN SCHEMA
>   Marks the publication as one that replicates changes for all tables in the
> specified list of schemas, including tables created in the future.
> 
> Like you, I wouldn't expect that definition, given the behavior of GRANT with
> respect to the same grammatical construction.

I'm a bit unsure if it should be compared to GRANT. Because even if we chose
"ALTER PUBLICATION p1 { ADD | DROP } SCHEMA name", it's also not
consistent with the meaning of GRANT ON SCHEMA, as GRANT ON SCHEMA doesn't
grant rights on the tables within schema if I understand correctly.

I feel we'd better compare the syntax with the existing publication command:
FOR ALL TABLES. If you create a publication FOR ALL TABLES, it means publishing
all the tables in the database *including* tables created in the future. I
think both the syntax and meaning of ALL TABLES IN SCHEMA are consistent with
the existing FOR ALL TABLES.

And the behavior is clearly documented, so personally I think it's fine.
https://www.postgresql.org/docs/devel/sql-createpublication.html
--
FOR ALL TABLES
Marks the publication as one that replicates changes for all tables in 
the database, including tables created in the future.
FOR ALL TABLES IN SCHEMA
Marks the publication as one that replicates changes for all tables in 
the specified list of schemas, including tables created in the future.
--

Besides, as mentioned(and suggested by Tom[1]), we might support publishing
SEQUENCE(or others) in the future. It would give more flexibility to user if we
have another FOR ALL SEQUENCES(or other objects) IN SCHEMA.

[1] https://www.postgresql.org/message-id/155565.1628954580%40sss.pgh.pa.us

Best regards,
Hou zj

Re: list of acknowledgments for PG15

2022-09-11 Thread Etsuro Fujita

On Thu, Sep 8, 2022 at 9:13 PM Peter Eisentraut
 wrote:
> Attached is the plain-text list of acknowledgments for the PG15 release
> notes, current through REL_15_BETA4.  Please check for problems such as
> wrong sorting, duplicate names in different variants, or names in the
> wrong order etc.  (Note that the current standard is given name followed
> by surname, independent of cultural origin.)

Thanks as usual!

I think these are Japanese names that are in the
surname-followed-by-given-name order:

Kamigishi Rei
Kawamoto Masaya
Okano Naoki

Best regards,
Etsuro Fujita

Re: Switching XLog source from archive to streaming when primary available

2022-09-11 Thread Bharath Rupireddy

On Sat, Sep 10, 2022 at 3:35 AM Nathan Bossart  wrote:
>
> Well, for wal_keep_size, using bytes makes sense.  Given you know how much
> disk space you have, you can set this parameter accordingly to avoid
> retaining too much of it for standby servers.  For your proposed parameter,
> it's not so simple.  The same setting could have wildly different timing
> behavior depending on the server.  I still think that a timeout is the most
> intuitive.

Hm. In v3 patch, I've used the timeout approach, but tracking the
duration server spends in XLOG_FROM_ARCHIVE as opposed to tracking
last failed time in streaming from primary.

> So I think it
> would be difficult for the end user to decide on a value.  However, even
> the timeout approach has this sort of problem.  If your parameter is set to
> 1 minute, but the current archive takes 5 minutes to recover, you won't
> really be testing streaming replication once a minute.  That would likely
> need to be documented.

Added a note in the docs.

On Fri, Sep 9, 2022 at 10:46 AM Kyotaro Horiguchi
 wrote:
>
> Being late for the party.

Thanks for reviewing this.

> It seems to me that the function is getting too long.  I think we
> might want to move the core part of the patch into another function.

Yeah, the WaitForWALToBecomeAvailable() (without this patch) has
around 460 LOC out of which WAL fetching from the chosen source is of
240 LOC, IMO, this code will be a candidate for a new function. I
think that part can be discussed separately.

Having said that, I moved the new code to a new function.

> I think it might be better if intentionalSourceSwitch doesn't need
> lastSourceFailed set. It would look like this:
>
> >  if (lastSourceFailed || switchSource)
> >  {
> > if (nonblocking && lastSourceFailed)
> >return XLREAD_WOULDBLOCK;

I think the above looks good, done that way in the latest patch.

> I don't think the flag first_time is needed.

Addressed this in the v4 patch.

Please review the attached v4 patch addressing above review comments.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

v4-0001-Allow-standby-to-switch-WAL-source-from-archive-t.patch
Description: Binary data

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Michael Paquier

On Mon, Sep 12, 2022 at 02:34:48PM +1200, Thomas Munro wrote:
> On Mon, Sep 12, 2022 at 2:27 PM Justin Pryzby  wrote:
>> Yeah ... I just realized that I've already forgotten the relevant
>> chronology.
>>
>> The io_concurrency bugfix wasn't included in 15b4, so (if I understood
>> you correctly), that might explain these symptoms - right ?
> 
> Yeah.

Could you double-check if the issues you are seeing persist after
syncing up with the latest point of REL_15_STABLE?  For now, I have
added an open item just to be on the safe side.
--
Michael


signature.asc
Description: PGP signature

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Thomas Munro

On Mon, Sep 12, 2022 at 2:27 PM Justin Pryzby  wrote:
> On Mon, Sep 12, 2022 at 02:25:48PM +1200, Thomas Munro wrote:
> > On Mon, Sep 12, 2022 at 1:42 PM Justin Pryzby  wrote:
> > > But yesterday I started from initdb and restored this cluster from 
> > > backup, and
> > > started up sqlsmith, and sent some kill -9, and now got more corruption.
> > > Looks like it took ~10 induced crashes before this happened.
> >
> > $SUBJECT says 15b4, which doesn't have the fix.  Are you still using
> > maintainance_io_concurrent=0?
>
> Yeah ... I just realized that I've already forgotten the relevant
> chronology.
>
> The io_concurrency bugfix wasn't included in 15b4, so (if I understood
> you correctly), that might explain these symptoms - right ?

Yeah.

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Justin Pryzby

On Mon, Sep 12, 2022 at 02:25:48PM +1200, Thomas Munro wrote:
> On Mon, Sep 12, 2022 at 1:42 PM Justin Pryzby  wrote:
> > On Mon, Sep 12, 2022 at 10:44:38AM +1200, Thomas Munro wrote:
> > > On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby  
> > > wrote:
> > > > < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR:  MultiXactId 133553154 
> > > > has not been created yet -- apparent wraparound
> > >
> > > I guess what happened here is that after one of your (apparently
> > > several?) OOM crashes, crash recovery didn't run all the way to the
> > > true end of the WAL due to the maintenance_io_concurrency=0 bug.  In
> > > the case you reported, it couldn't complete an end-of-recovery
> > > checkpoint until you disabled recovery_prefetch, but that's only
> > > because of the somewhat unusual way that vismap pages work.  In
> > > another case it might have been able to (bogusly) complete a
> > > checkpoint, leaving things in an inconsistent state.
> >
> > I think you're saying is that this can be explained by the
> > io_concurrency bug in recovery_prefetch, if run under 15b3.
> 
> Well I don't know, but it's one way I could think of that you could
> have a data page referring to a multixact that isn't on disk after
> recovery (because the data page happens to have been flushed, but we
> didn't replay the WAL that would create the multixact).
> 
> > But yesterday I started from initdb and restored this cluster from backup, 
> > and
> > started up sqlsmith, and sent some kill -9, and now got more corruption.
> > Looks like it took ~10 induced crashes before this happened.
> 
> $SUBJECT says 15b4, which doesn't have the fix.  Are you still using
> maintainance_io_concurrent=0?

Yeah ... I just realized that I've already forgotten the relevant
chronology.

The io_concurrency bugfix wasn't included in 15b4, so (if I understood
you correctly), that might explain these symptoms - right ?

-- 
Justin

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Thomas Munro

On Mon, Sep 12, 2022 at 1:42 PM Justin Pryzby  wrote:
> On Mon, Sep 12, 2022 at 10:44:38AM +1200, Thomas Munro wrote:
> > On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby  wrote:
> > > < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR:  MultiXactId 133553154 
> > > has not been created yet -- apparent wraparound
> >
> > I guess what happened here is that after one of your (apparently
> > several?) OOM crashes, crash recovery didn't run all the way to the
> > true end of the WAL due to the maintenance_io_concurrency=0 bug.  In
> > the case you reported, it couldn't complete an end-of-recovery
> > checkpoint until you disabled recovery_prefetch, but that's only
> > because of the somewhat unusual way that vismap pages work.  In
> > another case it might have been able to (bogusly) complete a
> > checkpoint, leaving things in an inconsistent state.
>
> I think you're saying is that this can be explained by the
> io_concurrency bug in recovery_prefetch, if run under 15b3.

Well I don't know, but it's one way I could think of that you could
have a data page referring to a multixact that isn't on disk after
recovery (because the data page happens to have been flushed, but we
didn't replay the WAL that would create the multixact).

> But yesterday I started from initdb and restored this cluster from backup, and
> started up sqlsmith, and sent some kill -9, and now got more corruption.
> Looks like it took ~10 induced crashes before this happened.

$SUBJECT says 15b4, which doesn't have the fix.  Are you still using
maintainance_io_concurrent=0?

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Peter Geoghegan

On Sun, Sep 11, 2022 at 6:42 PM Justin Pryzby  wrote:
> I think you're saying is that this can be explained by the
> io_concurrency bug in recovery_prefetch, if run under 15b3.
>
> But yesterday I started from initdb and restored this cluster from backup, and
> started up sqlsmith, and sent some kill -9, and now got more corruption.
> Looks like it took ~10 induced crashes before this happened.

Have you tested fsync on the system?

The symptoms here are all over the place. This assertion failure seems
like a pretty good sign that the problems happen during recovery, or
because basic guarantees needed by for crash safety aren't met:

> #2  0x00962c5c in ExceptionalCondition 
> (conditionName=conditionName@entry=0x9ce238 "P_ISLEAF(opaque) && 
> !P_ISDELETED(opaque)", errorType=errorType@entry=0x9bad97 "FailedAssertion",
> fileName=fileName@entry=0x9cdcd1 "nbtpage.c", 
> lineNumber=lineNumber@entry=1778) at assert.c:69
> #3  0x00507e34 in _bt_rightsib_halfdeadflag 
> (rel=rel@entry=0x7f4138a238a8, leafrightsib=leafrightsib@entry=53) at 
> nbtpage.c:1778
> #4  0x00507fba in _bt_mark_page_halfdead 
> (rel=rel@entry=0x7f4138a238a8, leafbuf=leafbuf@entry=13637, 
> stack=stack@entry=0x144ca20) at nbtpage.c:2121

This shows that the basic rules for page deletion have somehow
seemingly been violated. It's as if a page deletion went ahead, but
didn't work as an atomic operation -- there were some lost writes for
some but not all pages. Actually, it looks like a mix of states from
before and after both the first and the second phases of page deletion
-- so not just one atomic operation.

-- 
Peter Geoghegan

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Justin Pryzby

On Mon, Sep 12, 2022 at 10:44:38AM +1200, Thomas Munro wrote:
> On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby  wrote:
> > < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR:  MultiXactId 133553154 has 
> > not been created yet -- apparent wraparound
> 
> I guess what happened here is that after one of your (apparently
> several?) OOM crashes, crash recovery didn't run all the way to the
> true end of the WAL due to the maintenance_io_concurrency=0 bug.  In
> the case you reported, it couldn't complete an end-of-recovery
> checkpoint until you disabled recovery_prefetch, but that's only
> because of the somewhat unusual way that vismap pages work.  In
> another case it might have been able to (bogusly) complete a
> checkpoint, leaving things in an inconsistent state.

I think you're saying is that this can be explained by the
io_concurrency bug in recovery_prefetch, if run under 15b3.

But yesterday I started from initdb and restored this cluster from backup, and
started up sqlsmith, and sent some kill -9, and now got more corruption.
Looks like it took ~10 induced crashes before this happened.

At the moment, I have no reason to believe this issue is related to
prefetch_recovery; I am wondering about changes to vacuum.

< 2022-09-11 20:19:03.071 CDT telsasoft >ERROR:  MultiXactId 732646 has not 
been created yet -- apparent wraparound
< 2022-09-11 20:24:00.530 CDT telsasoft >ERROR:  MultiXactId 732646 has not 
been created yet -- apparent wraparound

Program terminated with signal 6, Aborted.
#0  0x7f413716b1f7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install 
glibc-2.17-196.el7_4.2.x86_64 libgcc-4.8.5-44.el7.x86_64 
libxml2-2.9.1-6.el7_9.6.x86_64 lz4-1.8.3-1.el7.x86_64 
xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  0x7f413716b1f7 in raise () from /lib64/libc.so.6
#1  0x7f413716c8e8 in abort () from /lib64/libc.so.6
#2  0x00962c5c in ExceptionalCondition 
(conditionName=conditionName@entry=0x9ce238 "P_ISLEAF(opaque) && 
!P_ISDELETED(opaque)", errorType=errorType@entry=0x9bad97 "FailedAssertion", 
fileName=fileName@entry=0x9cdcd1 "nbtpage.c", 
lineNumber=lineNumber@entry=1778) at assert.c:69
#3  0x00507e34 in _bt_rightsib_halfdeadflag 
(rel=rel@entry=0x7f4138a238a8, leafrightsib=leafrightsib@entry=53) at 
nbtpage.c:1778
#4  0x00507fba in _bt_mark_page_halfdead (rel=rel@entry=0x7f4138a238a8, 
leafbuf=leafbuf@entry=13637, stack=stack@entry=0x144ca20) at nbtpage.c:2121
#5  0x0050af1d in _bt_pagedel (rel=rel@entry=0x7f4138a238a8, 
leafbuf=leafbuf@entry=13637, vstate=vstate@entry=0x7ffef18de8b0) at 
nbtpage.c:2004
#6  0x0050c996 in btvacuumpage (vstate=vstate@entry=0x7ffef18de8b0, 
scanblkno=scanblkno@entry=36) at nbtree.c:1342
#7  0x0050caf8 in btvacuumscan (info=info@entry=0x7ffef18deac0, 
stats=stats@entry=0x142fb70, callback=callback@entry=0x67e89b , 
callback_state=callback_state@entry=0x1461220, cycleid=)
at nbtree.c:997
#8  0x0050cc2f in btbulkdelete (info=0x7ffef18deac0, stats=0x142fb70, 
callback=0x67e89b , callback_state=0x1461220) at nbtree.c:801
#9  0x004fc64b in index_bulk_delete (info=info@entry=0x7ffef18deac0, 
istat=istat@entry=0x0, callback=callback@entry=0x67e89b , 
callback_state=callback_state@entry=0x1461220) at indexam.c:701
#10 0x0068108c in vac_bulkdel_one_index 
(ivinfo=ivinfo@entry=0x7ffef18deac0, istat=istat@entry=0x0, 
dead_items=0x1461220) at vacuum.c:2324
#11 0x004f72ae in lazy_vacuum_one_index (indrel=, 
istat=0x0, reltuples=, vacrel=vacrel@entry=0x142f100) at 
vacuumlazy.c:2726
#12 0x004f738b in lazy_vacuum_all_indexes 
(vacrel=vacrel@entry=0x142f100) at vacuumlazy.c:2328
#13 0x004f75df in lazy_vacuum (vacrel=vacrel@entry=0x142f100) at 
vacuumlazy.c:2261
#14 0x004f7f14 in lazy_scan_heap (vacrel=vacrel@entry=0x142f100) at 
vacuumlazy.c:1264
#15 0x004f895f in heap_vacuum_rel (rel=0x7f4138a67c00, 
params=0x143cbec, bstrategy=0x143ea20) at vacuumlazy.c:534
#16 0x0067f62b in table_relation_vacuum (bstrategy=, 
params=0x143cbec, rel=0x7f4138a67c00) at 
../../../src/include/access/tableam.h:1680
#17 vacuum_rel (relid=1249, relation=, 
params=params@entry=0x143cbec) at vacuum.c:2086
#18 0x0068065c in vacuum (relations=0x144a118, 
params=params@entry=0x143cbec, bstrategy=, 
bstrategy@entry=0x143ea20, isTopLevel=isTopLevel@entry=true) at vacuum.c:475
#19 0x00796a0e in autovacuum_do_vac_analyze (tab=tab@entry=0x143cbe8, 
bstrategy=bstrategy@entry=0x143ea20) at autovacuum.c:3149
#20 0x007987bf in do_autovacuum () at autovacuum.c:2472
#21 0x00798e72 in AutoVacWorkerMain (argc=argc@entry=0, 
argv=argv@entry=0x0) at autovacuum.c:1715
#22 0x00798eed in StartAutoVacWorker () at autovacuum.c:1493
#23 0x0079fe49 in StartAutovacuumWorker () at postmaster.c:5534
#24 0x007a0c44 in sigusr1_handler (postgres_signal_arg=) 
at postmaster.c:5239
#25 
#26 0x7f4

Re: Background writer and checkpointer in crash recovery

2022-09-11 Thread Justin Pryzby

On Tue, Aug 03, 2021 at 02:19:22PM +1200, Thomas Munro wrote:
> On Tue, Aug 3, 2021 at 9:52 AM Thomas Munro  wrote:
> > On Tue, Aug 3, 2021 at 1:17 AM Robert Haas  wrote:
> > > That's great. I just realized that this leaves us with identical
> > > RequestCheckpoint() calls in two nearby places. Is there any reason
> > > not to further simplify as in the attached?
> >
> > LGTM.
> 
> And pushed.

Gripe: this made the "ps" display worse than before.

7ff23c6d2 Run checkpointer and bgwriter in crash recovery.

A couple years ago, I complained that during the end-of-recovery
checkpoint, the "ps" display still said "recovering N", which made
it look like it was stuck on a particular WAL file.

That led to commit df9274adf, which updated the startup process's "ps"
to say "end-of-recovery checkpoint".

But since the start process no longer does the checkpoint, it still
says:

postgres 19738 11433  5 19:33 ?00:00:01 postgres: startup recovering 
0001000C00FB
postgres 19739 11433  3 19:33 ?00:00:00 postgres: checkpointer 
performing end-of-recovery checkpoint

That looks inconsistent.  It'd be better if the startup process's "ps"
were cleared.

-- 
Justin

Re: Bump MIN_WINNT to 0x0600 (Vista) as minimal runtime in 16~

2022-09-11 Thread Michael Paquier

On Mon, Sep 12, 2022 at 09:42:25AM +1200, Thomas Munro wrote:
> When looking the function up it made sense to use the name ending in
> '...A', but when calling directly I think we shouldn't use the A
> suffix, we should let the  macros do that for us[1].  (I
> wondered for a moment if that would even make Windows and Unix code
> the same, but sadly not due to the extra NULL arguments.)

Good idea, I did not noticed this part.  This should work equally, so
done this way and applied.  I am keeping an eye on the buildfarm.
--
Michael


signature.asc
Description: PGP signature

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Thomas Munro

On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby  wrote:
> < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR:  MultiXactId 133553154 has 
> not been created yet -- apparent wraparound

I guess what happened here is that after one of your (apparently
several?) OOM crashes, crash recovery didn't run all the way to the
true end of the WAL due to the maintenance_io_concurrency=0 bug.  In
the case you reported, it couldn't complete an end-of-recovery
checkpoint until you disabled recovery_prefetch, but that's only
because of the somewhat unusual way that vismap pages work.  In
another case it might have been able to (bogusly) complete a
checkpoint, leaving things in an inconsistent state.

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

2022-09-11 Thread Thomas Munro

On Sat, Sep 10, 2022 at 5:01 PM Justin Pryzby  wrote:
> BTW, after a number of sigabrt's, I started seeing these during
> recovery:
>
> < 2022-09-09 19:44:04.180 CDT  >LOG:  unexpected pageaddr 1214/AF0FE000 in 
> log segment 0001121400B4, offset 1040384
> < 2022-09-09 23:20:50.830 CDT  >LOG:  unexpected pageaddr 1214/CF65C000 in 
> log segment 0001121400D8, offset 6668288

That's just what it looks like when we discover the end of the WAL by
hitting a page that hasn't been overwritten yet in a recycled WAL
segment, so the pageaddr is off by a multiple of 16MB.  Depending on
timing and chance you might be more used to seeing the error where we
hit zeroes in a partially filled page, the famous 'wanted 24, got 0',
and you can also hit a fully zero-initialised page 'invalid magic
number '.  All of these are expected, and more exotic errors are
possible with power loss torn writes or on crash of a streaming
standbys where we currently fail to zero the rest of overwritten
pages.

Re: Splitting up guc.c

2022-09-11 Thread Tom Lane

Here's a v3 that gets rid of guc_hooks.c in favor of moving the
hook functions to related modules (though some did end up in
variables.c for lack of a better idea).  I also pushed all the
hook function declarations to guc_hooks.h.  Unsurprisingly,
removal of guc.h #includes from header files led to discovery
of some surprising indirect dependencies, notably a lot of places
were evidently depending on indirect inclusions of array.h.

I think this is code-complete at this point.  I'd like to not
sit on it too long, because it'll inevitably get side-swiped
by additions of new GUCs.  On the other hand, pushing it in
the middle of a CF would presumably break other people's patches.
Maybe push it at the end of this CF, to give people a month to
rebase anything that's affected?

regards, tom lane



split-up-guc-code-3.patch.gz
Description: split-up-guc-code-3.patch.gz

Re: Bump MIN_WINNT to 0x0600 (Vista) as minimal runtime in 16~

2022-09-11 Thread Thomas Munro

On Sun, Sep 11, 2022 at 12:29 PM Michael Paquier  wrote:
> On Fri, Sep 09, 2022 at 08:11:09PM +0900, Michael Paquier wrote:
> > Based on what I can see, the Windows animals seem to have digested
> > 47bd0b3 (cygwin, MinGW and MSVC), so I think that we are good.

Great, that's a lot of nice cleanup.

> The last part that's worth adjusting is ldap_start_tls_sA(), which
> would lead to the attached simplification.

-if ((r = _ldap_start_tls_sA(*ldap, NULL, NULL, NULL, NULL))
!= LDAP_SUCCESS)
+if ((r = ldap_start_tls_sA(*ldap, NULL, NULL, NULL, NULL)) !=
LDAP_SUCCESS)

When looking the function up it made sense to use the name ending in
'...A', but when calling directly I think we shouldn't use the A
suffix, we should let the  macros do that for us[1].  (I
wondered for a moment if that would even make Windows and Unix code
the same, but sadly not due to the extra NULL arguments.)

[1] 
https://docs.microsoft.com/en-us/windows/win32/api/winldap/nf-winldap-ldap_start_tls_sa

Re: New docs chapter on Transaction Management and related changes

2022-09-11 Thread Robert Treat

On Wed, Sep 7, 2022 at 8:05 AM Simon Riggs  wrote:
>
> On Tue, 6 Sept 2022 at 21:33, Justin Pryzby  wrote:
> >
> > On Tue, Sep 06, 2022 at 04:16:02PM +0100, Simon Riggs wrote:
> > > New chapter on transaction management, plus a few related changes.
> > >
> > > Markup and links are not polished yet, so please comment initially on
> > > the topics, descriptions and wording.
> >
> I've also added further notes about prepared transactions.
>
> I attach a diff against the original patch, plus a new clean copy.
>

Some feedback on the v4 patch, hopefully useful.

+
+Transactions may be started explicitly using BEGIN and COMMIT
commands, known as
+a transaction block. A transaction will also be started and ended
implicitly for
+each request when outside of a transaction block.
+

Transactions may be managed explicitly using BEGIN and COMMIT commands, known as
a transaction block.

+Committed subtransactions are recorded as committed if the main transaction
+commits.  The word subtransaction is often abbreviated to "subxact".
+

Committed subtransactions are only recorded as committed if the main
transaction commits,
otherwise any work done in a subtransaction will be rolled back or
aborted. The word subtransaction is
often abbreviated as "subxact".

+
+Subtransactions may be started explicitly by using the SAVEPOINT
command, but may
+also be started in other ways, such as PL/pgSQL's EXCEPTION clause.
PL/Python and
+PL/TCL also support explicit subtransactions. Working with the C API, users may
+also call BeginInternalSubTransaction().
+

I think this paragraph (or something similar) should be the opening
paragraph for this section, so that readers are immediately given
context for what PostgreSQL considers to be a subtransaction.

+prepared transactions that were prepared before the last checkpoint.
In the typical
+case, shorter-lived prepared transactions are stored only in shared
memory and WAL.
+Currently-prepared transactions can be inspected using the
pg_prepared_xacts view.
+

Transactions that are currently prepared can be inspected using the
pg_prepated_xacts view.

* I thought the hyphenated wording looked odd, though I understand why
you used it. We don't use it elsewhere though (just `currently
prepared` san hyphen) so re-worded to match the other wording.

Robert Treat
https://xzilla.net

Re: Splitting up guc.c

2022-09-11 Thread Tom Lane

I wrote:
> Michael Paquier  writes:
>> One part that I have found a bit strange lately about guc.c is that we
>> have mix the core machinery with the SQL-callable parts.  What do you
>> think about the addition of a gucfuncs.c in src/backend/utils/adt/ to
>> split things a bit more?

> I might be wrong, but I think the SQL-callable stuff makes use
> of some APIs that are currently private in guc.c.  So we'd have
> to expose more API to make that possible.  Maybe that wouldn't
> be a bad thing, but it seems to be getting beyond the original
> idea here.

I tried this just to see, and it worked out better than I thought.
The key extra idea is to also pull out the functions implementing
the SET and SHOW commands, because (unsurprisingly) those are just
about in the same place dependency-wise as the SQL functions, and
they have some common subroutines.

I had to export get_config_unit_name(), config_enum_get_options(),
and _ShowOption() (here renamed to ShowGUCOption()) to make this
work.  That doesn't seem too awful.

v2 attached does this, without any further relocation of hook
functions as yet.  I now see these file sizes:

$ wc guc*c
  2629   9372  69467 guc-file.c
  6425  22282 176816 guc.c
  1048   3005  26962 guc_funcs.c
   939   2693  22915 guc_hooks.c
  4877  13163 126769 guc_tables.c
 15918  50515 422929 total
$ size guc*o
   textdata bss dec hex filename
  13653   4 112   1376935c9 guc-file.o
  46589   0 564   47153b831 guc.o
   8509   0   08509213d guc_funcs.o
   6951   0 11270631b97 guc_hooks.o
  43570   62998 216  106784   1a120 guc_tables.o

So this removes just about a thousand more lines from guc.c,
which seems worth doing.

regards, tom lane

split-up-guc-code-2.patch.gz
Description: split-up-guc-code-2.patch.gz

Re: why can't a table be part of the same publication as its schema

2022-09-11 Thread Mark Dilger

> On Sep 10, 2022, at 4:17 PM, Robert Haas  wrote:
> 
>>> I don't understand why we
>>> used this ALL TABLES IN SCHEMA language.
>> 
>> The conversation, as I recall, was that "ADD SCHEMA foo" would only mean all 
>> tables in foo, until publication of other object types became supported, at 
>> which point "ADD SCHEMA foo" would suddenly mean more than it did before.  
>> People might find that surprising, so the "ALL TABLES IN" was intended to 
>> future-proof against surprising behavioral changes.
> 
> If I encountered this syntax in a vacuum, that's not what I would
> think. I would think that ADD ALL TABLES IN SCHEMA meant add all the
> tables in the schema to the publication one by one as individual
> objects

Yes, it appears the syntax was chosen to avoid one kind of confusion, but 
created another kind.  Per the docs on this feature:

  FOR ALL TABLES IN SCHEMA
  Marks the publication as one that replicates changes for all tables in the 
specified list of schemas, including tables created in the future.

Like you, I wouldn't expect that definition, given the behavior of GRANT with 
respect to the same grammatical construction.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] initdb: do not exit after warn_on_mount_point

2022-09-11 Thread Tom Lane

Julien Rouhaud  writes:
> On Sat, Sep 10, 2022 at 07:14:59PM +, andrey.ara...@nixaid.com wrote:
>> Have slightly improved the logic so that it does not report an error
>> "directory \"%s\" exists but is not empty"
>> when it is only supposed to warn the user about the mountpoint, without
>> exiting.

> This definitely isn't a typo, as it's really strongly discouraged to use a
> mount point for the data directory.

Absolutely.  I think maybe the problem here is that warn_on_mount_point()
is a pretty bad name for that helper function, as this is not "just a
warning".

regards, tom lane

Check SubPlan clause for nonnullable rels/Vars

2022-09-11 Thread Richard Guo

Hi hackers,

While wandering around the codes of reducing outer joins, I noticed that
when determining which base rels/Vars are forced nonnullable by given
clause, we don't take SubPlan into consideration. Does anyone happen to
know what is the concern behind that?

IMO, for SubPlans of type ALL/ANY/ROWCOMPARE, we should be able to find
additional nonnullable rels/Vars by descending through their testexpr.
As we know, ALL_SUBLINK/ANY_SUBLINK combine results across tuples
produced by the subplan using AND/OR semantics. ROWCOMPARE_SUBLINK
doesn't allow multiple tuples from the subplan. So we can tell whether
the subplan is strict or not by checking its testexpr, leveraging the
existing codes in find_nonnullable_rels/vars_walker. Below is an
example:

# explain (costs off)
select * from a left join b on a.i = b.i where b.i = ANY (select i from c
where c.j = b.j);
QUERY PLAN
---
 Hash Join
   Hash Cond: (b.i = a.i)
   ->  Seq Scan on b
 Filter: (SubPlan 1)
 SubPlan 1
   ->  Seq Scan on c
 Filter: (j = b.j)
   ->  Hash
 ->  Seq Scan on a
(9 rows)

BTW, this change would also have impact on SpecialJoinInfo, especially
for the case of identity 3, because find_nonnullable_rels() is also used
to determine strict_relids from the clause. As an example, consider

select * from a left join b on a.i = b.i
left join c on b.j = ANY (select j from c);

Now we can know the SubPlan is strict for 'b'. Thus the b/c join would
be considered to be legal.

Thanks
Richard


v1-0001-Check-SubPlan-clause-for-nonnullable-rels-Vars.patch
Description: Binary data

Re: [PATCH] initdb: do not exit after warn_on_mount_point

2022-09-11 Thread Julien Rouhaud

Hi,

On Sat, Sep 10, 2022 at 07:14:59PM +, andrey.ara...@nixaid.com wrote:
>
> Have slightly improved the logic so that it does not report an error
> "directory \"%s\" exists but is not empty"
> when it is only supposed to warn the user about the mountpoint, without
> exiting.
>
> To me, my patch looks like a typo fix where exit(1) should not be called on
> the warn_on_mount_point(), but only warn and continue as more people are
> mounting the device at `/var/lib/postgresql/data` (PGDATA) in the
> containerized world (K8s deployments, especially now in the Akash Network I
> am working for) for making sure their data persist.

This definitely isn't a typo, as it's really strongly discouraged to use a
mount point for the data directory.  You can refer to this thread [1] for more
explanations.

[1] 
https://www.postgresql.org/message-id/flat/CAKoxK%2B6H40imynM5P31bf0DnpN-5f5zeROjcaj6BKVAjxdEA9w%40mail.gmail.com

Re: [PATCH] Clarify the comments about varlena header encoding

2022-09-11 Thread Aleksander Alekseev

Hi John,

Many thanks for the feedback!

> Or put the now-standard 0b in front.

Good idea.

> I think the problem is ambiguity about what a "toast pointer" is. This 
> comment:
>
> * struct varatt_external is a traditional "TOAST pointer", that is, the

Right. The comment for varatt_external says that it IS a TOAST
pointer. However the comments for varlena headers bit layout
implicitly include it into a TOAST pointer, which contradicts the
previous comments. I suggest we fix this ambiguity by explicitly
enumerating the type tag in the comments for varlena headers.

> The patch added xxx's for the type tag in a comment about
> the header. This is more misleading than what is there now.

OK, here is another attempt. Changes compared to v1:

* "xxx xxx xxx" were removed, according to the feedback
* 0b prefix was added in order to make sure the reader will not
misread this as a hex values
* The clarification about the type tag was added
* The references to "first case", "second case", etc were removed

Hopefully it's better now.

-- 
Best regards,
Aleksander Alekseev


v2-0001-Clarify-the-comments-about-varlena-header-encodin.patch
Description: Binary data

Re: [PATCH] initdb: do not exit after warn_on_mount_point

2022-09-11 Thread andrey . arapov

Hi Tom,

I've updated the patch by adding the explanation behind and more comments. 
(please see the attachment)

Have slightly improved the logic so that it does not report an error
"directory \"%s\" exists but is not empty"
when it is only supposed to warn the user about the mountpoint, without exiting.

To me, my patch looks like a typo fix where exit(1) should not be called on the 
warn_on_mount_point(),
but only warn and continue as more people are mounting the device at 
`/var/lib/postgresql/data` (PGDATA) in the containerized world (K8s 
deployments, especially now in the Akash Network I am working for) for making 
sure their data persist.

As a workaround, we either have to `rmdir /var/lib/postgresql/data/lost+found` 
before running `docker-entrypoint.sh postgres` which in turn calls the 
`initdb`, or, alternatively we have to pass 
`PGDATA=/var/lib/postgresql/data/` while mounting persistent storage 
over `/var/lib/postgresql/data` path so that it won't exit on the very first 
run.
To me, this in itself is an odd behavior, which led me to finding this typo 
(from my point of view) to which I've made this patch.

Please let me know if it makes sense or requires more information / explanation.

Kind regards,
Andrey Arapov

September 10, 2022 5:10 PM, "Tom Lane"  wrote:

> andrey.ara...@nixaid.com writes:
> 
>> please find my first patch for PostgreSQL is attached.
> 
> You haven't explained why you think this would be a good
> change, or even a safe one.
> 
> regards, tom lane

0001-initdb-do-not-exit-when-PGDATA-PGDATA-pg_wal-is-a-mo.patch
Description: Binary data

Re: Expand palloc/pg_malloc API

Re: [PATCH]Feature improvement for MERGE tab completion

pg_stat_statements locking

Re: Switching XLog source from archive to streaming when primary available

Re: [PATCH] initdb: do not exit after warn_on_mount_point

Re: broken table formatting in psql

Re: broken table formatting in psql

Re: Handle infinite recursion in logical replication setup

Re: [PATCH] initdb: do not exit after warn_on_mount_point

Re: [PATCH] Clarify the comments about varlena header encoding

RE: why can't a table be part of the same publication as its schema

Re: list of acknowledgments for PG15

Re: Switching XLog source from archive to streaming when primary available

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

Re: Background writer and checkpointer in crash recovery

Re: Bump MIN_WINNT to 0x0600 (Vista) as minimal runtime in 16~

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)

Re: Splitting up guc.c

Re: Bump MIN_WINNT to 0x0600 (Vista) as minimal runtime in 16~

Re: New docs chapter on Transaction Management and related changes

Re: Splitting up guc.c

Re: why can't a table be part of the same publication as its schema

Re: [PATCH] initdb: do not exit after warn_on_mount_point

Check SubPlan clause for nonnullable rels/Vars

Re: [PATCH] initdb: do not exit after warn_on_mount_point

Re: [PATCH] Clarify the comments about varlena header encoding

Re: [PATCH] initdb: do not exit after warn_on_mount_point

33 matches

Site Navigation

Mail list logo

Footer information