Re: SQL:2011 application time

2021-09-04 Thread Jaime Casanova
On Sat, Jul 03, 2021 at 10:46:55AM -0700, Paul A Jungwirth wrote:
> On Fri, Jul 2, 2021 at 2:39 PM Paul A Jungwirth
>  wrote:
> >
> > On Wed, Jun 30, 2021 at 10:39 AM Paul A Jungwirth
> >  wrote:
> > > Here is a set of patches to add SQL:2011 application-time support (aka
> > > valid-time).
> >
> > Here is a small fix to prevent `FOR PORTION OF valid_at FROM MAXVALUE
> > TO foo` or `FROM foo TO MINVALUE`. I rebased on latest master too.
> 
> Here is a patch set that cleans up the catalog docs for pg_period. The
> columns have changed since that was written, and also we use a
> different sgml structure on those pages now. Note pg_period still
> contains a couple essentially-unused columns, perislocal and
> perinhcount. Those are intended for supporting table inheritance, so
> I've left them in.
> 

Hi Paul,

Thanks for working on this. It would be a great improvement.

I wanted to test the patches but:

patch 01: does apply but doesn't compile, attached the compile errors.
patch 04: does not apply clean.

Please fix and resend.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL
parse_utilcmd.c: En la función ‘generateClonedIndexStmt’:
parse_utilcmd.c:1730:2: aviso: ISO C90 prohíbe mezclar declaraciones y código 
[-Wdeclaration-after-statement]
  Period *p = makeNode(Period);
  ^~
tablecmds.c: En la función ‘ATPrepCmd’:
tablecmds.c:4637:24: error: tipo incompatible para el argumento 1 de 
‘ATSimplePermissions’
ATSimplePermissions(rel, ATT_TABLE);
^~~
tablecmds.c:403:48: nota: se esperaba ‘AlterTableType’ {también conocido como 
‘enum AlterTableType’} pero el argumento es de tipo ‘Relation’ {también 
conocido como ‘struct RelationData *’}
 static void ATSimplePermissions(AlterTableType cmdtype, Relation rel, int 
allowed_targets);
 ~~~^~~
tablecmds.c:308:23: aviso: el paso del argumento 2 de ‘ATSimplePermissions’ 
crea un puntero desde un entero sin una conversión [-Wint-conversion]
 #define  ATT_TABLE0x0001
   ^~
tablecmds.c:4637:29: nota: en expansión de macro ‘ATT_TABLE’
ATSimplePermissions(rel, ATT_TABLE);
 ^
tablecmds.c:403:66: nota: se esperaba ‘Relation’ {también conocido como ‘struct 
RelationData *’} pero el argumento es de tipo ‘int’
 static void ATSimplePermissions(AlterTableType cmdtype, Relation rel, int 
allowed_targets);
 ~^~~
tablecmds.c:4637:4: error: faltan argumentos para la función 
‘ATSimplePermissions’
ATSimplePermissions(rel, ATT_TABLE);
^~~
tablecmds.c:403:13: nota: se declara aquí
 static void ATSimplePermissions(AlterTableType cmdtype, Relation rel, int 
allowed_targets);
 ^~~
tablecmds.c:4645:24: error: tipo incompatible para el argumento 1 de 
‘ATSimplePermissions’
ATSimplePermissions(rel, ATT_TABLE);
^~~
tablecmds.c:403:48: nota: se esperaba ‘AlterTableType’ {también conocido como 
‘enum AlterTableType’} pero el argumento es de tipo ‘Relation’ {también 
conocido como ‘struct RelationData *’}
 static void ATSimplePermissions(AlterTableType cmdtype, Relation rel, int 
allowed_targets);
 ~~~^~~
tablecmds.c:308:23: aviso: el paso del argumento 2 de ‘ATSimplePermissions’ 
crea un puntero desde un entero sin una conversión [-Wint-conversion]
 #define  ATT_TABLE0x0001
   ^~
tablecmds.c:4645:29: nota: en expansión de macro ‘ATT_TABLE’
ATSimplePermissions(rel, ATT_TABLE);
 ^
tablecmds.c:403:66: nota: se esperaba ‘Relation’ {también conocido como ‘struct 
RelationData *’} pero el argumento es de tipo ‘int’
 static void ATSimplePermissions(AlterTableType cmdtype, Relation rel, int 
allowed_targets);
 ~^~~
tablecmds.c:4645:4: error: faltan argumentos para la función 
‘ATSimplePermissions’
ATSimplePermissions(rel, ATT_TABLE);
^~~
tablecmds.c:403:13: nota: se declara aquí
 static void ATSimplePermissions(AlterTableType cmdtype, Relation rel, int 
allowed_targets);
 ^~~
tablecmds.c: En la función ‘alter_table_type_to_string’:
tablecmds.c:6179:2: aviso: el valor de enumeración ‘AT_AddPeriod’ no se maneja 
en un switch [-Wswitch]
  switch (cmdtype)
  ^~
tablecmds.c:6179:2: aviso: el valor de enumeración ‘AT_DropPeriod’ no se maneja 
en un switch [-Wswitch]
tablecmds.c: En la función ‘ATExecDropPeriod’:
tablecmds.c:7968:23: error: tipo incompatible para el argumento 1 de 
‘ATSimplePermissions’
   ATSimplePermissions(rel, ATT_TABLE);
   ^~~
tablecmds.c:6325:36: nota: se esperaba ‘AlterTableType’ {también conocido como 
‘enum AlterTableType’} pero el argumento es de tipo ‘Relation’ {también 
conocido como ‘struct Rela

Re: Reduce lock level for ALTER TABLE ... ADD CHECK .. NOT VALID

2021-09-04 Thread Tom Lane
Simon Riggs  writes:
> 897795240cfaaed724af2f53ed2c50c9862f951f forgot to reduce the lock
> level for CHECK constraints when allowing them to be NOT VALID.
> This is simple and safe, since check constraints are not used in
> planning until validated.

Unfortunately, just asserting that it's safe doesn't make it so.

We have two things that we need to worry about when considering
reducing ALTER TABLE lock levels:

1. Is it semantically okay (which is what you claim above)?

2. Will onlooker processes see sufficiently-consistent catalog data
if they look at the table during the change?

Trying to reduce the lock level for ADD CHECK fails the second
test, because it has to alter two different catalogs.  It has
to increment pg_class.relchecks, and it has to make an entry in
pg_constraint.  This patch makes it possible for onlookers to
see a value of pg_class.relchecks that is inconsistent with what
they find in pg_constraint, and then they will blow up.

To demonstrate this, I applied the patch and then did this
in session 1:

regression=# create table mytable (f1 int check(f1 > 0), f2 int);
CREATE TABLE

I then started a second session, attached to it with gdb, and
set a breakpoint at CheckConstraintFetch.  Letting that session
continue, I told it

regression=# select * from mytable;

which of course reached the breakpoint at CheckConstraintFetch.
(At this point, session 2 has read the pg_class entry for mytable,
seen relchecks == 1, and now it wants to read pg_constraint.)

I then told session 1:

regression=# alter table mytable add check (f2 > 0);
ALTER TABLE

which it happily did thanks to the now-inadequate lock level.
I then released session 2 to continue, and behold it complains:

WARNING:  unexpected pg_constraint record found for relation "mytable"
LINE 1: select * from mytable;
  ^

(Pre-v14 branches would have made that an ERROR not a WARNING.)
That happens because the systable_beginscan() in CheckConstraintFetch
will get a new snapshot, so now it sees the new entry in pg_constraint,
making the count of entries inconsistent with what it found in pg_class.

It's possible that this could be made safe if we replaced the exact
"relchecks" count with a boolean "relhaschecks", analogous to the
way indexes are handled.  It's not clear to me though that the effort,
and ensuing compatibility costs for applications that look at pg_class,
would be repaid by having a bit more concurrency here.  One could
also worry about whether we really want to give up this consistency
cross-check between pg_class and pg_constraint.

regards, tom lane




Re: PG Docs - CREATE SUBSCRIPTION option list order

2021-09-04 Thread Tom Lane
Amit Kapila  writes:
> On Mon, Apr 19, 2021 at 10:32 AM Peter Smith  wrote:
>> Yes, if there were dozens of list items then I would agree that they
>> should be grouped somehow. But there aren't.

> I think this list is going to grow in the future as we enhance this
> subsystem. For example, the pending 2PC patch will add another
> parameter to this list.

Well, we've got nine now; growing to ten wouldn't be a lot.  However,
I think that grouping the options would be helpful because the existing
presentation seems extremely confusing.  In particular, I think it might
help to separate the options that only determine what happens during
CREATE SUBSCRIPTION from those that control how replication behaves later.
(Are the latter set the same ones that are shared with ALTER
SUBSCRIPTION?)

Also, I think a lot of these descriptions desperately need copy-editing.
The grammar is shoddy and the markup is inconsistent.

regards, tom lane




Re: [PATCH] Add tab-complete for backslash commands

2021-09-04 Thread Tom Lane
... BTW, I went ahead and pushed the changes in help.c,
since that part seemed uncontroversial.

regards, tom lane




Re: prevent immature WAL streaming

2021-09-04 Thread Alvaro Herrera
On 2021-Sep-03, Andres Freund wrote:

> > +#ifdef WAL_DEBUG
> > +   ereport(LOG,
> > +   (errmsg_internal("recovery overwriting broken 
> > contrecord at %X/%X (EndRecPtr: %X/%X)",
> > +
> > LSN_FORMAT_ARGS(abortedContrecordPtr),
> > +
> > LSN_FORMAT_ARGS(EndRecPtr;
> > +#endif
> 
> "broken" sounds a bit off. But then, it's just WAL_DEBUG. Which made me
> realize, isn't this missing a
> if (XLOG_DEBUG)?

Attached are the same patches as last night, except I added a test for
XLOG_DEBUG where pertinent.  (The elog(PANIC) is not made conditional on
that, since it's a cross-check rather than informative.)  Also fixed the
silly pg_rewind mistake I made.

I'll work on the new xlog record early next week.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"I can see support will not be a problem.  10 out of 10."(Simon Wittber)
  (http://archives.postgresql.org/pgsql-general/2004-12/msg00159.php)
>From 4173fc5000c9101604e6c64a795322771cb0687a Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 2 Sep 2021 17:21:46 -0400
Subject: [PATCH v4 1/4] Implement FIRST_IS_ABORTED_CONTRECORD

---
 src/backend/access/transam/xlog.c   | 55 +++--
 src/backend/access/transam/xlogreader.c | 39 +-
 src/include/access/xlog_internal.h  | 14 ++-
 src/include/access/xlogreader.h |  3 ++
 4 files changed, 105 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e51a7a749d..49912483d5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -586,6 +586,8 @@ typedef struct XLogCtlData
 	XLogRecPtr	replicationSlotMinLSN;	/* oldest LSN needed by any slot */
 
 	XLogSegNo	lastRemovedSegNo;	/* latest removed/recycled XLOG segment */
+	XLogRecPtr	abortedContrecordPtr; /* LSN of incomplete record at end of
+	   * WAL */
 
 	/* Fake LSN counter, for unlogged relations. Protected by ulsn_lck. */
 	XLogRecPtr	unloggedLSN;
@@ -848,6 +850,7 @@ static XLogSource XLogReceiptSource = XLOG_FROM_ANY;
 /* State information for XLOG reading */
 static XLogRecPtr ReadRecPtr;	/* start of last record read */
 static XLogRecPtr EndRecPtr;	/* end+1 of last record read */
+static XLogRecPtr abortedContrecordPtr;	/* end+1 of incomplete record */
 
 /*
  * Local copies of equivalent fields in the control file.  When running
@@ -2246,6 +2249,31 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
 		if (!Insert->forcePageWrites)
 			NewPage->xlp_info |= XLP_BKP_REMOVABLE;
 
+		/*
+		 * If the last page ended with an aborted partial continuation record,
+		 * mark the new page to indicate that the partial record can be
+		 * omitted.
+		 *
+		 * This happens only once at the end of recovery, so there's no race
+		 * condition here.
+		 */
+		if (XLogCtl->abortedContrecordPtr >= NewPageBeginPtr)
+		{
+#ifdef WAL_DEBUG
+			if (XLogCtl->abortedContrecordPtr != NewPageBeginPtr)
+elog(PANIC, "inconsistent aborted contrecord location %X/%X, expected %X/%X",
+	 LSN_FORMAT_ARGS(XLogCtl->abortedContrecordPtr),
+	 LSN_FORMAT_ARGS(NewPageBeginPtr));
+			if (XLOG_DEBUG)
+ereport(LOG,
+		(errmsg_internal("setting XLP_FIRST_IS_ABORTED_PARTIAL flag at %X/%X",
+		 LSN_FORMAT_ARGS(NewPageBeginPtr;
+#endif
+			NewPage->xlp_info |= XLP_FIRST_IS_ABORTED_PARTIAL;
+
+			XLogCtl->abortedContrecordPtr = InvalidXLogRecPtr;
+		}
+
 		/*
 		 * If first page of an XLOG segment file, make it a long header.
 		 */
@@ -4392,6 +4420,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 		record = XLogReadRecord(xlogreader, &errormsg);
 		ReadRecPtr = xlogreader->ReadRecPtr;
 		EndRecPtr = xlogreader->EndRecPtr;
+		abortedContrecordPtr = xlogreader->abortedContrecordPtr;
 		if (record == NULL)
 		{
 			if (readFile >= 0)
@@ -7691,10 +7720,30 @@ StartupXLOG(void)
 	/*
 	 * Re-fetch the last valid or last applied record, so we can identify the
 	 * exact endpoint of what we consider the valid portion of WAL.
+	 *
+	 * When recovery ended in an incomplete record, continue writing from the
+	 * point where it went missing.  This leaves behind an initial part of
+	 * broken record, which rescues downstream which have already received
+	 * that first part.
 	 */
-	XLogBeginRead(xlogreader, LastRec);
-	record = ReadRecord(xlogreader, PANIC, false);
-	EndOfLog = EndRecPtr;
+	if (XLogRecPtrIsInvalid(abortedContrecordPtr))
+	{
+		XLogBeginRead(xlogreader, LastRec);
+		record = ReadRecord(xlogreader, PANIC, false);
+		EndOfLog = EndRecPtr;
+	}
+	else
+	{
+#ifdef WAL_DEBUG
+		if (XLOG_DEBUG)
+			ereport(LOG,
+	(errmsg_internal("recovery overwriting broken contrecord at %X/%X (EndRecPtr: %X/%X)",
+	 LSN_FORMAT_ARGS(abortedContrecordPtr),
+	 LSN_FORMAT_ARGS(EndRecPtr;
+#endif
+		EndOfLo

Re: [PATCH] Add tab-complete for backslash commands

2021-09-04 Thread Tom Lane
ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
> I've updated the commitfest entry to Ready for Committer.

I was about to push this but started to have second thoughts about it.
I'm not convinced that offering multiple variant spellings of the same
command is really such a great thing: I'm afraid that it will be more
confusing than helpful.  I particularly question why we'd offer both
single- and multiple-character versions, as the single-character
version seems entirely useless from a completion standpoint.

For example, up to now "\o" got you "\o ", which isn't amazingly
useful but maybe it serves to confirm that you typed a valid command.
This patch now forces you to choose between alternative spellings
of the exact same command, which is a waste of effort plus it will
make you stop to wonder whether they really are the same command.
It would be much better to either keep the old behavior, or just
immediately complete to "\out " and stay out of the user's way.

So I'd be inclined to take out the single-character versions of any
commands that we offer a longer spelling of.  I'm not dead set on that,
but I think the possibility ought to be discussed.

regards, tom lane




Re: prevent immature WAL streaming

2021-09-04 Thread Alvaro Herrera
On 2021-Sep-03, Andres Freund wrote:

> Hi,
> 
> On 2021-09-03 20:01:50 -0400, Alvaro Herrera wrote:
> > From 6abc5026f92b99d704bff527d1306eb8588635e9 Mon Sep 17 00:00:00 2001
> > From: Alvaro Herrera 
> > Date: Tue, 31 Aug 2021 20:55:10 -0400
> > Subject: [PATCH v3 1/5] Revert "Avoid creating archive status ".ready" files
> >  too early"
> 
> > This reverts commit 515e3d84a0b58b58eb30194209d2bc47ed349f5b.
> 
> I'd prefer to see this pushed soon. I've a bunch of patches to xlog.c that
> conflict with the prior changes, and rebasing back and forth isn't that much
> fun...

Done.

> > +   /*
> > +* If we were expecting a continuation record and got 
> > an "aborted
> > +* partial" flag, that means the continuation record 
> > was lost.
> > +* Ignore the record we were reading, since we now know 
> > it's broken
> > +* and lost forever, and restart the read by assuming 
> > the address
> > +* to read is the location where we found this flag.
> > +*/
> > +   if (pageHeader->xlp_info & XLP_FIRST_IS_ABORTED_PARTIAL)
> > +   {
> > +   ResetDecoder(state);
> > +   RecPtr = targetPagePtr;
> > +   goto restart;
> > +   }
> 
> I think we need to add more validation to this path. What I was proposing
> earlier is that we add a new special type of record at the start of an
> XLP_FIRST_IS_ABORTED_PARTIAL page, which contains a) lsn of the record we're
> aborting, b) checksum of the data up to this point.

Hmm, a new record type?  Yeah, we can do that, and sounds like it would
make things simpler too -- I wasn't too happy about adding clutter to
AdvanceXLInsertBuffer either, but the alternative I had in mind was that
we'd pass the flag to the checkpointer, which seemed quite annoying API
wise.  But if we add a new record, seems we can write it directly in
StartupXLOG and avoid most nastiness.  I'm not too sure about the
checksum up to this point, though, but I'll spend some time with the
idea to see how bad it is.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
Voy a acabar con todos los humanos / con los humanos yo acabaré
voy a acabar con todos (bis) / con todos los humanos acabaré ¡acabaré! (Bender)




Re: public schema default ACL

2021-09-04 Thread Noah Misch
On Thu, Sep 02, 2021 at 12:36:51PM +0200, Peter Eisentraut wrote:
> I think this patch represents the consensus.
> 
> The documentation looks okay.  Some places still refer to PostgreSQL 13,
> which should now be changed to 14.

Thanks.  I'll update s/13/14/ and/or s/14/15/ before the next step.

> I tried a couple of upgrade scenarios and it appeared to do the right thing.
> 
> This patch is actually two separate changes: First, change the owner of the
> public schema to "pg_database_owner"; second, change the default privileges
> set on the public schema by initdb.  I was a bit surprised that the former
> hadn't already be done in PG14.

Interesting.  That change requires a7a7be1, which is also not in v14.

Do you plan to change the CF entry, or should it remain in Needs Review with
no assigned reviewer?




Re: [PATCH] support tab-completion for single quote input with equal sign

2021-09-04 Thread Tom Lane
Jacob Champion  writes:
> t/010_tab_completion.pl .. 17/? 
> #   Failed test 'tab-completion after single quoted text input with equal 
> sign'
> #   at t/010_tab_completion.pl line 198.
> # Actual output was "CREATE SUBSCRIPTION my_sub CONNECTION 
> 'host=localhost port=5432 dbname=postgres' \a"
> # Did not match "(?^:PUBLICATION)"
> # Looks like you failed 1 test of 23.

Independently of the concerns I raised, I'm wondering how come you
are getting different results.  Which readline or libedit version
are you using, on what platform?

regards, tom lane




Re: [PATCH] support tab-completion for single quote input with equal sign

2021-09-04 Thread Tom Lane
I wrote:
> Surely this patch is completely wrong?  It needs more thought about
> the interaction with the existing logic for double quotes, ie single
> quote inside double quotes is not special, nor the reverse; nor should
> parentheses inside quotes be counted.  It also needs to be aware of
> backslashes in escape-style strings.

Actually ... those are just implementation details, and now that
I've thought about it a little more, I question the entire concept
of making single-quoted strings be single words in tab-complete's
view.  I think it's quite intentional that we don't do that;
if we did, it'd forever foreclose the possibility of tab-completing
*within* strings.  You don't have to look any further than CREATE
SUBSCRIPTION itself to see possible applications of that: someone
could wish that

CREATE SUBSCRIPTION my_sub CONNECTION 'db

would complete with "name=", or that  right after the quote
would offer a list of connection keywords.

(More generally, I'm afraid that people are already relying on this
behavior in other contexts, and thus that the proposed patch could
break more use-cases than it fixes.)

So now I think that this approach should be rejected, and that the
right thing is to fix the CREATE SUBSCRIPTION completion rules
to allow more than one "word" between CONNECTION and PUBLICATION.

Another idea that might be useful is to treat the opening and
closing quotes themselves as separate "words", which'd give
the CREATE SUBSCRIPTION rules a bit more to go on about when to
offer PUBLICATION.

regards, tom lane




Re: Column Filtering in Logical Replication

2021-09-04 Thread Alvaro Herrera
On 2021-Sep-04, Amit Kapila wrote:

> On Thu, Sep 2, 2021 at 2:19 PM Alvaro Herrera  wrote:
> >
> > On 2021-Sep-02, Rahila Syed wrote:
> >
> > > After thinking about this, I think it is best to remove the entire table
> > > from publication,
> > > if a column specified in the column filter is dropped from the table.
> >
> > Hmm, I think it would be cleanest to give responsibility to the user: if
> > the column to be dropped is in the filter, then raise an error, aborting
> > the drop.
> 
> Do you think that will make sense if the user used Cascade (Alter
> Table ... Drop Column ... Cascade)?

... ugh.  Since CASCADE is already defined to be a potentially-data-loss
operation, then that may be acceptable behavior.  For sure the default
RESTRICT behavior shouldn't do it, though.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: [PATCH] support tab-completion for single quote input with equal sign

2021-09-04 Thread Tom Lane
"tanghy.f...@fujitsu.com"  writes:
> [ v2-0001-support-tab-completion-for-single-quote-input-wit.patch ]

Surely this patch is completely wrong?  It needs more thought about
the interaction with the existing logic for double quotes, ie single
quote inside double quotes is not special, nor the reverse; nor should
parentheses inside quotes be counted.  It also needs to be aware of
backslashes in escape-style strings.

I kind of doubt that it's actually possible to parse string literals
correctly when working backward, as this function does.  For starters,
you won't know whether the string starts with "E", so you won't know
whether backslashes are special.  We've got away with backwards
parsing so far because the syntax rules for double-quoted strings are
so much simpler.  But if you want to handle single quotes, I think
you'll have to start by rearranging the code to parse forward.  That's
likely to be fairly ticklish, see the comment about

 * backwards scan has some interesting but intentional properties
 * concerning parenthesis handling.

I wish that whoever wrote that (which I think was me :-() had been
more explicit.  But I think that the point is that we look for a place
that's at the same parenthesis nesting level as the completion point,
not necessarily one that's globally outside of any parens.  That will
be messy to handle if we want to convert to scanning forwards from
the start of the string.

I kind of wonder if it isn't time to enlist the help of psqlscan.l
instead of doubling down on the idea that tab-complete.c should have
its own half-baked SQL lexer.

regards, tom lane




Re: [PATCH] Hooks at XactCommand level

2021-09-04 Thread Gilles Darold

I have changed the status of this proposal as rejected.


To resume the final state of this proposal there is no consensus on the 
interest to add a hook on start xact commands. Also the only useful case 
for this hook was to be able to have a server side automatic rollback at 
statement level. It can be regrettable because I don't think that 
PostgreSQL will have such feature before a long time (that's probably 
better) and a way to external implementation through an extension would 
be helpful for migration from other RDBMS like DB2 or Oracle. The only 
ways to have this feature is to handle the rollback at client side using 
savepoint, which is at least 3 times slower than a server side 
implementation, or not use such implementation at all. Outside not being 
performant it doesn't scale due to txid wraparound. And the last way is 
to use a proprietary forks of PostgreSQL, some are proposing this  feature.


--
Gilles Darold
http://www.darold.net/






Re: using an end-of-recovery record in all cases

2021-09-04 Thread Amit Kapila
On Tue, Aug 10, 2021 at 12:31 AM Robert Haas  wrote:
>
> On Thu, Jul 29, 2021 at 11:49 AM Robert Haas  wrote:
> > On Wed, Jul 28, 2021 at 3:28 PM Andres Freund  wrote:
> > > > Imagine if instead of
> > > > all the hairy logic we have now we just replaced this whole if
> > > > (IsInRecovery) stanza with this:
> > > >
> > > > if (InRecovery)
> > > > CreateEndOfRecoveryRecord();
> > > >
> > > > That would be WAY easier to reason about than the rat's nest we have
> > > > here today. Now, I am not sure what it would take to get there, but I
> > > > think that is the direction we ought to be heading.
> > >
> > > What are we going to do in the single user ([1]) case in this awesome 
> > > future?
> > > I guess we could just not create a checkpoint until single user mode is 
> > > shut
> > > down / creates a checkpoint for other reasons?
> >
> > It probably depends on who writes this thus-far-hypothetical patch,
> > but my thought is that we'd unconditionally request a checkpoint after
> > writing the end-of-recovery record, same as we do now if (promoted).
> > If we happened to be in single-user mode, then that checkpoint request
> > would turn into performing a checkpoint on the spot in the one and
> > only process we've got, but StartupXLOG() wouldn't really need to care
> > what happens under the hood after it called RequestCheckpoint().
>
> I decided to try writing a patch to use an end-of-recovery record
> rather than a checkpoint record in all cases. The patch itself was
> pretty simple but didn't work. There are two different reasons why it
> didn't work, which I'll explain in a minute. I'm not sure whether
> there are any other problems; these are the only two things that cause
> problems with 'make check-world', but that's hardly a guarantee of
> anything. Anyway, I thought it would be useful to report these issues
> first and hopefully get some feedback.
>
> The first problem I hit was that GetRunningTransactionData() does
> Assert(TransactionIdIsNormal(CurrentRunningXacts->latestCompletedXid)).
> While that does seem like a superficially reasonable thing to assert,
> StartupXLOG() initializes latestCompletedXid by calling
> TransactionIdRetreat on the value extracted from checkPoint.nextXid,
> and the first-ever checkpoint that is written at the beginning of WAL
> has a nextXid of 3, so when starting up from that checkpoint nextXid
> is 2, which is not a normal XID. When we try to create the next
> checkpoint, CreateCheckPoint() does LogStandbySnapshot() which calls
> GetRunningTransactionData() and the assertion fails. In the current
> code, we avoid this more or less accidentally because
> LogStandbySnapshot() is skipped when starting from a shutdown
> checkpoint. If we write an end-of-recovery record and then trigger a
> checkpoint afterwards, then we don't avoid the problem. Although I'm
> impishly tempted to suggest that we #define SecondNormalTransactionId
> 4 and then use that to create the first checkpoint instead of
> FirstNormalTransactionId -- naturally with no comments explaining why
> we're doing it -- I think the real problem is that the assertion is
> just wrong. CurrentRunningXacts->latestCompletedXid shouldn't be
> InvalidTransactionId or BootstrapTransactionId, but
> FrozenTransactionId is a legal, if corner-case, value, at least as the
> code stands today.
>
> Unfortunately we can't just relax the assertion, because the
> XLOG_RUNNING_XACTS record will eventually be handed to
> ProcArrayApplyRecoveryInfo() for processing ... and that function
> contains a matching assertion which would in turn fail. It in turn
> passes the value to MaintainLatestCompletedXidRecovery() which
> contains yet another matching assertion, so the restriction to normal
> XIDs here looks pretty deliberate. There are no comments, though, so
> the reader is left to guess why. I see one problem:
> MaintainLatestCompletedXidRecovery uses FullXidRelativeTo, which
> expects a normal XID. Perhaps it's best to just dodge the entire issue
> by skipping LogStandbySnapshot() if latestCompletedXid happens to be
> 2,
>

By reading above explanation, it seems like it is better to skip
LogStandbySnapshot() for this proposal. Can't we consider skipping
LogStandbySnapshot() by passing some explicit flag-like
'recovery_checkpoint' or something like that?

I think there is some prior discussion in another thread related to
this work but it would be easier to follow if you can summarize the
same.


With Regards,
Amit Kapila.