MERGE regress test

2022-11-16 Thread Teja Mupparti
Hi,



A quick question on merge regress-test



https://github.com/postgres/postgres/blob/REL_15_STABLE/src/test/regress/expected/merge.out#L846



should there be an ERROR or comment needs a fix? What's the expected behavior?



Regards

Teja

[Text  Description automatically generated]


Re: Corruption during WAL replay

2020-04-14 Thread Teja Mupparti
Thanks Kyotaro and Masahiko for the feedback. I think there is a consensus on 
the critical-section around truncate, but I just want to emphasize the need for 
reversing the order of the dropping the buffers and the truncation.

 Repro details (when full page write = off)

 1) Page on disk has empty LP 1, Insert into page LP 1
 2) checkpoint START (Recovery REDO eventually starts here)
 3) Delete all rows on the page (page is empty now)
 4) Autovacuum kicks in and truncates the pages
 DropRelFileNodeBuffers - Dirty page NOT written, LP 1 on disk 
still empty
 5) Checkpoint completes
 6) Crash
 7) smgrtruncate - Not reached (this is where we do the physical 
truncate)

 Now the crash-recovery starts

 Delete-log-replay (above step-3) reads page with empty LP 1 and the 
delete fails with PANIC (old page on disk with no insert)

Doing recovery, truncate is even not reached, a WAL replay of the truncation 
will happen in the future but the recovery fails (repeatedly) even before 
reaching that point.

Best regards,
Teja


From: Kyotaro Horiguchi 
Sent: Monday, April 13, 2020 7:35 PM
To: masahiko.saw...@2ndquadrant.com 
Cc: and...@anarazel.de ; tejesw...@hotmail.com 
; pgsql-hack...@postgresql.org 
; hexexp...@comcast.net 
Subject: Re: Corruption during WAL replay

At Mon, 13 Apr 2020 18:53:26 +0900, Masahiko Sawada 
 wrote in
> On Mon, 13 Apr 2020 at 17:40, Andres Freund  wrote:
> >
> > Hi,
> >
> > On 2020-04-13 15:24:55 +0900, Masahiko Sawada wrote:
> > > On Sat, 11 Apr 2020 at 09:00, Teja Mupparti  wrote:
> > > /*
> > >  * We WAL-log the truncation before actually truncating, which means
> > >  * trouble if the truncation fails. If we then crash, the WAL replay
> > >  * likely isn't going to succeed in the truncation either, and cause a
> > >  * PANIC. It's tempting to put a critical section here, but that cure
> > >  * would be worse than the disease. It would turn a usually harmless
> > >  * failure to truncate, that might spell trouble at WAL replay, into a
> > >  * certain PANIC.
> > >  */
> >
> > Yea, but that reasoning is just plain *wrong*. It's *never* ok to WAL
> > log something and then not perform the action. This leads to to primary
> > / replica getting out of sync, crash recovery potentially not completing
> > (because of records referencing the should-be-truncated pages), ...

It is introduced in 2008 by 3396000684, for 8.4.  So it can be said as
an overlook when introducing log-shipping.

The reason other operations like INSERTs (that extends the underlying
file) are "safe" after an extension failure is the following
operations are performed in shared buffers as if the new page exists,
then tries to extend the file again.  So if we continue working after
truncation failure, we need to disguise on shared buffers as if the
truncated pages are gone.  But we don't have a room for another flag
in buffer header.  For example, BM_DIRTY && !BM_VALID might be able to
be used as the state that the page should have been truncated but not
succeeded yet, but I'm not sure.

Anyway, I think the prognosis of a truncation failure is far hopeless
than extension failure in most cases and I doubt that it's good to
introduce such a complex feature only to overcome such a hopeless
situation.

In short, I think we should PANIC in that case.

> > > As a second idea, I wonder if we can defer truncation until commit
> > > time like smgrDoPendingDeletes mechanism. The sequence would be:
> >
> > This is mostly an issue during [auto]vacuum partially truncating the end
> > of the file. We intentionally release the AEL regularly to allow other
> > accesses to continue.
> >
> > For transactional truncations we don't go down this path (as we create a
> > new relfilenode).
> >
> >
> > > At RelationTruncate(),
> > > 1. WAL logging.
> > > 2. Remember buffers to be dropped.
> >
> > You definitely cannot do that, as explained above.
>
> Ah yes, you're right.
>
> So it seems to me currently what we can do for this issue would be to
> enclose the truncation operation in a critical section. IIUC it's not
> enough just to reverse the order of dropping buffers and physical file
> truncation because it cannot solve the problem of inconsistency on the
> standby. And as Horiguchi-san mentioned, there is no need to reverse
> that order if we envelop the truncation operation by a critical
> section because we can recover page changes during crash recovery. The
> strategy of writing out all dirty buffers before dropping buffers,
> proposed as (a) in [1], also seems not enough.

Agreed.  Since it's not acceptable

Re: Corruption during WAL replay

2020-04-10 Thread Teja Mupparti
Thanks Andres and Kyotaro for the quick review.  I have fixed the typos and 
also included the critical section (emulated it with try-catch block since 
palloc()s are causing issues in the truncate code). This time I used git 
format-patch.

Regards
Teja




From: Andres Freund 
Sent: Monday, March 30, 2020 4:31 PM
To: Kyotaro Horiguchi 
Cc: tejesw...@hotmail.com ; pgsql-hack...@postgresql.org 
; hexexp...@comcast.net 
Subject: Re: Corruption during WAL replay

Hi,

On 2020-03-24 18:18:12 +0900, Kyotaro Horiguchi wrote:
> At Mon, 23 Mar 2020 20:56:59 +0000, Teja Mupparti  
> wrote in
> > The original bug reporting-email and the relevant discussion is here
> ...
> > The crux of the fix is, in the current code, engine drops the buffer and 
> > then truncates the file, but a crash before the truncate and after the 
> > buffer-drop is causing the corruption. Patch reverses the order i.e. 
> > truncate the file and drop the buffer later.
>
> BufferAlloc doesn't wait for the BM_IO_IN_PROGRESS for a valid buffer.

I don't think that's true. For any of this to be relevant the buffer has
to be dirty. In which case BufferAlloc() has to call
FlushBuffer(). Which in turn does a WaitIO() if BM_IO_IN_PROGRESS is
set.

What path are you thinking of? Or alternatively, what am I missing?


> I'm not sure it's acceptable to remember all to-be-deleted buffers
> while truncation.

I don't see a real problem with it. Nor really a good alternative. Note
that for autovacuum truncations we'll only truncate a limited number of
buffers at once, and for most relation truncations we don't enter this
path (since we create a new relfilenode instead).


>
> +  /*START_CRIT_SECTION();*/

> Is this a point of argument?  It is not needed if we choose the
> strategy (c) in [1], since the protocol is aiming to allow server to
> continue running after truncation failure.
>
> [1]: 
> https://www.postgresql.org/message-id/20191207001232.klidxnm756wqxvwx%40alap3.anarazel.de

I think it's entirely broken to continue running after a truncation
failure. We obviously have to first WAL log the truncation (since
otherwise we can crash just after doing the truncation). But we cannot
just continue running after WAL logging, but not performing the
associated action: The most obvious reason is that otherwise a replica
will execute the trunction, but the primary will not.

The whole justification for that behaviour "It would turn a usually
harmless failure to truncate, that might spell trouble at WAL replay,
into a certain PANIC." was always dubious (since on-disk and in-memory
state now can diverge), but it's clearly wrong once replication had
entered the picture. There's just no alternative to a critical section
here.


If we are really concerned with truncation failing - I don't know why we
would be, we accept that we have to be able to modify files etc to stay
up - we can add a pre-check ensuring that permissions are set up
appropriately to allow us to truncate.

Greetings,

Andres Freund


0001-Wal-replay-corruption.patch
Description: 0001-Wal-replay-corruption.patch


Corruption during WAL replay

2020-03-23 Thread Teja Mupparti
This is my *first* attempt to submit a Postgres patch, please let me know if I 
missed any process or format of the patch (I used this link 
https://wiki.postgresql.org/wiki/Working_with_Git
 As reference)



The original bug reporting-email and the relevant discussion is here



https://www.postgresql.org/message-id/20191207001232.klidxnm756wqxvwx%40alap3.anarazel.de

https://www.postgresql.org/message-id/822113470.250068.1573246011818%40connect.xfinity.com

https://www.postgresql.org/message-id/20191206230640.2dvdjpcgn46q3ks2%40alap3.anarazel.de

https://www.postgresql.org/message-id/1880.1281020...@sss.pgh.pa.us



The crux of the fix is, in the current code, engine drops the buffer and then 
truncates the file, but a crash before the truncate and after the buffer-drop 
is causing the corruption. Patch reverses the order i.e. truncate the file and 
drop the buffer later.



Warm regards,

Teja



bug-fix-patch
Description: bug-fix-patch