Greetings,

* Kyotaro Horiguchi (horikyota....@gmail.com) wrote:
> At Tue, 10 Nov 2020 09:33:12 -0500, Stephen Frost <sfr...@snowman.net> wrote 
> in 
> > * Kyotaro Horiguchi (horikyota....@gmail.com) wrote:
> > > For fuel(?) of the discussion, I tried a very-quick PoC for in-place
> > > ALTER TABLE SET LOGGED/UNLOGGED and resulted as attached. After some
> > > trials of several ways, I drifted to the following way after poking
> > > several ways.
> > > 
> > > 1. Flip BM_PERMANENT of active buffers
> > > 2. adding/removing init fork
> > > 3. sync files,
> > > 4. Flip pg_class.relpersistence.
> > > 
> > > It always skips table copy in the SET UNLOGGED case, and only when
> > > wal_level=minimal in the SET LOGGED case.  Crash recovery seems
> > > working by some brief testing by hand.
> > 
> > Somehow missed that this patch more-or-less does what I was referring to
> > down-thread, but I did want to mention that it looks like it's missing a
> > necessary FlushRelationBuffers() call before the sync, otherwise there
> > could be dirty buffers for the relation that's being set to LOGGED (with
> > wal_level=minimal), which wouldn't be good.  See the comments above
> > smgrimmedsync().
> 
> Right. Thanks.  However, since SetRelFileNodeBuffersPersistence()
> called just above scans shared buffers so I don't want to just call
> FlushRelationBuffers() separately.  Instead, I added buffer-flush to
> SetRelFileNodeBuffersPersistence().

Maybe I'm missing something, but it sure looks like in the patch that
SetRelFileNodeBuffersPersistence() is being called after the
smgrimmedsync() call, and I don't think you get to just switch the order
of those- the sync is telling the kernel to make sure it's written to
disk, while the FlushBuffer() is just writing it into the kernel but
doesn't provide any guarantee that the data has actually made it to
disk.  We have to FlushBuffer() first, and then call smgrimmedsync().
Perhaps there's a way to avoid having to go through shared buffers
twice, and I generally agreed it'd be good if we could avoid doing so,
but this approach doesn't look like it actually works.

> FWIW this is a revised version of the PoC, which has some known
> problems.
> 
> - Flipping of Buffer persistence is not WAL-logged nor even be able to
>   be safely roll-backed. (It might be better to drop buffers).

Not sure if it'd be better to drop buffers or not, but figuring out how
to deal with rollback seems pretty important.  How is the persistence
change in the catalog not WAL-logged though..?

> - This version handles indexes but not yet handle toast relatins.

Would need to be fixed, of course.

> - tableAMs are supposed to support this feature. (but I'm not sure
>   it's worth allowing them not to do so).

Seems like they should.

Thanks,

Stephen

Attachment: signature.asc
Description: PGP signature

Reply via email to