On Fri, Aug 25, 2023 at 8:47 AM Heikki Linnakangas <hlinn...@iki.fi> wrote: > > I noticed another missing fsync() with unlogged tables, similar to the > one at [1]. > > RelationCopyStorage does this: > > > /* > > * When we WAL-logged rel pages, we must nonetheless fsync them. The > > * reason is that since we're copying outside shared buffers, a > > CHECKPOINT > > * occurring during the copy has no way to flush the previously > > written > > * data to disk (indeed it won't know the new rel even exists). A > > crash > > * later on would replay WAL from the checkpoint, therefore it > > wouldn't > > * replay our earlier WAL entries. If we do not fsync those pages > > here, > > * they might still not be on disk when the crash occurs. > > */ > > if (use_wal || copying_initfork) > > smgrimmedsync(dst, forkNum); > > That 'copying_initfork' condition is wrong. The first hint of that is > that 'use_wal' is always true for an init fork. I believe this was meant > to check for 'relpersistence == RELPERSISTENCE_UNLOGGED'. Otherwise, > this bad thing can happen: > > 1. Create an unlogged table > 2. ALTER TABLE unlogged_tbl SET TABLESPACE ... -- This calls > RelationCopyStorage > 3. a checkpoint happens while the command is running > 4. After the ALTER TABLE has finished, shut down postgres cleanly. > 5. Lose power > > When you recover, the unlogged table is not reset, because it was a > clean postgres shutdown. But the part of the file that was copied after > the checkpoint at step 3 was never fsync'd, so part of the file is lost. > I was able to reproduce with a VM that I kill to simulate step 5. > > This is the same scenario that the smgrimmedsync() call above protects > from for WAL-logged relations. But we got the condition wrong: instead > of worrying about the init fork, we need to call smgrimmedsync() for all > *other* forks of an unlogged relation. > > Fortunately the fix is trivial, see attached. I suspect we have similar > problems in a few other places, though. end_heap_rewrite(), _bt_load(), > and gist_indexsortbuild have a similar-looking smgrimmedsync() calls, > with no consideration for unlogged relations at all. I haven't tested > those, but they look wrong to me.
The patch looks reasonable to me. Is this [1] case in hash index build that I reported but didn't take the time to reproduce similar? - Melanie [1] https://www.postgresql.org/message-id/CAAKRu_bPc81M121pOEU7W%3D%2BwSWEebiLnrie4NpaFC%2BkWATFtSA%40mail.gmail.com