On Mon, Nov 20, 2017 at 7:35 PM, atorikoshi <torikoshi_atsushi...@lab.ntt.co.jp> wrote: > Hi, > > I put many queries into one transaction and made ReorderBuffer spill > data to disk, and sent SIGKILL to postgres before the end of the > transaction. > > After starting up postgres again, I observed the files spilled to > data wasn't deleted. > > I think these files should be deleted because its transaction was no > more valid, so no one can use these files. > > > Below is a reproduction instructions. > > ------------------------------------------------ > 1. Create table and publication at publiser. > > @pub =# CREATE TABLE t1( > id INT PRIMARY KEY, > name TEXT); > > @pub =# CREATE PUBLICATION pub FOR TABLE t1; > > 2. Create table and subscription at subscriber. > > @sub =# CREATE TABLE t1( > id INT PRIMARY KEY, > name TEXT > ); > > @sub =# CREATE SUBSCRIPTION sub > CONNECTION 'host=[hostname] port=[port] dbname=[dbname]' > PUBLICATION pub; > > 3. Put many queries into one transaction. > > @pub =# BEGIN; > INSERT INTO t1 > SELECT > i, > 'aaaaaaaaaa' > FROM > generate_series(1, 1000000) as i; > > 4. Then we can see spilled files. > > @pub $ ls -1 ${PGDATA}/pg_replslot/sub/ > state > xid-561-lsn-0-1000000.snap > xid-561-lsn-0-2000000.snap > xid-561-lsn-0-3000000.snap > xid-561-lsn-0-4000000.snap > xid-561-lsn-0-5000000.snap > xid-561-lsn-0-6000000.snap > xid-561-lsn-0-7000000.snap > xid-561-lsn-0-8000000.snap > xid-561-lsn-0-9000000.snap > > 5. Kill publisher's postgres process before COMMIT. > > @pub $ kill -s SIGKILL [pid of postgres] > > 6. Start publisher's postgres process. > > @pub $ pg_ctl start -D ${PGDATA} > > 7. After a while, we can see the files remaining. > (Immediately after starting publiser, we can not see these files.) > > @pub $ pg_ctl start -D ${PGDATA} > > When I configured with '--enable-cassert', below assertion error > was appeared. > > TRAP: FailedAssertion("!(txn->final_lsn != 0)", File: "reorderbuffer.c", > Line: 2576) > ------------------------------------------------ > > Attached patch sets final_lsn to the last ReorderBufferChange if > final_lsn == 0.
Thank you for the report. I could reproduce this issue with the above step. My analysis is, the cause of that a serialized reorder buffer isn't cleaned up is that the aborted transaction without an abort WAL record has no chance to set ReorderBufferTXN->final_lsn. So if there is such serialized transactions ReorderBufferRestoreCleanup cleanups no files, which is cause of the assertion failure (or a file being orphaned). What do you think? On detail of your patch, I'm not sure it's safe if we set the lsn of other than commit record or abort record to final_lsn. The comment in reorderbuffer.h says, typedef trcut ReorderBufferTXN { (snip) /* ---- * LSN of the record that lead to this xact to be committed or * aborted. This can be a * * plain commit record * * plain commit record, of a parent transaction * * prepared transaction commit * * plain abort record * * prepared transaction abort * * error during decoding * ---- */ XLogRecPtr final_lsn; But with your patch, we could set a lsn of a record that is other than what listed above to final_lsn. One way I came up with is to make ReorderBufferRestoreCleanup accept an invalid value of final_lsn and regards it as a aborted transaction that doesn't has a abort WAL record. So we can cleanup all serialized files if final_lsn of a transaction is invalid. Since I'm not very familiar with snapshot building part please check it. Anyway I think you should register this patch to the next commit fest so as not forget. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center