Hi, When testing REPACK concurrently, I noticed that all WALs are retained from the moment REPACK begins copying data to the new table until the command finishes replaying concurrent changes on the new table and stops the repack decoding worker.
I understand the reason: the REPACK command itself starts a long-running transaction, and logical decoding does not advance restart_lsn beyond the oldest running transaction's start position. As a result, slot.restart_lsn remains unchanged, preventing the checkpointer from recycling WALs. However, since REPACK can run for a long time (hours or even days), I'd like to confirm whether this is expected behavior or if we plan to improve it in the future ? And additionally, IIUC, REPACK without using concurrent option does not have this issue. Given that we do not restart a REPACK, I think the repack decoding worker should be able to advance restart_lsn each time after writing changes (similar to how a physical slot behaves). To illustrate this, I've written a patch (attached) that implements this approach, and it works fine for me. BTW, catalog_xmin also won't advance, but that seems not a big issue as the REPACK transaction itself also holds a snapshot that retains catalog tuples, so advancing catalog_xmin wouldn't change the situation anyway. Thoughts ? Best Regards, Hou zj
v1-0001-Allow-old-WALs-to-be-removed-during-REPACK-CONCUR.patch
Description: v1-0001-Allow-old-WALs-to-be-removed-during-REPACK-CONCUR.patch
