Simon Riggs <[EMAIL PROTECTED]> writes: > If the server crashes, we replay WAL. If we see a load start message, we > truncate the relation and note that a load has started. If there is WAL > data for the tuples, we replay it. If WAL replay ends without the load > transaction having successfully committed, then we truncate the table.
On further thought, this seems both risky and unnecessary. The reason it's risky is this scenario: * Backend 1 makes a LOAD-start WAL entry. * Backend 1 loads some data, extending the table beyond its former end. * Backend 1 errors out without committing its transaction. * Backend 2 inserts some data into the no-longer-locked table. It uses free space in one of the added pages, or maybe even adds new pages of its own. * Backend 2 commits. * System crashes, and we have to replay the above actions. In this scenario you cannot truncate at the end of replay without losing backend 2's committed data. You can think of various ways to avoid this risk (for instance, maybe *any* WAL-logged operation on the table should cause the pending TRUNCATE to be discarded) but they all seem expensive and/or still somewhat unsafe. The reason it's unnecessary is what's the point? All you're doing by not truncating is leaving some uncommitted tuples in the table. It's not the job of WAL recovery to get rid of such things; that's VACUUM's job. So what I'm thinking is we need no special WAL entries for this. What we need is just an operating mode of COPY in which it doesn't WAL-log its inserts, but instead fsyncs before completion, much like index build does. For safety it must do all its inserts into freshly-added pages; this is not to ensure truncatability, because we aren't going to do that anyway, but to ensure that we don't have unlogged operations changing pages that might contain committed tuples. (That would pose a risk of losing committed data to incomplete writes in case of system crash partway through. The same reason is why we need exclusive lock: else we might end up with pages containing a mix of logged and unlogged tuples.) Also there can be no indexes, since we don't want index entries pointing to unlogged tuples. And PITR can't be enabled. Otherwise no problem. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings