Re: [HACKERS] NOLOGGING option, or ?

Tom Lane Wed, 01 Jun 2005 11:17:48 -0700

Simon Riggs <[EMAIL PROTECTED]> writes:
> If the server crashes, we replay WAL. If we see a load start message, we
> truncate the relation and note that a load has started. If there is WAL
> data for the tuples, we replay it. If WAL replay ends without the load
> transaction having successfully committed, then we truncate the table.


On further thought, this seems both risky and unnecessary.

The reason it's risky is this scenario:

        * Backend 1 makes a LOAD-start WAL entry.

        * Backend 1 loads some data, extending the table beyond its
          former end.

        * Backend 1 errors out without committing its transaction.

        * Backend 2 inserts some data into the no-longer-locked table.
          It uses free space in one of the added pages, or maybe even
          adds new pages of its own.

        * Backend 2 commits.

        * System crashes, and we have to replay the above actions.

In this scenario you cannot truncate at the end of replay without losing
backend 2's committed data.

You can think of various ways to avoid this risk (for instance, maybe
*any* WAL-logged operation on the table should cause the pending
TRUNCATE to be discarded) but they all seem expensive and/or still
somewhat unsafe.

The reason it's unnecessary is what's the point?  All you're doing by not
truncating is leaving some uncommitted tuples in the table.  It's not
the job of WAL recovery to get rid of such things; that's VACUUM's job.

So what I'm thinking is we need no special WAL entries for this.  What
we need is just an operating mode of COPY in which it doesn't WAL-log
its inserts, but instead fsyncs before completion, much like index build
does.  For safety it must do all its inserts into freshly-added pages;
this is not to ensure truncatability, because we aren't going to do that
anyway, but to ensure that we don't have unlogged operations changing
pages that might contain committed tuples. (That would pose a risk of
losing committed data to incomplete writes in case of system crash
partway through.  The same reason is why we need exclusive lock: else
we might end up with pages containing a mix of logged and unlogged
tuples.)  Also there can be no indexes, since we don't want index
entries pointing to unlogged tuples.  And PITR can't be enabled.
Otherwise no problem.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] NOLOGGING option, or ?

Reply via email to