Re: [HACKERS] WAL logging problem in 9.4.3?

Heikki Linnakangas Fri, 10 Jul 2015 01:51:39 -0700

On 07/10/2015 02:06 AM, Tom Lane wrote:

Andres Freund <[email protected]> writes:

On 2015-07-06 11:49:54 -0400, Tom Lane wrote:

Rather than reverting cab9a0656c36739f, which would re-introduce a
different performance problem, perhaps we could have COPY create a new
relfilenode when it does this.  That should be safe if the table was
previously empty.

I'm not convinced that cab9a0656c36739f needs to survive in that
form. To me only allowing one COPY to benefit from the wal_level =
minimal optimization has a significantly higher cost than
cab9a0656c36739f.


What evidence have you got to base that value judgement on?

cab9a0656c36739f was based on an actual user complaint, so we have good
evidence that there are people out there who care about the cost of
truncating a table many times in one transaction.

Yeah, if we specifically made that case cheap, in response to acomplaint, it would be a regression to make it expensive again. We mightget away with it in a major version, but would hate to backpatch that.

My tentative guess is that the best course is to
a) Make heap_truncate_one_rel() create a new relfeilnode. That fixes the
    truncation replay issue.
b) Force new pages to be used when using the heap_sync mode in
    COPY. That avoids the INIT danger you found. It seems rather
    reasonable to avoid using pages that have already been the target of
    WAL logging here in general.


And what reason is there to think that this would fix all the problems?
We know of those two, but we've not exactly looked hard for other cases.

Hmm. Perhaps that could be made to work, but it feels pretty fragile.For example, you could have an insert trigger on the table that insertsadditional rows to the same table, and those inserts would be intermixedwith the rows inserted by COPY. You'll have to avoid that somehow.Full-page images in general are a problem. If a checkpoint happens, anda trigger modifies the page we're COPYing to in any way, you have thesame issue. Even reading a page can cause a full-page image of it to bewritten: If you update a hint bit on the page while reading it, andchecksums are enabled, and a checkpoint happened since the page was lastupdated, bang. I don't think that's a problem in this case because thereare no hint bits to be set on pages that we're COPYing to, but it's awhole new subtle assumption.


I think we should

1. reliably and explicitly keep track of whether we've WAL-logged anyTRUNCATE, INSERT/UPDATE+INIT, or any other full-page-logging operationson the relation, and

2. make sure we never skip WAL-logging again if we have.

Let's add a flag, rd_skip_wal_safe, to RelationData that's initially setwhen a new relfilenode is created, i.e. whenever rd_createSubid orrd_newRelfilenodeSubid is set. Whenever a TRUNCATE or a full-page image(including INSERT/UPDATE+INIT) is WAL-logged, clear the flag. In copy.c,only skip WAL-logging if the flag is still set. To deal with the casethat the flag gets cleared in the middle of COPY, also check the flagwhenever we're about to skip WAL-logging in heap_insert, and if it'sbeen cleared, ignore the HEAP_INSERT_SKIP_WAL option and WAL-log anyway.

Compared to the status quo, that disables the WAL-skipping optimizationin the scenario where you CREATE, INSERT, then COPY to a table in thesame transaction. I think that's acceptable.

(Alternatively, to handle the case that the flag gets cleared in themiddle of COPY, add another flag to RelationData indicating that aWAL-skipping COPY is in-progress, and refrain from WAL-logging anyFPW-writing operations on the table when it's set (or any operationswhatsoever). That'd be more efficient, but it's such a rare corner casethat it hardly matters.)


- Heikki



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

Reply via email to