On 07/10/2015 02:06 AM, Tom Lane wrote:
Andres Freund <and...@anarazel.de> writes:
On 2015-07-06 11:49:54 -0400, Tom Lane wrote:
Rather than reverting cab9a0656c36739f, which would re-introduce a
different performance problem, perhaps we could have COPY create a new
relfilenode when it does this.  That should be safe if the table was
previously empty.

I'm not convinced that cab9a0656c36739f needs to survive in that
form. To me only allowing one COPY to benefit from the wal_level =
minimal optimization has a significantly higher cost than
cab9a0656c36739f.

What evidence have you got to base that value judgement on?

cab9a0656c36739f was based on an actual user complaint, so we have good
evidence that there are people out there who care about the cost of
truncating a table many times in one transaction.

Yeah, if we specifically made that case cheap, in response to a complaint, it would be a regression to make it expensive again. We might get away with it in a major version, but would hate to backpatch that.

My tentative guess is that the best course is to
a) Make heap_truncate_one_rel() create a new relfeilnode. That fixes the
    truncation replay issue.
b) Force new pages to be used when using the heap_sync mode in
    COPY. That avoids the INIT danger you found. It seems rather
    reasonable to avoid using pages that have already been the target of
    WAL logging here in general.

And what reason is there to think that this would fix all the problems?
We know of those two, but we've not exactly looked hard for other cases.

Hmm. Perhaps that could be made to work, but it feels pretty fragile. For example, you could have an insert trigger on the table that inserts additional rows to the same table, and those inserts would be intermixed with the rows inserted by COPY. You'll have to avoid that somehow. Full-page images in general are a problem. If a checkpoint happens, and a trigger modifies the page we're COPYing to in any way, you have the same issue. Even reading a page can cause a full-page image of it to be written: If you update a hint bit on the page while reading it, and checksums are enabled, and a checkpoint happened since the page was last updated, bang. I don't think that's a problem in this case because there are no hint bits to be set on pages that we're COPYing to, but it's a whole new subtle assumption.

I think we should
1. reliably and explicitly keep track of whether we've WAL-logged any TRUNCATE, INSERT/UPDATE+INIT, or any other full-page-logging operations on the relation, and
2. make sure we never skip WAL-logging again if we have.

Let's add a flag, rd_skip_wal_safe, to RelationData that's initially set when a new relfilenode is created, i.e. whenever rd_createSubid or rd_newRelfilenodeSubid is set. Whenever a TRUNCATE or a full-page image (including INSERT/UPDATE+INIT) is WAL-logged, clear the flag. In copy.c, only skip WAL-logging if the flag is still set. To deal with the case that the flag gets cleared in the middle of COPY, also check the flag whenever we're about to skip WAL-logging in heap_insert, and if it's been cleared, ignore the HEAP_INSERT_SKIP_WAL option and WAL-log anyway.

Compared to the status quo, that disables the WAL-skipping optimization in the scenario where you CREATE, INSERT, then COPY to a table in the same transaction. I think that's acceptable.

(Alternatively, to handle the case that the flag gets cleared in the middle of COPY, add another flag to RelationData indicating that a WAL-skipping COPY is in-progress, and refrain from WAL-logging any FPW-writing operations on the table when it's set (or any operations whatsoever). That'd be more efficient, but it's such a rare corner case that it hardly matters.)

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to