Re: pgsql: Generational memory allocator

Tom Lane Sat, 25 Nov 2017 13:20:07 -0800

Andres Freund <and...@anarazel.de> writes:
> On 2017-11-25 14:50:41 -0500, Tom Lane wrote:
>> Meanwhile, skink has come back with a success as of 0f2458f, rendering
>> the other mystery even deeper --- although at least this confirms the
>> idea that the crashes we are seeing predate the generation.c patch.


> Hm, wonder whether that could be optimization dependent too.

Evidently :-(.  I examined my compiler's assembly output for the
relevant line, snapbuild.c:780:

    ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn,
                                 xlrec->target_node, xlrec->target_tid,
                                 xlrec->cmin, xlrec->cmax,
                                 xlrec->combocid);

which is

        movl    12(%rbx), %eax          combocid
        movq    16(%rbx), %rcx          target_node.spcNode + dbNode
        movq    %rbp, %rdx
        movl    24(%rbx), %r8d          target_node.relNode
        movq    56(%r12), %rdi
        movq    28(%rbx), %r9           target_tid ... 8 bytes worth
        movl    (%rbx), %esi            top_xid
        movl    %eax, 16(%rsp)
        movl    8(%rbx), %eax           cmax
        movl    %eax, 8(%rsp)
        movl    4(%rbx), %eax           cmin
        movl    %eax, (%rsp)
        call    ReorderBufferAddNewTupleCids

I've annotated the fetches from *rbx to match the data structure:

typedef struct xl_heap_new_cid
{
    /*
     * store toplevel xid so we don't have to merge cids from different
     * transactions
     */
    TransactionId top_xid;
    CommandId    cmin;
    CommandId    cmax;

    /*
     * don't really need the combocid since we have the actual values right in
     * this struct, but the padding makes it free and its useful for
     * debugging.
     */
    CommandId    combocid;

    /*
     * Store the relfilenode/ctid pair to facilitate lookups.
     */
    RelFileNode target_node;
    ItemPointerData target_tid;
} xl_heap_new_cid;

and what's apparent is that it's grabbing 8 bytes not just 6 from
target_tid.  So the failure case must occur when the xl_heap_new_cid
WAL record is right at the end of the palloc'd record buffer and
valgrind notices that we're fetching padding bytes.

I suspect that gcc feels within its rights to do this because the
size/alignment of xl_heap_new_cid is a multiple of 4 bytes, so that
there ought to be 2 padding bytes at the end.

We could possibly prevent that by marking the whole xl_heap_new_cid
struct as packed/align(2), the same way ItemPointerData is marked,
but that would render accesses to all of its fields inefficient
on alignment-sensitive machines.  Moreover, if we go that route,
we have a boatload of other xlog record formats with similar hazards.

Instead I propose that we should make sure that the palloc request size
for XLogReaderState->main_data is always maxalign'd.  The existing
behavior in DecodeXLogRecord of palloc'ing it only just barely big
enough for the current record seems pretty brain-dead performance-wise
even without this consideration.  Generally, if we need to enlarge
that buffer, we should enlarge it significantly, IMO.

BTW, that comment in the middle of xl_heap_new_cid about "padding
makes this field free" seems entirely wrong.  By my count the
struct is currently 34 bytes, so if by padding it's talking about
maxalign alignment of the whole struct, that field is costing us 8 bytes
on maxalign-8 machines.  There's certainly no internal alignment
that would make it free.

                        regards, tom lane

Re: pgsql: Generational memory allocator

Reply via email to