Re: refactoring relation extension and BufferAlloc(), faster COPY

Jim Nasby Tue, 21 Feb 2023 13:00:55 -0800

On 10/28/22 9:54 PM, Andres Freund wrote:

b) I found that is quite beneficial to bulk-extend the relation with
    smgrextend() even without concurrency. The reason for that is the primarily
    the aforementioned dirty buffers that our current extension method causes.


    One bit that stumped me for quite a while is to know how much to extend the
    relation by. RelationGetBufferForTuple() drives the decision whether / how
    much to bulk extend purely on the contention on the extension lock, which
    obviously does not work for non-concurrent workloads.

    After quite a while I figured out that we actually have good information on
    how much to extend by, at least for COPY /
    heap_multi_insert(). heap_multi_insert() can compute how much space is
    needed to store all tuples, and pass that on to
    RelationGetBufferForTuple().

    For that to be accurate we need to recompute that number whenever we use an
    already partially filled page. That's not great, but doesn't appear to be a
    measurable overhead.

Some food for thought: I think it's also completely fine to extend anyrelation over a certain size by multiple blocks, regardless ofconcurrency. E.g. 10 extra blocks on an 80MB relation is 0.1%. I don'thave a good feel for what algorithm would make sense here; maybesomething along the lines of extend = max(relpages / 2048, 128); ifextend < 8 extend = 1; (presumably extending by just a couple extrapages doesn't help much without concurrency).

Re: refactoring relation extension and BufferAlloc(), faster COPY

Reply via email to