On Sat, 4 May 2024 at 03:51, Matthias van de Meent
<boekewurm+postg...@gmail.com> wrote:
> Was a bump context considered? If so, why didn't it make the cut?
> If tuplestore_trim is the only reason why the type of context in patch
> 2 is a generation context, then couldn't we make the type of context
> conditional on state->eflags & EXEC_FLAG_REWIND, and use a bump
> context if we require rewind capabilities (i.e. where _trim is never
> effectively executed)?

I didn't really want to raise all this here, but to answer why I
didn't use bump...

There's a bit more that would need to be done to allow bump to work in
use-cases where no trim support is needed. Namely, if you look at
writetup_heap(), you'll see a heap_free_minimal_tuple(), which is
pfreeing the memory that was allocated for the tuple in either
tuplestore_puttupleslot(), tuplestore_puttuple() or
tuplestore_putvalues().   So basically, what happens if we're still
loading the tuplestore and we've spilled to disk, once the
tuplestore_put* function is called, we allocate memory for the tuple
that might get stored in RAM (we don't know yet), but then call
tuplestore_puttuple_common() which decides if the tuple goes to RAM or
disk, then because we're spilling to disk, the write function pfree's
the memory we allocate in the tuplestore_put function after the tuple
is safely written to the disk buffer.

This is a fairly inefficient design.  While, we do need to still form
a tuple and store it somewhere for tuplestore_putvalues(), we don't
need to do that for a heap tuple. I think it should be possible to
write directly from the table's page.

Overall tuplestore.c seems quite confused as to how it's meant to
work.  You have tuplestore_begin_heap() function, which is the *only*
external function to create a tuplestore.  We then pretend we're
agnostic about how we store tuples that won't fit in memory by
providing callbacks for read/write/copy, but we only have 1 set of
functions for those and instead of having some other begin method we
use when not dealing with heap tuples, we use some other
tuplestore_put* function.

It seems like another pass is required to fix all this and I think
that should be:

1. Get rid of the function pointers and just hardcode which static
functions we call to read/write/copy.
2. Rename tuplestore_begin_heap() to tuplestore_begin().
3. See if we can rearrange the code so that the copying to the tuple
context is only done when we are in TSS_INMEM. I'm not sure what that
would look like, but it's required before we could use bump so we
don't pfree a bump allocated chunk.

Or maybe there's a way to fix this by adding other begin functions and
making it work more like tuplesort.  I've not really looked into that.

I'd rather tackle these problems independently and I believe there are
much bigger wins to moving from aset to generation than generation to
bump, so that's where I've started.

Thanks for having a look at the patch.

David


Reply via email to