Re: Parallel heap vacuum

Masahiko Sawada Fri, 18 Jul 2025 19:00:21 -0700

On Thu, Jul 17, 2025 at 1:39 PM Melanie Plageman
<[email protected]> wrote:
>
> On Mon, Jun 30, 2025 at 9:41 AM Masahiko Sawada <[email protected]> wrote:
> >
> > I've rebased the patches to the current HEAD.
>
> Hi Sawada-san,
>
> I've started reviewing this. I initially meant to review the eager
> scan part, but I think it would be easier to do that if it were in a
> separate patch on top of the rest of the parallel heap vacuuming
> patches -- just during review. I had a hard time telling which parts
> of the design (and code) were necessitated by the eager scanning
> feature.


Thank you for reviewing the patch! I'll separate the changes of the
eager scanning part from the main patch in the next version patch.

> LV
> ---
> I don't think we should use "LV" and "lazy vacuum" anymore. Let's call
> it what it is: "heap vacuuming". We don't need to rename vacuumlazy.c
> right now, but if you are making new structs or substantially altering
> old ones, I wouldn't use "LV".

Agreed. I'll change accordingly in the next version patch.

>
> table AM callbacks
> -------------------------
> I think there is a fundamental tension here related to whether or not
> vacuumparallel.c should be table-AM agnostic. All of its code is
> invoked below heap_vacuum_rel(). I would argue that vacuumparallel.c
> is index AM-agnostic but heap-specific.
>
> I think this is what makes your table AM callbacks feel odd. Other
> table AM callbacks are invoked from general code -- for example from
> executor code that is agnostic to the underlying table AMs. But, all
> of the functions you added and the associated callbacks are invoked
> below heap_vacuum_rel() (as is all of vacuumparallel.c).

What does it exactly mean to make vacuumparallel.c index AM-agnostic
but heap-specific? I imagine we change the vacuumparallel.c so that it
calls heap's functions for parallel vacuuming such as initializing DSM
and parallel vacuum worker's entry function etc. But I'm not sure how
it works with non-heap table AMs that uses vacuumparallel.c.

I've tried to implement parallel heap scanning in vacuumlazy.c before
but I realized that there could be a lot of duplications between
vacuumlazy.c and vacuumparallel.c.

One possible change would be to have lazyvacuum.c pass the set of
callbacks to parallel_vacuum_init() instead of having them as table AM
callbacks. This removes the weirdness of the associated table AM
callbacks being invoked below heap_vacuum_rel() but it doesn't address
your point "vacuumparallel.c is index AM-agnostic but heap-specific".

> duplicated information across structs
> -----------------------------------------------
> I think a useful preliminary patch to this set would be to refactor
> LVRelState into multiple structs that each represent the static input,
> working state, and output of vacuum. You start to do this in one of
> your patches with LVScanData but it doesn't go far enough.
>
> In fact, I think there are members of all the parallel data structures
> you introduced and of those that are already present that overlap with
> members of the LVRelState and we could start putting these smaller
> structs. For my examples, I am referring both to existing code and
> code you added.
>
> For example, relnamespace, relname, and indname in LVRelState and
> ParallelVacuumState and ParallelVacuumState->heaprel and
> LVRelState->rel. Seems like you could have a smaller struct that is
> accessible to the vacuum error callback and also to the users in
> LVRelState.

I guess it's also related to the point of whether we should make
vacuumparallel.c heap-specific. Currently, since vacuumparallel.c is
independent from heap AM and has its own error callback
parallel_vacuum_error_callback(), it has duplicated information like
heaprel and indrels. Given vacuumparallel.c should not rely on the
heap-specific struct, I imagine that we store that information in
vacuumparallel.c side and have heap access to it. But which means it's
workable only during parallel vacuum.

> (Side note: In your patch, I find it confusing that these members of
> LVRelState are initialized in
> heap_parallel_vacuum_collect_dead_items() instead of
> heap_parallel_vacuum_initialize_worker().)

Yeah, I think at least some fields such as dbname, relnamespace, and
relname should be initialized in
heap_parallel_vacuum_initialize_worker().

>
> And even for cases where the information has to be passed from the
> leader to the worker, there is no reason you can't have the same
> struct but in shared memory for the parallel case and local memory for
> the serial case. For example with the struct members "aggressive",
> "skipwithvm", and "cutoffs" in LVRelState and ParallelLVShared. Or
> ParallelLVScanDesc->nblocks and LVRelState->rel_pages.
>
> Ideally the entire part of the LVRelState that is an unchanging
> input/reference data is in a struct which is in local memory for the
> serial and local parallel case and a single read-only location in the
> shared parallel case. Or, for the shared case, if there is a reason
> not to read from the shared memory, we copy them over to a local
> instance of the same struct. Maybe it is called HeapVacInput or
> similar.

ParallelLVShared is created to pass information to parallel vacuum
workers while keeping LVRelStates able to work locally. Suppose that
we create HeapVacInput including "aggressive", "cutoff", "skipwithvm",
and "rel_pages", LVRelState would have to have a pointer to a
HeapVacInput instance that is either on local memory or shared memory.
Since we also need to pass other information such as
initial_chunk_size and eager scanning related information to the
parallel vacuum worker, we would have to create something like
ParallelLVShared as the patch does. As a result, we would have two
structs that need to be shared on the shared buffer. Is that kind of
what you meant? or did you mean that we include parallel vacuum
related fields too to HeapVacInput struct?

>
> There are a bunch of other instances like this. For example, the
> leader initializes LVScanData->NewRelfrozenXid from
> vacrel->cutoffs.OldestXmin, then uses this to set
> ParallelLVShared->NewRelfrozenXid. Then the worker copies it from
> ParallelLVShared->NewRelfrozenXid back to LVScanData->NewRelfrozenXid.
> But the worker has access to ParallelLVShared->cutoffs. So you could
> either get it from there or allow the workers to access the first
> LVScanData and have that always belong to the leader. Either way, I
> don't think you should need NewRelfrozenXid in ParallelLVShared.

You're right. Both NewRelFrozenXid and NewRelminMxid don't need to be
shared and can be removed from ParallelLVShared.

>
> Another related concern: I don't understand why
> ParallelLVScanWorkerData->nallocated has to be in shared memory. I get
> that each worker has to keep track for itself how many blocks it has
> "processed" so far (either skipped or scanned) and I get that there
> has to be some coordination variable that keeps track of whether or
> not all blocks have been processed and where the next chunk is. But
> why does the worker's nallocated have to be in shared memory? It
> doesn't seem like the leader accesses it.

ParallelLVScanWorkerData->nallocated works in the same way as
ParallelBlockTableScanDescData.phs_nallocated. That is, it controls
how many blocks has been allocated to any of the workers so far. When
allocating a new chunk to scan, each worker gets
ParallelLVScanWorkerData->nallocated value and adds the chunk size to
it in an atomic operation (see parallel_lazy_scan_get_nextpage()).

>
> (Side note: I'm rather confused by why ParallelLVScanDesc is its own
> thing [instead of part of ParallelLVShared] -- not to mention its
> chunk_size member appears to be unused.)

I wanted to keep ParallelLVShared() mostly read-only and aimed to
share the information from the leader to the workers, whereas
ParallelLVScanDesc() is a pure scan state maintained by each worker
(and the leader).

chunk_size is used when allocating the new chunk (see
parallel_lazy_scan_get_nextpage()). It's initialized by
vacrel->plvstate->shared->initial_chunk_size since the first chunk
size could vary depending on eager scanning state, and is updated to
the fixed size PARALLEL_LV_CHUNK_SIZE from the next chunk.

>
> Separately, I think the fact that PVShared and ParallelVacuumState
> stayed almost untouched (and continue to have general,
> parallel-sounding names) with your patch but now mostly deal with
> stuff about indexes while most other parallel vacuuming content is in
> other structs is confusing. I think we need to consider which members
> in PVShared and ParallelVacuumState are phase II only and put that in
> an appropriately named structure or get more of the heap vacuuming
> specific members in those generally named structures.

Yeah, fields such as reltuples, estimated_count, and
maintenance_work_mem_worker are used only for phase II while some
fields such as relid, TidStore related fields, and vacuum delays
related fields are commonly used in phase I and II. I'll consider
creating a new struct to store phase II only information.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Parallel heap vacuum

Reply via email to