Re: POC: Parallel processing of indexes in autovacuum

Masahiko Sawada Tue, 10 Mar 2026 11:14:33 -0700

On Tue, Mar 3, 2026 at 10:59 PM Daniil Davydov <[email protected]> wrote:
>
> Hi,
>
> On Tue, Mar 3, 2026 at 5:26 AM Masahiko Sawada <[email protected]> wrote:
> >
> > On Sun, Mar 1, 2026 at 6:46 AM Daniil Davydov <[email protected]> wrote:
> > >
> > > Thus, a/v leader cannot launch any workers if max_parallel_workers is set 
> > > to 0.
> >
> > Right. But this fact would actually support that limiting
> > autovacuum_max_parallel_workers by max_parallel_workers is more
> > appropriate, no?
> >
>
> av_max_parallel_workers is really limited by max_parallel_workers only
> during shmem init. After that we can change it to a value that is higher
> than max_parallel_workers, and nothing bad will happen (obviously).
>
> So, my point was : why should we have this explicit limitation if it
> 1) doesn't guard us from something bad and 2) can be violated at any time
> (via ALTER SYSTEM SET ...).
>
> Now it seems to me that limiting our parameter by max_parallel_workers is
> more about grouping of logically related parameters, not a practical 
> necessity.


I believe there is also a benefit for users when they want to disable
all parallel behavior. If av_max_parallel_workers is in
max_parallel_worker group, they would have to set just
max_parallel_workers to 0. Otherwise, they would have to set both
max_parallel_workers and av_max_parallel_workers.

>
> > > I suppose to do the same as we did for try/catch block - add logging 
> > > inside
> > > the "autovacuum_worker_before_shmem_exit" with some unique message.
> > > Thus, we will be sure that the workers are released precisely in the
> > > "before_shmem_exit_hook".
> > >
> > > The alternative is to pass some additional information to the
> > > "ReleaseAllParallelWorkers" function (to supplement the log it emits), 
> > > but it
> > > doesn't seem like a good solution to me.
> >
> > I'm not sure if it's important to check how
> > AutoVacuumReleaseAllParallelWorkers() has been called (either in
> > PG_CATCH() block or by autovacuum_worker_before_shmem_exit()). We
> > would end up having to add a unique message to each caller of
> > AutoVacuumReleaseAllParallelWorkers() in the future. I guess it's more
> > important to make sure that all workers have been released in the end.
> >
> > In that sense, it would make more sense to check that all workers have
> > actually been released (i.e., checking by
> > get_parallel_autovacuum_free_workers()) after a parallel vacuum
> > instead of checking workers being released by debug logs. That is, we
> > can check at each test end if get_parallel_autovacuum_free_workers()
> > returns the expected number after disabling parallel autovacuum.
> >
>
> Sure, at first we want to check whether all workers have been
> released. But the ability to release them precisely in the try/catch
> block is also important, because if it doesn't - a/v worker can "hold"
> these workers until it finishes vacuuming of other tables (which can
> take a lot of time). Such a situation will surely degrade performance,
> so I think that we must check whether we can release workers precisely
> during ERROR handling. Do you agree with it?

I agree that we need to make sure that parallel workers are released
even during ERROR handling, but I don't think it's important to check
the places where AutoVacuumReleaesAllParallelWorkers() is called, by
using regression tests. It's more important and future-proof that we
check if all workers are released according to the shmem data. In
other words, even if we call AutoVacuumReleaseAllParallelWorkers() in
an unexpected call path in an ERROR case, it's still okay if we
successfully release all workers in the end. These regression tests
should test these database behavior but not what specific code path
taken. If we can check if all workers are released by checking the
shmem, why do we need to check further where they are released?

>
> I understand your concerns about adding a unique log message for each
> ReleaseAll call. But I cannot imagine a new situation when we need to
> emergency release workers. If you think that it might be possible, I can
> propose adding a new optional parameter to the "ReleaseAll" function -
> something like "char *context_msg", which will be added to the elog placed
> inside this function.

I think we should not make the function complex just for testing
purposes. My point is that what we should be testing is the behavior
-- specifically whether parallel workers are released at the expected
timing -- rather than focusing on whether a specific code path was
executed.

>
> > On second thoughts on the "planned" and "reserved", can we consider
> > what the patch implemented as "reserved" as the "planned" in
> > autovacuum cases? That is, in autovacuum cases, the "planned" number
> > considers the number of parallel degrees based on the number of
> > indexes (or autovacuum_parallel_workers value) as well as the number
> > of workers that have actually been reserved. In cases of
> > autovacuum_max_parallel_workers shortage, users would notice by seeing
> > logs that enough workers are not planned in the first place against
> > the number of indexes on the table. That might be less confusing for
> > users rather than introducing a new "reserved" concept in the vacuum
> > logs. Also, it slightly helps simplify the codes.
>
> Yeah, it sounds tempting. But in this case we're shifting more responsibility
> to the user. For instance :
> If av_max_workers = 5 and there are two a/v leaders each of which is trying
> to launch 3 parallel workers, we will see logs like "3 planned, 3 launched",
> "2 planned, 2 launched". IMHO, such a log doesn't imply that there is a
> shortage of workers. I.e. this is the user's responsibility to notice that the
> second a/v leader could launch more than 2 workers for processing of the
> table with (N + 2) indexes.
> In this case even our previous version of logging will give more information
> to the user : "3 planned, 3 launched", "3 planned, 2 launched".
>
> If we don't want to create a new "reserved" concept, maybe we can rename
> it to something more intuitive? For example, "n_abandoned" - number of
> workers that we were unable to launch due to av_max_parallel_workers
> shortage. If n_abandoned is 0 and n_launched < n_planned, the user can
> conclude that he should increase the max_parallel_workers parameter.
> And vica versa, if n_launched == n_planned and n_abandoned > 0, the
> user can conclude that he should increase the
> autovacuum_max_parallel_workers parameter.
>
> What do you think?

While I agree that showing only two numbers might lack some
information for users, I guess the same is true for
max_parallel_maintenance_workers or other parallel queries related to
GUC parameters. For instance, suppose we set
max_parallel_maintenance_workers to 2, if the table has (large enough)
4 indexes, we would plan to execute a parallel vacuum with 2 workers
instead of 4 due to max_parallel_maintenance_worker shortage and it's
even possible that only 1 worker can launch due to
max_worker_processes shortage. In this case, we currently consider
that 2 workers are planned. Isn't it the same situation as the case
where we reserved 2 parallel vacuum workers for autovacuum for the
table with 4 indexes?

>
> **Comments on the 0001 patch**
>
> >   * of the worker list (see above).
> > @@ -299,6 +308,8 @@ typedef struct
> >         WorkerInfo      av_startingWorker;
> >         AutoVacuumWorkItem av_workItems[NUM_WORKITEMS];
> >         pg_atomic_uint32 av_nworkersForBalance;
> > +       uint32          av_freeParallelWorkers;
> > +       uint32          av_maxParallelWorkers;
> >  } AutoVacuumShmemStruct;
> >
> > We should use int32 instead of uint32.
>
> I don't mind, but I don't quite understand the reason. We assume that the
> minimal value for both variables is 0. Why shouldn't we use unsigned
> data type?

Unsigned integers should be used for bit masks, flags, or when we need
to handle more than INT_MAX. Signed integers are preferable in other
cases as we're using signed integers for controlling the number of
workers and autovacuum_max_parallel_workers is defined as signed int
(which could be stored to AutoVacuumShmem->av_maxParallelWorkers).

Here are some review comments.

* 0001 patch:

+        /* Cannot release more workers than reserved */
+        Assert(nworkers <= av_nworkers_reserved);

I think it's better to use Min() to cap the number of workers to be
released by av_nworkers_reserved as Assert() won't work in release
builds.

* 0004 patch:

Can we write the same test cases while not relying on the 0002 patch
(i.e., worker usage logging)? We check the worker usage log at two
places in the regression tests. The idea is that we write the number
of workers planned, reserved, and launched in DEBUG log level and
check these logs in the regression tests. The patch 0001, 0003, and
0004 can be merged before push while we might want more discussion on
the 0002 patch.


Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC: Parallel processing of indexes in autovacuum

Reply via email to