On Wed, Mar 11, 2026 at 4:28 AM Daniil Davydov <[email protected]> wrote:
>
>
> > While I agree that showing only two numbers might lack some
> > information for users, I guess the same is true for
> > max_parallel_maintenance_workers or other parallel queries related to
> > GUC parameters. For instance, suppose we set
> > max_parallel_maintenance_workers to 2, if the table has (large enough)
> > 4 indexes, we would plan to execute a parallel vacuum with 2 workers
> > instead of 4 due to max_parallel_maintenance_worker shortage and it's
> > even possible that only 1 worker can launch due to
> > max_worker_processes shortage. In this case, we currently consider
> > that 2 workers are planned. Isn't it the same situation as the case
> > where we reserved 2 parallel vacuum workers for autovacuum for the
> > table with 4 indexes?
>
> I don't think that examples with other "max_parallel_" parameters will be
> appropriate, because these parameters are limiting the number of parallel
> workers for *single* operation/executor node/... .  At the same time,
> av_max_parallel_workers limits the total number of parallel workers across
> all a/v leaders.
>
> Regarding the situation that you provided :
> The number of planned workers is reduced inside the
> parallel_vacuum_compute_workers due to the max_parallel_maintenance_workers
> limit.  I.e. we cannot plan more workers than required by the config, and
> it's completely OK No one expects the number of "planned workers" to be more
> than max_parallel_maintenance_workers.
>
> IMO there is no need to make efforts to track the shortage of
> max_parallel_maintenance_workers for the VACUUM (PARALLEL), because this
> parameter just plays the role of a limiter. We will consider only the
> shortage of max_parallel_workers, that can be determined by looking at
> "planned vs. launched".
>
> And here is a difference with a parallel autovacuum :
> av_max_parallel_workers is considered twice : in the
> "parallel_vacuum_compute_workers" and "ReserveWorkers" functions.
> So the low number of launched workers can be explained by the shortage of
> both av_max_parallel_workers and max_parallel_workers. Since we want to
> distinguish between these cases, we have added the "nreserved" concept.
>
> I see that few modules can report something like "out of background worker
> slots" when they cannot launch more workers due to max_parallel_workers
> shortage (but modules depending on the "parallel.c" logic don't do so).
> This fact gave me another idea :
> If we don't want to log "nreserved" or some other similar value, maybe
> we should add logging after the "ReserveWorkers" function? I.e. if some
> workers cannot be reserved, we can emit a log like "out of parallel
> autovacuum workers. you should increase the av_max_parallel_workers
> parameter". Having this log can help the user distinguish between
> max_parallel_workers/av_max_parallel_workers shortage situations.
> What do you think?

My point is that the process of determining the number of workers
planned to launch is somewhat unclear to users in both cases. We
consider not only GUCs such as max_parallel_maintenance_workers but
also index AM definitions (i.e., amparallelvacuumoption) and index
sizes etc. But I agree that providing more detailed logs might help
users understand and notice the av_max_parallel_workers shortage.

BTW thes discussion made me think to change av_max_parallel_workers to
control the number of workers per-autovacuum worker instead (with
renaming it to say max_parallel_workers_per_autovacuum_worker). Users
can compute the maximum number of parallel workers the system requires
by (autovacuum_worker_slots *
max_parallel_workers_per_autovacuum_worker). We would no longer need
the reservation and release logic. I'd like to hear your opinion.

>
> Summary :
> 1)
> I think that we should not look at maintenance vacuum while
> considering how to inform the user about parameters shortage for autovacuum,
> because we have a more complicated situation in case of autovacuum.
> 2)
> I suggest adding a separate log that will be emitted every time we are
> unable to start workers due to a shortage of av_max_parallel_workers.

For (2), do you mean that the worker writes these logs regardless of
log_autovacuum_min_duration setting? I'm concerned that the server
logs would be flooded with these logs especially when multiple
autovacuum workers are working very actively and the system is facing
a shortage of av_max_parallel_workers.

>
> > * 0004 patch:
> >
> > Can we write the same test cases while not relying on the 0002 patch
> > (i.e., worker usage logging)? We check the worker usage log at two
> > places in the regression tests. The idea is that we write the number
> > of workers planned, reserved, and launched in DEBUG log level and
> > check these logs in the regression tests. The patch 0001, 0003, and
> > 0004 can be merged before push while we might want more discussion on
> > the 0002 patch.
>
> Possibly we can introduce a new injection point, or a new log for it.
> But I assume that the subject of discussion in patch 0002 is the
> "nreserved" logic, and "nlaunched/nplanned" logic does not raise any
> questions.
>
> I suggest splitting the 0002 patch into two parts : 1) basic logic and
> 2) additional logic with nreserved or something else. The second part can be
> discussed in isolation from the patch set. If we do this, we may not have to
> change the tests. What do you think?

Assuming the basic logic means nlaunched/nplanned logic, yes, it would
be a nice idea. I think user-facing logging stuff can be developed as
an improvement independent from the main parallel autovacuum patch.
It's ideal if we can implement the main patch (with tests) without
relying on the user-facing logging.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


Reply via email to