On Mon, Sep 15, 2025 at 11:50 AM Alexander Korotkov
<[email protected]> wrote:
>
> Hi!
>
> On Tue, Aug 19, 2025 at 12:04 AM Masahiko Sawada <[email protected]> 
> wrote:
> >
> > On Mon, Aug 18, 2025 at 1:31 AM Daniil Davydov <[email protected]> wrote:
> > >
> > >
> > > On Fri, Aug 15, 2025 at 3:41 AM Masahiko Sawada <[email protected]> 
> > > wrote:
> > > >
> > >
> > > > 2. when an autovacuum worker (not parallel vacuum worker) who uses
> > > > parallel vacuum gets SIGHUP, it errors out with the error message
> > > > "parameter "max_stack_depth" cannot be set during a parallel
> > > > operation". Autovacuum checks the configuration file reload in
> > > > vacuum_delay_point(), and while reloading the configuration file, it
> > > > attempts to set max_stack_depth in
> > > > InitializeGUCOptionsFromEnvironment() (which is called by
> > > > ProcessConfigFileInternal()). However, it cannot change
> > > > max_stack_depth since the worker is in parallel mode but
> > > > max_stack_depth doesn't have GUC_ALLOW_IN_PARALLEL flag. This doesn't
> > > > happen in regular backends who are using parallel queries because they
> > > > check the configuration file reload at the end of each SQL command.
> > > >
> > >
> > > Hm, this is a really serious problem. I see only two ways to solve it 
> > > (both are
> > > not really good) :
> > > 1)
> > > Do not allow processing of the config file during parallel autovacuum
> > > execution.
> > >
> > > 2)
> > > Teach the autovacuum to enter parallel mode only during the index 
> > > vacuum/cleanup
> > > phase. I'm a bit wary about it, because the design says that we should
> > > be in parallel
> > > mode during the whole parallel operation. But actually, if we can make
> > > sure that all
> > > launched workers are exited, I don't see reasons, why can't we just
> > > exit parallel mode
> > > at the end of parallel_vacuum_process_all_indexes.
> > >
> > > What do you think about it?
> >
> > Hmm, given that we're trying to support parallel heap vacuum on
> > another thread[1] and we will probably support it in autovacuums, it
> > seems to me that these approaches won't work.
> >
> > Another idea would be to allow autovacuum workers to process the
> > config file even in parallel mode. GUC changes in the leader worker
> > would not affect parallel vacuum workers, but it is fine to me. In the
> > context of autovacuum, only specific GUC parameters related to
> > cost-based delays need to be affected also to parallel vacuum workers.
> > Probably we need some changes to compute_parallel_delay() so that
> > parallel workers can compute the sleep time based on the new
> > vacuum_cost_limit and vacuum_cost_delay after the leader process
> > (i.e., autovacuum worker) reloads the config file.
> >
> > >
> > > Again, thank you for the review. Please, see v10 patches (only 0001
> > > has been changed) :
> > > 1) Reserve and release workers only inside 
> > > parallel_vacuum_process_all_indexes.
> > > 2) Add try/catch block to the parallel_vacuum_process_all_indexes, so we 
> > > can
> > > release workers even after an error. This required adding a static
> > > variable to account
> > > for the total number of reserved workers (av_nworkers_reserved).
> > > 3) Cap autovacuum_max_parallel_workers by max_worker_processes only inside
> > > autovacuum code. Assign hook has been removed.
> > > 4) Use shmem value for determining the maximum number of parallel 
> > > autovacuum
> > > workers (eliminate race condition between launcher and leader process).
> >
> > Thank you for updating the patch! I'll review the new version patches.
>
> I've rebased this patchset to the current master.  That required me to move 
> the new GUC definition to guc_parameters.dat.  Also, I adjusted typedefs.list 
> and made pgindent.  Some notes about the patch.

Thank you for updating the patch!

> I see parallel_vacuum_process_all_indexes() have a TRY/CATCH block.  As I 
> heard, the overhead of setting/doing jumps is platform-dependent, and not 
> harmless on some platforms.  Therefore, can we skip TRY/CATCH block for 
> non-autovacuum vacuum?  Possibly we could move it to AutoVacWorkerMain(), 
> that would save us from repeatedly setting a jump in autovacuum workers too.

I wonder if using the TRY/CATCH block is not enough to ensure that
autovacuum workers release the reserved parallel workers in FATAL
cases.

> In general, I think this patchset badly lack of testing.  I think it needs 
> tap tests checking from the logs that autovacuum has been done in parallel.  
> Also, it would be good to set up some injection points, and check that 
> reserved autovacuum parallel workers are getting released correctly in the 
> case of errors.

+1

IIUC the patch still has one problem in terms of reloading the
configuration parameters during parallel mode as I mentioned
before[1].

Regards,

[1] 
https://www.postgresql.org/message-id/CAD21AoBRRXbNJEvCjS-0XZgCEeRBzQPKmrSDjJ3wZ8TN28vaCQ%40mail.gmail.com

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


Reply via email to