On Mon, Sep 15, 2025 at 11:50 AM Alexander Korotkov <[email protected]> wrote: > > Hi! > > On Tue, Aug 19, 2025 at 12:04 AM Masahiko Sawada <[email protected]> > wrote: > > > > On Mon, Aug 18, 2025 at 1:31 AM Daniil Davydov <[email protected]> wrote: > > > > > > > > > On Fri, Aug 15, 2025 at 3:41 AM Masahiko Sawada <[email protected]> > > > wrote: > > > > > > > > > > > 2. when an autovacuum worker (not parallel vacuum worker) who uses > > > > parallel vacuum gets SIGHUP, it errors out with the error message > > > > "parameter "max_stack_depth" cannot be set during a parallel > > > > operation". Autovacuum checks the configuration file reload in > > > > vacuum_delay_point(), and while reloading the configuration file, it > > > > attempts to set max_stack_depth in > > > > InitializeGUCOptionsFromEnvironment() (which is called by > > > > ProcessConfigFileInternal()). However, it cannot change > > > > max_stack_depth since the worker is in parallel mode but > > > > max_stack_depth doesn't have GUC_ALLOW_IN_PARALLEL flag. This doesn't > > > > happen in regular backends who are using parallel queries because they > > > > check the configuration file reload at the end of each SQL command. > > > > > > > > > > Hm, this is a really serious problem. I see only two ways to solve it > > > (both are > > > not really good) : > > > 1) > > > Do not allow processing of the config file during parallel autovacuum > > > execution. > > > > > > 2) > > > Teach the autovacuum to enter parallel mode only during the index > > > vacuum/cleanup > > > phase. I'm a bit wary about it, because the design says that we should > > > be in parallel > > > mode during the whole parallel operation. But actually, if we can make > > > sure that all > > > launched workers are exited, I don't see reasons, why can't we just > > > exit parallel mode > > > at the end of parallel_vacuum_process_all_indexes. > > > > > > What do you think about it? > > > > Hmm, given that we're trying to support parallel heap vacuum on > > another thread[1] and we will probably support it in autovacuums, it > > seems to me that these approaches won't work. > > > > Another idea would be to allow autovacuum workers to process the > > config file even in parallel mode. GUC changes in the leader worker > > would not affect parallel vacuum workers, but it is fine to me. In the > > context of autovacuum, only specific GUC parameters related to > > cost-based delays need to be affected also to parallel vacuum workers. > > Probably we need some changes to compute_parallel_delay() so that > > parallel workers can compute the sleep time based on the new > > vacuum_cost_limit and vacuum_cost_delay after the leader process > > (i.e., autovacuum worker) reloads the config file. > > > > > > > > Again, thank you for the review. Please, see v10 patches (only 0001 > > > has been changed) : > > > 1) Reserve and release workers only inside > > > parallel_vacuum_process_all_indexes. > > > 2) Add try/catch block to the parallel_vacuum_process_all_indexes, so we > > > can > > > release workers even after an error. This required adding a static > > > variable to account > > > for the total number of reserved workers (av_nworkers_reserved). > > > 3) Cap autovacuum_max_parallel_workers by max_worker_processes only inside > > > autovacuum code. Assign hook has been removed. > > > 4) Use shmem value for determining the maximum number of parallel > > > autovacuum > > > workers (eliminate race condition between launcher and leader process). > > > > Thank you for updating the patch! I'll review the new version patches. > > I've rebased this patchset to the current master. That required me to move > the new GUC definition to guc_parameters.dat. Also, I adjusted typedefs.list > and made pgindent. Some notes about the patch.
Thank you for updating the patch! > I see parallel_vacuum_process_all_indexes() have a TRY/CATCH block. As I > heard, the overhead of setting/doing jumps is platform-dependent, and not > harmless on some platforms. Therefore, can we skip TRY/CATCH block for > non-autovacuum vacuum? Possibly we could move it to AutoVacWorkerMain(), > that would save us from repeatedly setting a jump in autovacuum workers too. I wonder if using the TRY/CATCH block is not enough to ensure that autovacuum workers release the reserved parallel workers in FATAL cases. > In general, I think this patchset badly lack of testing. I think it needs > tap tests checking from the logs that autovacuum has been done in parallel. > Also, it would be good to set up some injection points, and check that > reserved autovacuum parallel workers are getting released correctly in the > case of errors. +1 IIUC the patch still has one problem in terms of reloading the configuration parameters during parallel mode as I mentioned before[1]. Regards, [1] https://www.postgresql.org/message-id/CAD21AoBRRXbNJEvCjS-0XZgCEeRBzQPKmrSDjJ3wZ8TN28vaCQ%40mail.gmail.com -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
