RE: Assert while autovacuum was executing

Zhijie Hou (Fujitsu) Tue, 20 Jun 2023 21:12:44 -0700

On Tuesday, June 20, 2023 5:44 PM Amit Kapila <[email protected]> wrote:
> 
> On Mon, Jun 19, 2023 at 5:13 PM Amit Kapila <[email protected]>
> wrote:
> >
> > On Sun, Jun 18, 2023 at 12:18 AM Peter Geoghegan <[email protected]> wrote:
> > >
> > > On Sat, Jun 17, 2023 at 11:29 AM Jaime Casanova
> > > <[email protected]> wrote:
> > > > I have been testing 16beta1, last commit
> > > > a14e75eb0b6a73821e0d66c0d407372ec8376105
> > > > I just let sqlsmith do its magic before trying something else, and
> > > > today I found a core with the attached backtrace.
> > >
> > > The assertion that fails is the IsPageLockHeld assertion from commit
> 72e78d831a.
> > >
> >
> > I'll look into this and share my analysis.
> >
> 
> This failure mode appears to be introduced in commit 7d71d3dd08 (in
> PG16) where we started to process the config file after acquiring page lock
> during autovacuum. The problem here is that after acquiring page lock (a
> heavy-weight lock), while processing the config file, we tried to access the
> catalog cache which in turn attempts to acquire a lock on the catalog 
> relation,
> and that leads to the assertion failure. This is because of an existing rule 
> that we
> don't acquire any other heavyweight lock while holding the page lock except
> for relation extension. I think normally we should be careful about the lock
> ordering for heavy-weight locks to avoid deadlocks but here there may not be
> any existing hazard in acquiring a lock on the catalog table after acquiring 
> page
> lock on the gin index's metapage as I am not aware of a scenario where we can
> acquire them in reverse order. One naive idea is to have a parameter like
> vacuum_config_reload_safe to allow config reload during autovacuum and
> make it false for the gin index cleanup code path.


I also think it would be better to skip reloading config in page lock cases to 
be consistent
with the rule. And here is the patch which does the same.

I tried to reproduce the assert failure(before applying the patch) using the 
following steps:

1. I added a sleep before vacuum_delay_point in ginInsertCleanup and 
LOG("attach this process") before sleeping.
2. And changed few GUC to make autovacuum happen more frequently and then start 
the server.
-
autovacuum_naptime = 5s
autovacuum_vacuum_threshold = 1
autovacuum_vacuum_insert_threshold = 100
-

3. Then I execute the following sqls:
-
create table gin_test_tbl(i int4[]);
create index gin_test_idx on gin_test_tbl using gin (i)
  with (fastupdate = on, gin_pending_list_limit = 4096);
insert into gin_test_tbl select array[1, 2, g] from generate_series(1, 20000) g;
insert into gin_test_tbl select array[1, 3, g] from generate_series(1, 1000) g;
-
4. After a while, I can see the LOG from autovacuum worker and then use gdb to 
attach to the autovacuum worker.
5. When the autovacuum worker is blocked, I changed the 
"default_text_search_config = 'pg_catalog.public'" in configure file and reload 
it.
6. Release the autovacuum worker and then I can see the assert failure.

And I can see the assert failure doesn't happen after applying the patch.

> 
> The reason for the existing rule for page lock and relation extension locks 
> was
> to not allow them to participate in group locking which will allow other 
> parallel
> operations like a parallel vacuum where multiple workers can work on the same
> index, or parallel inserts, parallel copy, etc. The related commits are 
> 15ef6ff4b9,
> 72e78d831ab, 85f6b49c2c, and 3ba59ccc89. See 3ba59ccc89 for more details
> (To allow parallel inserts and parallel copy, we have ensured that relation
> extension and page locks don't participate in group locking which means such
> locks can conflict among the same group members. This is required as it is no
> safer for two related processes to extend the same relation or perform clean
> up in gin indexes at a time than for unrelated processes to do the same....).

Best Regards,
Hou zj

0001-disallow-reloading-config-when-holding-the-page-lock.patch
Description: 0001-disallow-reloading-config-when-holding-the-page-lock.patch

RE: Assert while autovacuum was executing

Reply via email to