Hi Thomas
I found some other problems which I want to share my change with you to make 
you confirm.
<1> I changed the code in smgr_alloc_sr to avoid dead lock.
​
  LWLockAcquire(mapping_lock, LW_EXCLUSIVE);
  flags = smgr_lock_sr(sr);
  Assert(flags & SR_SYNCING(forknum));
  + flags &= ~SR_SYNCING(forknum);
  if (flags & SR_JUST_DIRTIED(forknum))
  {
   /*
    * Someone else dirtied it while we were syncing, so we can't mark
    * it clean.  Let's give up on this SR and go around again.
    */
   smgr_unlock_sr(sr, flags);
   LWLockRelease(mapping_lock);
   goto retry;
  }
  /* This fork is clean! */
 - flags &= ~SR_SYNCING(forknum);
  flags &= ~SR_DIRTY(forknum);
 }

In smgr_drop_sr, if the sr is SR_SYNCING, it will retry until the sr is not 
SR_SYNCING.  But in smgr_alloc_sr, if the sr is SR_JUST_DIRTIED, it will retry 
to get another sr, although it has been synced by smgrimmedsync, the flag 
SR_SYNCING  doesn't changed. This might cause dead lock. So I changed your 
patch as above.

<2> I changed the code in smgr_drop_sr to avoid some corner cases
/* Mark it invalid and drop the mapping. */
smgr_unlock_sr(sr, ~SR_VALID);
+ for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
+  sr->nblocks[forknum] = InvalidBlockNumber;
hash_search_with_hash_value(sr_mapping_table,
       &reln->smgr_rnode,
       hash,
       HASH_REMOVE,
       NULL);

smgr_drop_sr just removes the hash entry from sr_mapping_table, but doesn't 
remove the sr_pool. But in smgrnblocks_fast, it just get sr from sr_pool, so I 
add some codes as above to avoid some corner cases get an unexpected result 
from smgrnblocks_fast. Is it necessary, I also want some advice from you.

Thanks a lot.
Buzhen


 ------------------原始邮件 ------------------
发件人:Thomas Munro <thomas.mu...@gmail.com>
发送时间:Tue Dec 29 22:52:59 2020
收件人:陈佳昕(步真) <buzhen....@alibaba-inc.com>
抄送:Amit Kapila <amit.kapil...@gmail.com>, Konstantin Knizhnik 
<k.knizh...@postgrespro.ru>, PostgreSQL Hackers 
<pgsql-hackers@lists.postgresql.org>
主题:Re: Re: Cache relation sizes?
On Wed, Dec 23, 2020 at 1:31 AM 陈佳昕(步真) <buzhen....@alibaba-inc.com> wrote:
> I studied your patch these days and found there might be a problem.
> When execute 'drop database', the smgr shared pool will not be removed 
> because of no call 'smgr_drop_sr'. Function 'dropdb' in dbcommands.c remove 
> the buffer from bufferpool and unlink the real files by 'rmtree', It doesn't 
> call smgrdounlinkall, so the smgr shared cache will not be dropped although 
> the table has been removed. This will cause some errors when smgr_alloc_str 
> -> smgropen、smgrimmedsync. Table file has been removed, so smgropen and 
> smgrimmedsync will get a unexpected result.

Hi Buzhen,

Thanks, you're right -- it needs to scan the pool of SRs and forget
everything from the database you're dropping.


Reply via email to