Re: BufferAlloc: don't take two simultaneous locks

Kyotaro Horiguchi Mon, 14 Mar 2022 01:13:09 -0700

At Mon, 14 Mar 2022 09:15:11 +0300, Yura Sokolov <y.soko...@postgrespro.ru> 
wrote in 
> В Пн, 14/03/2022 в 14:31 +0900, Kyotaro Horiguchi пишет:
> > I'd like to ask you to remove nalloced from partitions then add a
> > global atomic for the same use?
> 
> I really believe it should be global. I made it per-partition to
> not overcomplicate first versions. Glad you tell it.
> 
> I thought to protect it with freeList[0].mutex, but probably atomic
> is better idea here. But which atomic to chose: uint64 or uint32?
> Based on sizeof(long)?
> Ok, I'll do in next version.


Current nentries is a long (= int64 on CentOS). And uint32 can support
roughly 2^32 * 8192 = 32TB shared buffers, which doesn't seem safe
enough.  So it would be uint64.

> Whole get_hash_entry look strange.
> Doesn't it better to cycle through partitions and only then go to
> get_hash_entry?
> May be there should be bitmap for non-empty free lists? 32bit for
> 32 partitions. But wouldn't bitmap became contention point itself?

The code puts significance on avoiding contention caused by visiting
freelists of other partitions.  And perhaps thinks that freelist
shortage rarely happen.

I tried pgbench runs with scale 100 (with 10 threads, 10 clients) on
128kB shared buffers and I saw that get_hash_entry never takes the
!element_alloc() path and always allocate a fresh entry, then
saturates at 30 new elements allocated at the medium of a 100 seconds
run.

Then, I tried the same with the patch, and I am surprized to see that
the rise of the number of newly allocated elements didn't stop and
went up to 511 elements after the 100 seconds run.  So I found that my
concern was valid.  The change in dynahash actually
continuously/repeatedly causes lack of free list entries.  I'm not
sure how much the impact given on performance if we change
get_hash_entry to prefer other freelists, though.


By the way, there's the following comment in StrategyInitalize.

>        * Initialize the shared buffer lookup hashtable.
>        *
>        * Since we can't tolerate running out of lookup table entries, we must 
> be
>        * sure to specify an adequate table size here.  The maximum 
> steady-state
>        * usage is of course NBuffers entries, but BufferAlloc() tries to 
> insert
>        * a new entry before deleting the old.  In principle this could be
>        * happening in each partition concurrently, so we could need as many as
>        * NBuffers + NUM_BUFFER_PARTITIONS entries.
>        */
>       InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS);

"but BufferAlloc() tries to insert a new entry before deleting the
old." gets false by this patch but still need that additional room for
stashed entries.  It seems like needing a fix.



regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: BufferAlloc: don't take two simultaneous locks

Reply via email to