On 09.10.2019 9:06, ideriha.take...@fujitsu.com wrote:
Hi, Konstantin

From: Konstantin Knizhnik [mailto:k.knizh...@postgrespro.ru]
I do not completely understand from your description when are are going
to evict entry from local cache?
Just once transaction is committed? I think it will be more efficient
to also specify memory threshold for local cache size and use LRU or
some other eviction policy to remove data from local cache.
So if working set (accessed relations) fits in local cache limit, there
will be no performance penalty comparing with current implementation.
There should be completely on difference on pgbench or other benchmarks
with relatively small number of relations.

If entry is not found in local cache, then we should look for it in
global cache and in case of double cache miss - read it from the disk.
I do not completely understand why we need to store references to
global cache entries in local cache and use reference counters for global cache
entries.
Why we can not maintain just two independent caches?

While there are really databases with hundreds and even thousands of
tables, application is still used to work with only some small subset of them.
So I think that "working set" can still fit in memory.  This is why I
think that in case of local cache miss and global cache hit, we should
copy data from global cache to local cache to make it possible to access it in 
future
without any sycnhronization.
As far as we need to keep all uncommitted data in local cache, there is
still a chance of local memory overflow (if some transaction creates or
alters too much number of tables).
But I think that it is very exotic and rare use case. The problem with
memory overflow usually takes place if we have large number of
backends, each maintaining its own  catalog cache.
So I think that we should have "soft" limit for local cache and "hard"
limit for global cache.
Oh, I didn't come up this idea at all. So local cache is sort of 1st cache and 
global cache
is second cache. That sounds great.
It would be good for performance and also setting two guc parameter for 
limiting local
cache and global cache gives complete memory control for DBA.
Yeah, uncommitted data should be in local but it's the only exception.
No need to keep track of reference to global cache from local cache header 
seems less
complex for implementation. I'll look into the design.
(After sleeping on it)
What happens if there is a cache miss in local memory and it's found in global?
One possible way is to copy the found global cache into local memory. If so,
I'm just anxious about the cost of memcpy. Another way is, for example,
leaving the global cache and not copying it into local memory. In this case,
every time searching the global cache seems expensive because we need to
get lock for at least the partition of hash table.

The architecture that the local cache holding the reference to global cache
(strictly speaking, holding the pointer to pointer to global cache ) is complex
but once a process searches global cache, after that it can get global cache by
checking the reference is still valid and traversing some pointers.

Regards,
Takeshi Ideriha

If the assumption that working set of backend (set of tables accessed by this session) is small enough to fit in backend's memory is true, then global meta cache is not needed at all: it is enough to limit size of local cache and implement some eviction algorithm. If data is not found in local cache, then it is loaded from catalog in standard way. It is the simplest solution and may be it is good starting point for work in this direction.

If there are cases when application need to work with hundreds of tables (partitioning?) then we can either store in local cache references to global cache either perform two lookups: in local and global caches.




Reply via email to