Re: Global shared meta cache

2020-02-01 Thread Tomas Vondra

This patch was broken and waiting for author since early December, so
I've marked it as returned with feedback. Feel free to resubmit an
updated version to a future commitfest.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Global shared meta cache

2019-11-30 Thread Michael Paquier
On Wed, Nov 06, 2019 at 02:55:30AM +, ideriha.take...@fujitsu.com wrote:
> Thank you for the reply.

Latest patch does not apply.  Please send a rebase.  Patch moved to
next CF, waiting on author.

Bip.
--
Michael


signature.asc
Description: PGP signature


RE: Global shared meta cache

2019-11-05 Thread ideriha.take...@fujitsu.com
>From: Konstantin Knizhnik [mailto:k.knizh...@postgrespro.ru]
>If the assumption that working set of backend (set of tables accessed by this 
>session)
>is small enough to fit in backend's memory is true, then global meta cache is 
>not
>needed at all: it is enough to limit size of local cache and implement some 
>eviction
>algorithm.
>If data is not found in local cache, then it is loaded from catalog in 
>standard way.
>It is the simplest solution and may be it is good starting point for work in 
>this direction.

Thank you for the reply. 

I introduced GUC for users to choose if they want to use this feature or not.
But as you stated, if data size is not so much big, my suggestion does too much 
and simple threshold is enough.
The idea of threashold has been discussed in another thread, so I'd like to 
discuss it in that thread.
Though it's not active these days, ideas having been discussed are memory 
limit, access time limit, and hybrid.
It seems to me that discussion is converged into the idea of eviction by access 
timestamp. 
https://www.postgresql.org/message-id/flat/20161219.201505.11562604.horiguchi.kyot...@lab.ntt.co.jp
  

>If there are cases when application need to work with hundreds of tables
>(partitioning?) then we can either store in local cache references to global 
>cache either
>perform two lookups: in local and global caches.

I think this is my target. In case of especially many partitioned table, 
and many (more than 100) columns  and so many backends, sharing 
cache would have more benefits for memory usage and performance 
than having only simple threshold.

I did experiment before. One parent table has about 145 interger columns and 
this table is partitioned into about 350 child tables in average.
There is 11 parent tables and about 3850 tables in total.
When I did "select * from parent_table" to 11 parent tables, only 
CacheMemoryContext
consumed about 0.37GB and in case of 100 backends, it consumed about 37GB.
This is because he number of system catalog cache for pg_statistics is very 
large
(about 577,000 entries). This number is almost same as the number of columns 
(145) times
the number of tables (3850). (Sorry that the model and figures are not simple 
to understand.)

By the way, in my current patch there are some redundant codes.
For example, LWLocks are used too much even if you can actually use spin locks.
Another thing is increasing/decreasing reference count of local reference even 
if 
local reference cache doesn't need to be protected.
I'll fix these things and submit statistics about memory usage and performance.

Regards,
Takeshi Ideriha


Re: Global shared meta cache

2019-10-09 Thread Konstantin Knizhnik




On 09.10.2019 9:06, ideriha.take...@fujitsu.com wrote:

Hi, Konstantin


From: Konstantin Knizhnik [mailto:k.knizh...@postgrespro.ru]
I do not completely understand from your description when are are going
to evict entry from local cache?
Just once transaction is committed? I think it will be more efficient
to also specify memory threshold for local cache size and use LRU or
some other eviction policy to remove data from local cache.
So if working set (accessed relations) fits in local cache limit, there
will be no performance penalty comparing with current implementation.
There should be completely on difference on pgbench or other benchmarks
with relatively small number of relations.

If entry is not found in local cache, then we should look for it in
global cache and in case of double cache miss - read it from the disk.
I do not completely understand why we need to store references to
global cache entries in local cache and use reference counters for global cache

entries.

Why we can not maintain just two independent caches?

While there are really databases with hundreds and even thousands of
tables, application is still used to work with only some small subset of them.
So I think that "working set" can still fit in memory.  This is why I
think that in case of local cache miss and global cache hit, we should
copy data from global cache to local cache to make it possible to access it in 
future

without any sycnhronization.

As far as we need to keep all uncommitted data in local cache, there is
still a chance of local memory overflow (if some transaction creates or
alters too much number of tables).
But I think that it is very exotic and rare use case. The problem with
memory overflow usually takes place if we have large number of
backends, each maintaining its own  catalog cache.
So I think that we should have "soft" limit for local cache and "hard"
limit for global cache.

Oh, I didn't come up this idea at all. So local cache is sort of 1st cache and 
global cache
is second cache. That sounds great.
It would be good for performance and also setting two guc parameter for 
limiting local
cache and global cache gives complete memory control for DBA.
Yeah, uncommitted data should be in local but it's the only exception.
No need to keep track of reference to global cache from local cache header 
seems less
complex for implementation. I'll look into the design.

(After sleeping on it)
What happens if there is a cache miss in local memory and it's found in global?
One possible way is to copy the found global cache into local memory. If so,
I'm just anxious about the cost of memcpy. Another way is, for example,
leaving the global cache and not copying it into local memory. In this case,
every time searching the global cache seems expensive because we need to
get lock for at least the partition of hash table.

The architecture that the local cache holding the reference to global cache
(strictly speaking, holding the pointer to pointer to global cache ) is complex
but once a process searches global cache, after that it can get global cache by
checking the reference is still valid and traversing some pointers.

Regards,
Takeshi Ideriha


If the assumption that working set of backend (set of tables accessed by 
this session) is small enough to fit in backend's memory is true,
then global meta cache is not needed at all: it is enough to limit size 
of local cache and implement some eviction algorithm.
If data is not found in local cache, then it is loaded from catalog in 
standard way.
It is the simplest solution and may be it is good starting point for 
work in this direction.


If there are cases when application need to work with hundreds of tables 
(partitioning?) then we can either store in local cache references to 
global cache either perform two lookups: in local and global caches.







RE: Global shared meta cache

2019-10-09 Thread ideriha.take...@fujitsu.com
Hi, Konstantin

>>From: Konstantin Knizhnik [mailto:k.knizh...@postgrespro.ru]
>>I do not completely understand from your description when are are going
>>to evict entry from local cache?
>>Just once transaction is committed? I think it will be more efficient
>>to also specify memory threshold for local cache size and use LRU or
>>some other eviction policy to remove data from local cache.
>>So if working set (accessed relations) fits in local cache limit, there
>>will be no performance penalty comparing with current implementation.
>>There should be completely on difference on pgbench or other benchmarks
>>with relatively small number of relations.
>>
>>If entry is not found in local cache, then we should look for it in
>>global cache and in case of double cache miss - read it from the disk.
>>I do not completely understand why we need to store references to
>>global cache entries in local cache and use reference counters for global 
>>cache
>entries.
>>Why we can not maintain just two independent caches?
>>
>>While there are really databases with hundreds and even thousands of
>>tables, application is still used to work with only some small subset of them.
>>So I think that "working set" can still fit in memory.  This is why I
>>think that in case of local cache miss and global cache hit, we should
>>copy data from global cache to local cache to make it possible to access it 
>>in future
>without any sycnhronization.
>>
>>As far as we need to keep all uncommitted data in local cache, there is
>>still a chance of local memory overflow (if some transaction creates or
>>alters too much number of tables).
>>But I think that it is very exotic and rare use case. The problem with
>>memory overflow usually takes place if we have large number of
>>backends, each maintaining its own  catalog cache.
>>So I think that we should have "soft" limit for local cache and "hard"
>>limit for global cache.
>
>Oh, I didn't come up this idea at all. So local cache is sort of 1st cache and 
>global cache
>is second cache. That sounds great.
>It would be good for performance and also setting two guc parameter for 
>limiting local
>cache and global cache gives complete memory control for DBA.
>Yeah, uncommitted data should be in local but it's the only exception.
>No need to keep track of reference to global cache from local cache header 
>seems less
>complex for implementation. I'll look into the design.

(After sleeping on it)
What happens if there is a cache miss in local memory and it's found in global?
One possible way is to copy the found global cache into local memory. If so, 
I'm just anxious about the cost of memcpy. Another way is, for example, 
leaving the global cache and not copying it into local memory. In this case, 
every time searching the global cache seems expensive because we need to 
get lock for at least the partition of hash table.

The architecture that the local cache holding the reference to global cache 
(strictly speaking, holding the pointer to pointer to global cache ) is complex
but once a process searches global cache, after that it can get global cache by 
checking the reference is still valid and traversing some pointers.

Regards,
Takeshi Ideriha


RE: Global shared meta cache

2019-10-08 Thread ideriha.take...@fujitsu.com
Hi, Alvaro

>
>The last patch we got here (a prototype) was almost a year ago.  There was
>substantial discussion about it, but no new version of the patch has been 
>posted.  Are
>we getting a proper patch soon, or did we give up on the approach entirely?

I'm sorry for the late response. I started to work for it again for coming up 
commitfest.

Regards,
Takeshi Ideriha


RE: Global shared meta cache

2019-10-08 Thread ideriha.take...@fujitsu.com
Hi, Konstantin 

I'm very sorry for the late response and thank you for your feedback.
(I re-sent this email because my email address changed and couldn't deliver to 
hackers.)

>From: Konstantin Knizhnik [mailto:k.knizh...@postgrespro.ru]
>
>Takeshi-san,
>
>I am sorry for late response - I just waited new version of the patch 
>from you for review.

Though I haven't incorporated your idea, I made PoC patch, which supports 
regular create table, select, and drop table.

TBH, current patch is not sophisticated so much. 
It failed some installcheck items with global catalog cache on and has around 
2k LOC.

>I read your last proposal and it seems to be very reasonable.
> From my point of view we can not reach acceptable level of performance 
>if we do not have local cache at all.
>So, as you proposed, we should maintain local cache for uncommitted data.
Yeah, I did this in my patch.

>I think that size of global cache should be limited (you have introduced GUC 
>for it).
>In principle it is possible to use dynamic shared memory and have 
>unlimited global cache.
>But I do not see much sense in it.
Yes. I limit the size for global cache. Right now it doesn't support eviction 
policy like LRU.
 
>I do not completely understand from your description when are are going 
>to evict entry from local cache?
>Just once transaction is committed? I think it will be more efficient 
>to also specify memory threshold for local cache size and use LRU or 
>some other eviction policy to remove data from local cache.
>So if working set (accessed relations) fits in local cache limit, there 
>will be no performance penalty comparing with current implementation.
>There should be completely on difference on pgbench or other benchmarks 
>with relatively small number of relations.
>
>If entry is not found in local cache, then we should look for it in 
>global cache and in case of double cache miss - read it from the disk.
>I do not completely understand why we need to store references to 
>global cache entries in local cache and use reference counters for global 
>cache entries.
>Why we can not maintain just two independent caches?
>
>While there are really databases with hundreds and even thousands of 
>tables, application is still used to work with only some small subset of them.
>So I think that "working set" can still fit in memory.  This is why I 
>think that in case of local cache miss and global cache hit, we should 
>copy data from global cache to local cache to make it possible to access it in 
>future without any sycnhronization.
>
>As far as we need to keep all uncommitted data in local cache, there is 
>still a chance of local memory overflow (if some transaction creates or 
>alters too much number of tables).
>But I think that it is very exotic and rare use case. The problem with 
>memory overflow usually takes place if we have large number of 
>backends, each maintaining its own  catalog cache.
>So I think that we should have "soft" limit for local cache and "hard"
>limit for global cache.

Oh, I didn't come up this idea at all. So local cache is sort of 1st cache and 
global cache is second cache. That sounds great. 
It would be good for performance and also setting two guc parameter for 
limiting local cache and global cache gives complete memory control for DBA.
Yeah, uncommitted data should be in local but it's the only exception.
No need to keep track of reference to global cache from local cache header 
seems less complex for implementation. I'll look into the design.

>I didn't think much about cache invalidation. I read your proposal, but 
>frankly speaking do not understand why it should be so complicated.
>Why we can't immediately invalidate entry in global cache and lazily 
>(as it is done now using invalidation signals) invalidate local caches?
>

I was overthinking about when local/global cache is evicted. Simply the process 
reads the sinval messages then invalidate it. If the refcount is not zero, the 
process mark it dead to prevent other process from finding the obsoleted cache 
from global hash table.
The refcount of global cache is raised between SearchSysCache() and 
ReleaseSysCache().
Invalidation of global cache with refcount up would cause invalid memory access.

Regards,
Takeshi Ideriha


0002-POC-global-catalog-cache.patch
Description: 0002-POC-global-catalog-cache.patch


0001-MemoryContext-for-shared-memory-based-on-DSA.patch
Description: 0001-MemoryContext-for-shared-memory-based-on-DSA.patch


Re: Global shared meta cache

2019-09-25 Thread Alvaro Herrera
The last patch we got here (a prototype) was almost a year ago.  There
was substantial discussion about it, but no new version of the patch has
been posted.  Are we getting a proper patch soon, or did we give up on
the approach entirely?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Global shared meta cache

2019-08-01 Thread Konstantin Knizhnik

Takeshi-san,

I am sorry for late response - I just waited new version of the patch 
from you for review.

I read your last proposal and it seems to be very reasonable.
From my point of view we can not reach acceptable level of performance 
if we do not have local cache at all.

So, as you proposed, we should maintain local cache for uncommitted data.

I think that size of global cache should be limited (you have introduced 
GUC for it).
In principle it is possible to use dynamic shared memory and have 
unlimited global cache.

But I do not see much sense in it.

I do not completely understand from your description when are are going 
to evict entry from local cache?
Just once transaction is committed? I think it will be more efficient to 
also specify memory threshold for local cache size

and use LRU or some other eviction policy to remove data from local cache.

So if working set (accessed relations) fits in local cache limit, there 
will be no performance penalty comparing with current implementation.
There should be completely on difference on pgbench or other benchmarks 
with relatively small number of relations.


If entry is not found in local cache, then we should look for it in 
global cache and in case of double cache miss - read it from the disk.
I do not completely understand why we need to store references to global 
cache entries in local cache and use reference counters for global cache 
entries.

Why we can not maintain just two independent caches?

While there are really databases with hundreds and even thousands of 
tables, application is still used to work with only some small subset of 
them.
So I think that "working set" can still fit in memory.  This is why I 
think that in case of local cache miss and global cache hit, we should 
copy data from global cache to local cache

to make it possible to access it in future without any sycnhronization.

As far as we need to keep all uncommitted data in local cache, there is 
still a chance of local memory overflow (if some transaction creates or 
alters too much number of tables).
But I think that it is very exotic and rare use case. The problem with 
memory overflow usually takes place if we have large number of backends, 
each maintaining its own  catalog cache.
So I think that we should have "soft" limit for local cache and "hard" 
limit for global cache.


I didn't think much about cache invalidation. I read your proposal, but 
frankly speaking do not understand why it should be so complicated.
Why we can't immediately invalidate entry in global cache and lazily (as 
it is done now using invalidation signals) invalidate local caches?




On 26.06.2019 9:23, Ideriha, Takeshi wrote:

Hi, everyone.


From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com]
My current thoughts:
- Each catcache has (maybe partial) HeapTupleHeader
- put every catcache on shared memory and no local catcache
- but catcache for aborted tuple is not put on shared memory
- Hash table exists per kind of CatCache
- These hash tables exists for each database and shared
  - e.g) there is a hash table for pg_class of a DB

I talked about shared CatCache (SysCache) with Thomas at PGCon and he
suggested using sinval to control cache visibility instead of xid.
Base on this I've changed my design. I'll send some PoC patch in a week
but share my idea beforehand. I'm sorry this email is too long to read
but I'm happy if you have some comments.

Basically I won't make shared catcache as default, make it as option.

Both local and shared memory has hash tables of catcache. A shared hash
entry is catctup itself and a local hash entry is a pointer to the
shared catctup. Actually, local hash entry does not hold a direct pointer
but points to a handle of shared catctup. The handle points to shared
catctup and is located in shared memory. This is intended to avoid
dangling pointer of local hash entry due to eviction of shared catctup
by LRU. ( The detail about LRU will be written in another email because
I'll implement it later.)

* Search and Insert
Current postgres searches (local) hash table and if it's missed, search
the actual catalog (shared buffer and disk) and build the cache; build
the negative cache if not found.

In new architecture, if cache is not found in local hash table, postgres
tries to search shared one before consulting shared buffer. Here is a
detail. To begin with, postgres looks up the pointer in local hash
table. If it's found, it references the pointer and gets catctup. If
not, it searches the shared hash table and gets catctup and insert
its pointer into local hash table if the catctup is found. If it doesn't
exist in shared hash table either, postgres searches actual catalog and
build the cache and in most cases insert it into shared hash table
and its pointer to local one. The exception case is that the cache
is made from uncommitted catalog tuple, which must not be seen from
other process. So an uncommitted cache is built in local memory and

RE: Global shared meta cache

2019-06-26 Thread Ideriha, Takeshi
Hi, everyone.

>From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com]
>My current thoughts:
>- Each catcache has (maybe partial) HeapTupleHeader
>- put every catcache on shared memory and no local catcache
>- but catcache for aborted tuple is not put on shared memory
>- Hash table exists per kind of CatCache
>- These hash tables exists for each database and shared
>  - e.g) there is a hash table for pg_class of a DB

I talked about shared CatCache (SysCache) with Thomas at PGCon and he
suggested using sinval to control cache visibility instead of xid. 
Base on this I've changed my design. I'll send some PoC patch in a week 
but share my idea beforehand. I'm sorry this email is too long to read 
but I'm happy if you have some comments.

Basically I won't make shared catcache as default, make it as option.

Both local and shared memory has hash tables of catcache. A shared hash 
entry is catctup itself and a local hash entry is a pointer to the 
shared catctup. Actually, local hash entry does not hold a direct pointer
but points to a handle of shared catctup. The handle points to shared 
catctup and is located in shared memory. This is intended to avoid
dangling pointer of local hash entry due to eviction of shared catctup
by LRU. ( The detail about LRU will be written in another email because 
I'll implement it later.) 

* Search and Insert
Current postgres searches (local) hash table and if it's missed, search
the actual catalog (shared buffer and disk) and build the cache; build 
the negative cache if not found. 

In new architecture, if cache is not found in local hash table, postgres 
tries to search shared one before consulting shared buffer. Here is a 
detail. To begin with, postgres looks up the pointer in local hash 
table. If it's found, it references the pointer and gets catctup. If 
not, it searches the shared hash table and gets catctup and insert
its pointer into local hash table if the catctup is found. If it doesn't
exist in shared hash table either, postgres searches actual catalog and
build the cache and in most cases insert it into shared hash table 
and its pointer to local one. The exception case is that the cache
is made from uncommitted catalog tuple, which must not be seen from 
other process. So an uncommitted cache is built in local memory and
pushed directly into local table but not shared one. Lastly, if there
is no tuple we're looking for, put negative tuple into shared hash table.

* Invalidation and visibility control
Now let's talk about invalidation. Current cache invalidation is based
on local and shared invalidation queue (sinval). When transaction is 
committed, sinval msg is queued into shared one. Other processes read and
process sinval msgs at their own timing.

In shared catcache, I follow the current sinval in most parts. But I'll 
change the action when sinval msg is queued up and read by a process. 
When messages are added to shared queue, identify corresponding shared 
caches (matched by hash value) and turn their "obsolete flag" on. When 
sinval msg is read by a process, each process deletes the local hash 
entries (pointer to handler). Each process can see a shared catctup as
long as its pointer (local entry) is valid. Because sinval msgs are not
processed yet, it's ok to keep seeing the pointer to possibly old 
cache. After local entry is invalidated, its local process tries
to search shared hash table to always find a catctup whose obsolete flag
is off. The process can see the right shared cache after invalidation
messages are read because it checks the obsolete flag and also 
uncommitted cache never exists in shared memory at all. 

There is a subtle thing here. Always finding a shared catctup without
obsolete mark assumes that the process already read the sinval msgs. So
before trying to search shared table, I make the process read sinval msg.
After it's read, local cache status becomes consistent with the action
to get a new cache. This reading timing is almost same as current postgres
behavior because it's happened after local cache miss both in current 
design and mine. After cache miss in current design, a process opens
the relation and gets a heavyweight lock. At this time, in fact, it reads
the sinval msgs. (These things are well summarized in talking by Robert
Haas at PGCon[1]). 

Lastly, we need to invalidate a shared catctup itself at some point. But
we cannot delete is as long as someone sees it. So I'll introduce
refcounter. It's increased or decreased at the same timing when 
current postgres manipulates the local refcounter of catctup and catclist
to avoid catctup is deleted while catclist is used or vice versa (that
is SearchCatCache/RelaseCatCache). So shared catctup is deleted when
its shared refcount becomes zero and obsolete flag is on. Once it's 
vanished from shared cache, the obsolete cache never comes back again
because a process which tries to get cache but fails in shared hash table 
already read the sinval messages (in any 

RE: Global shared meta cache

2019-05-22 Thread Ideriha, Takeshi
>From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com]
>[TL; DR]
>The basic idea is following 4 points:
>A. User can choose which database to put a cache (relation and catalog) on 
>shared
>memory and how much memory is used 
>B. Caches of committed data are on the
>shared memory. Caches of uncommitted data are on the local memory.
>C. Caches on the shared memory have xid information (xmin, xmax) 
>D. Evict not recently used cache from shared memory

I updated some thoughts about B and C for CatCache.
I would be very happy if you put some comments.

>[B & C]
>Regarding B & C, the motivation is we don't want other backends to see 
>uncommitted
>tables.
>Search order is local memory -> shared memory -> disk.
>Local process searches cache in shared memory based from its own snapshot and 
>xid
>of cache.
>When cache is not found in shared memory, cache with xmin is made in shared
>memory ( but not in local one).
>
>When cache definition is changed by DDL, new cache is created in local one, 
>and thus
>next commands refer to local cache if needed.
>When it's committed, local cache is cleared and shared cache is updated. This 
>update
>is done by adding xmax to old cache and also make a new one with xmin. The idea
>behind adding a new one is that newly created cache (new table or altered 
>table) is
>likely to be used in next transactions. At this point maybe we can make use of 
>current
>invalidation mechanism, even though invalidation message to other backends is 
>not
>sent.

My current thoughts:
- Each catcache has (maybe partial) HeapTupleHeader
- put every catcache on shared memory and no local catcache
- but catcache for aborted tuple is not put on shared memory
- Hash table exists per kind of CatCache
- These hash tables exists for each database and shared
  - e.g) there is a hash table for pg_class of a DB

Why I'm leaning toward not to use local cache follows:
- At commit moment you need to copy local cache to global cache. This would 
delay
  the response time.
- Even if uncommitted catcache is on shared memory, other transaction cannot 
  see the cache. In my idea they have xid information and visibility is checked 
  by comparing xmin, xmax of catcache and snapshot.  

OK, then if we put catcache on shared memory, we need to check their visibility.
But if we use the exact same visibility check mechanism as heap tuple,
it takes much more steps compared to current local catcache search.
Current visibility check is based on snapshot check and commit/abort check.
So I'm thinking to only put in-progress caches or committed one. This would
save time for checking catcache status (commit/abort) while searching cache.
But basically I'm going to use current visibility check mechanism except commit/
abort check (in other words check of clog).

These are how it works.
- When creating a catcache, copy heap tuple with heapTupleHeader 
- When update/delete command for catalog tuple is finished, 
  update xmax of corresponding cache 
- If there is a cache whose xmin is aborted xid, delete the cache
- If there is a cache whose xmax is aborted xid, initialize xmax information
- At commit time, there is no action to the shared cache

Pending items are
- thoughts about shared relcache
- "vacuum" process for shared cache

Regards,
Ideriha Takeshi





RE: Global shared meta cache

2019-04-19 Thread Ideriha, Takeshi
>From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com]
>Do you have any thoughts?
>
Hi, I updated my idea, hoping get some feedback.

[TL; DR]
The basic idea is following 4 points:
A. User can choose which database to put a cache (relation and catalog) on 
shared memory and how much memory is used
B. Caches of committed data are on the shared memory. Caches of uncommitted 
data are on the local memory.
C. Caches on the shared memory have xid information (xmin, xmax)
D. Evict not recently used cache from shared memory


[A]
Regarding point A, I can imagine some databases are connected by lots of 
clients but others don't.
So I introduced a new parameter in postgresql.conf, "shared_meta_cache", 
which is disabled by default and needs server restart to enable.
ex. shared_meta_cache = 'db1:500MB, db2:100MB'. 

Some catcaches like pg_database are shared among the whole database, 
so such shared catcaches are allocated in a dedicated space within shared 
memory. 
This space can be controlled by "shared_meta_global_catcache" parameter, which 
is named after global directory.
But I want this parameter to be hidden in postgresql.conf to make it simple for 
users. It's too detailed.

[B & C]
Regarding B & C, the motivation is we don't want other backends to see 
uncommitted tables.
Search order is local memory -> shared memory -> disk. 
Local process searches cache in shared memory based from its own snapshot and 
xid of cache. 
When cache is not found in shared memory, cache with xmin is made in shared 
memory ( but not in local one).

When cache definition is changed by DDL, new cache is created in local one, and 
thus next commands refer to local cache if needed. 
When it's committed, local cache is cleared and shared cache is updated. This 
update is done by adding xmax to old cache
and also make a new one with xmin. The idea behind adding a new one is that 
newly created cache (new table or altered table)
is likely to be used in next transactions. At this point maybe we can make use 
of current invalidation mechanism, 
even though invalidation message to other backends is not sent. 

[D]
As for D, I'm thinking to do benchmark with simple LRU. If the performance is 
bad, change to other algorithm like Clock.
We don't care about eviction of local cache because its lifetime is in a 
transaction, and I don't want to make it bloat.

best regards,
Takeshi Ideriha





Re: Global shared meta cache

2019-02-03 Thread Michael Paquier
On Mon, Nov 26, 2018 at 12:12:09PM +, Ideriha, Takeshi wrote:
> On this allocation stuffs I'm trying to handle it in another thread
>  [1] in a broader way.

Based on the latets updates of this thread, this is waiting for
review, so moved to next CF.
--
Michael


signature.asc
Description: PGP signature


RE: Global shared meta cache

2018-11-26 Thread Ideriha, Takeshi
Hi,

>From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com]
>Sent: Wednesday, October 3, 2018 3:18 PM
>At this moment this patch only allocates catalog cache header and CatCache 
>data on
>the shared memory area.
On this allocation stuffs I'm trying to handle it in another thread [1] in a 
broader way.

>The followings are major items I haven't touched:
>- how to treat cache visibility and invalidation coming from transactions 
>including DDL

On this point some of you gave me comment before but at that time I had less 
knowledge 
and couldn't replay them immediately. Sorry for that.

Right now I hit upon two things.
Plan A is that all of the works is done in the shared memory and no local cache 
is used.
Plan B is that both shared cache and local cache are used.
Maybe based on the discussion several month ago in this thread, plan B would be 
better.
But there are some variations of plan B so I'd like to hear opinions. 

A. Use only shared memory
Because everything should be done inside shared memory it needs same machinery 
as current DB shared_buffers
That is, handling transaction including DDL in a proper way needs MVCC and 
cleaning up obsoleted cache needs vacuum.
Taking advantage of MVCC and vacuum would work but it seems to me pretty tough 
to implement them.
So another option is plan B, which handles version control of cache and clean 
them up in a different way.

B. Use both shared memory and local memory
Basic policy is that the shared memory keeps the latest version cache as much 
as possible and each cache has version information (xmin, xmax).
Local cache is a kind of cache of shared one and its lifetime is temporal.

[Search cache]
When a backend wants to use relation or catalog cache in a transaction, it 
tries to find them in a following order:
1. local cache
2. shared cache
3. disk

At first there is no local cache so it tries to search shared cache and if 
found loads it into the local memory.
If wanted cache is not found in shared memory, backend fetches it from disk. 

[Lifetime of local cache]
When ALTER TABLE/DROP TABLE is issued in a transaction, relevant local cache 
should be different from the original one.
On this point I'm thinking two cases.
B-1: Create a local cache at the first reference and keep it until transaction 
ends.
 The relevant local cache is updated or deleted when the DROP/ALTER is 
issued. It's freed when transaction is committed or aborted.
B-2: The lifetime of local cache is during one snapshot. If isolation-level is 
read-committed, every time the command is issued local cache is deleted.

In case of B-1 sinval messages machinery is necessary to update the local 
cache, which is same as current machinery.
On the other hand, case B-2 doesn't need sinval message because after one 
snapshot duration is expired the local cache is deleted.
From another point of view, there is trade-off relation between B-1 and B-2. 
B-1 would outweigh B-2 in terms of performance 
but B-2 would use less memory.

[Invalidation of shared cache]
I'm thinking that invalidating shared cache can be responsible for a backend 
which wants to see the latest version rather than
one has committed DROP/ALTER command. In my sketch caches has its own version 
information so transaction can compare its snapshot 
with shared cache version and if cache is not wanted one, we can obtain it from 
disk.

Do you have any thoughts?

[1] 
https://www.postgresql.org/message-id/flat/4E72940DA2BF16479384A86D54D0988A6F1EE452%40G01JPEXMBKW04
 
Regards,
Takeshi Ideriha



RE: Global shared meta cache

2018-10-03 Thread Ideriha, Takeshi
Hi, 

Thank you for the previous discussion while ago.
I’m afraid I haven't replied to all.

To move forward this development I attached a PoC patch.

I introduced a guc called shared_catacache_mem to specify 
how much memory is supposed be allocated on the shared memory area.
It defaults to zero, which indicates that no catalog cache is shared
but allocated on each backend MemoryContext (same as current Postgres).

At this moment this patch only allocates catalog cache header and CatCache data 
on the shared memory area.
It doesn't do much work, just starting and stopping postgres server with 
shared_catcache_mem non-zero.

Shared version CatCacheHdr is put on the postgres-initialized shared memory so 
that backends attach it
and build SysCache[] to store pointers of CatCache.
Each CatCache, CatCTup and CacCList is also allocated on the shared memory area,
where the limit size is the value of shared_catcache_mem and backed by DSA.
This area is first created at the postgres-initialized shared memory and 
re-initialized as DSA area
because the address of postgres-initialized shared area does not change among 
different process 
and hopefully makes it easy to handle pointers on the shared memory.
(Though I'm still struggling to grasp the idea of DSA and underlying DSM..)

The followings are major items I haven't touched:
- make hash table of each CatCache shared, which I'm going to take advantage of 
dshash
- how to evict shared cache (LRU mechanism)
- how to treat cache visibility and invalidation coming from transactions 
including DDL 
- how to alleviate the slowness compared to current PostgreSQL
- make relcache shared as well as catcache

If you have any insights/reactions/suggestions, please feel free to comment.


Takeshi Ideriha
Fujitsu Limited




0001-PoC-Allocate-catcache-on-the-shared-memory.patch
Description: 0001-PoC-Allocate-catcache-on-the-shared-memory.patch


RE: Global shared meta cache

2018-07-13 Thread Ideriha, Takeshi
Hi, Konstantin

>Hi,
>I really think that we need to move to global caches (and especially catalog 
>caches) in
>Postgres.
>Modern NUMA servers may have hundreds of cores and to be able to utilize all 
>of them,
>we may need to start large number (hundreds) of backends.
>Memory overhead of local cache multiplied by 1000 can be quite significant.

Yeah, thank you for the comment.


>I am quite skeptical concerning performance results you have provided.
>Once dataset completely fits in memory (which is true in your case), 
>select-only
>pgbench with prepared statements should be about two times faster, than without
>prepared statements. And in your case performance with prepared statements is 
>even
>worser.
>
>I wonder if you have repeated each measurement multiple time, to make sure 
>that it
>is not just a fluctuation.
>Also which postgresql configuration you have used. If it is default 
>postgresql.conf with
>128Mb shared buffers size, then you are measuring time of disk access and 
>catalog
>cache is not relevant for performance in this case.
>
>Below are result I got with pgbench scale 100 (with scale 10 results are 
>slightly better)
>at my desktop with just 16Gb of RAM and 4 ccore.:
>
>|master branch | prototype  | 
> proto/master
>(%)
>
> 
>pgbench -c10 -T60 -Msimple -S   | 187189  |182123 |97%
>pgbench -c10 -T60 -Msimple  | 15495   |15112  |97%
>pgbench -c10 -T60 -Mprepared -S | 98273   |92810  |94%
>pgbench -c10 -T60 -Mprepared| 25796   |25169  |97%
>
>As you see there are no surprises here: negative effect of shared cache is the 
>largest
>for the case of non-prepared selects (because selects themselves are much 
>faster
>than updates and during compilation we have to access relations multiple 
>times).
>

As you pointed out my shared_memory and scaling factor was too small.
I did the benchmark again with a new setting and my result seems to reproduce 
your result.

On the machine with 128GB memory and 16 cores, shared_buffer was set to 32GB and
db was initialized with -s100.

TPS result follows: (mean of 10 times measurement; round off the decimal) 
  |master branch | proto | 
proto/master (%)
   

  pgbench -c48 -T60 -j16 -Msimple -S|122140 | 114103 | 93
  pgbench -c48 -T60 -j16 -Msimple   | 7858  | 7822   | 100
  pgbench -c48 -T60 -j16 -Mprepared -S  |221740 | 210778 | 95
  pgbench -c48 -T60 -j16 -Mprepared | 9257  | 8998   | 97
  
As you mentioned, SELECT only query has more overheads.

( By the way, I think in the later email you mentioned about the result when 
the concurrent number of clients is larger.
 On this point I'll also try to check the result.)


Takeshi Ideriha
Fujitsu Limited




Re: Global shared meta cache

2018-07-05 Thread Amit Langote
On 2018/07/05 23:00, Robert Haas wrote:
> With respect to partitioning specifically, it seems like we might be
> able to come up with some way of planning that doesn't need a full
> relcache entry for every partition, particularly if there are no
> partition-local objects (indexes, triggers, etc.).
We won't know that there are no partition-local objects until we open them
though, right?  As you said, there might be a way to refactor things such
that just knowing that there are no partition-local objects becomes
cheaper than doing a full-fledged RelationBuildDesc.

Thanks,
Amit




RE: Global shared meta cache

2018-07-05 Thread Ideriha, Takeshi
>-Original Message-
>From: se...@rielau.com [mailto:se...@rielau.com]
>Sent: Wednesday, June 27, 2018 2:04 AM
>To: Ideriha, Takeshi/出利葉 健 ; pgsql-hackers
>
>Subject: RE: Global shared meta cache
>
>Takeshi-san,
>
>
>>My customer created hundreds of thousands of partition tables and tried
>>to select data from hundreds of applications, which resulted in
>>enormous consumption of memory because it consumed # of backend multiplied by
>#  of local memory (ex. 100 backends X 1GB = 100GB).
>>Relation caches are loaded on each backend local memory.
>My team and I have been working to make caches shared for the past two years, 
>but
>the system and rel caches we have chosen not to share..
>Reason being that these caches play a big role in transactional DDL processing.
>When you do DDL your backend can see all the changes since you update your own
>cache, but no anyone else's until you commit.
>You will find that dealing with that will be the true complexity.

Hi Serge,

Thank you for sharing your experience.
I didn't thought much about DDL visibility problem. 
Introducing version control like MVCC to catcache may solve the problem, but it 
seems too much to me.
It may be a good to keep local catcache for in-progress transaction rather than 
sharing everything. 
(Other hackers also pointed out it. )

>Have you tried to simply cap the size of these caches?
>That's a rather straight forward piece of work and will get you quite far.
>We run with a 20MB syscache and a 10MB relcache with 100k+ objects and hundreds
>of backends A dumb LRU is plenty good for the purpose.
>

I haven't tried yet but read some relevant discussion:
 
https://www.postgresql.org/message-id/flat/20161219.201505.11562604.horiguchi.kyot...@lab.ntt.co.jp

I think the cap solution alleviates memory bloating in some cases but there is 
a still problematic case if there are so many backends.

>That being said I would love to see these caches shared. :-)
Thank you!

Regards,
Takeshi


Re: Global shared meta cache

2018-07-05 Thread Andres Freund
Hi,

On 2018-07-05 10:00:13 -0400, Robert Haas wrote:
> I think we need to take a little bit broader view of this problem.
> For instance, maybe we could have backend-local caches that are kept
> relatively small, and then a larger shared cache that can hold more
> entries.

I think it's pretty much *required* that we have that. Not just for
speed, but for correctness. It'll otherwise be very hard to deal with
transactional DDL.  I'm pretty sure that we'll have to have everything
modified by the local transaction in that cache. There's a lot of
interesting additional problems with snapshots stil, but that seems like
a baseline requirement.


> Obviously, loading an entry into our backend-private cache must be a
> LOT slower than consulting one.

I'm not actually sure it makes *that* much of a difference, if the local
cache is small.


> I would guess that we'd want to try to use something like the
> st_changecount protocol to make reads very cheap and writes
> comparatively more expensive, since by and large cache invalidations
> aren't that frequent.

That strikes me as hard to get right and fatal to get wrong. I suspect
normal RW locking ought to do fine.


> One experiment I think would be interesting is to study how much
> catcache traffic we're actually generating and see if we can find any
> way to reduce it.  For instance, somebody could write code to record
> the file and line number for every catcache lookup and then run a test
> workload (or several test workloads).  That might give us some ideas
> about stuff we could just decide not to cache, especially if we also
> knew how large each cache ended up being.

We definitely do a lot of redundant lookups for the same entries.

Greetings,

Andres Freund



Re: Global shared meta cache

2018-07-05 Thread Konstantin Knizhnik




On 05.07.2018 17:00, Robert Haas wrote:

On Mon, Jul 2, 2018 at 5:59 AM, Konstantin Knizhnik
 wrote:

But I am not sure that just using RW lock will be enough replace local cache
with global.

I'm pretty sure it won't.  In fact, no matter what kind of locking you
use, it's bound to cost something.  There is no such thing as a free
lunch.  It does not seem realistic to me to suppose that we're going
to just get rid of all of our backend-private caches and replace them
with a shared cache and somehow there will be no performance
regression.  Maybe someone will come up with something that is
surprisingly awesome, but I doubt it.

I think we need to take a little bit broader view of this problem.
For instance, maybe we could have backend-local caches that are kept
relatively small, and then a larger shared cache that can hold more
entries.  There are code comments someplace that say that
CLOBBER_CACHE_ALWAYS is about 100x slower than a regular build, and
CLOBBER_CACHE_ALWAYS is about 1x slower.  Obviously, loading an
entry into our backend-private cache must be a LOT slower than
consulting one.  If the shared cache is only, say, 3x slower than the
backend-private cache, then we might be able to get away with having
only the really-frequently-accessed stuff in the actual private cache
and the somewhat-frequently-accessed stuff in the shared cache.  Now
that's probably still going to cost something, but maybe we can make
that "something" very small in the cases people are actually likely to
hit.  I would guess that we'd want to try to use something like the
st_changecount protocol to make reads very cheap and writes
comparatively more expensive, since by and large cache invalidations
aren't that frequent.

Another approach would be to consider whether we're caching too much
useless junk in the first place.  For instance, maybe there's some
stuff in the relcache that takes up a lot of memory compared to how
much of a performance boost it produces, or maybe there's some stuff
that could be represented more compactly.

One experiment I think would be interesting is to study how much
catcache traffic we're actually generating and see if we can find any
way to reduce it.  For instance, somebody could write code to record
the file and line number for every catcache lookup and then run a test
workload (or several test workloads).  That might give us some ideas
about stuff we could just decide not to cache, especially if we also
knew how large each cache ended up being.

With respect to partitioning specifically, it seems like we might be
able to come up with some way of planning that doesn't need a full
relcache entry for every partition, particularly if there are no
partition-local objects (indexes, triggers, etc.).  But that seems
like a hard refactoring, and even if we did it, what about execution
time?  So much code expects to be handed a Relation.  Still, I have a
suspicion that there might be some way to do better here with enough
work.


94% slowdown at my desktop seems to be not so significant degradation.
But I tried this patch at more powerful server with 24 physical cores 
and here the negative effect of global cache synchronization was much 
more dramatic:
196k TPS  vs. 395k TPS for select-only pgbench with -S -c 100 -j 10. 
Almost two times!


So we really need more sophisticated and smart solution for the problem 
of global caches.
Looks like combination of small local and big global caches is the best 
alternative.





--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Global shared meta cache

2018-07-05 Thread Robert Haas
On Mon, Jul 2, 2018 at 5:59 AM, Konstantin Knizhnik
 wrote:
> But I am not sure that just using RW lock will be enough replace local cache
> with global.

I'm pretty sure it won't.  In fact, no matter what kind of locking you
use, it's bound to cost something.  There is no such thing as a free
lunch.  It does not seem realistic to me to suppose that we're going
to just get rid of all of our backend-private caches and replace them
with a shared cache and somehow there will be no performance
regression.  Maybe someone will come up with something that is
surprisingly awesome, but I doubt it.

I think we need to take a little bit broader view of this problem.
For instance, maybe we could have backend-local caches that are kept
relatively small, and then a larger shared cache that can hold more
entries.  There are code comments someplace that say that
CLOBBER_CACHE_ALWAYS is about 100x slower than a regular build, and
CLOBBER_CACHE_ALWAYS is about 1x slower.  Obviously, loading an
entry into our backend-private cache must be a LOT slower than
consulting one.  If the shared cache is only, say, 3x slower than the
backend-private cache, then we might be able to get away with having
only the really-frequently-accessed stuff in the actual private cache
and the somewhat-frequently-accessed stuff in the shared cache.  Now
that's probably still going to cost something, but maybe we can make
that "something" very small in the cases people are actually likely to
hit.  I would guess that we'd want to try to use something like the
st_changecount protocol to make reads very cheap and writes
comparatively more expensive, since by and large cache invalidations
aren't that frequent.

Another approach would be to consider whether we're caching too much
useless junk in the first place.  For instance, maybe there's some
stuff in the relcache that takes up a lot of memory compared to how
much of a performance boost it produces, or maybe there's some stuff
that could be represented more compactly.

One experiment I think would be interesting is to study how much
catcache traffic we're actually generating and see if we can find any
way to reduce it.  For instance, somebody could write code to record
the file and line number for every catcache lookup and then run a test
workload (or several test workloads).  That might give us some ideas
about stuff we could just decide not to cache, especially if we also
knew how large each cache ended up being.

With respect to partitioning specifically, it seems like we might be
able to come up with some way of planning that doesn't need a full
relcache entry for every partition, particularly if there are no
partition-local objects (indexes, triggers, etc.).  But that seems
like a hard refactoring, and even if we did it, what about execution
time?  So much code expects to be handed a Relation.  Still, I have a
suspicion that there might be some way to do better here with enough
work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



RE: Global shared meta cache

2018-07-04 Thread Ideriha, Takeshi
>-Original Message-
>From: AJG [mailto:ay...@gera.co.nz]
>Sent: Wednesday, June 27, 2018 3:21 AM
>To: pgsql-hack...@postgresql.org
>Subject: Re: Global shared meta cache
>
>Ideriha, Takeshi wrote
>> 2) benchmarked 3 times for each conditions and got the average result
>> of TPS.
>>  |master branch | prototype  |
>> proto/master (%)
>>
>> 
>>pgbench -c48 -T60 -Msimple -S   | 131297 |130541 |101%
>>pgbench -c48 -T60 -Msimple  | 4956   |4965   |95%
>>pgbench -c48 -T60 -Mprepared -S |129688  |132538 |97%
>>pgbench -c48 -T60 -Mprepared|5113|4615   |84%
>>
>>
>> 001_global_meta_cache.patch (6K)
>> http://www.postgresql-archive.org/attachment/6026686/0/001_global_
>> meta_cache.patch
>
>
>Hello,
>Apologies for question. I thought I would just double check percentages that 
>have
>been presented.
>Is the percentage calculation correct?
>as #1 and #3 look inverted to me (say lower when should be higher and vice 
>versa),
>and
>#2 and #4 look incorrect generally (percentages look much larger than they 
>should be
>based on numbers.
>
>I.e. Msimple -S the protype had slightly worse tps performance (130541) versus
>Master (131297). I would expect the percentage to be e.g. 99% not 101%
>
>But I may be misunderstanding something :)
>
>Also, Msimple is 4956 master versus 4965 prototype. Just 9 tps change. A very 
>slight
>improvement in tps. but the percentage provided is 95%. I would expect it to 
>be just
>over 100%?
>Again, maybe im not understanding, and hoping it is just my error :)
>
>
>
>--
>Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
>
Hi, 
Thank you for comments and sorry for late replay.
Thanks to you, I noticed I made a mistake.
As you pointed out, I think my calculation is wrong.

I also need to change some settings of postgresql.conf and pgbench. 
So I'm going to measure performance again and submit the result.

Regards,
Takeshi Ideriha




Re: Global shared meta cache

2018-07-02 Thread Konstantin Knizhnik




On 26.06.2018 09:48, Ideriha, Takeshi wrote:

Hi, hackers!

My customer created hundreds of thousands of partition tables and tried to 
select data from hundreds of applications,
which resulted in enormous consumption of memory because it consumed # of 
backend multiplied by # of local memory (ex. 100 backends X 1GB = 100GB).
Relation caches are loaded on each backend local memory.

To address this issue I'm trying to move meta caches like catcache or relcache 
into shared memory.

This topic seems to have been discussed several times.
For instance this thread:
https://www.postgresql.org/message-id/CA%2BTgmobjDw_SWsxyJwT9z-YOwWv0ietuQx5fb%3DWEYdDfvCbzGQ%40mail.gmail.com

In my understanding, it discussed moving catcache and relcache to shared memory 
rather than current local backend memory,
and is most concerned with performance overhead.

Robert Haas wrote:

I think it would be interested for somebody to build a prototype here
that ignores all the problems but the first and uses some
straightforward, relatively unoptimized locking strategy for the first
problem. Then benchmark it. If the results show that the idea has
legs, then we can try to figure out what a real implementation would
look like.
(One possible approach: use Thomas Munro's DHT stuff to build the shared cache.)

I'm inspired by this comment and now developing a prototype (please see 
attached),
but I haven't yet put cache structure on shared memory.
Instead, I put dummy data on shared memory which is initialized at startup,
and then acquire/release lock just before/after searching/creating catcache 
entry.

I haven't considered relcache and catcachelist either.
It is difficult for me to do everything at one time with right direction.
So I'm trying to make small prototype and see what I'm walking on the proper 
way.

I tested pgbench to compare master branch with my patch.

0) Environment
- RHEL 7.4
- 16 cores
- 128 GB memory

1) Initialized with pgbench -i -s10

2) benchmarked 3 times for each conditions and got the average result of TPS.
  |master branch | prototype  | 
proto/master (%)


pgbench -c48 -T60 -Msimple -S   | 131297   |130541 |101%
pgbench -c48 -T60 -Msimple  | 4956 |4965   |95%
pgbench -c48 -T60 -Mprepared -S |129688|132538 |97%
pgbench -c48 -T60 -Mprepared|5113  |4615   |84%

   This result seems to show except for prepared protocol with "not only 
SELECT" it didn't make much difference.



What do you think about it?
Before I dig deeper, I want to hear your thoughts.

Best regards,
Takeshi Ideriha



Hi,
I really think that we need to move to global caches (and especially  
catalog caches) in Postgres.
Modern NUMA servers may have hundreds of cores and to be able to utilize  
all of them, we may need to start large number (hundreds) of backends.

Memory overhead of local cache multiplied by 1000 can be quite significant.

But I am not sure that just using RW lock will be enough replace local  
cache with global.

I am quite skeptical concerning performance results you have provided.
Once dataset completely fits in memory (which is true in your case),  
select-only pgbench with prepared statements should be about two times  
faster,
than without prepared statements. And in your case performance with  
prepared statements is even worser.


I wonder if you have repeated each measurement multiple time, to make  
sure that it is not just a fluctuation.
Also which postgresql configuration you have used. If it is default  
postgresql.conf with 128Mb shared buffers size,
then you are measuring time of disk access and catalog cache is not  
relevant for performance in this case.


Below are result I got with pgbench scale 100 (with scale 10 results are  
slightly better) at my desktop with just 16Gb of RAM and 4 ccore.:


   |master branch | prototype  | 
proto/master (%)
   

   pgbench -c10 -T60 -Msimple -S   | 187189|182123 |97%
   pgbench -c10 -T60 -Msimple  | 15495 |15112  |97%
   pgbench -c10 -T60 -Mprepared -S | 98273 |92810  |94%
   pgbench -c10 -T60 -Mprepared| 25796 |25169  |97%

As you see there are no surprises here: negative effect of shared cache  
is the largest for the case of non-prepared selects
(because selects themselves are much faster than updates and during  
compilation we have to access relations multiple times).











--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




RE: Global shared meta cache

2018-06-27 Thread Tsunakawa, Takayuki
From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com]
> 1) Initialized with pgbench -i -s10
...
>pgbench -c48 -T60 -Msimple  | 4956|4965
> |95%

The scaling factor should be much greater than the number of clients.  
Otherwise, multiple clients would conflict on the same row of branches table, 
which might hide the additional overhead of the global metacache.

And I think -j should be 12 or so on your 16-core server, so that the pgbench 
client can maximize its concurrency.


Regards
Takayuki Tsunakawa







Re: Global shared meta cache

2018-06-27 Thread AJG
Ideriha, Takeshi wrote
> 2) benchmarked 3 times for each conditions and got the average result of
> TPS.
>  |master branch | prototype  |
> proto/master (%)
>   
> 
>pgbench -c48 -T60 -Msimple -S   | 131297  |130541 |101%
>pgbench -c48 -T60 -Msimple  | 4956|4965   |95%
>pgbench -c48 -T60 -Mprepared -S |129688   |132538 |97%
>pgbench -c48 -T60 -Mprepared|5113 |4615   |84%
> 
> 
> 001_global_meta_cache.patch (6K)
> http://www.postgresql-archive.org/attachment/6026686/0/001_global_meta_cache.patch;


Hello,
Apologies for question. I thought I would just double check percentages that
have been presented.
Is the percentage calculation correct?
as #1 and #3 look inverted to me (say lower when should be higher and vice
versa), and 
#2 and #4 look incorrect generally (percentages look much larger than they
should be based on numbers.

I.e. Msimple -S the protype had slightly worse tps performance (130541)
versus Master (131297). I would expect the percentage to be e.g. 99% not
101%

But I may be misunderstanding something :)

Also, Msimple is 4956 master versus 4965 prototype. Just 9 tps change. A
very slight improvement in tps. but the percentage provided is 95%. I would
expect it to be just over 100%?
Again, maybe im not understanding, and hoping it is just my error :)



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: Global shared meta cache

2018-06-26 Thread Andres Freund
Hi,

On 2018-06-26 06:48:28 +, Ideriha, Takeshi wrote:
> > I think it would be interested for somebody to build a prototype here
> > that ignores all the problems but the first and uses some
> > straightforward, relatively unoptimized locking strategy for the first
> > problem. Then benchmark it. If the results show that the idea has
> > legs, then we can try to figure out what a real implementation would
> > look like.
> > (One possible approach: use Thomas Munro's DHT stuff to build the shared 
> > cache.)
> 
> I'm inspired by this comment and now developing a prototype (please see 
> attached),
> but I haven't yet put cache structure on shared memory.

> Instead, I put dummy data on shared memory which is initialized at startup, 
> and then acquire/release lock just before/after searching/creating catcache 
> entry.

> I haven't considered relcache and catcachelist either.
> It is difficult for me to do everything at one time with right direction. 
> So I'm trying to make small prototype and see what I'm walking on the proper 
> way.
> 
> I tested pgbench to compare master branch with my patch. 
> 
> 0) Environment
>- RHEL 7.4
>- 16 cores
>- 128 GB memory
> 
> 1) Initialized with pgbench -i -s10
> 
> 2) benchmarked 3 times for each conditions and got the average result of TPS.
>  |master branch | prototype  | 
> proto/master (%)
>
> 
>pgbench -c48 -T60 -Msimple -S   | 131297  |130541 |101%
>pgbench -c48 -T60 -Msimple  | 4956|4965   |95%
>pgbench -c48 -T60 -Mprepared -S |129688   |132538 |97%
>pgbench -c48 -T60 -Mprepared|5113 |4615   |84%
> 
>   This result seems to show except for prepared protocol with "not only 
> SELECT" it didn't make much difference.

This seems like an pretty large regression to me. And that's an
extremely simplistic case, with tiny caches, and barely any changes to
the cache contents.

Greetings,

Andres Freund



RE: Global shared meta cache

2018-06-26 Thread serge
Takeshi-san,


>My customer created hundreds of thousands of partition tables and tried to 
>select data from hundreds of applications,
>which resulted in enormous consumption of memory because it consumed # of 
>backend multiplied by #
> of local memory (ex. 100 backends X 1GB = 100GB).
>Relation caches are loaded on each backend local memory. 
My team and I have been working to make caches shared for the past two years, 
but the system and rel caches we have chosen not to share..
Reason being that these caches play a big role in transactional DDL processing.
When you do DDL your backend can see all the changes since you update your own 
cache, but no anyone else's until you commit.
You will find that dealing with that will be the true complexity.
 
 
Have you tried to simply cap the size of these caches?
That's a rather straight forward piece of work and will get you quite far. 
We run with a 20MB syscache and a 10MB relcache with 100k+ objects and hundreds 
of backends
A dumb LRU is plenty good for the purpose. 
 
That being said I would love to see these caches shared. :-)
 
Cheers
Serge
Salesforce