Hi all please ignore this mail, this email is incomplete I have to add more information and Sorry accidentally I pressed the send button while replying.
On Tue, Aug 29, 2017 at 11:27 AM, Mithun Cy <mithun...@enterprisedb.com> wrote: > Cache the SnapshotData for reuse: > =========================== > In one of our perf analysis using perf tool it showed GetSnapshotData > takes a very high percentage of CPU cycles on readonly workload when > there is very high number of concurrent connections >= 64. > > Machine : cthulhu 8 node machine. > ---------------------------------------------- > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 128 > On-line CPU(s) list: 0-127 > Thread(s) per core: 2 > Core(s) per socket: 8 > Socket(s): 8 > NUMA node(s): 8 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 47 > Model name: Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz > Stepping: 2 > CPU MHz: 2128.000 > BogoMIPS: 4266.63 > Virtualization: VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 24576K > NUMA node0 CPU(s): 0,65-71,96-103 > NUMA node1 CPU(s): 72-79,104-111 > NUMA node2 CPU(s): 80-87,112-119 > NUMA node3 CPU(s): 88-95,120-127 > NUMA node4 CPU(s): 1-8,33-40 > NUMA node5 CPU(s): 9-16,41-48 > NUMA node6 CPU(s): 17-24,49-56 > NUMA node7 CPU(s): 25-32,57-64 > > Perf CPU cycle 128 clients. > > On further analysis, it appears this mainly accounts to cache-misses > happening while iterating through large proc array to compute the > SnapshotData. Also, it seems mainly on read-only (read heavy) workload > SnapshotData mostly remains same as no new(rarely a) transaction > commits or rollbacks. Taking advantage of this we can save the > previously computed SanspshotData in shared memory and in > GetSnapshotData we can use the saved SnapshotData instead of computing > same when it is still valid. A saved SnapsotData is valid until no > transaction end. > > [Perf analysis Base] > > Samples: 421K of event 'cache-misses', Event count (approx.): 160069274 > Overhead Command Shared Object Symbol > 18.63% postgres postgres [.] GetSnapshotData > 11.98% postgres postgres [.] _bt_compare > 10.20% postgres postgres [.] PinBuffer > 8.63% postgres postgres [.] LWLockAttemptLock > 6.50% postgres postgres [.] LWLockRelease > > > [Perf analysis with Patch] > > > Server configuration: > ./postgres -c shared_buffers=8GB -N 300 -c min_wal_size=15GB -c > max_wal_size=20GB -c checkpoint_timeout=900 -c > maintenance_work_mem=1GB -c checkpoint_completion_target=0.9 -c > wal_buffers=256MB & > > pgbench configuration: > scale_factor = 300 > ./pgbench -c $threads -j $threads -T $time_for_reading -M prepared -S > postgres > > The machine has 64 cores with this patch I can see server starts > improvement after 64 clients. I have tested up to 256 clients. Which > shows performance improvement nearly max 39% [1]. This is the best > case for the patch where once computed snapshotData is reused further. > > The worst case seems to be small, quick write transactions with which > frequently invalidate the cached SnapshotData before it is reused by > any next GetSnapshotData call. As of now, I tested simple update > tests: pgbench -M Prepared -N on the same machine with the above > server configuration. I do not see much change in TPS numbers. > > All TPS are median of 3 runs > Clients TPS-With Patch 05 TPS-Base %Diff > 1 752.461117 755.186777 -0.3% > 64 32171.296537 31202.153576 +3.1% > 128 41059.660769 40061.929658 +2.49% > > I will do some profiling and find out why this case is not costing us > some performance due to caching overhead. > > > [] > > On Thu, Aug 3, 2017 at 2:16 AM, Andres Freund <and...@anarazel.de> wrote: >> Hi, >> >> I think this patch should have a "cover letter" explaining what it's >> trying to achieve, how it does so and why it's safe/correct. I think >> it'd also be good to try to show some of the worst cases of this patch >> (i.e. where the caching only adds overhead, but never gets used), and >> some of the best cases (where the cache gets used quite often, and >> reduces contention significantly). >> >> I wonder if there's a way to avoid copying the snapshot into the cache >> in situations where we're barely ever going to need it. But right now >> the only way I can think of is to look at the length of the >> ProcArrayLock wait queue and count the exclusive locks therein - which'd >> be expensive and intrusive... >> >> >> On 2017-08-02 15:56:21 +0530, Mithun Cy wrote: >>> diff --git a/src/backend/storage/ipc/procarray.c >>> b/src/backend/storage/ipc/procarray.c >>> index a7e8cf2..4d7debe 100644 >>> --- a/src/backend/storage/ipc/procarray.c >>> +++ b/src/backend/storage/ipc/procarray.c >>> @@ -366,11 +366,13 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid) >>> (arrayP->numProcs - index - 1) * >>> sizeof(int)); >>> arrayP->pgprocnos[arrayP->numProcs - 1] = -1; /* >>> for debugging */ >>> arrayP->numProcs--; >>> + ProcGlobal->is_snapshot_valid = false; >>> LWLockRelease(ProcArrayLock); >>> return; >>> } >>> } >>> >>> + ProcGlobal->is_snapshot_valid = false; >>> /* Oops */ >>> LWLockRelease(ProcArrayLock); >> >> Why do we need to do anything here? And if we need to, why not in > ;> ProcArrayAdd? > > In ProcArrayRemove I can see ShmemVariableCache->latestCompletedXid is > set to latestXid which is going to change xmax in GetSnapshotData, > hence invalidated the snapshot there. ProcArrayAdd do not seem to be > affecting snapshotData. I have modified now to set > cached_snapshot_valid to false just below the statement > ShmemVariableCache->latestCompletedXid = latestXid only. > +if (TransactionIdIsValid(latestXid)) > +{ > + Assert(TransactionIdIsValid(allPgXact[proc->pgprocno].xid)); > + /* Advance global latestCompletedXid while holding the lock */ > + if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid, > + latestXid)) > + ShmemVariableCache->latestCompletedXid = latestXid; > > > >>> @@ -1552,6 +1557,8 @@ GetSnapshotData(Snapshot snapshot) >>> errmsg("out of memory"))); >>> } >>> >>> + snapshot->takenDuringRecovery = RecoveryInProgress(); >>> + >>> /* >>> * It is sufficient to get shared lock on ProcArrayLock, even if we >>> are >>> * going to set MyPgXact->xmin. >>> @@ -1566,100 +1573,200 @@ GetSnapshotData(Snapshot snapshot) >>> /* initialize xmin calculation with xmax */ >>> globalxmin = xmin = xmax; >>> >>> - snapshot->takenDuringRecovery = RecoveryInProgress(); >>> - >> >> This movement of code seems fairly unrelated? > > -- Yes not related, It was part of early POC patch so retained it as > it is. Now reverted those changes moved them back under the > ProcArrayLock > >>> if (!snapshot->takenDuringRecovery) >>> { >> >> Do we really need to restrict this to !recovery snapshots? It'd be nicer >> if we could generalize it - I don't immediately see anything !recovery >> specific in the logic here. > > Agree I will add one more patch on this to include standby(recovery > state) case also to unify all the cases where we can cache the > snapshot. Before let me see > > >>> - int *pgprocnos = arrayP->pgprocnos; >>> - int numProcs; >>> + if (ProcGlobal->is_snapshot_valid) >>> + { >>> + Snapshot csnap = &ProcGlobal->cached_snapshot; >>> + TransactionId *saved_xip; >>> + TransactionId *saved_subxip; >>> >>> - /* >>> - * Spin over procArray checking xid, xmin, and subxids. The >>> goal is >>> - * to gather all active xids, find the lowest xmin, and try >>> to record >>> - * subxids. >>> - */ >>> - numProcs = arrayP->numProcs; >>> - for (index = 0; index < numProcs; index++) >>> + saved_xip = snapshot->xip; >>> + saved_subxip = snapshot->subxip; >>> + >>> + memcpy(snapshot, csnap, sizeof(SnapshotData)); >>> + >>> + snapshot->xip = saved_xip; >>> + snapshot->subxip = saved_subxip; >>> + >>> + memcpy(snapshot->xip, csnap->xip, >>> + sizeof(TransactionId) * csnap->xcnt); >>> + memcpy(snapshot->subxip, csnap->subxip, >>> + sizeof(TransactionId) * csnap->subxcnt); >>> + globalxmin = ProcGlobal->cached_snapshot_globalxmin; >>> + xmin = csnap->xmin; >>> + >>> + count = csnap->xcnt; >>> + subcount = csnap->subxcnt; >>> + suboverflowed = csnap->suboverflowed; >>> + >>> + Assert(TransactionIdIsValid(globalxmin)); >>> + Assert(TransactionIdIsValid(xmin)); >>> + } >> >> Can we move this to a separate function? In fact, could you check how >> much effort it'd be to split cached, !recovery, recovery cases into >> three static inline helper functions? This is starting to be hard to >> read, the indentation added in this patch doesn't help... Consider doing >> recovery, !recovery cases in a preliminary separate patch. > > In this patch, I have moved the code 2 functions one to cache the > snapshot data another to get the snapshot data from the cache. In the > next patch, I will try to unify these things with recovery (standby) > case. For recovery case, we are copying the xids from a simple xid > array KnownAssignedXids[], I am not completely sure whether caching > them bring similar performance benefits as above so I will also try to > get a perf report for same. > >> >>> + * Let only one of the caller cache the computed >>> snapshot, and >>> + * others can continue as before. >>> */ >>> - if (!suboverflowed) >>> + if (!ProcGlobal->is_snapshot_valid && >>> + >>> LWLockConditionalAcquire(&ProcGlobal->CachedSnapshotLock, >>> + >>> LW_EXCLUSIVE)) >>> { >> >> I'd also move this to a helper function (the bit inside the lock). > > Fixed as suggested. > >>> + >>> + /* >>> + * The memory barrier has be to be placed >>> here to ensure that >>> + * is_snapshot_valid is set only after >>> snapshot is cached. >>> + */ >>> + pg_write_barrier(); >>> + ProcGlobal->is_snapshot_valid = true; >>> + >>> LWLockRelease(&ProcGlobal->CachedSnapshotLock); >> >> LWLockRelease() is a full memory barrier, so this shouldn't be required. > > Okay, Sorry for my understanding, Do you want me to set > ProcGlobal->is_snapshot_valid after > LWLockRelease(&ProcGlobal->CachedSnapshotLock)? > >> >>> /* >>> diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h >>> index 7dbaa81..7d06904 100644 >>> --- a/src/include/storage/proc.h >>> +++ b/src/include/storage/proc.h >>> @@ -16,6 +16,7 @@ >>> >>> #include "access/xlogdefs.h" >>> #include "lib/ilist.h" >>> +#include "utils/snapshot.h" >>> #include "storage/latch.h" >>> #include "storage/lock.h" >>> #include "storage/pg_sema.h" >>> @@ -253,6 +254,22 @@ typedef struct PROC_HDR >>> int startupProcPid; >>> /* Buffer id of the buffer that Startup process waits for pin on, or >>> -1 */ >>> int startupBufferPinWaitBufId; >>> + >>> + /* >>> + * In GetSnapshotData we can reuse the previously snapshot computed >>> if no >>> + * new transaction has committed or rolledback. Thus saving good >>> amount of >>> + * computation cycle under GetSnapshotData where we need to iterate >>> + * through procArray every time we try to get the current snapshot. >>> Below >>> + * members help us to save the previously computed snapshot in global >>> + * shared memory and any process which want to get current snapshot >>> can >>> + * directly copy from them if it is still valid. >>> + */ >>> + SnapshotData cached_snapshot; /* Previously saved snapshot */ >>> + volatile bool is_snapshot_valid; /* is above snapshot valid */ >>> + TransactionId cached_snapshot_globalxmin; /* globalxmin when >>> above >> >> >> I'd rename is_snapshot_valid to 'cached_snapshot_valid' or such, it's >> not clear from the name what it refers to. > > -- Renamed to cached_snapshot_valid. > >> >>> + LWLock CachedSnapshotLock; /* A try lock to make sure only >>> one >>> + * >>> process write to cached_snapshot */ >>> } PROC_HDR; >>> >> >> Hm, I'd not add the lock to the struct, we normally don't do that for >> the "in core" lwlocks that don't exist in a "configurable" number like >> the buffer locks and such. >> > > -- I am not really sure about this. Can you please help what exactly I > should be doing here to get this corrected? Is that I have to add it > to lwlocknames.txt as a new LWLock? > > -- > Thanks and Regards > Mithun C Y > EnterpriseDB: http://www.enterprisedb.com -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers