On Mon, Jun 22, 2015 at 1:41 PM, Fabien COELHO <coe...@cri.ensmp.fr> wrote: > > > <sorry, resent stalled post, wrong from> > >> It'd be interesting to see numbers for tiny, without the overly small >> checkpoint timeout value. 30s is below the OS's writeback time. > > > Here are some tests with longer timeout: > > tiny2: scale=10 shared_buffers=1GB checkpoint_timeout=5min > max_wal_size=1GB warmup=600 time=4000 > > flsh | full speed tps | percent of late tx, 4 clients, for tps: > /srt | 1 client | 4 clients | 100 | 200 | 400 | 800 | 1200 | 1600 > N/N | 930 +- 124 | 2560 +- 394 | 0.70 | 1.03 | 1.27 | 1.56 | 2.02 | 2.38 > N/Y | 924 +- 122 | 2612 +- 326 | 0.63 | 0.79 | 0.94 | 1.15 | 1.45 | 1.67 > Y/N | 907 +- 112 | 2590 +- 315 | 0.58 | 0.83 | 0.68 | 0.71 | 0.81 | 1.26 > Y/Y | 915 +- 114 | 2590 +- 317 | 0.60 | 0.68 | 0.70 | 0.78 | 0.88 | 1.13 > > There seems to be a small 1-2% performance benefit with 4 clients, this is reversed for 1 client, there are significantly and consistently less late transactions when options are activated, the performance is more stable > (standard deviation reduced by 10-18%). > > The db is about 200 MB ~ 25000 pages, at 2500+ tps it is written 40 times over in 5 minutes, so the checkpoint basically writes everything in 220 seconds, 0.9 MB/s. Given the preload phase the buffers may be more or less in order in memory, so may be written out in order anyway. > > > medium2: scale=300 shared_buffers=5GB checkpoint_timeout=30min > max_wal_size=4GB warmup=1200 time=7500 > > flsh | full speed tps | percent of late tx, 4 clients > /srt | 1 client | 4 clients | 100 | 200 | 400 | > N/N | 173 +- 289* | 198 +- 531* | 27.61 | 43.92 | 61.16 | > N/Y | 458 +- 327* | 743 +- 920* | 7.05 | 14.24 | 24.07 | > Y/N | 169 +- 166* | 187 +- 302* | 4.01 | 39.84 | 65.70 | > Y/Y | 546 +- 143 | 681 +- 459 | 1.55 | 3.51 | 2.84 | > > The effect of sorting is very positive (+150% to 270% tps). On this run, flushing has a positive (+20% with 1 client) or negative (-8 % with 4 clients) on throughput, and late transactions are reduced by 92-95% when both options are activated. >
Why there is dip in performance with multiple clients, can it be due to reason that we started doing more stuff after holding bufhdr lock in below code? BufferSync() { .. for (buf_id = 0; buf_id < NBuffers; buf_id++) { volatile BufferDesc *bufHdr = GetBufferDescriptor(buf_id); @@ -1621,32 +1719,185 @@ BufferSync(int flags) if ((bufHdr->flags & mask) == mask) { + Oid spc; + TableSpaceCountEntry * entry; + bool found; + bufHdr->flags |= BM_CHECKPOINT_NEEDED; + CheckpointBufferIds[num_to_write] = buf_id; num_to_write++; + + /* keep track of per tablespace buffers */ + spc = bufHdr->tag.rnode.spcNode; + entry = (TableSpaceCountEntry *) + hash_search(spcBuffers, (void *) &spc, HASH_ENTER, &found); + + if (found) entry->count++; + else entry->count = 1; } .. } - BufferSync() { .. - buf_id = StrategySyncStart(NULL, NULL); - num_to_scan = NBuffers; + active_spaces = nb_spaces; + space = 0; num_written = 0; - while (num_to_scan-- > 0) + + while (active_spaces != 0) .. } The changed code doesn't seems to give any consideration to clock-sweep point which might not be helpful for cases when checkpoint could have flushed soon-to-be-recycled buffers. I think flushing the sorted buffers w.r.t tablespaces is a good idea, but not giving any preference to clock-sweep point seems to me that we would loose in some cases by this new change. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com