On 01/03/2017 11:56 PM, Tomas Vondra wrote:
Hi,

...
I'll push results for larger ones once those tests complete (possibly
tomorrow).


I just pushed additional results (from the additional scales) to the git repositories. On the larger (16/32-cores) machine with 2x e5-2620, the results look like this

   scale         minimal       replica         logical
  -----------------------------------------------------
   100             23968         24393           24393
   1000            23412         23656           23794
   10000            5283          5320            5197

and on the smaller one (i5-2500k with 4 cores) I got this:

   scale         minimal       replica         logical
  -----------------------------------------------------
   50               5884          5896            5873
   400              5324          5342            5478
   1000             5341          5439            5425

The scales were chosen so that the smallest one fits into shared buffers, the medium exceeds shared buffers but still fits into RAM, and the largest scale exceeds RAM.

The results seem to confirm that for this workload (regular pgbench), there's very little difference between the different WAL levels. Actually, the 'replica' seems a tad faster than 'minimal', but the difference may be easily due to noise.

I've also looked at the amount of WAL actually produced, by doing pgbench runs throttled to the same throughput, and counting the number of archived WAL segments & running pg_xlogdump. Interestingly enough, those two metrics differ quite a bit - for example for scale 1000 (on the 32-core machine), the 2h runs produced these number of WAL segments:

   minimal: 5515 (88.2GB)
   replica: 5587 (89.4GB)
   logical: 6058 (96.9GB)

so 'replica' adds ~1.3% and 'logical' ~9.8%. But per pg_xlogdump, the WAL amounts are only 73.3GB, 73.9GB and 74.4GB - a difference of only ~1.5% between minimal and logical. The values are also much lower than raw WAL size, so I assume it's because pg_xlogdump ignores some extra overhead, present in the segments. Moreover, the sequential nature of WAL writes means even the +10% is not a big deal (unless it results in saturating the bandwidth, but running on >90% is a bad idea anyway).

My conclusion from these results is that using 'wal_level=replica' by default seems fine. Perhaps even wal_level=logical would be OK, but that's probably a too big step for 10.0.

Any ideas how to construct a plausible workload where the differences are significantly larger? Running the tests on non-SSD storage might also be useful.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to