On 01/05/2017 02:23 PM, Magnus Hagander wrote:


On Thu, Jan 5, 2017 at 12:44 AM, Tomas Vondra
<tomas.von...@2ndquadrant.com <mailto:tomas.von...@2ndquadrant.com>> wrote:

    On 01/03/2017 11:56 PM, Tomas Vondra wrote:

        Hi,

    ...

        I'll push results for larger ones once those tests complete
        (possibly
        tomorrow).


    I just pushed additional results (from the additional scales) to the
    git repositories. On the larger (16/32-cores) machine with 2x
    e5-2620, the results look like this

       scale         minimal       replica         logical
      -----------------------------------------------------
       100             23968         24393           24393
       1000            23412         23656           23794
       10000            5283          5320            5197

    and on the smaller one (i5-2500k with 4 cores) I got this:

       scale         minimal       replica         logical
      -----------------------------------------------------
       50               5884          5896            5873
       400              5324          5342            5478
       1000             5341          5439            5425

    The scales were chosen so that the smallest one fits into shared
    buffers, the medium exceeds shared buffers but still fits into RAM,
    and the largest scale exceeds RAM.

    The results seem to confirm that for this workload (regular
    pgbench), there's very little difference between the different WAL
    levels. Actually, the 'replica' seems a tad faster than 'minimal',
    but the difference may be easily due to noise.

    I've also looked at the amount of WAL actually produced, by doing
    pgbench runs throttled to the same throughput, and counting the
    number of archived WAL segments & running pg_xlogdump. Interestingly
    enough, those two metrics differ quite a bit - for example for scale
    1000 (on the 32-core machine), the 2h runs produced these number of
    WAL segments:

       minimal: 5515 (88.2GB)
       replica: 5587 (89.4GB)
       logical: 6058 (96.9GB)

    so 'replica' adds ~1.3% and 'logical' ~9.8%. But per pg_xlogdump,
    the WAL amounts are only 73.3GB, 73.9GB and 74.4GB - a difference of
    only ~1.5% between minimal and logical. The values are also much
    lower than raw WAL size, so I assume it's because pg_xlogdump
    ignores some extra overhead, present in the segments. Moreover, the
    sequential nature of WAL writes means even the +10% is not a big
    deal (unless it results in saturating the bandwidth, but running on
    >90% is a bad idea anyway).


If you are using log archiving, it also means your log archive grows by
10% (well, 8% assuming it was 9.8% on top of 0, not on top of replica).


... and that the standby has to chew through the additional 10% of WAL. We already have standbys that occasionally struggle to keep up with the master, and adding more load won't make them happy (even if just 10%).



    My conclusion from these results is that using 'wal_level=replica'
    by default seems fine. Perhaps even wal_level=logical would be OK,
    but that's probably a too big step for 10.0.



I think it sounds like 'replica' is the safe default.

If we can make it possible to go replica<->logical without a restart,
that makes it easy enough to increase it if necessary, and the default
still applies to most people (most people take backups, most people
probably don't do logical replication).


My thoughts, exactly.



    Any ideas how to construct a plausible workload where the
    differences are significantly larger? Running the tests on non-SSD
    storage might also be useful.


It's easy enough to construct a benchmark specifically to show the
difference, but of any actual "normal workload" for it. Typically the
optimization applies to things like bulk loading, which typically never
done alone and does not lend itself to that type of benchmarking very
easily.


Not sure if I understand correctly what you're saying. You're saying that although it'd be easy to construct a benchmark showing significant performance impact, it won't represent a common workload. Correct?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to