On 10/31/2016 02:51 PM, Amit Kapila wrote:
On Mon, Oct 31, 2016 at 12:02 AM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
Hi,
On 10/27/2016 01:44 PM, Amit Kapila wrote:
I've read that analysis, but I'm not sure I see how it explains the "zig
zag" behavior. I do understand that shifting the contention to some other
(already busy) lock may negatively impact throughput, or that the
group_update may result in updating multiple clog pages, but I don't
understand two things:
(1) Why this should result in the fluctuations we observe in some of the
cases. For example, why should we see 150k tps on, 72 clients, then drop to
92k with 108 clients, then back to 130k on 144 clients, then 84k on 180
clients etc. That seems fairly strange.
I don't think hitting multiple clog pages has much to do with
client-count. However, we can wait to see your further detailed test
report.
(2) Why this should affect all three patches, when only group_update has to
modify multiple clog pages.
No, all three patches can be affected due to multiple clog pages.
Read second paragraph ("I think one of the probable reasons that could
happen for both the approaches") in same e-mail [1]. It is basically
due to frequent release-and-reacquire of locks.
On logged tables it usually looks like this (i.e. modest increase for
high
client counts at the expense of significantly higher variability):
http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-skip-64
What variability are you referring to in those results?
Good question. What I mean by "variability" is how stable the tps is during
the benchmark (when measured on per-second granularity). For example, let's
run a 10-second benchmark, measuring number of transactions committed each
second.
Then all those runs do 1000 tps on average:
run 1: 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000
run 2: 500, 1500, 500, 1500, 500, 1500, 500, 1500, 500, 1500
run 3: 0, 2000, 0, 2000, 0, 2000, 0, 2000, 0, 2000
Generally, such behaviours are seen due to writes. Are WAL and DATA
on same disk in your tests?
Yes, there's one RAID device on 10 SSDs, with 4GB of the controller.
I've done some tests and it easily handles > 1.5GB/s in sequential
writes, and >500MB/s in sustained random writes.
Also, let me point out that most of the tests were done so that the
whole data set fits into shared_buffers, and with no checkpoints during
the runs (so no writes to data files should really happen).
For example these tests were done on scale 3000 (45GB data set) with
64GB shared buffers:
[a]
http://tvondra.bitbucket.org/index2.html#pgbench-3000-unlogged-sync-noskip-64
[b]
http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-async-noskip-64
and I could show similar cases with scale 300 on 16GB shared buffers.
In those cases, there's very little contention between WAL and the rest
of the data base (in terms of I/O).
And moreover, this setup (single device for the whole cluster) is very
common, we can't just neglect it.
But my main point here really is that the trade-off in those cases may
not be really all that great, because you get the best performance at
36/72 clients, and then the tps drops and variability increases. At
least not right now, before tackling contention on the WAL lock (or
whatever lock becomes the bottleneck).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers