Hi,

On 2019-08-01 08:52:52 +0200, Fabien COELHO wrote:
>   sh> time pgbench -r -T 30 -M prepared
>   ...
>   latency average = 2.425 ms
>   tps = 412.394420 (including connections establishing)
>   statement latencies in milliseconds:
>     0.001  \set aid random(1, 100000 * :scale)
>     0.000  \set bid random(1, 1 * :scale)
>     0.000  \set tid random(1, 10 * :scale)
>     0.000  \set delta random(-5000, 5000)
>     0.022  BEGIN;
>     0.061  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid 
> = :aid;
>     0.038  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
>     0.046  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid 
> = :tid;
>     0.042  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid 
> = :bid;
>     0.036  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES 
> (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
>     2.178  END;
>   real    0m30.080s, user    0m0.406s, sys     0m0.689s
>
> The cost of pgbench interpreted part (\set) is under 1/1000.

I don't put much credence in those numbers for pgbench commands - they
don't include significant parts of the overhead of command
execution. Even just the fact that you need to process more commands
through the pretty slow pgbench interpreter has significant costs.

Using pgbench -Mprepared -n -c 8 -j 8 -S pgbench_100 -T 10 -r -P1
e.g. shows pgbench to use 189% CPU in my 4/8 core/thread laptop. That's
a pretty significant share.

And before you argue that that's just about a read-only workload:
Servers with either synchronous_commit=off, or with extremely fast WAL
commit due to BBUs/NVMe, are quite common. So you can easily get into
scenarios where pgbench overhead is an issue for read/write workloads too.

With synchronous_commit=off, I e.g. see:

$ PGOPTIONS='-c synchronous_commit=off' /usr/bin/time pgbench -Mprepared  -n -c 
8 -j 8 pgbench_100 -T 10 -r
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: prepared
number of clients: 8
number of threads: 8
duration: 10 s
number of transactions actually processed: 179198
latency average = 0.447 ms
tps = 17892.824470 (including connections establishing)
tps = 17908.086839 (excluding connections establishing)
statement latencies in milliseconds:
         0.001  \set aid random(1, 100000 * :scale)
         0.000  \set bid random(1, 1 * :scale)
         0.000  \set tid random(1, 10 * :scale)
         0.000  \set delta random(-5000, 5000)
         0.042  BEGIN;
         0.086  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE 
aid = :aid;
         0.061  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
         0.070  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE 
tid = :tid;
         0.070  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE 
bid = :bid;
         0.058  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) 
VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         0.054  END;
6.65user 10.64system 0:10.02elapsed 172%CPU (0avgtext+0avgdata 4588maxresident)k
0inputs+0outputs (0major+367minor)pagefaults 0swaps

And the largest part of the overhead is in pgbench's interpreter loop:
+   12.35%  pgbench  pgbench                [.] threadRun
+    4.47%  pgbench  libpq.so.5.13          [.] pqParseInput3
+    3.54%  pgbench  pgbench                [.] dopr.constprop.0
+    3.30%  pgbench  pgbench                [.] fmtint
+    3.16%  pgbench  libc-2.28.so           [.] __strcmp_avx2
+    2.95%  pgbench  libc-2.28.so           [.] malloc
+    2.95%  pgbench  libpq.so.5.13          [.] PQsendQueryPrepared
+    2.15%  pgbench  libpq.so.5.13          [.] pqPutInt
+    1.93%  pgbench  pgbench                [.] getVariable
+    1.85%  pgbench  libc-2.28.so           [.] ppoll
+    1.85%  pgbench  libc-2.28.so           [.] __strlen_avx2
+    1.85%  pgbench  libpthread-2.28.so     [.] __libc_recv
+    1.66%  pgbench  libpq.so.5.13          [.] pqPutMsgStart
+    1.63%  pgbench  libpq.so.5.13          [.] pqGetInt

And that's the just the standard pgbench read/write case, without
additional script commands or anything.


> The full time
> of the process itself counts for 1.4%, below the inevitable system time
> which is 2.3%. Pgbench overheads are pretty small compared to postgres
> connection and command execution, and system time. The above used a local
> socket, if it were an actual remote network connection, the gap would be
> larger. A profile run could collect more data, but that does not seem
> necessary.

Well, duh, that's because you're completely IO bound. You're doing
400tps. That's *nothing*. All you're measuring is how fast the WAL can
be fdatasync()ed to disk.  Of *course* pgbench isn't a relevant overhead
in that case.  I really don't understand how this can be an argument.


> Also, pgbench overheads must be compared to an actual client application,
> which deals with a database through some language (PHP, Python, JS, Java…)
> the interpreter of which would be written in C/C++ just like pgbench, and
> some library (ORM, DBI, JDBC…), possibly written in the initial language and
> relying on libpq under the hood. Ok, there could be some JIT involved, but
> it will not change that there are costs there too, and it would have to do
> pretty much the same things that pgbench is doing, plus what the application
> has to do with the data.

Uh, but those clients aren't all running on a single machine.

Greetings,

Andres Freund


Reply via email to