RE: [Proposal] Add accumulated statistics

imai.yoshik...@fujitsu.com Tue, 29 Oct 2019 22:52:00 -0700

Hi,

On Tue, Jan 15, 2019 at 1:14 AM, Tsunakawa, Takayuki wrote:


[ ... absent for a long time ]

I read the discussions of this thread.

If we want wait event info, we can currently do sampling from pg_stat_activity 
and get pseudo wait event total duration.
(I understand wait event sampling does *not* represent wait event count but 
wait event duration.)

It is said that, in most cases, sampling is enough to find the cause of issue. 
On the other hand, it is said that sampling missed wait events that don't spent 
much time in the process. Can we say we don't need such wait events that seems 
apparently not important?

Also we can know the pseudo total duration from the sampling though, we can't 
know the how many times wait events occurred and how long wait events take time 
in the average each time. I think it would be better to also know wait event 
average time to investigate the cause of the issue.

For example, counts, total duration and average times of each wait events after 
executing pgbench, which is gotten from the server applied my patch, is like 
below.


transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100 
query mode: simple
number of clients: 8
number of threads: 8
duration: 60 s
number of transactions actually processed: 132259
latency average = 3.630 ms
tps = 2203.909727 (including connections establishing)
tps = 2204.248691 (excluding connections establishing)

 wait_event_type |      wait_event      | calls  |   times   | times/calls
-----------------+----------------------+--------+-----------+------------
 Client          | ClientRead           | 925732 | 115270950 |  124.52
 IO              | DataFileRead         | 220357 |   3219893 |   14.61
 LWLock          | WALWriteLock         | 195560 | 192684987 |  985.30
 IO              | DataFileWrite        | 156208 |   4440091 |   28.42
 IO              | WALWrite             |  51035 |   2925808 |   57.33
 IO              | WALSync              |  50812 |  49626240 |  976.66
 Lock            | transactionid        |   6179 |   9905727 | 1603.13
 LWLock          | buffer_content       |   5453 |    295123 |   54.12
 IO              | DataFileExtend       |   2367 |     90430 |   38.20
 LWLock          | wal_insert           |   2053 |     33620 |   16.38
 LWLock          | WALBufMappingLock    |   1301 |     23761 |   18.26
 LWLock          | CLogControlLock      |    514 |      3959 |    7.70
 LWLock          | buffer_mapping       |    162 |      1313 |    8.10
 LWLock          | XidGenLock           |    153 |      2399 |   15.68
 LWLock          | lock_manager         |    122 |      5051 |   41.40
 Lock            | tuple                |    105 |    114632 | 1091.73
 LWLock          | ProcArrayLock        |     99 |      1828 |   18.46
 LWLock          | CheckpointerCommLock |     33 |       854 |   25.88
 Lock            | extend               |     24 |     20493 |  853.88
 IO              | RelationMapRead      |     22 |        71 |    3.23
 LWLock          | buffer_io            |     11 |      7398 |  672.55
 IO              | SLRUWrite            |      4 |       249 |   62.25
 IO              | SLRUSync             |      4 |         0 |    0.00
 IPC             | ClogGroupUpdate      |      3 |       223 |   74.33
 LWLock          | SyncRepLock          |      1 |         1 |    1.00


We can see that WALWriteLock has the longest total duration(times), and 
ClientRead is the next one. We might get this information from the sampling. We 
can also see that WALWriteLock longest total duration is because of its high 
average time, while ClientRead total duration is because of its many counts. We 
can also know that transactionid, tuple, WALWriteLock, WALSync, extend have the 
longer average time than the others. If we want to improve the performance even 
for just a bit, such info is useful I think.

The patch which is proposed in this thread was not accepted because it loads 
overhead on the server/backend.
The function that measures the wait event time, gettimeofday and clock_gettime, 
takes time to execute and it is on the frequently called 
path(pgstat_report_wait_start/end), it is supposed a lot of overhead will be 
introduced.

Even if we only count the wait event, there would be overhead, which is pointed 
out in the another thread.[1]

The overhead which is induced by getting wait event info was discussed from old 
times, but I couldn't find the actual measuring results, so I want to measure 
its overhead.

[1] 
https://www.postgresql.org/message-id/CA%2BTgmobf1NJD%2B_DfQG5qccG5YFSnxk3CgC2mh0-UHabznCQtYA%40mail.gmail.com

--
Yoshikazu Imai

RE: [Proposal] Add accumulated statistics

Reply via email to