RE: [Proposal] Add accumulated statistics for wait event

Phil Florent Thu, 04 Oct 2018 02:33:49 -0700

Hi,
I am a DB beginner, so please tell me. It says that you can find events that 
are bottlenecks in sampling, but as you saw above, you can not find events 
shorter than the sampling interval, right?

If an event occurs frequently and if it is reported in pg_stat_activity, you 
will catch it again and again while sampling, no matter it duration.
Hence you just need to

  *   Sample the sessions and consider the active ones. You need to know if 
they are waiting (PostgreSQL now provides detailed wait events) or if they are 
on the CPU
  *   Optionally collect information on the system context at the time of 
sampling (CPU, memory...), it can be provided by many tools like psutil python 
library for example

If the client application itself provides information it's even more 
interesting. With something like 
program/module/action/client_info/sofar/totalwork in application_name you are 
able to focus directly on different kind of activity. It can give you 
information like  "I/O waits are meaningful for my batch activity but not for 
my OLTP activity, if my goal is to improve response time for end users I have 
to consider that."

Best regards
Phil

________________________________
De : Yotsunaga, Naoki <yotsunaga.na...@jp.fujitsu.com>
Envoyé : jeudi 4 octobre 2018 10:31
À : 'Michael Paquier'; Phil Florent
Cc : Tomas Vondra; pgsql-hackers@lists.postgresql.org
Objet : RE: [Proposal] Add accumulated statistics for wait event

On Thu, July 26, 2018 at 1:25 AM, Michael Paquier wrote:
> Even if you have spiky workloads, sampling may miss those, but even with 
> adding counters for each event
> you would need to query the table holding the counters at an insane frequency 
> to be able to perhaps get
> something out of it as you need to do sampling of the counters as well to 
> extract deltas.

Hi, I was wondering why PostgreSQL did not have the number of wait events and 
wait time that other databases such as Oracle had as a function, and when I was 
looking for related threads, I got to this thread.

I am a DB beginner, so please tell me. It says that you can find events that 
are bottlenecks in sampling, but as you saw above, you can not find events 
shorter than the sampling interval, right?
If this short event has occurred quite a number of times and it was a 
considerable amount of time in total, can you solve this problem with sampling?
# I have asked, but I could not understand much of the discussion above and I 
do not know if such a case can exist.
Also, I think that it can be solved by higher the sampling frequency, but the 
load will be high, right? I think this method is not very practical.

Moreover, I think that it is not implemented due to the reason that sampling is 
good as described above and because it affects performance.
How about switching the function on / off and implementing with default off?
Do you care about performance degradation during bottleneck investigation?
When investigating the bottleneck, I think that it is better to find the cause 
even at the expense of performance.
# If you can detect with high frequency sampling, I think that it depends on 
whether the sampling or the function(the number of wait events and wait time) 
is high load.

Since I am a DB beginner, I think that it is saying strange things.
I am glad if you tell me.

-----
Naoki Yotsunaga

RE: [Proposal] Add accumulated statistics for wait event

Reply via email to