date:20221231

Re: How to analyze of short but heavy intermittent slowdown on BIND on production database (or BIND vs log_lock_waits)

2022-12-31 Thread Justin Pryzby

On Sat, Dec 31, 2022 at 02:26:08PM +0200, Maxim Boguk wrote:
> Hi,
> 
> When performing post-mortem analysis of some short latency spikes on a
> heavily loaded database, I found that the reason for (less than 10 second
> latency spike) wasn't on the EXECUTE stage but on the BIND stage.
> At the same time graphical monitoring shows that during this few second
> period there were some queries waiting in the BIND stage.
> 
> Logging setup:
> log_min_duration_statement=200ms
> log_lock_waits=on
> deadlock_timeout=100ms
> So I expected that every lock waiting over 100ms (>deadlock_timeout) should
> be in the log.
> But in the log I see only spikes on slow BIND but not lock waits logged.

What version postgres?  What settings have non-default values ?
What OS/version?  What environment/hardware?  VM/image/provider/...
What are the queries that are running BIND ?  What parameter types ?
Are the slow BINDs failing?  Are their paramters being logged ?
What else is running besides postgres ?  Are the DB clients local or
remote ?  It shouldn't matter, but what client library?

https://wiki.postgresql.org/wiki/Slow_Query_Questions

-- 
Justin

How to analyze of short but heavy intermittent slowdown on BIND on production database (or BIND vs log_lock_waits)

2022-12-31 Thread Maxim Boguk

Hi,

When performing post-mortem analysis of some short latency spikes on a
heavily loaded database, I found that the reason for (less than 10 second
latency spike) wasn't on the EXECUTE stage but on the BIND stage.
At the same time graphical monitoring shows that during this few second
period there were some queries waiting in the BIND stage.

Logging setup:
log_min_duration_statement=200ms
log_lock_waits=on
deadlock_timeout=100ms
So I expected that every lock waiting over 100ms (>deadlock_timeout) should
be in the log.
But in the log I see only spikes on slow BIND but not lock waits logged.
(
grep BIND /var/log/postgresql/postgresql-2022-12-29.log | grep 'duration' |
perl -pe 's/^(2022-12-29 \d\d:\d\d:\d).*$/$1/' | sort | uniq -c | less
...
  9 2022-12-29 00:12:5
  2 2022-12-29 00:13:1
  3 2022-12-29 00:13:5
!!!  68 2022-12-29 00:14:0
  5 2022-12-29 00:14:1
  3 2022-12-29 00:14:2
  2 2022-12-29 00:14:3
).
But no lock waits on the BIND stage logged during the problem period (and
no lock waits in general).
Similar issues happen a few times per day without any visible pattern (but
on the same tables usually).
No CPU or IO load/latency spikes found during problem periods.
No EXECUTE slowdown found in the log during that time.

So currently I have two hypotheses in research:
1)during BIND stage not every lock waits logged
2)there are some not a lock related intermittent slowdown of BIND

I ask for any ideas how to debug this issue (duration of such spike usually
under 1s but given how many TPS database serving - 1s is too much and
affect end users).


-- 
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone UA: +380 99 143 
Phone AU: +61  45 218 5678

Re: How to analyze of short but heavy intermittent slowdown on BIND on production database (or BIND vs log_lock_waits)

How to analyze of short but heavy intermittent slowdown on BIND on production database (or BIND vs log_lock_waits)

2 matches

Site Navigation

Mail list logo

Footer information