Re: [HACKERS] auto_explain WAS: RFC: Timing Events

Greg Smith Wed, 21 Nov 2012 15:15:44 -0800

On 11/8/12 2:16 PM, Josh Berkus wrote:

Also, logging only the long-running queries is less useful than people
on this list seem to think.  When I'm doing real performance analysis, I
need to see *everything* which was run, not just the slow stuff.  Often
the real problem is a query which used to take 1.1ms, now takes 1.8ms,
and gets run 400 times/second. Looking just at the slow queries won't
tell you that.

No argument here. I've tried to be clear that some of these high-speedbut lossy things I'm hacking on are not going to be useful for everyone.A solution that found most of these 1.8ms queries, but dropped somepercentage under the highest peak load conditions, would still be veryuseful to me.

An anecdote on this topic seems relevant. I have a troublesomeproduction server that has moved log_min_duration_statement from 100msto 500ms to 2000ms as the system grew. Even the original setting wasn'tshort enough to catch everything we would like to watch *now*, butseeing sub-second data is a dream at this point. The increases havebeen forced by logging contention becoming unmanagable when every clienthas to fight for the log to write to disk. I can see the buggers stackup as waiting for locks if I try to log shorter statements, stallingenough that it drags the whole server down under peak load.

If I could just turn off logging just during those periods--basically,throwing them away only when some output rate throttled component hitits limit--I could still find them in the data collected the rest of thetime. There are some types of problems that also only occur under peakload that this idea would miss, but you'd still be likely to get *some*of them, statistically.


There's a really hard trade-off here:

-Sometimes you must save data on every query to capture fine details
-Making every query wait for disk I/O is impractical

The sort of ideas you threw out for making things like auto-explainlogging per-session can work. The main limitation there though is thatit presumes you even know what area the problem is in the first place.I am not optimistic that covers a whole lot of ground either.

Upthread I talked about a background process that persists shared memoryqueues as a future consumer of timing events--one that might consumeslow query data too. That is effectively acting as the ideal componentI described above, one that only loses things when it exceeds thesystem's write capacity for saving them. I wouldn't want to try andretrofit the existing PostgreSQL logging facility for such a thingthough. Log parsing as the way to collect data is filled with headachesanyway.

I don't see any other good way to resolve this trade-off. To help withthe real-world situation you describe, ideally you dump all the querydata somewhere, fast, and have the client move on. You can't makequeries wait for storage, something else (with a buffer!) needs to takeover that job.

I can't find the URL right now, but at PG.EU someone was showing me amodule that grabbed the new 9.2 logging hook and shipped the result toanother server. Clients burn a little CPU and network time and theymove on, and magically disk I/O goes out of their concerns. How muchoverhead persisting the data takes isn't the database server's job atall then. That's the sort of flexibility people need to have withlogging eventually. Something so efficient that every client can affordto do it; it is capable of saving all events under ideal conditions; butunder adverse ones, you have to keep going and accept the loss.


--
Greg Smith   2ndQuadrant US    [email protected]   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] auto_explain WAS: RFC: Timing Events

Reply via email to