Greg,

First off, let me again praise the great work you and Peter are doing in
this area.
> Modeling this on pg_stat_statements includes the hope of packaging it as
> an extension with minor core hooks, and the idea that there would be a
> fixed size list of timing events available at any time.  Consider it a
> FIFO queue built using a circular buffer.  Dump events into there and
> provide a way to view them.  If some fall out before they're
> analyzed/saved, that doesn't seem a problem for now.

Agreed.

>  If you want 100%
> durable, the log is still available.  Eventually I'd like a consumer for
> these that wrote them to a history table as an option, but that seems a
> second priority after getting them into memory in the first place.

Yes, and easy for users/tools to implement once the basic data is out there.

>  I'd
> like that consumer thing for pg_stat_statements too, but so far we're
> getting by without it.  It seems like something that might benefit from
> the in-core queuing work one day too.

Well, we could actually use such a thing in general, and not just for
the timing events.  For example, it would be really useful to be able to
see, for example, pg_stat_user_tables from 2 days ago to estimate table
growth and activity, or pg_stat_replication from 10 minutes ago to
average replication lag.  There was a plug-in tool for this, I think
Itagaki developed it.  Anyone remember what/where it is?

So there's actually two things for a consumer: Timing Events and Counters.

> = Initial event list =
> 
> There are three particular events I have been thinking about, hoping
> that will be enough to make sure the structure is general enough to
> handle more later:
> 
> -log_lock_waits output.  This is my #1 goal.  Lock waits are incredibly
> common issues in larger systems, and the process for tracking them down
> right now is very painful.  Probably deadlocks too.
> 
> -log_temp_files output.  This everyone wants to monitor, but it won't be
> as useful unless query tieing (below) works.
> 
> -log_checkpoints.  The fight to make this data available via
> pg_stat_bgwriter has been useful.  You really need a trending system to
> do anything useful with the data that way though.  Being able to write a
> simple timing events query that looked for checkpoint spikes would be
> great for alerting, which is often done with something other than a
> trending package.  To give specific examples here, the graphs Munin
> makes for pg_stat_bgwriter data work, but they're not much of a help if
> what you really want is to trigger a Nagios alert when checkpoints are
> slow.  A Timing Event entry of a slow checkpoint would be easy to
> trigger alerts from.

I agree that these are the events to start with.  Let me add the
additional events I think we want to have, in the order of how useful
the information is for troubleshooting:

4. log_connections / log_disconnections
5. deadlock reporting
6. autovacuum logging

I also think that we should eventually expose a buffer of the last N
error messages, but that's a different sort of data.

> 
> = Data to save =
> 
> There's a few obvious fields that could be explicitly saved as regular
> columns:
> 
> -timestamp
> -user
> -database

+ application_name

> -event type code.  Presumably an integer that's decoded via a master
> list of these, in some header file.
> -running process id

+ running session id

> -running process details (below)
> -referenced process id.  The one causing the problem in situations like
> a lock wait.

- referenced session id

> -referenced process details (below)
> -serial number.  People who will want to poll & store this data might
> want a better way to do that than just a timestamp.  Would let you know
> when you lost some of it, too, so you'd know to consider increasing the
> buffer size or decrease the polling interval.

By the time the user looks at this information, details on the process
from pg_stat_activity will probably no longer be available.  This makes
data like process_id not that useful, and means we need to replicate all
relevant data from pg_stat_activity.

> Another way this data might be stored is to have a key/value composite
> type, and then make each timing event entry an array of those.

If only PostgreSQL had some sort of key-value composite storage type
available!  I have no idea where we would get such a thing.  ;-)

> I'm open to other ideas for how to store this.  If the events go into
> one place, the different data each saves turns into the classic
> "flexible schema" problem.

I think it's an excellent approach, not in the least because it lets us
add new data later, and deals with the variety of fields we need to
lock_waits.

> = Query text =
> 
> For the often multi-line query text that's dumped by log_lock_waits,
> I've thought of two possibilities so far:
> 
> -Mash them into a single text line using "\n" between them.  Now you're
> even back to where, with the right escaping, the entire timing event
> detail part could still collapse into a single text string.  That would
> kill the idea of simple parsing of key/value data stuffed there though.
> 
> -Create an array of lines to hold the two statements.  Can always make
> the view that shows the data unnest it.  Seems like this would compel
> putting the two query description arrays into their own fields though.

I think you need to have a separate field for the queries.  Do we have
anything we want to log which would have more than two queries?

> = Query Tieing =
> 
> I'd like to have a way to reference the normalized version of the active
> query or queries at the time when that's appropriate too.  That seems
> like it should be a key that's joinable with the normalized version in
> pg_stat_statements.  One of the ideas bounced around in the "Hash id in
> pg_stat_statements" thread, like assigning a numeric code for them,
> would seem fine for that.  This will turn dependent on one of those
> ideas being committed.

So, the problem with joining against pg_stat_statements is that a
special-purpose incident you're looking at (like a lock_wait) might have
been pushed "off the bottom" of pg_stat_statements even though it is
still visible in pg_stat_lock_waits.  No?

> = Limitations of scope =
> 
> I was thinking the act of loading the extension for this would turn the
> logging on, and the existing GUCs like log_lock_waits will also apply.
> It seems useful and easy to have a load time option for the module that
> says to disable all the replaced log file writes if you're using the
> timing event mechanism.  It would be nice to have a granular way to ask
> for some timing events:  "I want to see lock waits but not temp files".
>  That seems like scope creep I don't think is necessary to make this
> useful.

That depends on whether we think any of these types of events,
individually, will add substantial overhead.

> All these initial targets are events that do not occur that often, and
> I'm not too concerned about the performance of this implementation. Lock
> waits come out once per second per connection.  temp files shouldn't be
> that common.  And checkpoints are infrequent.  For now, I hope that only
> targeting infrequent events means the overhead of dumping the timing
> event into memory will be little overhead.  

log_connections/disconnections might not be low-frequency, though.  So
we should plan for that.

> Alternately, think about the overhead of these initial targets like
> this:  right now, all of these things are being written to the log file.
>  If you can afford to do that, you can certainly afford to write a copy
> to memory instead, and probably in addition too.

To say nothing of the overhead of parsing/processing them later ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to