Greg, First off, let me again praise the great work you and Peter are doing in this area. > Modeling this on pg_stat_statements includes the hope of packaging it as > an extension with minor core hooks, and the idea that there would be a > fixed size list of timing events available at any time. Consider it a > FIFO queue built using a circular buffer. Dump events into there and > provide a way to view them. If some fall out before they're > analyzed/saved, that doesn't seem a problem for now.
Agreed. > If you want 100% > durable, the log is still available. Eventually I'd like a consumer for > these that wrote them to a history table as an option, but that seems a > second priority after getting them into memory in the first place. Yes, and easy for users/tools to implement once the basic data is out there. > I'd > like that consumer thing for pg_stat_statements too, but so far we're > getting by without it. It seems like something that might benefit from > the in-core queuing work one day too. Well, we could actually use such a thing in general, and not just for the timing events. For example, it would be really useful to be able to see, for example, pg_stat_user_tables from 2 days ago to estimate table growth and activity, or pg_stat_replication from 10 minutes ago to average replication lag. There was a plug-in tool for this, I think Itagaki developed it. Anyone remember what/where it is? So there's actually two things for a consumer: Timing Events and Counters. > = Initial event list = > > There are three particular events I have been thinking about, hoping > that will be enough to make sure the structure is general enough to > handle more later: > > -log_lock_waits output. This is my #1 goal. Lock waits are incredibly > common issues in larger systems, and the process for tracking them down > right now is very painful. Probably deadlocks too. > > -log_temp_files output. This everyone wants to monitor, but it won't be > as useful unless query tieing (below) works. > > -log_checkpoints. The fight to make this data available via > pg_stat_bgwriter has been useful. You really need a trending system to > do anything useful with the data that way though. Being able to write a > simple timing events query that looked for checkpoint spikes would be > great for alerting, which is often done with something other than a > trending package. To give specific examples here, the graphs Munin > makes for pg_stat_bgwriter data work, but they're not much of a help if > what you really want is to trigger a Nagios alert when checkpoints are > slow. A Timing Event entry of a slow checkpoint would be easy to > trigger alerts from. I agree that these are the events to start with. Let me add the additional events I think we want to have, in the order of how useful the information is for troubleshooting: 4. log_connections / log_disconnections 5. deadlock reporting 6. autovacuum logging I also think that we should eventually expose a buffer of the last N error messages, but that's a different sort of data. > > = Data to save = > > There's a few obvious fields that could be explicitly saved as regular > columns: > > -timestamp > -user > -database + application_name > -event type code. Presumably an integer that's decoded via a master > list of these, in some header file. > -running process id + running session id > -running process details (below) > -referenced process id. The one causing the problem in situations like > a lock wait. - referenced session id > -referenced process details (below) > -serial number. People who will want to poll & store this data might > want a better way to do that than just a timestamp. Would let you know > when you lost some of it, too, so you'd know to consider increasing the > buffer size or decrease the polling interval. By the time the user looks at this information, details on the process from pg_stat_activity will probably no longer be available. This makes data like process_id not that useful, and means we need to replicate all relevant data from pg_stat_activity. > Another way this data might be stored is to have a key/value composite > type, and then make each timing event entry an array of those. If only PostgreSQL had some sort of key-value composite storage type available! I have no idea where we would get such a thing. ;-) > I'm open to other ideas for how to store this. If the events go into > one place, the different data each saves turns into the classic > "flexible schema" problem. I think it's an excellent approach, not in the least because it lets us add new data later, and deals with the variety of fields we need to lock_waits. > = Query text = > > For the often multi-line query text that's dumped by log_lock_waits, > I've thought of two possibilities so far: > > -Mash them into a single text line using "\n" between them. Now you're > even back to where, with the right escaping, the entire timing event > detail part could still collapse into a single text string. That would > kill the idea of simple parsing of key/value data stuffed there though. > > -Create an array of lines to hold the two statements. Can always make > the view that shows the data unnest it. Seems like this would compel > putting the two query description arrays into their own fields though. I think you need to have a separate field for the queries. Do we have anything we want to log which would have more than two queries? > = Query Tieing = > > I'd like to have a way to reference the normalized version of the active > query or queries at the time when that's appropriate too. That seems > like it should be a key that's joinable with the normalized version in > pg_stat_statements. One of the ideas bounced around in the "Hash id in > pg_stat_statements" thread, like assigning a numeric code for them, > would seem fine for that. This will turn dependent on one of those > ideas being committed. So, the problem with joining against pg_stat_statements is that a special-purpose incident you're looking at (like a lock_wait) might have been pushed "off the bottom" of pg_stat_statements even though it is still visible in pg_stat_lock_waits. No? > = Limitations of scope = > > I was thinking the act of loading the extension for this would turn the > logging on, and the existing GUCs like log_lock_waits will also apply. > It seems useful and easy to have a load time option for the module that > says to disable all the replaced log file writes if you're using the > timing event mechanism. It would be nice to have a granular way to ask > for some timing events: "I want to see lock waits but not temp files". > That seems like scope creep I don't think is necessary to make this > useful. That depends on whether we think any of these types of events, individually, will add substantial overhead. > All these initial targets are events that do not occur that often, and > I'm not too concerned about the performance of this implementation. Lock > waits come out once per second per connection. temp files shouldn't be > that common. And checkpoints are infrequent. For now, I hope that only > targeting infrequent events means the overhead of dumping the timing > event into memory will be little overhead. log_connections/disconnections might not be low-frequency, though. So we should plan for that. > Alternately, think about the overhead of these initial targets like > this: right now, all of these things are being written to the log file. > If you can afford to do that, you can certainly afford to write a copy > to memory instead, and probably in addition too. To say nothing of the overhead of parsing/processing them later ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers