subject:"\[HACKERS\] RFC\: built\-in historical query time profiling"

Re: [HACKERS] RFC: built-in historical query time profiling

2005-03-24 Thread Bort, Paul

Title: RE: [HACKERS] RFC: built-in historical query time profiling





> I see your point.  The ugliness of log-parsing beckons.
> 


Maybe it would make sense to use a separate log server machine, where they could be written to a database without impacting production?

Re: [HACKERS] RFC: built-in historical query time profiling

2005-03-23 Thread Ed L.

On Wednesday March 23 2005 5:14, Mark Kirkwood wrote:
> - decide on a snapshot interval (e.g. 30 seconds)
> - capture pg_stat_activity every interval and save the results
> in a timestamped copy of this view (e.g. add a column
> 'snap_time')

That might serve for some purposes, but log-parsing sounds far 
simpler and closer to our objectives.  For example, I want to be 
able to identify a query that normally takes 10ms but is now 
taking 200ms...

Thanks.
Ed

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] RFC: built-in historical query time profiling

2005-03-23 Thread Mark Kirkwood

Ed L. wrote:
On Wednesday March 23 2005 4:11, Mark Kirkwood wrote:
Is enabling the various postgresql.conf stats* options and
taking regular snapshots of pg_stat_activity a possible way to
get this?

I don't see how; the duration is the key measurement I'm after, 
and I don't believe it is available anywhere but the logs.


I was thinking about something like :
- decide on a snapshot interval (e.g. 30 seconds)
- capture pg_stat_activity every interval and save the results in a 
timestamped copy of this view (e.g. add a column 'snap_time')

You can then query this new table and find all those queries that are 
taking a long time (you know a query is finished - modulo measurement 
interval , if its no longer present in the snapshot).

Clearly the granularity limits the accuracy you can achieve, and there 
could some confusion about when a query is finished if a session 
repeatedly executes exactly the same query, but it would certainly bring 
to attention any sudden change in query activity.

cheers
Mark
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] RFC: built-in historical query time profiling

2005-03-23 Thread Ed L.

On Wednesday March 23 2005 3:34, Tom Lane wrote:
>
> This is going to fall down on exactly the same objections that
> have been made to putting the log messages themselves into
> tables.  The worst one is that a failed transaction would fail
> to make any entry whatsoever. There are also performance,
> locking, and security issues to think about. (Example: heaven
> help you if someone decides to VACUUM FULL the log table ---
> that would block every other process due to exclusive lock.)

I see your point.  The ugliness of log-parsing beckons.

Thanks.
Ed


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] RFC: built-in historical query time profiling

2005-03-23 Thread Ed L.

On Wednesday March 23 2005 4:11, Mark Kirkwood wrote:
> Is enabling the various postgresql.conf stats* options and
> taking regular snapshots of pg_stat_activity a possible way to
> get this?

I don't see how; the duration is the key measurement I'm after, 
and I don't believe it is available anywhere but the logs.

Thanks.
Ed

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] RFC: built-in historical query time profiling

2005-03-23 Thread Mark Kirkwood

Ed L. wrote:
Hackers,
(some snippage...)
Our Problem:  We work with 75+ geographically distributed pg 
clusters; it is a significant challenge keeping tabs on 
performance.  We see degradations from rogue applications, 
vacuums, dumps, bloating indices, I/O and memory shortages, and 
so on.  Customers don't generally tell us when applications are 
slow, so we need to know for ourselves in a timely manner.  At 
present, we can remotely and systematically query system 
relations for diskspace usage, detailed I/O usage, 
index/sequential scans, and more.  But our _ultimate_ DB 
performance measure is query execution time.  Obviously, you can 
measure that now in an ad hoc fashion with EXPLAIN ANALYZE, and 
by examining historical logs.  But we need to be able to see the 
history in a timely fashion to systematically identify 
customer-experienced execution time degradations for "query 
patterns of interest" without any visual log inspection 
whatsoever, and correlate those with other events.
Is enabling the various postgresql.conf stats* options and taking 
regular snapshots of pg_stat_activity a possible way to get this?

regards
Mark
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq

Re: [HACKERS] RFC: built-in historical query time profiling

2005-03-23 Thread Tom Lane

"Ed L." <[EMAIL PROTECTED]> writes:
> ... We can do 
> this by writing programs to periodically parse log files for 
> queries and durations, and then centralizing that information 
> into a db for analysis, similar to pqa's effort.

That strikes me as exactly what you ought to be doing.

> Suppose there were some sort of system relations like these:

This is going to fall down on exactly the same objections that have been
made to putting the log messages themselves into tables.  The worst one
is that a failed transaction would fail to make any entry whatsoever.
There are also performance, locking, and security issues to think about.
(Example: heaven help you if someone decides to VACUUM FULL the log
table --- that would block every other process due to exclusive lock.)

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[HACKERS] RFC: built-in historical query time profiling

2005-03-23 Thread Ed L.


Hackers,

I'd like to pose a problem we are facing (historical query time 
profiling) and see if any of you interested backend gurus have 
an opinion on the promise or design of a built-in backend 
solution (optional built-in historical query time stats), and/or 
willingness to consider such a patch submission.

Our Problem:  We work with 75+ geographically distributed pg 
clusters; it is a significant challenge keeping tabs on 
performance.  We see degradations from rogue applications, 
vacuums, dumps, bloating indices, I/O and memory shortages, and 
so on.  Customers don't generally tell us when applications are 
slow, so we need to know for ourselves in a timely manner.  At 
present, we can remotely and systematically query system 
relations for diskspace usage, detailed I/O usage, 
index/sequential scans, and more.  But our _ultimate_ DB 
performance measure is query execution time.  Obviously, you can 
measure that now in an ad hoc fashion with EXPLAIN ANALYZE, and 
by examining historical logs.  But we need to be able to see the 
history in a timely fashion to systematically identify 
customer-experienced execution time degradations for "query 
patterns of interest" without any visual log inspection 
whatsoever, and correlate those with other events.  We can do 
this by writing programs to periodically parse log files for 
queries and durations, and then centralizing that information 
into a db for analysis, similar to pqa's effort.  Short of a 
backend solution, that's what we'll do.

Backend solution?

But being able to query the database itself for historical 
execution time statistics for "query patterns of interest" is 
very attractive.  Such functionality would seem generally very 
useful for other deployments.  Below is a rough novice sketch of 
an **optional** scheme for doing so in the backend (I'm sure 
it's incomplete/faulty in this presentation; I'm really trying 
to determine if there are any fatal short-comings over the 
log-parsing approach).

Suppose there were some sort of system relations like these:

pg_query_profile (
id  integer,
namevarchar not null unique,
sql_regex   varchar not null unique,
enabled boolean not null
)

pg_query_profile_history (
profile_id  integer not null,  -- refs 
pg_query_profile.id
count   integer, -- number of matches in period
avgdur  float not null,  -- avg duration in secs
mindur  float not null, -- min duration
maxdur  float not null, -- max duration
errors  bigint not null, -- errors in period
period_starttimestamp not null,
period_end  timestamp not null
)

Each row in pg_query_profile_history would represent execution 
time stats for queries matching a given regex for a given 
interval.  The sql_regex column would be a user-specified value 
matching "queries of interest".  For example, if I were 
interested in profiling all queries of the form 

"SELECT * FROM result WHERE key = 123"

, then maybe my sql_regex would basically be 

INSERT INTO pg_query_profile (name, sql_regex)
VALUES ('Result Queries',
'SELECT * FROM result WHERE key = \d+');

Then, as each query completed, that query was (optionally!) 
checked against existing pg_query_profile.sql_regex values for a 
patten match, and any matching pg_query_profile rows for that 
period were then updated with the duration data.  I can imagine 
wishing to collect this data for 10-20 most-common queries in 
5-minute intervals for the past 24 hours or so.

One could then systematically identify degradations beyond 1.0 
seconds with a query similar to the following:

SELECT COUNT(1)
FROM pg_query_profile_view
WHERE name = 'Result Queries'
AND avgdur > 1.0;

Once the data is there, it opens up a lot of possibilities for 
systematic monitoring.

Some possible objections (O) and answers (A):

1) O:  But wouldn't this impose too much overhead in the backend 
for transactions for folks who don't want/need this feature?  A:  
Not if it were completely optional, right?

2)  O:  If enabled, there is no way you'd want to impose an 
update query on each select query!  A:  True.  I envision the 
query profile as cached in shared memory and only written to 
disk a user-configurable "every so often".

3)  O:  Regular expression evaluation is computationally 
expensive!  A:  I'm imagining it might add a few milliseconds to 
each query, which would be well worth the benefit to us in 
having the most important metric easily accessible.

GUC variables might include:

query_profile : boolean on/off for profiling

query_profile_interval : how often to write out stats

Re: [HACKERS] RFC: built-in historical query time profiling

Re: [HACKERS] RFC: built-in historical query time profiling

Re: [HACKERS] RFC: built-in historical query time profiling

Re: [HACKERS] RFC: built-in historical query time profiling

Re: [HACKERS] RFC: built-in historical query time profiling

Re: [HACKERS] RFC: built-in historical query time profiling

Re: [HACKERS] RFC: built-in historical query time profiling

[HACKERS] RFC: built-in historical query time profiling

8 matches

Site Navigation

Mail list logo

Footer information